Building GPT Applications on Open Source LangChain, Part 2
We’ll use the fast-rising LLM application framework for a practical example of how to use a GPT to help answer a question from a PDF document.
Jun 16th, 2023 8:15am by
SingleStore sponsored this post. Insight Partners is an investor in SingleStore and TNS.
Create a SingleStoreDB Cloud Account
First, sign up for a free SingleStoreDB Cloud account. Once logged in, select CLOUD > Create new workspace group from the left-hand navigation pane. Next, choose Create Workspace and just work through the wizard. Here are the recommended settings for this example:Create Workspace Group
Workspace Group Name: LangChain Demo Group Cloud Provider: AWS Region: US East 1 (N. Virginia) Click Next.Create Workspace
Workspace Name: langchain-demo Size: S-00 Click Create Workspace. Once the workspace is created and available, from the left-hand navigation pane, select DEVELOP > SQL Editor to create a new database, as follows:CREATE DATABASE IF NOT EXISTS pdf_db;
Create a Notebook
From the left-hand navigation pane, select DEVELOP > Notebooks. In the top right of the web page, select New Notebook > New Notebook, as shown in Figure 1 below.
We’ll call the notebook langchain_demo. Select a Blank notebook template from the available options.
We’ll also select the Connection and Database using the drop-down menus above the notebook, as shown in Figure 2.

Figure 2. Connection and Database
Fill out the Notebook
First, we’ll import some libraries:
!pip install langchain --quiet
!pip install openai --quiet
!pip install pdf2image --quiet
!pip install tabulate --quiet
!pip install tiktoken --quiet
!pip install unstructured --quiet
from langchain.document_loaders import OnlinePDFLoader
loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")
data = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")
You have 1 document(s) in your data
There are 13040 characters in your document
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 0)
texts = text_splitter.split_documents(data)
print (f"You have {len(texts)} pages")
%%sql
USE pdf_db;
DROP TABLE IF EXISTS pdf_docs;
CREATE TABLE IF NOT EXISTS pdf_docs (
id INT PRIMARY KEY,
text TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
embedding BLOB
);
from sqlalchemy import *
db_connection = create_engine(connection_url)
import openai
openai.api_key = "OpenAI API Key"
from langchain.embeddings import OpenAIEmbeddings
embedder = OpenAIEmbeddings(openai_api_key = openai.api_key)
db_connection.execute("TRUNCATE TABLE pdf_docs")
for i, document in enumerate(texts):
text_content = document.page_content
embedding = embedder.embed_documents([text_content])[0]
stmt = """
INSERT INTO pdf_docs (
id,
text,
embedding
)
VALUES (
%s,
%s,
JSON_ARRAY_PACK_F32(%s)
)
"""
db_connection.execute(stmt, (i+1, text_content, str(embedding)))
query_text = "Will object-oriented databases be commercially successful?"
query_embedding = embedder.embed_documents([query_text])[0]
stmt = """
SELECT
text,
DOT_PRODUCT_F32(JSON_ARRAY_PACK_F32(%s), embedding) AS score
FROM pdf_docs
ORDER BY score DESC
LIMIT 1
"""
results = db_connection.execute(stmt, str(query_embedding))
for row in results:
print(row[0])
prompt = f"The user asked: {query_text}. The most similar text from the document is: {row[0]}"
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)
print(response['choices'][0]['message']['content'])
Summary
In this example, we saw the benefits of LangChain in the application development process. We also saw how easily we can convert documents from one format to another, store the content in a database system, generate vector embeddings and ask questions about the data stored in the database system. We also have the full power of SQL available if we are interested in performing additional query operations on the data. I will host a workshop on June 22 and will go through building a ChatGPT application using LangChain. I hope you can join. Sign up here.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.