Try with your Data#

Now it’s your turn to apply your data and specific domain knowledge.

You can use this notebook as a starting point and adapt it to your needs. You will need to develop the pre-processing stage for a RAG system. This includes document retrieval, cleaning, chunking, and ingestion into the vector database using an embedding model.

To help you, we’ve provided a few example code snippets in Jupyter notebooks found in the appendix.

Utility Functions#

A section for whatever utility functions you need. We have packaged up our utility functions in a Python package called ssec_tutorials. You can find the source code in this GitHub repository.

# Write your code here for whatever utility functions you need. This can be anything such as
# cleaning up document format, setting up prompt templates, etc.

# Uncomment the following for a simple document formatting function
# def format_docs(docs):
#     return "\n\n".join(doc.page_content for doc in docs)

Retrieve documents#

A section for document retrieval. This just means getting your document from whatever sources, in your local computer or the internet. See the Document Loaders integration list from Langchain for an extensive list of what’s possible.

For the purpose of this tutorial, we recommend a simple example of loading a piece of text from a file such as PDF. Also, if you have a large piece of text, you can split it into smaller chunks using Langchains’s RecursiveTextSplitter.

If you don’t have any data with you, you can try out with this Algorithm Textbook by Jeff Erickson. This textbook has been generously made available by Jeff Erickson under the Creative Commons Attribution 4.0 International license, you can find more information about the textbook at https://jeffe.cs.illinois.edu/teaching/algorithms/.

Note

If you’re running things on Codespace, refer to this link and upload your data to resources/ folder.

# Write your code here for your retrieval step,
# see the documentation on PyMuPDF for more information:
# https://python.langchain.com/v0.2/docs/how_to/document_loader_pdf/#using-pymupdf

# Uncomment below for code to download the textbook
# import os
# from urllib.request import urlretrieve
# url = "http://jeffe.cs.illinois.edu/teaching/algorithms/book/Algorithms-JeffE.pdf"
# filename = os.path.basename(url)

# if not os.path.exists(filename):
#     # Download if file doesn't exist
#     pdf_path, headers = urlretrieve(url, filename)
# Write your code here to load the PDF document as a Langchain Document objects

Document Embeddings to Qdrant Vector Database#

Once you’ve figured out how to retrieve and load your documents to Langchain Document objects, you can then proceed to loading these documents to Qdrant Vector Database collection.

See the following documentation for some guidance on Langchain Qdrant integration.

from langchain_huggingface import HuggingFaceEmbeddings
# Setup the embedding, we are using the MiniLM model here
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L12-v2")

Setup Vector DB#

# Write your code here to load your data into the database

# uncomment below to set the Qdrant path and collection name
# for an "local mode" on-disk storage
# See https://python.langchain.com/v0.2/docs/integrations/vectorstores/qdrant/#on-disk-storage
# qdrant_path = "./my_qdrant_database"
# qdrant_collection = "algorithms_book"

Test out the Qdrant collection#

At this step, you should have a Qdrant object (langchain_qdrant.vectorstores.Qdrant) that has your document loaded into it in a collection. You can test out the collection by querying for a documents and checking if the results are as expected.

To do this, you’ll need to create a VectorStoreRetriever.

Note

A sample question example to ask from the document can be "What is the most familiar method for multiplying large numbers?". An answer to this question can be found on page 3, section 0.2 Multiplication, Lattice Multiplication.

Tip

You’ll probably need to tweak the arguments for creating a VectorStoreRetriever object for the best search type and limiting the number of documents. This part is a bit of trial and error, so don’t be afraid to experiment. It is a critical part of RAG system to get the right documents for the question as that is what the LLM would use to generate the answer.

# Write your code here to try out the vector database retrieval with a question query

Setup OLMo Model#

At this stage now we have the Retrieval-Augmented (RA) in RAG system. Let’s now setup the Generation (G) part with the OLMo model.

from ssec_tutorials import download_olmo_model

# This will download the OLMO model to the cache directory
OLMO_MODEL = download_olmo_model()
# Uncomment this line to understand your available options for LlamaCpp Class
# LlamaCpp?
from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import StreamingStdOutCallbackHandler

# Here we've setup the LlamaCpp model,
# but you'll need to add additional arguments to `LlamaCpp`
# to make it work for your specific use case
olmo = LlamaCpp(
    model_path=str(OLMO_MODEL),
    callbacks=[StreamingStdOutCallbackHandler()],
    verbose=False,
)

Tip

Try asking some questions to OLMo about any content of the document you’ve loaded in the Qdrant collection. You will find that the OLMo model is not trained on your specific domain, so it might not give you the best results.

_ = olmo.invoke(input="What is the most familiar method for multiplying large numbers?")

Prompt Engineering#

Rather than a just a simple question, we’ll need to refine the prompt to include instruction and context for the model to generate the answer. To do this, we’ll need to setup the proper string PromptTemplate.

from langchain_core.prompts import PromptTemplate

# Create the initial prompt template using OLMo's tokenizer chat template we saw in module 1.
prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"],
    template_format="jinja2",
    partial_variables={"add_generation_prompt": True, "eos_token": "<|endoftext|>"},
)

Set the question for the prompt

question = "What is the most familiar method for multiplying large numbers?"

Set the context for the prompt. This is where you’ll need to use the VectorStoreRetriever and format the document object with format_docs or simply add your own text to the variable.

# Uncomment variable below to set the context
# context = <Enter code or string here>

Set the instruction for the prompt.

instruction = """You are a computer science professor.
Please answer the following question based on the given context."""

The original OLMo chat template takes in multiple messages with a role and content key. You can use this template to ask questions to the model. For simplicity, we’ll just use a single message.

# Uncomment below to set the input text template
# input_text_template = f"""\
# {instruction}

# Context: {context}

# Question: {question}
# """
# Uncomment below to set the message dictionary
# message = {
#     "role": "user",
#     "content": input_text_template,
# }
# Uncomment below to try out the prompt template
# print(prompt_template.format(
#     messages=[message]
# ))

You can see above what the final prompt looks like. There are tags like <|user|> that signify the model that this is a user input and so on. This final string is sent to the model for generating the answer.

RAG#

At this point you have all the parts for RAG system setup. Now let’s chain the prompt engineering, OLMo model and the Qdrant collection to get a more accurate answer.

# Write your code here to create the retrieval chain

Bonus: Try to create a simple chat app, by modifying the 1-olmo-chat-rag.ipynb notebook with your use case.

Please fill out the survey feedback form to help us improve the tutorial.