LangChain: The LLM Application Framework

LangChain: The LLM Application Framework#

LangChain, an open-source library, empowers developers by providing a standardized and structured interface for building and integrating various components of an LLM Application. Its model-agnostic nature allows for compatibility with models from multiple LLM providers, including OpenAI, HuggingFace, and others.

Using Langchain allows us to build (“like a chain”) reusable components as part of complex multi-step LLM-based applications clearly and succinctly.

You can learn about different LangChain components here.

This tutorial will focus on a few LangChain components and learn about chaining, one of its powerful features.

Prompt templates#

Prompt Templates provides templates for designing prompts fed as inputs to the LLM models. It helps us design templates with multiple inputs that are parameterized and reusable.

Note

For this tutorial, we will only cover the use of String PromptTemplates as this gives us a finer control over the template string structure, unlike the ChatPromptTemplate.

Below is an example of how to use a prompt template. We can import this class from the langchain-core package.

The langchain-core package contains base abstractions of different components and ways to compose them together.

from langchain_core.prompts import PromptTemplate

String PromptTemplates#

The String PromptTemplates is used to format a string input. By default, the template takes Python f-string format. There are currently 2 choices of template_format available: f-string and jinja2. Later we will see the use of jinja2 format. In the example below, we will use the f-string format.

prompt_template = PromptTemplate.from_template(
    "{planet_name} in the solar system is the "
)

prompt_template.format(planet_name="Mars")

'Mars in the solar system is the '

Let’s instantiate our OLMo model like in the previous section of the tutorial with llama-cpp-python.

from llama_cpp import Llama
from ssec_tutorials import download_olmo_model
from ssec_tutorials.scipy_conf import parse_text_generation_response

OLMO_MODEL = (
    download_olmo_model()
)  # It won't actually download again if it's already there
olmo = Llama(model_path=str(OLMO_MODEL), verbose=False)

Model already exists at /Users/lsetiawan/.cache/ssec_tutorials/OLMo-7B-Instruct-Q4_K_M.gguf

Now that we have our model ready to go, let’s try different prompt templating starting from the previous prompt template with an input of planet_name.

model_response = olmo(
    prompt=prompt_template.format(planet_name="Mars"),
    temperature=0.2,
    max_tokens=8,
    echo=True,
)  # Generate a completion, can also call olmo.create_completion

print(parse_text_generation_response(model_response))

Mars in the solar system is the 
4th largest planet from the sun

# Another example
prompt_template = PromptTemplate.from_template(
    "{entity_1} of the planet {entity_2} is "
)
prompt_template.format(entity_1="Size", entity_2="Earth")

'Size of the planet Earth is '

model_response = olmo(
    prompt=prompt_template.format(entity_1="Size", entity_2="Earth"),
    temperature=0.2,
    echo=True,
)

print(parse_text_generation_response(model_response))

Size of the planet Earth is 
5,147 kilometers or 3,158 miles. The diameter of the planet

Your turn 😎#

Create a StringPromptTemplate that outputs some text generation prompt, for example, “Sun is part of galaxy …”.

Feel free to experiment with the built in Python f-string for the prompt input argument to the model.

# Write your prompt_template and model_response code here

LLM Interface#

LangChain have implemented a Runnable protocol that allows us to create custom “chains”. This protocol has a standard interface for defining and invoking various LLMs, PromptTemplates, and other components, enabling reusability. For more details, go to LangChain’s Runnable documentation.

Note

In this tutorial, you will see the use of .invoke method on various LangChain’s object. This is essentially using that standard interface for the Runnable protocol.

Loading the model via LangChain’s LlamaCpp abstraction enables us to use the chaining feature. This class is part of the langchain-community package, which contains third party integrations that are maintained by the LangChain community.

from langchain_community.llms import LlamaCpp

olmo = LlamaCpp(
    model_path=str(OLMO_MODEL),
    temperature=0.8,
    verbose=False,
)

As you can see below, we now have a LlamaCpp Langchain object rather than the Llama llama-cpp-python object from previous sections.

type(olmo)

langchain_community.llms.llamacpp.LlamaCpp

We learned above about the Runnable protocol. Let’s see how we can invoke the model using the standard interface compared to how we originally invoked the model with llama-cpp-python.

answer = olmo.invoke("What is the meaning of life?")

print(answer)

In one of his famous essays, Samuel Johnson replied to this question with a question: “What is the purpose of life?” He answered his own question by saying that the purpose of life was a philosophical problem, which could not be answered by an essay or a book.
Johnson’s response reflects a common attitude towards the meaning of life. Many people assume that the meaning of life is an intellectual or philosophical question that can only be answered through the study of philosophy, religion, or science. However, this view overlooks the fact that the meaning of life is not just a theoretical concept but also a practical one.
The meaning of life has real-world implications for our lives and for society as a whole. It affects how we live, what we do, and why we do it. For example, if the meaning of life is simply to be happy or to achieve success, then many people will focus on those goals without considering the broader implications of their actions. This can lead to a shallow and meaningless existence.
On the other hand, if the meaning of life is something more than just personal satisfaction, then it becomes important to consider how our actions contribute to the greater good of society. This may involve thinking about issues such as social justice, environmental responsibility,

If you’d like to access the base object Llama object from the llama-cpp-python package, you can access it via the .client attribute of the LlamaCpp object.

type(olmo.client)

llama_cpp.llama.Llama

With access to the underlying Llama object, you can directly retrieve any metadata information. In this example, we are retrieving OLMo’s tokenizer chat template we saw in the previous notebook to setup a String PromptTemplate.

The built in model’s chat template is using jinja2 templating syntax, which is a popular templating engine for Python.

prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"], template_format="jinja2"
)

The PromptTemplate object has 2 main attributes that are very useful to explore the built-in prompt template of the model:

input_variables: This is a list of all the input variables that the prompt template expects.
template: This is the actual template string that the model uses.

prompt_template.input_variables

['add_generation_prompt', 'eos_token', 'messages']

For this particular template, we can see that it expects add_generation_prompt, eos_token and messages. But what are the variable types for these inputs? What do they mean?

We can answer the questions above by looking at the template string itself. The template string is using the jinja2 templating engine syntax, so it may look confusing at first, but at the end of the day it’s essentially just some python code in a template string.

print(prompt_template.template)

{{ eos_token }}{% for message in messages %}
{% if message['role'] == 'user' %}
{{ '<|user|>
' + message['content'] }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>
'  + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %}

As we can see above, the template reads as follows:

eos_token is a string that is added at the top of the resulting string after prompt is formatted. You can also see that eos_token is used to append content string values from an assistant role. You can find this value by going to the Model’s tokenizer_config.json file and looking for the eos_token key. Unfornately, this is currently the only way to get this information, you can go to ggerganov/llama.cpp#5040 for more details. In our case, the eos_token is <|endoftext|>.
messages is a list of dictionary that is iterated over. As you can see that this dictionary should contain a role and content key.
add_generation_prompt is a boolean that is used to determine whether to add a generation prompt or not. In this case, when it’s the last message and add_generation_prompt is True, it will add <|assistant|> string to the end of the prompt.

Now that we know what the template expects we can create the final prompt string by passing in the expected input variables, this time, instead of using the .format method, let’s see what happens if we use the .invoke method on the PromptTemplate object.

prompt_template.invoke(
    messages=[
        {
            "role": "user",
            "content": "You are a helpful assistant. Tell me a joke about cats",
        }
    ],
    add_generation_prompt=True,
    eos_token="<|endoftext|>",
)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 prompt_template.invoke(
      2     messages=[
      3         {
      4             "role": "user",
      5             "content": "You are a helpful assistant. Tell me a joke about cats",
      6         }
      7     ],
      8     add_generation_prompt=True,
      9     eos_token="<|endoftext|>",
     10 )

TypeError: BasePromptTemplate.invoke() got an unexpected keyword argument 'messages'

As you can see, this results to an error if we pass in the input variables directly. This is because the .invoke method expects an input argument called input that is a dictionary of the input variables, which will be passed into the runnable. Also, there’s a config input argument that is a RunnableConfig object, however, this is optional and can be omitted, and you will see that we will use this later when invoking the model.

?prompt_template.invoke

Signature:
prompt_template.invoke(
    input: 'Dict',
    config: 'Optional[RunnableConfig]' = None,
) -> 'PromptValue'
Docstring:
Transform a single input into an output. Override to implement.

Args:
    input: The input to the runnable.
    config: A config to use when invoking the runnable.
       The config supports standard keys like 'tags', 'metadata' for tracing
       purposes, 'max_concurrency' for controlling how much work to do
       in parallel, and other keys. Please refer to the RunnableConfig
       for more details.

Returns:
    The output of the runnable.
File:      ~/mambaforge/envs/ssec-scipy2024/lib/python3.11/site-packages/langchain_core/prompts/base.py
Type:      method

Let’s try again, this time with the correct input type.

prompt_value = prompt_template.invoke(
    input=dict(
        messages=[
            {
                "role": "user",
                "content": "You are a helpful assistant. Tell me a joke about cats",
            }
        ],
        add_generation_prompt=True,
        eos_token="<|endoftext|>",
    )
)

You can see below that we get StringPromptValue object this time as the output rather than pure string. But we can still get the string value by calling the .to_string method on the StringPromptValue object.

prompt_value.to_string()

'<|endoftext|>\n\n<|user|>\nYou are a helpful assistant. Tell me a joke about cats\n\n\n<|assistant|>\n\n'

The output string above contains the necessary signifier tokens for the OLMo Model to understand what the user input is and where the model should put generated responses. This whole string output will then become the full prompt for the model.

Note

For the rest of the tutorial, we won’t be using .invoke method on the PromptTemplate object, but rather we will use the .format method to get the final prompt string. This is more straightforward and easier to understand. The walkthrough above is just to show you how to use the .invoke method.

Chain in LangChain#

Chaining allows us to combine multiple components, as described above, in series or parallel to develop a multi-step LLM pipeline. As shown in the image below, any number of components can be linked together to form a chain.

LancChain Chain

Image Source: www.analyticsvidhya.com

Internally, the chain works like below:

STEP 1: Dictionary is processed as an input to the prompt template.
STEP 2: Prompt Template reads the variables to form the prompt text as output - “What are stars and moon?”
STEP 3: The prompt is given as input to the LLM model.
STEP 4: LLM Model produces output.
STEP 5: The output goes through StrOutputParser that parses it into string and gives the result.

We can use the pipe operator (“|”), which is part of the LCEL(Lang Chain Expression Language). The pipe operator sequentially arranges each component, similar to the above image.

llm_chain = prompt_template | olmo

When we check the type of the resulting chain, it’s just a RunnableSequence! So, essentially, it’s a series of runnables that are executed in sequence.

type(llm_chain)

langchain_core.runnables.base.RunnableSequence

llm_chain

PromptTemplate(input_variables=['add_generation_prompt', 'eos_token', 'messages'], template="{{ eos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}", template_format='jinja2')
| LlamaCpp(verbose=False, client=<llama_cpp.llama.Llama object at 0x125cbb6d0>, model_path='/Users/lsetiawan/.cache/ssec_tutorials/OLMo-7B-Instruct-Q4_K_M.gguf')

Like other Runnable type, it has an invoke method that expects the same input and config arguments as we’ve seen before with the LLM and PromptTemplate objects.

?llm_chain.invoke

Signature:
llm_chain.invoke(
    input: 'Input',
    config: 'Optional[RunnableConfig]' = None,
    **kwargs: 'Any',
) -> 'Output'
Docstring:
Transform a single input into an output. Override to implement.

Args:
    input: The input to the runnable.
    config: A config to use when invoking the runnable.
       The config supports standard keys like 'tags', 'metadata' for tracing
       purposes, 'max_concurrency' for controlling how much work to do
       in parallel, and other keys. Please refer to the RunnableConfig
       for more details.

Returns:
    The output of the runnable.
File:      ~/mambaforge/envs/ssec-scipy2024/lib/python3.11/site-packages/langchain_core/runnables/base.py
Type:      method

Just like the example above, we’ll need to pass in the input variables as a dictionary.

# Construct the prompt as expected by OLMo
llm_chain.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "You are a helpful assistant. Tell me a joke about cats",
            }
        ],
        "add_generation_prompt": True,
        "eos_token": "<|endoftext|>",
    }
)

" Why don't cats play poker in the jungle? There are too many predators!\n\n — Jim Benton ☕️🐱🌊 (May The Beans Be With You)"

Instead of having to invoke llm_chain repeatedly with add_generation_prompt and eos_token, we can update our prompt_template.

# Create a prompt template using OLMo's tokenizer chat template we saw in module 1, but this time use partial variables.
prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"],
    template_format="jinja2",
    partial_variables={"add_generation_prompt": True, "eos_token": "<|endoftext|>"},
)

llm_chain = prompt_template | olmo

Let’s stream the output instead of waiting for OLMo to generate and display the text. We can use Callbacks to subscribe to various events in your LLM application pipeline. Check this out for a list of events.

Below, we will use the StreamingStdOutCallbackHander to stream the output to the console. To do this, we can pass in a dictionary to the config argument of the invoke method, with a callbacks key that contains a list of callback handlers, to see all the options, checkout the RunnableConfig documentation.

from langchain_core.callbacks import StreamingStdOutCallbackHandler

llm_chain.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "You are a helpful assistant. Tell me a joke about cats",
            }
        ]
    },
    config={"callbacks": [StreamingStdOutCallbackHandler()]},
)

 Sure, here's a cute cat joke for you:

Why don't cats like Wi-Fi? Because they prefer the old-school method of tracking down live cables to play with! 😊🐱💬 Remember, sharing is caring, but not when it comes to your wireless connection. 📶🌍❓

" Sure, here's a cute cat joke for you:\n\n\nWhy don't cats like Wi-Fi? Because they prefer the old-school method of tracking down live cables to play with! 😊🐱💬 Remember, sharing is caring, but not when it comes to your wireless connection. 📶🌍❓"

We will cover more LangChain concepts in upcoming notebooks.

Your turn 😎#

Try different messages value(s) and see how the output changes. But remember to follow the template structure. The dictionary keys must contain role and content and the allowed role values are only user and assistant.

# Write your llm_chain.invoke code here, feel free to also, create your own template and try partial_variables

LangChain: The LLM Application Framework

Contents

LangChain: The LLM Application Framework#

Prompt templates#

String PromptTemplates#

Your turn 😎#

LLM Interface#

Chain in LangChain#

Your turn 😎#