With the rise of ChatGPT and other large language models (LLMs), the potential for AI to surpass human capabilities has become a topic of both fascination and concern. While LLMs excel at understanding language, following instructions, and reasoning, they often fall short when it comes to performing specific tasks. Simply inputting a prompt into ChatGPT may result in answers that are unrelated or out of context, a phenomenon known as “hallucination.” To obtain relevant information, it is crucial to provide the model with the appropriate context. However, the size of the context window is limited, posing a challenge in capturing all necessary information. Although the context size has increased over time, storing extensive information within a fixed context window remains impractical and expensive. This is where the augmentation of language models comes into play.
Augmenting large language models involves three primary approaches:
- retrieval,
- chains, and
- tools.
These methods aim to enhance the capabilities of LLMs by providing them with additional resources and functionalities.
Retrieval Augmentation:
Retrieval augmentation involves leveraging an external corpus of data for the language model to search through. Traditionally, retrieval algorithms employ queries to rank relevant objects in a collection, which can include images, texts, documents, or other types of data. To enable efficient searching, the documents and their corresponding features are organized within an index. This index maps each feature to the documents containing it, facilitating quick retrieval. Boolean search determines the relevance of documents based on the query, while ranking is typically performed using algorithms like BM25 (Best Match 25).
BM25 (Best Match 25) is a ranking function commonly used in information retrieval to measure the relevance of a document to a given query. It is a probabilistic retrieval model that enhances the vector space model by incorporating document length normalization and term frequency saturation.
In BM25, the indexing process involves tokenizing each document in the collection into terms and calculating term statistics such as document frequency (df) and inverse document frequency (idf). Document frequency represents the number of documents in the collection containing a particular term, while inverse document frequency measures the rarity of the term across the collection.
During the querying phase, the query is tokenized into terms, and term statistics, including query term frequency (qtf) and query term inverse document frequency (qidf), are computed. These statistics capture the occurrence and relevance of terms in the query.
While traditional retrieval methods primarily rely on keyword matching and statistical techniques, modern approaches leverage AI-centric retrieval methods that utilize embeddings. These methods offer improved search capabilities and help retrieve contextually relevant information.
Chains
Chains involve using the output of one language model as the input for another. By cascading multiple models together, the output of each model becomes the input for the subsequent one. This chaining process allows the models to build upon each other’s knowledge and reasoning abilities, potentially leading to more accurate and contextually appropriate responses.
The sequential arrangement of models in a chain creates a pipeline of of interconnected language models, where the output of one model serves as the input for the next. This pipeline allows for a cascading flow of information and reasoning, enabling the models to collectively enhance their understanding and generate more accurate responses. By leveraging a chain of language models, each model can contribute its specialized knowledge and capabilities to the overall task. For example, one model may excel at language comprehension, while another may possess domain-specific knowledge.
As the input passes through the chain, each model can refine and expand upon the information, leading to a more comprehensive and contextually relevant output. The chaining process in language models has the potential to address the limitations of individual models, such as hallucination or generating irrelevant responses. By combining the strengths of multiple models, the pipeline can help mitigate these issues and produce more reliable and accurate results.
Furthermore, the pipeline can be customized and tailored to specific use cases or tasks. Different models can be integrated into the chain based on their strengths and compatibility with the desired objectives. This flexibility allows for the creation of powerful and specialized systems that leverage the collective intelligence of multiple language models.
Langchain
Langchain has emerged as an immensely popular tool for constructing chains of language models, making it one of the fastest-growing open-source projects in this domain. With support for both Python and JavaScript, it provides a versatile platform for building applications and can be seamlessly integrated into production environments. Langchain serves as the fastest way to kickstart development and offers a wide range of pre-built chains tailored for various tasks. Many developers find inspiration from Langchain and end up creating their own customized chaining solutions. One of the key strengths of Lang chain lies in its extensive repository, which houses numerous examples of different chaining patterns. These examples not only facilitate idea generation but also serve as valuable resources for learning and gaining insights into effective chaining techniques. Whether for rapid prototyping or constructing production-grade systems, Lang chain strikes a balance between ease of use and flexibility, empowering developers to effortlessly create their own chaining systems when needed.
The building block of Langchain are chains. Chains can be simple/generic or specialized. One simple chain is a generic chain that contains a single LLM. Generic chain takes a prompt and uses LLM for text generation based on the prompt. Let’s see how to achieve a simple chain using OpenAI’s gpt-3.5 turbo model.
import os
"OPENAI_API_KEY"] = "..."
os.environ[
from langchain.prompts import PromptTemplate
= """
template Who won the oscar for the best actor in a leading role on {year}?
"""
= PromptTemplate(
prompt =["year"],
input_variables=template,
template
)
print(prompt.format(year=2012))
for the best actor in a leading role on 2012? Output: Who won the oscar
PromptTemplate
helps to design prompt for your tasks and you can provide input variables if you want like below:
= """
template Who won the oscar for the best {role} on {year}?
"""
While creating a prompt template for multiple variables, you need to pass all those variables in input_variables
argument
= PromptTemplate(
prompt =["role", "year"],
input_variables=template,
template )
Tools
The another way to give LLMs access to outside world is to let them use tools.
Using Tools in Langchain
Tools are the flexible way to augment language model with external data. There are two ways to build tools into language models. First way is to manually create chains whereas the later one is the use of plugins and letting the model figure it out. Some example tools that can be use includes Arxiv, Bash, Bing Search, Google, etc.
Tools can be used in langchain using following code snippet (in Python):
from langchain.agents import load_tools
= [...]
tool_names = load_tools(tool_names)
tools tools
You can name the tools that you are going to use and load them using load_tools methods
Let’s use Python’s requests module as a tool to extract data from the web
from langchain.agents import load_tools
= ['requests_all']
tool_names = load_tools(tool_names)
requests_tools requests_tools
Output:
[='requests_get', description='A portal to the internet. Use this when you need to get specific content from a website. Input should be a url (i.e. https://www.google.com). The output will be the text response of the GET request.', args_schema=None, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, requests_wrapper=TextRequestsWrapper(headers=None, aiosession=None)),
RequestsGetTool(name='requests_post', description='Use this when you want to POST to a website.\n Input should be a json string with two keys: "url" and "data".\n The value of "url" should be a string, and the value of "data" should be a dictionary of \n key-value pairs you want to POST to the url.\n Be careful to always use double quotes for strings in the json string\n The output will be the text response of the POST request.\n ', args_schema=None, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, requests_wrapper=TextRequestsWrapper(headers=None, aiosession=None)),
RequestsPostTool(name='requests_patch', description='Use this when you want to PATCH to a website.\n Input should be a json string with two keys: "url" and "data".\n The value of "url" should be a string, and the value of "data" should be a dictionary of \n key-value pairs you want to PATCH to the url.\n Be careful to always use double quotes for strings in the json string\n The output will be the text response of the PATCH request.\n ', args_schema=None, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, requests_wrapper=TextRequestsWrapper(headers=None, aiosession=None)),
RequestsPatchTool(name='requests_put', description='Use this when you want to PUT to a website.\n Input should be a json string with two keys: "url" and "data".\n The value of "url" should be a string, and the value of "data" should be a dictionary of \n key-value pairs you want to PUT to the url.\n Be careful to always use double quotes for strings in the json string.\n The output will be the text response of the PUT request.\n ', args_schema=None, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, requests_wrapper=TextRequestsWrapper(headers=None, aiosession=None)),
RequestsPutTool(name='requests_delete', description='A portal to the internet. Use this when you need to make a DELETE request to a URL. Input should be a specific url, and the output will be the text response of the DELETE request.', args_schema=None, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, requests_wrapper=TextRequestsWrapper(headers=None, aiosession=None))
RequestsDeleteTool(name ]
Each tool inside the request_all
tool contains a request wapper. We can directly work with these wrappers as below:
0].requests_wrapper requests_tools[
Output:=None, aiosession=None) TextRequestsWrapper(headers
We can use TextRequestsWrapper
to create a request object and use the object to extract data from the web.
from langchain.utilities import TextRequestsWrapper
= TextRequestsWrapper() requests
"https://reqres.in/api/users?page=2") requests.get(
Output:'{"page":2,"per_page":6,"total":12,"total_pages":2,"data":[{"id":7,"email":"michael.lawson@reqres.in","first_name":"Michael","last_name":"Lawson","avatar":"https://reqres.in/img/faces/7-image.jpg"},{"id":8,"email":"lindsay.ferguson@reqres.in","first_name":"Lindsay","last_name":"Ferguson","avatar":"https://reqres.in/img/faces/8-image.jpg"},{"id":9,"email":"tobias.funke@reqres.in","first_name":"Tobias","last_name":"Funke","avatar":"https://reqres.in/img/faces/9-image.jpg"},{"id":10,"email":"byron.fields@reqres.in","first_name":"Byron","last_name":"Fields","avatar":"https://reqres.in/img/faces/10-image.jpg"},{"id":11,"email":"george.edwards@reqres.in","first_name":"George","last_name":"Edwards","avatar":"https://reqres.in/img/faces/11-image.jpg"},{"id":12,"email":"rachel.howell@reqres.in","first_name":"Rachel","last_name":"Howell","avatar":"https://reqres.in/img/faces/12-image.jpg"}],"support":{"url":"https://reqres.in/#support-heading","text":"To keep ReqRes free, contributions towards server costs are appreciated!"}}'