Image: Possessed Photography via Unsplash
Retrieval-Augmented Generation (RAG) is widely used in natural language processing, particularly for integrating external knowledge bases. However, it often struggles with poorly structured queries—leading to inaccuracies or missed information.
Agentic RAG addresses these issues by refining queries and improving the accuracy of responses. This guide will walk you through how Agentic RAG can elevate your AI systems—complete with a hands-on coding example.
RAG, or Retrieval-Augmented Generation, is a hybrid approach that combines traditional information retrieval methods with generative language models. Instead of relying solely on a model's internal knowledge, RAG retrieves relevant documents or data from an external knowledge base, using them to generate more accurate and contextually relevant responses.
In a standard RAG setup:
This method is useful in scenarios where the language model's internal knowledge might be outdated or insufficient. However, the effectiveness of RAG depends heavily on the quality of the retrieval process. If the search fails to retrieve the right information, the final output may be inaccurate or irrelevant.
In a typical RAG setup, a user’s query is processed through a semantic search to retrieve relevant data from a knowledge base. If the query isn't well-structured, the system might return irrelevant results or fail to find important information.
Agentic RAG offers a more robust solution by actively refining and re-evaluating queries to ensure higher accuracy and relevance. Here's how it works:
Agentic RAG uses AI agents that:
This approach reduces errors and ensures more reliable results.
Now that you know the theory around Agentic RAG, let’s get practical and build an example. Follow my steps below to learn how to set it up.
Before diving into the code, make sure your virtual environment is set up. If not, follow this setup guide.
We'll use HuggingFace's transformers library, though you can explore other alternatives as needed.
First, install the necessary libraries:
pip install langchain langchain-openai langchain-community langchain-chroma langchain-huggingface huggingface-hub python-dotenv sentence-transformers "transformers[agents]"
Now you should be able to see these packages inside your venv folder.
Create a main.py file and add the following imports:
1import os
2import datasets
3from dotenv import load_dotenv
4from transformers import AutoTokenizer
5from langchain.docstore.document import Document
6from langchain.text_splitter import RecursiveCharacterTextSplitter
7from langchain_community.vectorstores import Chroma
8from langchain_huggingface import HuggingFaceEmbeddings
9from tqdm import tqdm
10from transformers.agents import ReactJsonAgent
11from langchain_openai import ChatOpenAI
12import logging
13from RetrieverTool import RetrieverTool
14from OpenAIEngine import OpenAIEngine
15
16if __name__ == "__main__": #This is called when the file is called directly
17 main()
To start, load your dataset. As an example, I’ll use a mental health dataset from Huggingface. You can find it here.
# Load the knowledge base
knowledge_base = datasets.load_dataset("TVRRaviteja/Mental-Health-Data", split="train")
# Convert dataset to Document objects
source_docs = [
Document(page_content=doc["text"]) # Convert dataset into array of Langchain Document
for doc in knowledge_base
Next, set up a tokenizer and define a text splitter:
# Initialize the text splitter
tokenizer = AutoTokenizer.from_pretrained("thenlper/gte-small")
text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
tokenizer,
chunk_size=200,
chunk_overlap=20,
add_start_index=True,
strip_whitespace=True,
separators=["\n\n", "\n", ".", " ", ""],
)
Before generating embeddings, you need to set up a tokenizer and text splitter to process the dataset effectively.
1. Load the Pre-trained Tokenizer:
We use a pre-trained tokenizer from the Hugging Face model hub. The thenlper/gte-small model is used to convert text into tokens that the model can understand.
2. Set Up the Recursive Character Text Splitter:
The text splitter divides large texts into smaller chunks for processing by the model. Here’s how the arguments customize the splitting behavior:some text
Now that you've prepared the tokenizer and text splitter, it's time to generate embeddings and store them in a vector database (such as Chroma). Here's how to proceed:
Initialize the Embedding Model
You can use HuggingFace’s embedding model to create vector embeddings for your documents.
# Initialize the embedding model
embedding_model = HuggingFaceEmbeddings(model_name="thenlper/gte-small")
# Create the vector database
vectordb = Chroma.from_documents(
documents=docs_processed,
embedding=embedding_model,
persist_directory="chroma"
)
Store the generated embeddings in a vector database, such as Chroma.
Set Up the Retriever Tool:
Create a generic retriever tool for all vector stores in Langchain. This tool will handle document retrieval based on semantic similarity.
1from transformers.agents import Tool
2class RetrieverTool(Tool):
3 name = "retriever"
4 description = "Using semantic similarity, retrieves some documents from the knowledge base that have the closest embeddings to the input query."
5 inputs = {
6 "query": {
7 "type": "text",
8 "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
9 }
10 }
11 output_type = "text"
12 def __init__(self, vectordb, **kwargs):
13 super().__init__(**kwargs)
14 self.vectordb = vectordb
15 def forward(self, query: str) -> str:
16 assert isinstance(query, str), "Your search query must be a string"
17
18 docs = self.vectordb.similarity_search(
19 query,
20 k=7,
21 )
22 return "\nRetrieved documents:\n" + "".join(
23 [f"===== Document {str(i)} =====\n" + doc.page_content for i, doc in enumerate(docs)]
24 )
Attributes:
. inputs: Specifies the expected input for the tool, which is a query of type text. The description suggests the query should be in an affirmative form rather than a question.
. output_type = "text": Specifies that the output of the tool will be text.
2. __init__ Method:
The constructor initializes the RetrieverTool with a vectordb (a vector database used for similarity search) and any additional keyword arguments (kwargs).
3. forward Method:
Set up an engine that uses OpenAI for retrieval and LLM operations. This engine will handle the interaction with the OpenAI API to generate responses.
1import os
2from openai import OpenAI
3from dotenv import load_dotenv
4from transformers.agents.llm_engine import MessageRole, get_clean_message_list
5
6load_dotenv()
7
8openai_role_conversions = {
9 MessageRole.TOOL_RESPONSE: MessageRole.USER,
10}
11
12class OpenAIEngine:
13 def __init__(self, model_name="gpt-4-turbo"):
14 self.model_name = model_name
15 self.client = OpenAI(
16 api_key=os.getenv("OPENAI_API_KEY"),
17 )
18
19 def __call__(self, messages, stop_sequences=[]):
20 messages = get_clean_message_list(messages, role_conversions=openai_role_conversions)
21
22 response = self.client.chat.completions.create(
23 model=self.model_name,
24 messages=messages,
25 stop=stop_sequences,
26 temperature=0.5,
27 )
28 return response.choices[0].message.content
Combine the retriever tool and OpenAI engine into an agent using ReactJsonAgent. This agent will process queries, retrieve relevant information, and generate responses.
retriever_tool = RetrieverTool(vectordb)
llm_engine = OpenAIEngine()
# Create the agent
agent = ReactJsonAgent(tools=[retriever_tool], llm_engine=llm_engine, max_iterations=3, verbose=2)
def run_agentic_rag(question: str) -> str:
enhanced_question = f"""Using the information contained in your knowledge base, which you can access with the 'retriever' tool,
give a comprehensive answer to the question below.
Respond only to the question asked, response should be concise and relevant to the question.
If you cannot find information, do not give up and try calling your retriever again with different arguments!
Make sure to have covered the question completely by calling the retriever tool several times with semantically different queries.
Your queries should not be questions but affirmative form sentences: e.g. rather than "How to check personality scores of someone who is open and agreeable?", query should be "find me personality scores of someone who is open and agreeable".
Question:
{question}"""
return agent.run(enhanced_question)
Implement a standard RAG method to compare with the Agentic RAG approach. This will help you understand the improvements and benefits of Agentic RAG.
def run_standard_rag(question: str) -> str:
prompt = f"""Given the question and supporting documents below, give a comprehensive answer to the question.
Respond only to the question asked, response should be concise and relevant to the question.
Provide the number of the source document when relevant.
Question:
{question}
"""
messages = [{"role": "user", "content": prompt}]
reader_llm = ChatOpenAI(model="gpt-4-turbo", api_key=os.getenv("OPENAI_API_KEY"))
ai_msg = reader_llm.invoke(messages)
return ai_msg.content
Now, run these functions by making changes in the main function that we wrote in the first step.
def main():
init()
question = """
How can i check my score? If I am procarstinating and but at the same time I have imposter syndrome.
"""
print(f"Question: {question}")
agentic_answer = run_agentic_rag(question)
print("Agentic RAG Answer:")
print(f"Answer: {agentic_answer}")
standard_answer = run_standard_rag(question)
print("\nStandard RAG Answer:")
print(f"Answer: {standard_answer}")
Open your terminal and run the script using the following command:
python main.py
After running the script, compare the outputs from Agentic RAG and standard RAG. Analyze how Agentic RAG refines the query and delivers more accurate and relevant results.
When implementing RAG, especially in enterprise or sensitive environments, securing the data becomes paramount. Here's how to ensure that your RAG implementation is both effective and secure:
Agentic RAG tackles the limitations of traditional RAG by refining and improving the accuracy of query responses. By following this guide, you can implement Agentic RAG in your AI systems, ensuring more precise and reliable outputs.
We would love to talk to you about implementing an Agentic RAG infrastructure—while ensuring security and compliance are covered from the get-go.
Please reach out to schedule a time with us, or connect with me on LinkedIn and explore the full code on GitHub.