#Generative AI

Agentic RAG with LangGraph: Build Adaptive RAG for Production


By Anuj Yadav

November 10, 2025

Agentic-RAG-with-LangGraph-Adaptive-Retrieval-for-Production-AI

The next generation of Retrieval-Augmented Generation (RAG) is here  one that thinks, adapts, and self-corrects. Agentic RAG with LangGraph moves beyond static pipelines to build adaptive, intelligent retrieval systems that dynamically plan, evaluate, and refine their own workflows. For enterprise AI engineers, this shift means higher reliability, reduced latency, and smarter cost utilization.

If you’re new to LangGraph, it’s an open-source framework from the LangChain Docs team designed for modeling stateful, cyclical agent workflows  a must-have foundation for production-grade adaptive RAG systems.

Why Naive RAG Fails in Production

Traditional RAG systems follow a rigid linear pipeline — Query → Retrieve → Generate. While simple to implement, this design creates scalability and quality problems.

The linear trap: Every query is treated the same, triggering expensive vector searches even for trivial questions like What is Python? This wastes computation, slows down response times, and increases cost.

The static trap: When data is missing or outdated, the system can’t adapt. It depends solely on the pre-indexed vector store, producing inconsistent or stale answers.

Enterprise teams deploying RAG at scale found that linear systems consumed 10–20× more GPU/CPU resources than necessary while still under-delivering on accuracy.

Introducing Agentic Adaptive RAG

Agentic Adaptive RAG solves these inefficiencies by making the LLM an active planner rather than a passive generator.
It introduces two core principles:

  • Agentic Control – The model decides what to retrieve, when to re-query, and how to verify accuracy. It can dynamically skip retrieval, re-run searches, or request web-based updates.
  • Adaptive Routing – A lightweight LLM “router” analyzes each query’s intent and complexity, then picks the optimal path:
  • Direct Answer for general knowledge
  • Local RAG for internal documents
  • Web Search for external or recent topics

This selective routing and validation reduce hallucinations by up to 78% compared to static RAG baselines  a significant leap in reliability and cost efficiency.
For developers, Agentic RAG with LangGraph provides the framework and flexibility to deploy this intelligence in production  blending reasoning, retrieval, and self-correction seamlessly.

Read more: GPU Optimization for Faster Model Training and Inference

LangGraph: Building the Agentic Backbone

LangGraph enables developers to model complex, non-linear, cyclical agent workflows using graphs. Each node performs a specific function (retrieval, grading, rewriting, or generation), and edges define transitions based on model-driven decisions  the foundation of Agentic RAG with LangGraph.

1. Modeling the State

The state is the shared memory that holds everything an agent needs  user messages, retrieved documents, routing decisions, and grading results.


# Install dependencies:
# pip install langgraph langchain-core langchain-groq dotenv pydantic

from typing import TypedDict, List, Literal
from langchain_core.messages import BaseMessage
from langgraph.graph import StateGraph

# Define the state schema using TypedDict for clear structure
class AdaptiveRAGState(TypedDict):
    """
    Represents the state of our Adaptive RAG Agent's execution.
    The 'messages' key is the standard channel for conversation history.
    """
    messages: List
    documents: List[str] # Retrieved context documents (from vector store or web)
    query_transform: str # Rewritten or refined query if needed
    routing_decision: Literal['local_rag', 'web_search', 'direct_answer'] # Output from the initial router
    relevance_score: Literal['yes', 'no'] # Output from the document grader
    error_flag: bool # Flag to signal a critical node error for recovery/logging

This structure ensures that all nodes access consistent context. It’s also crucial for maintaining conversation continuity and debugging in production.

2. Intelligent Routing and Corrective Loops

The workflow starts with a Router LLM, which classifies the query and routes it through one of three paths — direct answer, local retrieval, or web search.
When retrieval is triggered, the system uses a Corrective RAG (CRAG) loop to self-verify results.

Phase

Function

Description

A

Router

Classifies query intent

B

Retriever

Fetches documents from vector store

C

Grader

Evaluates context relevance

D

Rewriter

Refines query if irrelevant

E

Generator

Produces final answer

If the Grader marks retrieved documents as “irrelevant,” the Rewriter node reformulates the question, and the process repeats until relevance is confirmed or a retry limit is reached.

This cyclical flow is what distinguishes Agentic RAG with LangGraph  every iteration learns from the last, improving retrieval and generation quality over time.

Example: Document Grader Node

The Document Grader ensures that retrieved context aligns with the user’s query before generation.
Using Pydantic schemas and structured outputs, the LLM is constrained to return consistent, machine-readable decisions.


from pydantic import BaseModel, Field
 
from typing import Literal
from langgraph.graph import MessagesState 
import os
 
# 1. Define Structured Output Schema for Grading
class GradeDocuments(BaseModel):
    """Binary score for relevance check."""
    binary_score: str = Field(
        description="Relevance score: 'yes' if relevant, or 'no' if not relevant."
    )
# Initialize LLM for grading (Fast, structured-output capable model)
from langchain_groq import ChatGroq
from dotenv import load_dotenv
 
load_dotenv()
 
# LLM for grading documents (structured output for binary scoring)
grader_model = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.3, api_key=os.getenv("GROQ_API_KEY")).with_structured_output(GradeDocuments)
 
# LLM for generating answers (higher temperature for more natural responses)
answer_model = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.7)
 
GRADE_PROMPT_TEMPLATE = (
    "You are a document relevance grader. Your task is to determine "
    "if the combined retrieved context is useful for answering the user's question. "
    "Do not answer the question itself. Just grade the context.\n\n"
    "Question: {question}\n"
    "Retrieved Context: {context}"
)
 
def grade_documents(state: AdaptiveRAGState) -> dict:
    """
    Determines whether the retrieved documents are relevant to the question.
    Returns a dictionary with the next node name.
    """
    print("--- Grading Retrieved Documents ---")
    # Extract the query and context from the state
    question = state["messages"][-1]["content"]
    context = "\n---\n".join(state.get("documents",))
    prompt = GRADE_PROMPT_TEMPLATE.format(question=question, context=context)
    try:
        # Structured invocation of the grader LLM
        response = grader_model.invoke([{"role": "user", "content": prompt}])
        score = response.binary_score.lower()
        # Update relevance score in state
        state["relevance_score"] = score
 
        if score == "yes":
            print("--- DECISION: Documents are relevant. Generating answer. ---")
            return {"next": "generate_answer"}
        else:
            print("--- DECISION: Documents are irrelevant. Triggering query rewrite. ---")
            # Production implementation would check retry count here before defaulting to web_search
            return {"next": "rewrite_question"}
 
    except Exception as e:
        # Crucial Production Hardening: Handle LLM/API errors gracefully
        print(f"--- GRADING ERROR: {e}. Defaulting to Web Search fallback. ---")
        state["error_flag"] = True # Set error flag for monitoring
        return {"next": "web_search"}
 
 
# Answer generation prompt template
ANSWER_PROMPT_TEMPLATE = (
    "You are a helpful assistant. Answer the user's question based on the provided context. "
    "If the context contains relevant information, use it to provide a comprehensive answer. "
    "If the context is not sufficient or is just a placeholder, provide the best answer you can based on your knowledge. "
    "Be clear, concise, and accurate. Do not just repeat the question.\n\n"
    "Question: {question}\n\n"
    "Context:\n{context}\n\n"
    "Answer:"
)
 
def generate_answer(state: AdaptiveRAGState):
    """
    Generates an answer based on the retrieved documents and user query using an LLM.
    Updates the state with the generated answer.
    """
    print("--- Generating Answer ---")
 
    # Extract the query and context from the state
    user_query = state["messages"][-1]["content"]
    context = "\n---\n".join(state.get("documents", []))
    # Create the prompt for answer generation
    prompt = ANSWER_PROMPT_TEMPLATE.format(question=user_query, context=context)
    try:
        # Generate answer using LLM
        response = answer_model.invoke([{"role": "user", "content": prompt}])
        # Extract the answer content from the response
        if hasattr(response, 'content'):
            answer = response.content
        elif isinstance(response, str):
            answer = response
        else:
            # Fallback if response structure is different
            answer = str(response)
        # Append the generated answer to messages
        state["messages"].append({"content": answer})
        return state
    except Exception as e:
        # Error handling: fallback to a basic answer
        print(f"--- ANSWER GENERATION ERROR: {e}. Using fallback answer. ---")
        state["error_flag"] = True
        fallback_answer = f"I apologize, but I encountered an error while generating the answer. Based on the available context: {context}"
        state["messages"].append({"content": fallback_answer})
        return state
 
 
# Import DuckDuckGo search
try:
    from duckduckgo_search import DDGS
except ImportError:
    print("Warning: duckduckgo-search not installed. Install it with: pip install duckduckgo-search")
    DDGS = None
 
def web_search_tool(state: AdaptiveRAGState):
    """
    Performs a web search using DuckDuckGo based on the user's query.
    Updates the state with the search results.
    """
    print("--- Performing Web Search with DuckDuckGo ---")
 
    user_query = state["messages"][-1]["content"]
    try:
        if DDGS is None:
            raise ImportError("duckduckgo-search library is not installed")
        # Perform DuckDuckGo search
        with DDGS() as ddgs:
            # Search for results (max_results limits the number of results)
            results = list(ddgs.text(
                keywords=user_query,
                max_results=5  # Get top 5 results
            ))
        if not results:
            print("--- No search results found ---")
            state["documents"] = [
                f"No search results found for: {user_query}. "
                "Please try rephrasing your query."
            ]
            return state
        # Format search results into document strings
        # Each result contains: title, body (snippet), href (URL)
        formatted_results = []
        for i, result in enumerate(results, 1):
            title = result.get('title', 'No title')
            body = result.get('body', 'No description')
            href = result.get('href', 'No URL')
            formatted_result = (
                f"Result {i}:\n"
                f"Title: {title}\n"
                f"Description: {body}\n"
                f"Source: {href}\n"
            )
            formatted_results.append(formatted_result)
        # Store formatted results in state
        state["documents"] = formatted_results
        print(f"--- Found {len(results)} search results ---")
    except ImportError as e:
        print(f"--- ERROR: {e} ---")
        print("Install duckduckgo-search with: pip install duckduckgo-search")
        state["error_flag"] = True
        state["documents"] = [
            f"Web search unavailable. Error: {str(e)}. "
            "Please install duckduckgo-search library."
        ]
    except Exception as e:
        print(f"--- Web Search Error: {e} ---")
        state["error_flag"] = True
        state["documents"] = [
            f"Error performing web search: {str(e)}. "
            "Please try again or rephrase your query."
        ]
    return state
 
def rewrite_question(state: AdaptiveRAGState):
    """
    Rewrites the user's query to improve retrieval results.
    Updates the state with the rewritten query.
    """
    print("--- Rewriting Question ---")
 
    # Example rewrite logic
    original_query = state["messages"][-1]["content"]
    rewritten_query = f"Refined: {original_query}"
    state["query_transform"] = rewritten_query
    return state
 
def retrieve_documents(state: AdaptiveRAGState):
    """
    Retrieves documents based on the user's query.
    Updates the state with the retrieved documents.
    """
    print("--- Retrieving Documents ---")
 
    # Example retrieval logic
    user_query = state["messages"][-1]["content"]
    state["documents"] = [f"Document related to '{user_query}'"]
    return state
 
def route_query(state: AdaptiveRAGState) -> dict:
    """
    Determines the routing decision based on the user's query.
    Returns a dictionary with the next node name.
    """
    print("--- Routing Query ---")
 
    # Example logic for routing decision
    user_query = state["messages"][-1]["content"].lower()
 
    if "search" in user_query:
        state["routing_decision"] = "web_search"
        return {"next": "web_search"}
    elif "answer" in user_query:
        state["routing_decision"] = "direct_answer"
        return {"next": "direct_answer"}
    else:
        state["routing_decision"] = "local_rag"
        return {"next": "local_rag"}

This approach improves resilience  if an LLM call fails, the system routes safely to an alternate node instead of crashing.

3. Assembling the LangGraph Workflow

LangGraph connects these nodes using conditional edges, creating an adaptive graph that mirrors intelligent reasoning and agentic control.


from langgraph.graph import StateGraph, END, START
from langgraph.checkpoint.memory import InMemorySaver
from langchain_core.messages import HumanMessage

# 1. Initialize the workflow with the defined state
workflow = StateGraph(AdaptiveRAGState)

# 2. Define Nodes 
workflow.add_node("route_query", route_query)
workflow.add_node("retrieve_documents", retrieve_documents) 
workflow.add_node("grade_documents_node", grade_documents) 
workflow.add_node("rewrite_question", rewrite_question)
workflow.add_node("web_search_tool", web_search_tool)
workflow.add_node("generate_answer", generate_answer)

# 3. Define Entry Point (where the user query enters the system)
workflow.set_entry_point("route_query")

# 4. Define Conditional Edges (The Agent's Decision Logic)

# --- Router Decisions (Entry Point) ---
workflow.add_conditional_edges(
    "route_query",
    lambda state: state["next"],  # Use the 'next' key from the returned dict
    {
        "local_rag": "retrieve_documents",
        "web_search": "web_search_tool",
        "direct_answer": "generate_answer"
    }
)

# --- CRAG Loop Decisions (Self-Correction) ---
workflow.add_conditional_edges(
    "grade_documents_node",
    lambda state: state["next"],  # Use the 'next' key from the returned dict
    {
        "generate_answer": "generate_answer",
        "rewrite_question": "rewrite_question",
        "web_search": "web_search_tool",
    }
)

# 5. Define Normal Edges (Fixed Transitions)
workflow.add_edge("retrieve_documents", "grade_documents_node")
workflow.add_edge("rewrite_question", "retrieve_documents")
workflow.add_edge("web_search_tool", "generate_answer")

# 6. Define End Points
workflow.add_edge("generate_answer", END)

# Compile the graph with state persistence
app = workflow.compile(checkpointer=InMemorySaver()) 

# Example invocation - Testing Local RAG path (retrieve -> grade -> generate)
# This query will route to "local_rag" since it doesn't contain "search" or "answer" keywords
final_state = app.invoke( 
    {
        "messages": [{"content": "What is the capital of France?"}],
        "documents": [],  # Will be populated by retrieve_documents
        "query_transform": "",
        "routing_decision": "",  # Will be set by route_query
        "relevance_score": "",  # Will be set by grade_documents
        "error_flag": False
    },
    config={"configurable": {"thread_id": "test_session_1"}}
)

# Display the answer
print("=" * 50)
print("ANSWER:")
print(final_state["messages"][-1]["content"])

This architecture forms the core of Agentic RAG with LangGraph, enabling autonomous routing, validation, and recovery within a single consistent workflow.

Performance Optimization and Caching

Even adaptive systems benefit from optimization. LangGraph offers built-in caching and parallelization to improve run-time efficiency.

Node-level caching:
Reusing results from identical queries avoids redundant LLM calls. For instance, grading the same document twice can simply fetch the cached result.

Parallel retrieval:
For complex systems, running multiple retrievals (vector store, APIs, external search) simultaneously reduces overall latency.

Evaluation and Feedback Loops:
Integrate continuous monitoring through LangSmith or custom dashboards. Read our full article on Caching and Feedback Loops in RAG to learn how caching improves stability and cost performance in enterprise-level setups.

For production deployment, partnering with experts in Generative AI Development Services helps implement robust caching policies, query routing, and monitoring strategies that scale efficiently.

Get Free Project Consultations

Quick Comparison: Linear vs. Agentic Adaptive RAG

Feature

Naive RAG

Agentic RAG with LangGraph

Workflow

Linear

Dynamic, conditional

Retrieval

Always active Context-aware

Logic

Static

Intelligent, agent-driven

Accuracy

Moderate

High (self-correcting)

Cost

High

Optimized through routing

Observability Limited

Full trace with LangSmith

Explore more: Production-Ready RAG Architecture to Reduce AI Hallucinations

Production Hardening Best Practices

  • Persistence: Use durable state checkpointers (SQLite/PostgreSQL) for recovery after failures.
  • Observability: Monitor node latencies, route distributions, and grader accuracy via LangSmith or OpenTelemetry.
  • Error Management: Wrap all LLM calls with try/except and maintain an error_flag in state.
  • Security: Mask API keys, enforce TLS, and validate external web results.

Conclusion

Agentic RAG with LangGraph is redefining how modern enterprises approach retrieval, reasoning, and generation. By combining adaptive routing, structured state management, and corrective learning, it allows AI systems to become truly autonomous and production-ready.

For teams scaling intelligent pipelines or deploying retrieval-augmented AI in real-world environments, collaborating with an experienced AI Development Agency and Company ensures a reliable, cost-efficient, and future-proof implementation.

FAQs

Q1. What is Agentic RAG with LangGraph?
Agentic RAG with LangGraph is an adaptive retrieval system where LLMs plan, route, and self-correct responses using a graph-based workflow.

Q2. How does Agentic RAG improve over traditional RAG?
Agentic RAG dynamically decides when to retrieve, rewrite, or skip steps, reducing latency and hallucinations in production AI systems.

Q3. Why use LangGraph for building Agentic RAG?
LangGraph provides a stateful, node-based framework that enables conditional routing, feedback loops, and persistent memory for complex AI agents.

Q4. Can Agentic RAG with LangGraph be used in enterprise AI apps?
Yes it’s ideal for enterprise AI, offering scalable retrieval, self-correction, and cost-optimized pipelines for production deployments.

free-ai-consultation

Written by Anuj Yadav

Anuj Yadav is a skilled AI Engineer with hands-on experience building end-to-end AI solutions across algorithmic trading, document intelligence, and conversational systems. He specializes in machine learning, deep learning, and generative AI, with strong expertise in Python, FastAPI, LangChain, and agentic AI workflows. Anuj has designed scalable pipelines for data ingestion, model development, retrieval-augmented generation, and workflow automation using Celery and Redis. With deep expertise in LLMs, embeddings, vector search, and intelligent retrieval systems, he focuses on delivering high-performance, production-ready AI platforms that solve complex business challenges with precision and reliability.

Scroll to Top