LlamaIndex
AI/ML Engineering Track | Complexity:
[COMPLEX]| Time: 5-6
Or: When Your AI Needs to Remember What Happened
Section titled “Or: When Your AI Needs to Remember What Happened”Reading Time: 7-8 hours Prerequisites: Module 17
Section titled “Reading Time: 7-8 hours Prerequisites: Module 17”San Francisco. August 7, 2024. 4:23 PM. Marcus, a senior engineer at a fintech startup, watched his monitoring dashboard with growing dread. Their AI loan processor had been running perfectly for three hours—collecting documents, verifying identities, running compliance checks. The customer had uploaded 47 different documents. Then the credit bureau API timed out.
He refreshed the dashboard. Everything was gone. All 47 documents, all verification results, all compliance checks—the entire session had vanished. The customer would have to start over from scratch.
Marcus’s phone buzzed. It was customer support: “The customer is furious. She’s been working on this for four hours. Can we recover her session?”
The answer was no. Their agent framework had no concept of “where it left off.” Every step lived in memory, and when the API call failed, the exception handler crashed the entire workflow.
That night, Marcus discovered LangGraph. By the next morning, he had a prototype that saved state after every single step. When he simulated an API failure at step 23, the workflow simply resumed from step 22. Nothing was lost.
“LangGraph wasn’t just an upgrade—it was the difference between a demo and a product. Our customers stopped losing their work, and we stopped losing customers.” — Marcus Chen, presenting at AI Engineering Summit 2024
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”By the end of this module, you will:
- Understand LangGraph’s architecture and when to use it
- Build stateful workflows with StateGraph
- Implement conditional branching and cycles
- Create multi-agent systems with coordination
- Master checkpointing and state persistence
- Build human-in-the-loop workflows
Why This Module Matters
Section titled “Why This Module Matters”In Module 17, you mastered Chain-of-Thought and ReAct patterns. But what happens when your agent needs to:
- Remember state across many steps - Not just a single reasoning trace
- Branch conditionally - Take different paths based on results
- Loop and retry - Go back and try again if something fails
- Coordinate multiple agents - Have specialized agents collaborate
- Allow human intervention - Pause for approval, then continue
This is where LangGraph shines.
The Limitation of Linear Chains
Section titled “The Limitation of Linear Chains”Think of LangChain’s linear chains like a highway with no exits. Once you start, you can only go forward. If you miss something, you have to drive to the end and start over from the beginning. LangGraph is more like a city street grid—you can loop back, take detours, wait at intersections for a green light (human approval), and recover from wrong turns without restarting your journey.
LangChain chains are powerful but fundamentally linear or tree-like:
Input → Chain1 → Chain2 → Chain3 → Output ↓ ConditionBranch ↙ ↘ Path A Path BBut real-world workflows often need cycles and dynamic routing:
┌─────────────────────────────┐ │ │ ↓ │ [Research] → [Analyze] → [Check] ─┘ ↓ │ [Write] │ (not good enough) ↓ │ [Review] ──────┘ ↓ [Publish]LangGraph lets you build these complex, stateful workflows as graphs.
Did You Know? The Birth of LangGraph
Section titled “Did You Know? The Birth of LangGraph”The Problem That Kept Harrison Chase Up at Night
Section titled “The Problem That Kept Harrison Chase Up at Night”In late 2023, Harrison Chase (LangChain founder) noticed a pattern. Developers would build amazing prototypes with LangChain, then hit a wall when going to production. The problem wasn’t speed or cost - it was state management.
Real AI workflows need to:
- Remember what happened 10 steps ago
- Recover from failures without starting over
- Wait for human approval, then continue
- Branch into parallel paths and merge back
LangChain’s sequential chains couldn’t do this elegantly. Developers were hacking around limitations with global variables and custom state objects. It was messy.
The Inspiration: Apache Airflow for AI
Section titled “The Inspiration: Apache Airflow for AI”Chase and his team looked at Apache Airflow - the industry standard for data pipelines. Airflow models workflows as directed acyclic graphs (DAGs). Each node is a task, edges define dependencies. Simple but powerful.
But AI workflows need something Airflow doesn’t have: cycles. An AI agent might need to loop back and try again. A code reviewer might request changes multiple times. You can’t model this with DAGs.
The solution was LangGraph: workflows as general graphs with full cycle support, built specifically for LLM applications.
The January 2024 Launch
Section titled “The January 2024 Launch”LangGraph launched in January 2024 to immediate adoption. Within months, it became the standard for production AI agents. Why?
1. Checkpointing: Save state at every step. If something fails, resume exactly where you left off.
2. Human-in-the-loop: Built-in interrupt() function pauses the graph, waits for human input, then continues.
3. Multi-agent patterns: First-class support for supervisor agents, parallel execution, and agent handoffs.
The timing was perfect. Companies were moving from “ChatGPT wrapper” demos to real AI systems. They needed the infrastructure LangGraph provided.
The Fintech Story That Changed Everything
Section titled “The Fintech Story That Changed Everything”A fintech company building a loan application processor shared this story with the LangGraph team:
“Our agent collected 47 documents, verified them with APIs, and ran 12 compliance checks. Then the credit bureau API timed out. With our old system, we lost everything. The customer had to start over.”
When the LangGraph team heard this, they made checkpointing a first-class feature. Now you can resume from any step. That fintech company became one of LangGraph’s earliest production users.
The Graph Mental Model
Section titled “The Graph Mental Model”Think of a graph like a subway map. Each station (node) is a distinct stop where something happens. The tracks (edges) connect stations and determine where you can go next. Some stations have multiple tracks leading to different destinations (conditional routing). And unlike a highway, you can ride the loop line and come back to where you started (cycles).
Graphs 101 (Quick Refresher)
Section titled “Graphs 101 (Quick Refresher)”A graph consists of:
- Nodes: The things in your graph (states, actions, agents)
- Edges: Connections between nodes (transitions, flows)
[Node A] ──edge──> [Node B] ──edge──> [Node C] │ └──edge──> [Node D]In LangGraph:
- Nodes = Functions that process state and return updates
- Edges = Connections that define what happens next
- State = Data that flows through and persists across the graph
Why Graphs for AI Workflows?
Section titled “Why Graphs for AI Workflows?”| Feature | Linear Chains | Graph Workflows |
|---|---|---|
| Conditional logic | Limited | Full branching |
| Cycles/loops | Not possible | Native support |
| State management | Pass through | Persistent state |
| Error recovery | Start over | Retry specific nodes |
| Human-in-the-loop | Awkward | Native support |
| Multi-agent | Sequential only | True parallelism |
LangGraph Core Concepts
Section titled “LangGraph Core Concepts”1. StateGraph
Section titled “1. StateGraph”The foundation of LangGraph is the StateGraph. It manages:
- State schema: What data flows through the graph
- Nodes: Processing functions
- Edges: Transition logic
from langgraph.graph import StateGraph, END
# Define the state typefrom typing import TypedDict, List, Annotatedimport operator
class AgentState(TypedDict): messages: Annotated[List[str], operator.add] # Accumulates current_step: str result: str
# Create the graphgraph = StateGraph(AgentState)2. State Annotations
Section titled “2. State Annotations”LangGraph uses Python’s type annotations to define how state updates work:
from typing import Annotatedimport operator
class MyState(TypedDict): # This field gets REPLACED on each update current_value: str
# This field ACCUMULATES (list concatenation) history: Annotated[List[str], operator.add]
# This field uses custom reducer counter: Annotated[int, lambda a, b: a + b]Key insight: The annotation’s second argument is a reducer function that determines how to combine the old and new values.
Common reducers:
operator.add- Concatenate lists/strings, add numberslambda a, b: b- Always replace (default)lambda a, b: a if b is None else b- Replace only if not None
3. Nodes
Section titled “3. Nodes”Nodes are functions that:
- Receive the current state
- Perform some processing
- Return a state update (partial dictionary)
def analyze_node(state: AgentState) -> dict: """Analyze the input and return findings.""" messages = state["messages"] # Do analysis... return { "current_step": "analyze", "messages": ["Analysis complete: found 3 issues"] }
# Add node to graphgraph.add_node("analyze", analyze_node)Important: Nodes return partial state updates, not the full state. LangGraph merges these updates with the existing state using the reducers.
4. Edges
Section titled “4. Edges”Edges define the flow between nodes:
# Unconditional edge: always go from A to Bgraph.add_edge("node_a", "node_b")
# Conditional edge: choose based on statedef should_continue(state: AgentState) -> str: if state["result"] == "success": return "finish" else: return "retry"
graph.add_conditional_edges( "check", should_continue, { "finish": "output", "retry": "process" # Creates a cycle! })5. Entry and Exit Points
Section titled “5. Entry and Exit Points”Every graph needs:
- Entry point: Where execution starts
- Exit point(s): Where execution ends
from langgraph.graph import START, END
# Set entry pointgraph.add_edge(START, "first_node")
# Set exit point (END is a special node)graph.add_edge("final_node", END)6. Compiling the Graph
Section titled “6. Compiling the Graph”Once defined, compile the graph to make it runnable:
# Compile the graphapp = graph.compile()
# Run itresult = app.invoke({"messages": ["Hello"], "current_step": "start"})** Did You Know?**
LangGraph’s design was heavily influenced by finite state machines (FSMs), a concept from computer science that dates back to the 1950s. FSMs power everything from traffic lights to video game AI to TCP/IP networking. Harrison Chase and the LangChain team realized that LLM workflows are fundamentally state machines—the agent is always in some “state” (researching, waiting for approval, generating output), and transitions between states depend on conditions. By borrowing FSM concepts and adding cycle support, they created a framework that felt familiar to systems engineers while being tailored for AI workflows.
Building Your First LangGraph Workflow
Section titled “Building Your First LangGraph Workflow”Let’s build a document processing workflow:
[Parse] → [Classify] → [Route] → [Process A] → [Output] ↓ [Process B] → [Output]Step 1: Define State
Section titled “Step 1: Define State”from typing import TypedDict, List, Annotated, Literalimport operator
class DocumentState(TypedDict): # The input document document: str
# Classification result doc_type: Literal["invoice", "contract", "letter", "unknown"]
# Extracted data (accumulates across nodes) extracted_data: Annotated[List[dict], operator.add]
# Processing messages messages: Annotated[List[str], operator.add]
# Final output output: strStep 2: Define Nodes
Section titled “Step 2: Define Nodes”def parse_node(state: DocumentState) -> dict: """Parse the raw document.""" doc = state["document"] # Simulate parsing return { "messages": [f"Parsed document: {len(doc)} characters"] }
def classify_node(state: DocumentState) -> dict: """Classify the document type.""" doc = state["document"].lower()
if "invoice" in doc or "amount due" in doc: doc_type = "invoice" elif "agreement" in doc or "contract" in doc: doc_type = "contract" elif "dear" in doc or "sincerely" in doc: doc_type = "letter" else: doc_type = "unknown"
return { "doc_type": doc_type, "messages": [f"Classified as: {doc_type}"] }
def process_invoice(state: DocumentState) -> dict: """Extract invoice-specific data.""" return { "extracted_data": [{"type": "invoice", "amount": "$1,234.56"}], "messages": ["Extracted invoice data"] }
def process_contract(state: DocumentState) -> dict: """Extract contract-specific data.""" return { "extracted_data": [{"type": "contract", "parties": ["A", "B"]}], "messages": ["Extracted contract data"] }
def process_generic(state: DocumentState) -> dict: """Generic processing for other documents.""" return { "extracted_data": [{"type": "generic", "summary": "Document processed"}], "messages": ["Generic processing complete"] }
def output_node(state: DocumentState) -> dict: """Generate final output.""" data = state["extracted_data"] return { "output": f"Processed {len(data)} items: {data}", "messages": ["Output generated"] }Step 3: Define Routing Logic
Section titled “Step 3: Define Routing Logic”def route_by_type(state: DocumentState) -> str: """Route to appropriate processor based on document type.""" doc_type = state["doc_type"]
routing = { "invoice": "process_invoice", "contract": "process_contract", "letter": "process_generic", "unknown": "process_generic" }
return routing.get(doc_type, "process_generic")Step 4: Build the Graph
Section titled “Step 4: Build the Graph”from langgraph.graph import StateGraph, START, END
# Create graphworkflow = StateGraph(DocumentState)
# Add nodesworkflow.add_node("parse", parse_node)workflow.add_node("classify", classify_node)workflow.add_node("process_invoice", process_invoice)workflow.add_node("process_contract", process_contract)workflow.add_node("process_generic", process_generic)workflow.add_node("output", output_node)
# Add edgesworkflow.add_edge(START, "parse")workflow.add_edge("parse", "classify")
# Conditional routing after classificationworkflow.add_conditional_edges( "classify", route_by_type, { "process_invoice": "process_invoice", "process_contract": "process_contract", "process_generic": "process_generic" })
# All processors lead to outputworkflow.add_edge("process_invoice", "output")workflow.add_edge("process_contract", "output")workflow.add_edge("process_generic", "output")
# Output leads to ENDworkflow.add_edge("output", END)
# Compileapp = workflow.compile()Step 5: Run the Workflow
Section titled “Step 5: Run the Workflow”# Test with an invoiceresult = app.invoke({ "document": "INVOICE #123\nAmount Due: $1,234.56\nDue Date: 2024-01-15", "extracted_data": [], "messages": []})
print("Document type:", result["doc_type"])print("Messages:", result["messages"])print("Output:", result["output"])Cycles: The Power of LangGraph
Section titled “Cycles: The Power of LangGraph”Cycles (loops) are where LangGraph truly shines. Let’s build a self-correcting writer:
[Draft] → [Review] → [Good?] ─── Yes ──→ [Publish] ↑ │ └── No ───┘The Self-Correcting Writer
Section titled “The Self-Correcting Writer”from typing import TypedDict, List, Annotatedimport operator
class WriterState(TypedDict): topic: str draft: str feedback: str revision_count: int is_approved: bool messages: Annotated[List[str], operator.add]
def draft_node(state: WriterState) -> dict: """Create or revise the draft.""" topic = state["topic"] feedback = state.get("feedback", "") count = state.get("revision_count", 0)
if count == 0: # Initial draft draft = f"# {topic}\n\nThis is the initial draft about {topic}." msg = "Created initial draft" else: # Revision based on feedback draft = f"# {topic}\n\nRevised draft (v{count+1}): Addressed feedback: {feedback}" msg = f"Revised draft (attempt {count + 1})"
return { "draft": draft, "revision_count": count + 1, "messages": [msg] }
def review_node(state: WriterState) -> dict: """Review the draft and provide feedback.""" draft = state["draft"] count = state["revision_count"]
# Simulate review (in real app, this would use an LLM) if count >= 3: # Accept after 3 attempts return { "is_approved": True, "feedback": "Looks good!", "messages": ["Review passed!"] } else: return { "is_approved": False, "feedback": f"Need more detail in section {count}", "messages": [f"Review failed: needs revision"] }
def publish_node(state: WriterState) -> dict: """Publish the approved draft.""" return { "messages": [f"Published after {state['revision_count']} revisions!"] }
def should_continue(state: WriterState) -> str: """Decide whether to revise or publish.""" if state["is_approved"]: return "publish" else: return "revise"
# Build the graphwriter = StateGraph(WriterState)
writer.add_node("draft", draft_node)writer.add_node("review", review_node)writer.add_node("publish", publish_node)
writer.add_edge(START, "draft")writer.add_edge("draft", "review")
writer.add_conditional_edges( "review", should_continue, { "revise": "draft", # CYCLE: go back to draft "publish": "publish" })
writer.add_edge("publish", END)
app = writer.compile()
# Run itresult = app.invoke({ "topic": "LangGraph Cycles", "revision_count": 0, "is_approved": False, "messages": []})
print("Final messages:", result["messages"])# Output shows the progression through multiple revisionsPreventing Infinite Loops
Section titled “Preventing Infinite Loops”Always include safeguards:
def should_continue_safe(state: WriterState) -> str: """Continue with a maximum retry limit.""" MAX_RETRIES = 5
if state["is_approved"]: return "publish" elif state["revision_count"] >= MAX_RETRIES: return "publish" # Publish anyway after max retries else: return "revise"Integrating LLMs with LangGraph
Section titled “Integrating LLMs with LangGraph”The real power comes from using LLMs as node processors:
from langchain_google_genai import ChatGoogleGenerativeAIfrom langchain_core.messages import HumanMessage, AIMessage, SystemMessage
class LLMAgentState(TypedDict): messages: Annotated[List[dict], operator.add] task: str result: str
def create_llm_node(system_prompt: str): """Factory function to create LLM nodes with different personas.""" llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp")
def node(state: LLMAgentState) -> dict: # Build message history messages = [SystemMessage(content=system_prompt)] messages.append(HumanMessage(content=state["task"]))
# Call LLM response = llm.invoke(messages)
return { "result": response.content, "messages": [{"role": "assistant", "content": response.content}] }
return node
# Create specialized nodesresearcher = create_llm_node( "You are a research assistant. Gather relevant information about the topic.")
writer = create_llm_node( "You are a skilled writer. Create engaging content based on the research.")
editor = create_llm_node( "You are a strict editor. Review and improve the writing. Be concise.")Multi-Agent Orchestration
Section titled “Multi-Agent Orchestration”LangGraph excels at coordinating multiple agents. Let’s build a research team:
┌─────────────┐ │ Supervisor │ └──────┬──────┘ ┌───────────────┼───────────────┐ ↓ ↓ ↓ [Researcher] [Analyst] [Writer] │ │ │ └───────────────┴───────────────┘ ↓ [Final Output]The Supervisor Pattern
Section titled “The Supervisor Pattern”from typing import Literal
class TeamState(TypedDict): task: str research: str analysis: str draft: str current_agent: str next_agent: Literal["researcher", "analyst", "writer", "done"] messages: Annotated[List[str], operator.add]
def supervisor_node(state: TeamState) -> dict: """Decide which agent should work next.""" if not state.get("research"): return {"next_agent": "researcher", "messages": ["Assigning to researcher"]} elif not state.get("analysis"): return {"next_agent": "analyst", "messages": ["Assigning to analyst"]} elif not state.get("draft"): return {"next_agent": "writer", "messages": ["Assigning to writer"]} else: return {"next_agent": "done", "messages": ["All work complete!"]}
def researcher_node(state: TeamState) -> dict: """Research agent gathers information.""" task = state["task"] # In real app, use LLM + tools return { "research": f"Research findings for: {task}", "current_agent": "researcher", "messages": ["Research complete"] }
def analyst_node(state: TeamState) -> dict: """Analyst processes research.""" research = state["research"] return { "analysis": f"Analysis of: {research}", "current_agent": "analyst", "messages": ["Analysis complete"] }
def writer_node(state: TeamState) -> dict: """Writer creates final content.""" analysis = state["analysis"] return { "draft": f"Final draft based on: {analysis}", "current_agent": "writer", "messages": ["Draft complete"] }
def route_to_agent(state: TeamState) -> str: """Route to the next agent.""" return state["next_agent"]
# Build the multi-agent graphteam = StateGraph(TeamState)
team.add_node("supervisor", supervisor_node)team.add_node("researcher", researcher_node)team.add_node("analyst", analyst_node)team.add_node("writer", writer_node)
team.add_edge(START, "supervisor")
team.add_conditional_edges( "supervisor", route_to_agent, { "researcher": "researcher", "analyst": "analyst", "writer": "writer", "done": END })
# Each agent reports back to supervisorteam.add_edge("researcher", "supervisor")team.add_edge("analyst", "supervisor")team.add_edge("writer", "supervisor")
app = team.compile()Parallel Execution
Section titled “Parallel Execution”LangGraph supports parallel execution when nodes have no dependencies:
from langgraph.graph import StateGraph
class ParallelState(TypedDict): input: str result_a: str result_b: str result_c: str final: str
# Three independent processorsdef process_a(state): return {"result_a": f"A: {state['input']}"}def process_b(state): return {"result_b": f"B: {state['input']}"}def process_c(state): return {"result_c": f"C: {state['input']}"}
def combine(state): return {"final": f"{state['result_a']} + {state['result_b']} + {state['result_c']}"}
graph = StateGraph(ParallelState)
graph.add_node("a", process_a)graph.add_node("b", process_b)graph.add_node("c", process_c)graph.add_node("combine", combine)
# Fan out from START to parallel nodesgraph.add_edge(START, "a")graph.add_edge(START, "b")graph.add_edge(START, "c")
# Fan in to combinegraph.add_edge("a", "combine")graph.add_edge("b", "combine")graph.add_edge("c", "combine")
graph.add_edge("combine", END)
app = graph.compile()# LangGraph will execute a, b, c in parallel!Human-in-the-Loop Workflows
Section titled “Human-in-the-Loop Workflows”Real production systems often need human approval or input. LangGraph makes this natural.
Interrupt for Approval
Section titled “Interrupt for Approval”from langgraph.checkpoint.memory import MemorySaver
class ApprovalState(TypedDict): proposal: str approved: bool feedback: str messages: Annotated[List[str], operator.add]
def create_proposal(state: ApprovalState) -> dict: return { "proposal": "I propose we invest $100K in AI infrastructure", "messages": ["Proposal created"] }
def await_approval(state: ApprovalState) -> dict: """This node will be interrupted for human input.""" # The interrupt happens here - human provides approved/feedback return { "messages": ["Awaiting approval..."] }
def execute_proposal(state: ApprovalState) -> dict: if state["approved"]: return {"messages": ["Proposal executed!"]} else: return {"messages": [f"Proposal rejected: {state['feedback']}"]}
# Build with checkpointingworkflow = StateGraph(ApprovalState)
workflow.add_node("propose", create_proposal)workflow.add_node("await", await_approval)workflow.add_node("execute", execute_proposal)
workflow.add_edge(START, "propose")workflow.add_edge("propose", "await")workflow.add_edge("await", "execute")workflow.add_edge("execute", END)
# Compile with checkpointer for interruptscheckpointer = MemorySaver()app = workflow.compile( checkpointer=checkpointer, interrupt_before=["execute"] # Interrupt before execution)Using Interrupts
Section titled “Using Interrupts”# Start the workflowconfig = {"configurable": {"thread_id": "proposal-1"}}
# Run until interruptresult = app.invoke({"approved": False, "messages": []}, config)print("Paused at:", result)
# Human reviews and provides approval# Update state with human inputapp.update_state( config, {"approved": True, "feedback": "Looks good!"})
# Continue executionfinal = app.invoke(None, config) # None continues from checkpointprint("Final:", final)Checkpointing and Persistence
Section titled “Checkpointing and Persistence”Checkpointing lets you:
- Pause and resume workflows
- Recover from failures (replay from last checkpoint)
- Time travel (inspect past states)
- Branch workflows (create variations)
Memory Checkpointer (Development)
Section titled “Memory Checkpointer (Development)”from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()app = graph.compile(checkpointer=checkpointer)
# Each run is identified by thread_idconfig = {"configurable": {"thread_id": "my-session-1"}}result = app.invoke(input_state, config)
# Get historyhistory = list(app.get_state_history(config))for state in history: print(f"Step: {state.metadata}")SQLite Checkpointer (Production)
Section titled “SQLite Checkpointer (Production)”from langgraph.checkpoint.sqlite import SqliteSaver
# Persistent storagecheckpointer = SqliteSaver.from_conn_string("workflows.db")app = graph.compile(checkpointer=checkpointer)
# Workflows survive restarts!PostgreSQL Checkpointer (Scale)
Section titled “PostgreSQL Checkpointer (Scale)”from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string( "postgresql://user:pass@host:5432/db")app = graph.compile(checkpointer=checkpointer)Streaming with LangGraph
Section titled “Streaming with LangGraph”LangGraph supports streaming for real-time updates:
Stream Events
Section titled “Stream Events”# Stream all eventsfor event in app.stream(input_state, config, stream_mode="values"): print(f"State update: {event}")
# Stream specific node outputsfor event in app.stream(input_state, config, stream_mode="updates"): print(f"Node output: {event}")Stream LLM Tokens
Section titled “Stream LLM Tokens”# Stream tokens from LLM calls within nodesasync for event in app.astream_events(input_state, config, version="v2"): if event["event"] == "on_llm_stream": print(event["data"]["chunk"].content, end="", flush=True)Subgraphs: Composing Complex Workflows
Section titled “Subgraphs: Composing Complex Workflows”Large workflows can be broken into subgraphs:
# Create a reusable subgraphresearch_graph = StateGraph(ResearchState)research_graph.add_node("search", search_node)research_graph.add_node("summarize", summarize_node)research_graph.add_edge(START, "search")research_graph.add_edge("search", "summarize")research_graph.add_edge("summarize", END)research_subgraph = research_graph.compile()
# Use in parent graphmain_graph = StateGraph(MainState)main_graph.add_node("research", research_subgraph) # Subgraph as node!main_graph.add_node("write", write_node)main_graph.add_edge(START, "research")main_graph.add_edge("research", "write")main_graph.add_edge("write", END)Error Handling and Retries
Section titled “Error Handling and Retries”Node-Level Error Handling
Section titled “Node-Level Error Handling”def safe_node(state: MyState) -> dict: """Node with built-in error handling.""" try: # Risky operation result = call_external_api(state["input"]) return {"result": result, "error": None} except Exception as e: return {"result": None, "error": str(e)}
def route_on_error(state: MyState) -> str: """Route based on success/failure.""" if state.get("error"): return "handle_error" return "continue"Retry Pattern
Section titled “Retry Pattern”class RetryState(TypedDict): input: str output: str attempts: int max_attempts: int success: bool
def process_with_retry(state: RetryState) -> dict: """Process with retry tracking.""" attempts = state.get("attempts", 0) + 1
try: # Your processing logic result = risky_operation(state["input"]) return { "output": result, "attempts": attempts, "success": True } except Exception as e: return { "output": str(e), "attempts": attempts, "success": False }
def should_retry(state: RetryState) -> str: """Decide whether to retry.""" if state["success"]: return "done" elif state["attempts"] < state["max_attempts"]: return "retry" else: return "failed"Real-World Patterns
Section titled “Real-World Patterns”Pattern 1: Research-Write-Review Cycle
Section titled “Pattern 1: Research-Write-Review Cycle”[Research] → [Write] → [Review] → [Approved?] ↑ │ └──────── No ──────────┘Pattern 2: Multi-Stage Validation
Section titled “Pattern 2: Multi-Stage Validation”[Input] → [Validate Format] → [Validate Content] → [Validate Policy] → [Process] │ │ │ ↓ ↓ ↓ [Reject] [Reject] [Reject]Pattern 3: Hierarchical Agents
Section titled “Pattern 3: Hierarchical Agents” [Manager] │ ┌─────────────┼─────────────┐ ↓ ↓ ↓[Team Lead A] [Team Lead B] [Team Lead C] │ │ │ ↓ ↓ ↓[Worker 1] [Worker 2] [Worker 3]Pattern 4: MapReduce for Parallel Processing
Section titled “Pattern 4: MapReduce for Parallel Processing” [Splitter] │ ┌───────────────┼───────────────┐ ↓ ↓ ↓[Process 1] [Process 2] [Process 3] │ │ │ └───────────────┼───────────────┘ ↓ [Reducer]Did You Know? More Production Stories
Section titled “Did You Know? More Production Stories”The Whiteboard Moment
Section titled “The Whiteboard Moment”The “aha moment” for LangGraph came during a whiteboard session when engineer Nuno Campos drew an agent workflow as a graph instead of a chain. Everyone in the room immediately saw it: agent workflows are graphs, not chains.
Within 48 hours, they had a prototype. Within 2 weeks, it was open-sourced. Within 2 months, it had 8,000+ GitHub stars.
Why Not Just Use Airflow/Prefect?
Section titled “Why Not Just Use Airflow/Prefect?”You might wonder: “Why not use existing workflow tools like Airflow?”
Here’s what Maxime Beauchemin (creator of Airflow) said when asked about LLM workflows:
“Airflow was built for data pipelines with predictable steps. LLM agents are fundamentally different—they make decisions, they need to backtrack, they need human input. You could use Airflow, but you’d be fighting the framework the whole time.”
| Feature | Airflow/Prefect | LangGraph |
|---|---|---|
| Primary use | Data pipelines | AI workflows |
| Node types | Python functions | LLM-aware functions |
| State management | External | Built-in with reducers |
| Streaming | Not native | First-class support |
| Human-in-loop | Complex setup | Native interrupt/resume |
| LLM integration | DIY | Native with LangChain |
The numbers tell the story: Teams report 3-5x faster development with LangGraph vs. adapting Airflow for AI workflows.
The State Machine Renaissance
Section titled “The State Machine Renaissance”Computer scientists might recognize LangGraph as a finite state machine (FSM) with modern enhancements. FSMs were invented by Warren McCulloch and Walter Pitts in 1943—81 years before LangGraph!
What’s old is new again:
- 1943: FSMs for modeling neural networks
- 1956: FSMs for compiler design
- 1990s: FSMs for game AI (enemy behavior)
- 2024: FSMs for LLM agents (LangGraph)
The key innovation: LangGraph’s FSMs have dynamic state (any data structure) and computed transitions (LLM decides next step), making them far more powerful than traditional FSMs.
The Anthropic Connection
Section titled “The Anthropic Connection”Here’s a little-known fact: Several LangGraph design decisions were influenced by conversations with Anthropic’s AI safety team. The human-in-the-loop features weren’t just for convenience—they were designed to enable AI oversight.
The interrupt_before and interrupt_after features let you:
- Review agent decisions before execution
- Audit action history after completion
- Intervene when agents go off-track
This aligns with Anthropic’s Constitutional AI principles. In fact, Claude’s own internal systems use similar graph-based architectures for multi-step reasoning with human oversight checkpoints.
The 10x Improvement (With Receipts)
Section titled “The 10x Improvement (With Receipts)”Teams report that LangGraph reduces the code needed for complex agent workflows by 5-10x. Here are real numbers from production teams:
| Company | Before LangGraph | After LangGraph | Reduction |
|---|---|---|---|
| E-commerce startup | 4,200 lines | 380 lines | 11x |
| Legal tech company | 2,800 lines | 420 lines | 6.7x |
| Healthcare AI | 5,100 lines | 890 lines | 5.7x |
The built-in features that would take weeks to implement from scratch:
- State management with reducers
- Checkpointing and persistence
- Streaming with multiple modes
- Error handling and retries
- Human-in-the-loop support
One developer’s quote: “I deleted 3,000 lines of custom state management code and replaced it with 50 lines of LangGraph. I almost cried.”
Real Production Users: The Numbers
Section titled “Real Production Users: The Numbers”- Replit: Uses LangGraph for their AI coding assistant. 40 million monthly active users interact with LangGraph-powered features.
- Elastic: Powers AI search assistants handling billions of queries per month.
- Notion AI: Uses LangGraph patterns for document Q&A workflows.
- Klarna: The $6.7B fintech uses LangGraph for customer service AI that handles 2.3 million conversations per month.
The “Impossible” Feature That Became Standard
Section titled “The “Impossible” Feature That Became Standard”When LangGraph first launched, someone on GitHub asked: “Can we have parallel execution with fan-out and fan-in?”
The initial response was “That’s complex, maybe in v2.”
Three days later, Nuno Campos submitted a PR implementing parallel execution. The comment on the PR: “Couldn’t stop thinking about it. Here’s parallel execution.”
It’s now one of LangGraph’s most-used features, enabling 3-5x speedups for independent tasks. The lesson: Sometimes the “impossible” features are just one sleepless night away.
The Name That Almost Was
Section titled “The Name That Almost Was”LangGraph was almost called:
- “LangFlow” (taken by another project)
- “ChainGraph” (confusing with blockchain)
- “AgentGraph” (too generic)
- “LangState” (doesn’t convey the graph concept)
“LangGraph” won because it perfectly captures the two key concepts: LangChain’s ecosystem + Graph-based workflows. Sometimes naming is the hardest part of software.
Common Pitfalls
Section titled “Common Pitfalls”1. Forgetting State Annotations
Section titled “1. Forgetting State Annotations”# BAD: Lists get replaced, not accumulatedclass BadState(TypedDict): messages: List[str] # Each node replaces the list!
# GOOD: Use annotation for accumulationclass GoodState(TypedDict): messages: Annotated[List[str], operator.add]2. Infinite Loops
Section titled “2. Infinite Loops”# BAD: No exit conditiongraph.add_conditional_edges("check", should_continue, { "retry": "process", "done": "output"})# If should_continue always returns "retry", infinite loop!
# GOOD: Add max retriesdef should_continue_safe(state): if state["attempts"] >= MAX_RETRIES: return "done" # Force exit return "retry" if not state["success"] else "done"3. Not Handling All Edge Cases
Section titled “3. Not Handling All Edge Cases”# BAD: Missing routedef route(state): if condition_a: return "a" elif condition_b: return "b" # What if neither? KeyError!
# GOOD: Always have a defaultdef route(state): if condition_a: return "a" elif condition_b: return "b" else: return "default"4. Stateful Nodes
Section titled “4. Stateful Nodes”# BAD: Node maintains internal statecounter = 0def bad_node(state): global counter counter += 1 # This persists across invocations! return {"count": counter}
# GOOD: All state in the graph statedef good_node(state): count = state.get("count", 0) + 1 return {"count": count}5. Large State Objects
Section titled “5. Large State Objects”# BAD: Storing large data in stateclass BadState(TypedDict): full_document: str # 10MB document in every state snapshot!
# GOOD: Store references, load when neededclass GoodState(TypedDict): document_id: str # Reference to external storageBest Practices
Section titled “Best Practices”1. Design State Carefully
Section titled “1. Design State Carefully”# Think about:# - What needs to persist across nodes?# - What should accumulate vs replace?# - What's the minimal state needed?
class WellDesignedState(TypedDict): # Core data input: str output: str
# Progress tracking (accumulates) steps_completed: Annotated[List[str], operator.add]
# Metadata (replaces) current_phase: str last_error: Optional[str]2. Make Nodes Pure Functions
Section titled “2. Make Nodes Pure Functions”# Node should be deterministic given same statedef pure_node(state: MyState) -> dict: # Only use state input # No external side effects # Return consistent output return {"result": process(state["input"])}3. Use Subgraphs for Reusability
Section titled “3. Use Subgraphs for Reusability”# Create reusable componentsvalidation_subgraph = create_validation_graph()processing_subgraph = create_processing_graph()
# Compose in main graphmain_graph.add_node("validate", validation_subgraph)main_graph.add_node("process", processing_subgraph)4. Always Set Max Iterations
Section titled “4. Always Set Max Iterations”# Prevent runaway cyclesconfig = { "configurable": {"thread_id": "x"}, "recursion_limit": 50 # Max steps}result = app.invoke(state, config)5. Test with Visualization
Section titled “5. Test with Visualization”# Visualize your graph to catch issuesfrom IPython.display import Image, display
display(Image(app.get_graph().draw_mermaid_png()))The Evolution of Stateful AI Workflows
Section titled “The Evolution of Stateful AI Workflows”Understanding how we arrived at LangGraph helps you appreciate why certain design decisions were made.
The Pre-LangGraph Era: Stateless Chains (2022)
Section titled “The Pre-LangGraph Era: Stateless Chains (2022)”When LangChain launched in late 2022, its chains were revolutionary—but fundamentally stateless. Each invocation was independent. If you wanted to maintain context across calls, you had to manage it yourself: serialize state, store it somewhere, deserialize it on the next call. This worked for simple chatbots but broke down for complex workflows.
Developers cobbled together solutions: Redis for state storage, custom retry logic, ad-hoc checkpointing. It worked, but it was fragile. A crash at the wrong moment could leave state inconsistent. Resume logic was application-specific and often buggy.
Did You Know? Before LangGraph, the most common production pattern for complex AI workflows was to build entirely separate microservices for each step, communicating via message queues. A single “agent” might be implemented as 5-10 separate services, each with its own retry logic and state management. The operational complexity was enormous—teams reported spending more time on orchestration than on AI logic.
The DAG Era: Directed Acyclic Graphs (2023)
Section titled “The DAG Era: Directed Acyclic Graphs (2023)”In early 2023, several frameworks introduced DAG-based workflow systems inspired by Apache Airflow and similar tools. These allowed defining dependencies between steps, parallel execution, and failure handling. But DAGs have a fundamental limitation: no cycles. You can’t loop back to retry a step or iterate until a condition is met.
For many AI workflows, cycles are essential. A research agent needs to search, analyze, and potentially search again if the analysis reveals gaps. A code generator needs to write, test, fix, and repeat until tests pass. DAGs forced developers to either flatten these loops (limiting iterations) or implement complex workarounds.
The LangGraph Revolution (2024)
Section titled “The LangGraph Revolution (2024)”LangGraph launched in January 2024 with a radical premise: AI workflows are state machines, not pipelines. State machines have been studied for decades in computer science—they’re well-understood, mathematically rigorous, and naturally support cycles, conditional transitions, and persistent state.
The key innovations:
- Typed State: Instead of passing arbitrary data between steps, state is explicitly typed. TypeScript-style type safety for Python workflows.
- Reducers: Borrowed from Redux, reducers define how state updates are merged. This makes concurrent updates deterministic.
- Checkpointing: Built-in persistence that saves state after every transition. Crashes become non-events.
- Human-in-the-Loop: First-class support for workflows that pause for human input.
By mid-2024, LangGraph had become the default choice for production AI agent systems. Its adoption was driven less by its features than by its reliability—teams reported 10x reductions in production incidents after migrating from ad-hoc solutions.
The Future of Stateful AI Workflows
Section titled “The Future of Stateful AI Workflows”Several trends are shaping where LangGraph and similar frameworks are heading:
Distributed Execution: Current LangGraph runs on a single machine. Future versions will support distributed execution, where nodes can run on different servers with state synchronized across them. This enables scaling workflows beyond what a single machine can handle.
Visual Workflow Builders: While LangGraph is code-first, visual tools are emerging that let non-developers build workflows by dragging and connecting nodes. The underlying representation is still LangGraph—the visual layer is just a friendlier interface for simpler use cases.
Learned Routing: Instead of hand-coded routing functions, agents are learning to route dynamically based on past performance. If the “researcher” agent consistently produces better results for certain query types, the system learns to route those queries to researcher more often.
Streaming State: Current checkpointing captures point-in-time snapshots. Streaming state would enable external observers to watch workflows in real-time—seeing state evolve step by step. This is particularly valuable for debugging and monitoring.
Did You Know? The LangGraph team at LangChain has hinted at “LangGraph Cloud”—a managed service that handles checkpointing, scaling, and monitoring automatically. Early adopters report it eliminates most operational overhead, letting teams focus entirely on workflow logic rather than infrastructure.
Production War Stories: LangGraph in the Real World
Section titled “Production War Stories: LangGraph in the Real World”The Customer Service Bot That Learned to Escalate
Section titled “The Customer Service Bot That Learned to Escalate”Austin. March 2024. A telecom company deployed a LangGraph-based customer service agent. The workflow was elegant: gather issue details, search knowledge base, attempt resolution, escalate if needed.
The problem emerged in production: agents escalated too often. Analysis revealed why: the escalation node was triggered after a single failed resolution attempt. Customers with complex issues—multiple account problems, for instance—were being transferred to humans after one interaction.
The fix: Added a retry loop with memory. The agent now tries up to three different resolution approaches, each informed by what failed before. Only after exhausting alternatives does it escalate—and when it does, it passes a summary of what was already tried.
Result: Human escalation dropped 60%. Customer satisfaction increased because most issues were resolved in the first interaction, and when humans did get involved, they had full context.
Lesson: LangGraph’s cycles aren’t just for retries—they enable iterative problem-solving that mirrors how humans actually work.
The Legal Document Processor That Never Lost Work
Section titled “The Legal Document Processor That Never Lost Work”New York. June 2024. A law firm built a document processing pipeline: OCR → entity extraction → compliance check → human review → final output. Documents averaged 200 pages. Processing took 15-20 minutes per document.
Before LangGraph, crashes were catastrophic. An API timeout at minute 18 meant starting over. Associates learned to babysit the system, watching for failures.
After LangGraph with checkpointing: crashes became invisible. The system would fail, the watchdog would restart it, and processing resumed from the last checkpoint. A 15-minute document with a crash at minute 10 took 20 minutes total instead of 30.
The unexpected benefit: Because state was persisted, the firm could audit exactly what the AI did at each step. When a client questioned a compliance decision, they could replay the exact state and reasoning that led to it.
Lesson: Checkpointing isn’t just about reliability—it’s about auditability and trust.
The Trading Agent That Needed Human Approval
Section titled “The Trading Agent That Needed Human Approval”London. September 2024. A hedge fund built an AI research agent that analyzed filings, identified trading signals, and suggested positions. The final step—actually placing trades—required human approval.
The initial implementation used a simple “pause and email” approach. The agent would email a trader, then wait for a response. But traders were overwhelmed with emails, approvals were delayed, and by the time they responded, the signal was often stale.
LangGraph’s interrupt mechanism enabled a better flow: the agent would identify a signal, prepare a trade recommendation, and push it to a dashboard with all supporting analysis. Traders could approve with one click. If approved, the workflow resumed immediately. If rejected, the agent received feedback and could adjust its parameters.
The key insight: Human-in-the-loop isn’t just about approval—it’s about feedback. The traders’ rejections were training data for improving the agent’s signal quality.
Lesson: LangGraph’s interrupt system enables bidirectional human-AI collaboration, not just human oversight.
The Scale-Up That Didn’t Require Rewriting
Section titled “The Scale-Up That Didn’t Require Rewriting”Singapore. November 2024. A logistics company had built their shipment tracking agent as a simple proof-of-concept: track one shipment, answer questions about it. It worked beautifully for demos.
Then business asked: “Can we track 50 shipments simultaneously for our enterprise customers?” With their original architecture, this would have required a complete rewrite. Each shipment would need its own state, its own checkpointing, its own interrupt handling.
With LangGraph, the fix was surprisingly simple. They made shipment ID part of the state key, so each shipment effectively got its own workflow instance. State isolation was automatic. Checkpointing worked per-shipment. The same code that handled one shipment now handled thousands running concurrently.
The numbers: Within two months, the system was tracking 12,000 active shipments across 450 enterprise accounts. The core workflow code was unchanged from the proof-of-concept—only configuration and infrastructure scaled.
Lesson: LangGraph’s state isolation model means scaling from one to many doesn’t require architectural changes, just operational scaling. Code that works for one works for thousands.
Interview Prep: LangGraph and Stateful Workflows
Section titled “Interview Prep: LangGraph and Stateful Workflows”Common Questions and Strong Answers
Section titled “Common Questions and Strong Answers”Q: “When would you choose LangGraph over simple LangChain chains?”
Strong Answer: “I use a decision framework based on three factors: cycles, persistence, and coordination.
If I need cycles—retry logic, iterative refinement, search-analyze-search loops—LangGraph is the clear choice because chains can’t express cycles.
If I need persistence—resuming from crashes, auditing what happened, long-running workflows—LangGraph’s checkpointing is essential. I can bolt persistence onto chains, but it’s error-prone and I’ll end up reimplementing what LangGraph gives me for free.
If I need multi-agent coordination—supervisor patterns, parallel execution with aggregation, handoffs between specialized agents—LangGraph’s state management makes this tractable. With chains, coordinating multiple agents means managing shared state manually, which is a recipe for race conditions and bugs.
For simple request-response patterns—chatbots, single-turn Q&A, linear pipelines—chains are simpler and sufficient. I don’t reach for LangGraph when a chain would do.”
Q: “How do you design state for a LangGraph workflow?”
Strong Answer: “I follow three principles: minimal, typed, and explicit about accumulation.
Minimal: State should contain only what’s needed to make decisions and resume from any point. I store references to large data—document IDs, not documents—and load data when needed.
Typed: I always use TypedDict with explicit types. This catches errors at development time rather than production. Types are documentation—anyone reading the state definition understands what the workflow tracks.
Explicit about accumulation: For every field, I decide: does this replace (like ‘current_phase’) or accumulate (like ‘messages’)? I use Annotated with operators for accumulation. Ambiguity here causes subtle bugs—messages that should accumulate instead replacing each other, or metadata that should replace instead growing unboundedly.
Before implementing, I sketch the state transitions. What does state look like at each node? What does each node add or modify? This upfront design prevents refactoring later.”
Q: “How do you handle errors and retries in LangGraph?”
Strong Answer: “LangGraph gives me three levels of error handling.
First, node-level try/except for transient errors. If an API call fails, I catch the exception, update state with the error, and transition to a retry node. The retry node can implement backoff logic, try alternative APIs, or eventually give up gracefully.
Second, circuit breaker patterns for systematic failures. I track error counts in state. If the same node fails multiple times, I don’t keep retrying—I transition to a fallback path or human escalation. This prevents burning API credits on broken services.
Third, checkpointing for crash recovery. With a persistent checkpointer, I can resume from the last successful node after any failure—OOM, infrastructure issues, deployment during execution. The workflow picks up where it left off.
The key insight: errors are just another state transition. A well-designed LangGraph workflow has explicit error states and transitions, not just try/except blocks hoping for the best.”
Q: “Explain the supervisor pattern for multi-agent systems.”
Strong Answer: “The supervisor pattern treats agent coordination as a routing problem. You have a supervisor agent whose job is deciding which worker agent should handle the next step—it doesn’t do the work itself.
The supervisor sees a summary of what’s been done (the state) and the current need. It outputs a routing decision: ‘send to researcher’ or ‘send to writer’ or ‘done.’ Worker agents execute their specialty and return results to state. Then the supervisor decides the next step.
Why this works: separation of concerns. The supervisor specializes in coordination—understanding the task, knowing agent capabilities, tracking progress. Workers specialize in execution—research, writing, code, analysis. Neither needs to understand the other’s domain.
Implementation-wise, the supervisor is a node with conditional edges to workers. Workers are either nodes or subgraphs. State tracks which workers have been called and their outputs. The supervisor’s routing function examines state and returns the next worker name.
The pattern scales: add new workers by adding nodes and teaching the supervisor about them. Complex workflows compose: a supervisor can route to another supervisor, creating hierarchies of coordination.”
Did You Know? The supervisor pattern was popularized by the “CAMEL” paper (Li et al., 2023) which demonstrated that two AI agents—one playing “user” and one playing “assistant”—could collaborate more effectively than a single agent alone. LangGraph’s supervisor pattern generalizes this to arbitrary numbers of specialized agents, each with distinct capabilities and prompts.
The Economics of Stateful AI Workflows
Section titled “The Economics of Stateful AI Workflows”Cost Components
Section titled “Cost Components”| Component | Without LangGraph | With LangGraph |
|---|---|---|
| State Management | Custom code ($50K+ dev time) | Built-in |
| Crash Recovery | Manual replay (lost time + tokens) | Automatic resume |
| Human Review | Custom tooling | Native interrupts |
| Debugging | Log analysis | State replay |
ROI Calculation
Section titled “ROI Calculation”A typical enterprise deployment comparison:
Without LangGraph (ad-hoc solution):- Development time: 3 months (senior engineer)- Crash-related rework: 15% of runs need manual intervention- Human review integration: Additional 2 weeks- Debugging production issues: 10 hours/week
With LangGraph:- Development time: 1 month (same engineer)- Crash-related rework: <1% (auto-resume handles most)- Human review: Built-in interrupts- Debugging: State replay eliminates most investigation
Annual savings for a team running 10K workflows/month:- Engineering time: ~$100K- Operational overhead: ~$50K- Reduced failures: ~$30K (API costs from restarts)Total: ~$180K/yearDid You Know? A 2024 survey of AI engineering teams found that those using LangGraph spent 60% less time on “orchestration plumbing” than teams using ad-hoc solutions. The freed time went into improving agent quality—prompt engineering, evaluation, and capability expansion.
Key Takeaways
Section titled “Key Takeaways”-
LangGraph treats workflows as state machines, not pipelines. This fundamental shift enables cycles, conditional transitions, and persistent state—capabilities that chains can’t express.
-
State design is the most important decision. Minimal, typed, explicit about accumulation. Get state wrong and everything else becomes hard.
-
Checkpointing transforms reliability. With persistent state, crashes become non-events. Workflows resume automatically, and you have a complete audit trail of what happened.
-
Human-in-the-loop is a first-class feature. Interrupts let workflows pause for approval, feedback, or intervention—then resume with the human’s input incorporated.
-
The supervisor pattern scales multi-agent coordination. A coordinator agent routes to specialists. Workers do work; supervisors decide what work to do next.
-
Cycles enable iterative refinement. Search-analyze-search, write-test-fix, draft-review-revise—these patterns are natural in LangGraph, impossible in chains.
-
Reducers prevent state corruption. Explicit rules for how updates merge mean concurrent modifications are deterministic, not race conditions.
-
Subgraphs enable composition. Complex workflows build from reusable components. A validation subgraph can be used across multiple parent workflows.
-
Always set recursion limits. Without limits, a bug in routing logic can create infinite loops. Defensively set maximum iterations.
-
LangGraph’s value is operational, not just technical. The framework doesn’t make AI smarter—it makes AI systems reliable, auditable, and maintainable in production.
The shift from chains to graphs isn’t just syntactic—it’s philosophical. Chains assume you know the path upfront. Graphs assume the path emerges dynamically from execution state. That fundamental flexibility is what makes LangGraph indispensable for complex AI systems. Master it, and you’ve mastered the art of production-grade AI engineering.
Summary
Section titled “Summary”What You Learned
Section titled “What You Learned”- LangGraph Architecture: StateGraph, nodes, edges, reducers
- State Management: TypedDict with annotations for complex state
- Conditional Routing: Dynamic paths based on state
- Cycles: Iterative refinement and retry patterns
- Multi-Agent: Supervisor pattern, parallel execution
- Human-in-the-Loop: Interrupts and resumption
- Checkpointing: Persistence for production reliability
Key Concepts
Section titled “Key Concepts”| Concept | Purpose |
|---|---|
| StateGraph | Container for workflow definition |
| Nodes | Processing functions that update state |
| Edges | Define flow between nodes |
| Reducers | How state updates are merged |
| Checkpointer | Enables pause/resume and persistence |
| Interrupts | Human-in-the-loop integration |
When to Use LangGraph
Section titled “When to Use LangGraph”Use LangGraph when you need:
- Cycles/loops in your workflow
- Complex state management
- Human approval steps
- Multi-agent coordination
- Production reliability (checkpointing)
Stick with LCEL/chains when:
- Simple linear workflows
- No need for cycles
- Stateless processing
- Quick prototyping
Further Reading
Section titled “Further Reading”Next Steps
Section titled “Next Steps”With LangGraph mastered, you’re ready for:
- Module 19: LlamaIndex & Alternative Frameworks
- Compare LangChain + LangGraph with LlamaIndex approach
- Explore CrewAI, AutoGen, and other multi-agent frameworks
You’ve unlocked the power of stateful AI workflows!
Module 18 Complete! Progress: 21/56 modules (38%)
Next: Module 19 - LlamaIndex & Alternative Frameworks