Skip to content

Building AI Agents

AI/ML Engineering Track | Complexity: [COMPLEX] | Time: 6-8

Building AI Agents: The Framework Ecosystem

Section titled “Building AI Agents: The Framework Ecosystem”

Reading Time: 6-8 hours Prerequisites: Module 18

In early 2024, Air Canada was forced by a civil tribunal to honor a non-existent bereavement fare policy that had been hallucinated by their custom-built customer service chatbot. Although the tribunal award itself was modest, the incident showed that customer-facing LLM systems can create legal, operational, and reputational risk when they are not grounded in current policy data.

In late 2022, developers building with early LLMs quickly ran into a recurring problem: models could reason fluently about public information but could not reliably answer questions grounded in private company data without retrieval and indexing systems.

The lesson here is foundational for modern AI engineering: models are commoditized compute, but your proprietary data is your competitive moat. Every enterprise has access to the same foundational models via APIs. The only differentiator is how effectively an organization can connect those models to their internal knowledge bases and orchestrate them to take meaningful action. In this module, you will explore the ecosystem of frameworks that have emerged to solve this problem, learning how to select, deploy, and scale the right framework for the right architectural challenge, avoiding the catastrophic failures of early adopters.

By the end of this module, you will be able to:

  • Evaluate the architectural trade-offs between composable chain frameworks and data-centric indexing systems.
  • Design a role-based multi-agent orchestration system to distribute complex reasoning tasks.
  • Implement a production-ready Retrieval-Augmented Generation pipeline bridging multiple framework paradigms.
  • Diagnose common failure modes in multi-agent deployments on Kubernetes, such as context window exhaustion and synchronization deadlocks.
  • Compare framework half-life and enterprise adoption metrics to formulate sustainable, long-term tooling strategies.

The Framework Landscape: A Map of the Territory

Section titled “The Framework Landscape: A Map of the Territory”

You have already explored the foundational concepts of LLM interaction. However, the AI framework ecosystem is rich with alternative paradigms, each with distinct philosophies, strengths, and optimal deployment targets. Choosing a framework is akin to choosing a primary programming language: Python, Rust, and Go all solve computational problems, but they make wildly different trade-offs regarding memory safety, execution speed, and developer ergonomics.

Different engineering teams solve AI challenges differently, compounding into distinct operational philosophies. Understanding these philosophies prevents costly architectural rewrites months into a project.

FrameworkPhilosophyBest For
LangChainComposable chains, flexibilityGeneral-purpose LLM apps
LlamaIndexData-centric, indexing focusRAG and knowledge systems
CrewAIRole-based agents, simplicityMulti-agent collaboration
AutoGenConversational agentsResearch, complex dialogues
Semantic KernelEnterprise, Microsoft stack.NET/Azure integration
HaystackSearch-firstProduction search systems

Think of the landscape as a set of specialized tools. LangChain provides a massive, general-purpose workshop where you can build anything from scratch. LlamaIndex acts as a highly optimized, automated factory specifically for document retrieval. CrewAI functions as a corporate org chart simulator, while AutoGen operates as a debate room for specialized scholars.

The hierarchy of modern AI application frameworks generally splits into three major domains: general composition, pure data indexing, and multi-agent orchestration.

graph TD
Root[AI Application Frameworks]
Root --> LC[LangChain / LangGraph]
Root --> LI[LlamaIndex]
Root --> MA[Multi-Agent]
LC --> C[Chains]
LC --> A[Agents]
LC --> M[Memory]
LC --> T[Tools]
LI --> IDX[Index Types]
LI --> QE[Query Engine]
LI --> RAG[RAG]
MA --> CR[CrewAI]
MA --> AU[AutoGen]
CR --> ROL[Roles]
AU --> AG[Agents]

Pause and predict: If you were tasked with building a system that automatically generates quarterly financial reports by pulling from 15 different relational database tables and 30 PDF compliance documents, which framework’s philosophy makes the most sense as your foundation?

LlamaIndex is a strict data framework for building LLM applications. While other frameworks focus heavily on how agents chain thoughts together, LlamaIndex focuses on a completely different trinity of operations:

  1. Data Ingestion: Standardizing connections to vastly different data sources.
  2. Data Indexing: Structuring the ingested data mathematically and hierarchically for ultra-efficient retrieval.
  3. Query Interface: Providing a natural language translation layer to query the structured indexes.

Consider LlamaIndex as an expert librarian. Other frameworks represent the patron asking questions and taking notes. The librarian knows exactly where every book is located, understands the taxonomy of the library, and can retrieve the exact paragraph needed in seconds.

The architecture functions identically to a modern data pipeline, treating the LLM not as a master controller, but as a final synthesis engine.

flowchart LR
subgraph LlamaIndex
direction LR
DL[Data Loaders] --> IT[Index Types]
IT --> QE[Query Engines]
end
DL -.-> PDF[PDF, CSV]
DL -.-> Web[Web, API]
DL -.-> DB[Database]
DL -.-> Notion[Notion]
IT -.-> Vec[Vector]
IT -.-> Sum[Summary]
IT -.-> KG[Knowledge]
IT -.-> Tree[Tree]
QE -.-> RAG2[RAG]
QE -.-> Chat[Chat]
QE -.-> SQL[SQL]
QE -.-> Agent[Agent]

LlamaIndex maintains a massive registry of data connectors. These act as flexible adapters, normalizing many heterogeneous data sources into a standard Document object. Whether the data is locked in a PostgreSQL database, a live webpage, or a massive PDF, the loader handles the extraction seamlessly.

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.database import DatabaseReader
# Load from directory - handles PDF, DOCX, TXT, etc.
documents = SimpleDirectoryReader("./data").load_data()
# Load from web - scrapes and converts to text
web_docs = SimpleWebPageReader(html_to_text=True).load_data(
["https://example.com/page1", "https://example.com/page2"]
)
# Load from database - runs SQL, converts rows to documents
db_reader = DatabaseReader(
sql_database="postgresql://user:pass@host:5432/db"
)
documents = db_reader.load_data(query="SELECT * FROM articles")

Once data is loaded, it must be indexed. The choice of index dictates how the LLM will navigate the data. Choosing the wrong index leads to massive token waste and poor retrieval accuracy.

from llama_index.core import (
VectorStoreIndex,
SummaryIndex,
TreeIndex,
KeywordTableIndex
)
# Vector Index - best for semantic search
# Like organizing books by topic rather than title
vector_index = VectorStoreIndex.from_documents(documents)
# Summary Index - good for summarization
# Like having cliff notes for every book
summary_index = SummaryIndex.from_documents(documents)
# Tree Index - hierarchical organization
# Like organizing books by category > subcategory > title
tree_index = TreeIndex.from_documents(documents)
# Keyword Index - good for exact matches
# Like a traditional library card catalog
keyword_index = KeywordTableIndex.from_documents(documents)

Query engines are the interface between the user’s natural language and the structured index. They handle the complex routing, retrieval, and synthesis of the final answer. They abstract away the prompt formatting needed to inject context into the LLM.

# Basic query engine
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
# With retrieval settings
query_engine = index.as_query_engine(
similarity_top_k=5, # Return top 5 most relevant chunks
response_mode="compact" # Summarize results
)
# Chat engine (maintains conversation context)
chat_engine = index.as_chat_engine()
response = chat_engine.chat("Tell me about the document")
follow_up = chat_engine.chat("Can you elaborate?") # Remembers context

Building a complete Retrieval-Augmented Generation pipeline is where LlamaIndex’s opinionated defaults shine. Notice how it handles chunking, embedding generation, and prompt formatting implicitly.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
# 1. Load documents (handles chunking internally)
documents = SimpleDirectoryReader("./data").load_data()
# 2. Create index (embeds and stores)
index = VectorStoreIndex.from_documents(documents)
# 3. Create query engine
query_engine = index.as_query_engine(
llm=OpenAI(model="gpt-5"),
similarity_top_k=3
)
# 4. Query
response = query_engine.query("What are the key findings?")
print(response)

As enterprise requirements scale, basic vector similarity search falls short. LlamaIndex provides advanced query architectures out of the box to handle multi-hop reasoning and diverse data silos.

Complex queries often require multi-hop reasoning. The Sub-Question engine acts as an orchestrator, breaking down a complex prompt into discrete tasks, querying the appropriate indexes, and synthesizing the results.

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# Create tools from multiple indexes
tools = [
QueryEngineTool(
query_engine=financial_index.as_query_engine(),
metadata=ToolMetadata(
name="financial_data",
description="Financial reports and metrics"
)
),
QueryEngineTool(
query_engine=product_index.as_query_engine(),
metadata=ToolMetadata(
name="product_data",
description="Product documentation"
)
)
]
# Sub-question engine decomposes complex queries automatically
query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=tools)
response = query_engine.query(
"How did Q3 product launches affect revenue?"
)
# Internally: "What products launched in Q3?" + "What was Q3 revenue?" → synthesis

When dealing with massive organizational data, it is inefficient to query everything simultaneously. The Router Query Engine uses an LLM decision-maker to route the user’s query only to the relevant specialist index.

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(), # LLM decides which index to use
query_engine_tools=[
QueryEngineTool(
query_engine=technical_index.as_query_engine(),
metadata=ToolMetadata(
name="technical",
description="Technical documentation and code"
)
),
QueryEngineTool(
query_engine=business_index.as_query_engine(),
metadata=ToolMetadata(
name="business",
description="Business processes and policies"
)
)
]
)

For deep semantic relationships (e.g., “Company X acquired Company Y, which owns Patent Z”), vector search is insufficient. LlamaIndex can automatically extract triplets to build queryable knowledge graphs.

from llama_index.core import KnowledgeGraphIndex
# Create knowledge graph from documents
kg_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=3, # Extract relationships
include_embeddings=True # Also enable semantic search
)
# Query with graph traversal
query_engine = kg_index.as_query_engine(
include_text=True,
retriever_mode="keyword",
response_mode="tree_summarize"
)

LangChain vs LlamaIndex: A Deep Comparison

Section titled “LangChain vs LlamaIndex: A Deep Comparison”

A common architectural failure is choosing a framework based on popularity rather than alignment with the engineering goal.

AspectLangChainLlamaIndex
Core FocusChains, agents, toolsData indexing, retrieval
Mental Model”Building blocks for LLM apps""Data framework for LLMs”
StrengthFlexibility, agent patternsRAG, knowledge management
ComplexityHigher learning curveMore opinionated, simpler
Use CaseGeneral-purposeData-intensive apps

Choose LangChain when:

  • Building complex agent systems that require executing diverse external tools.
  • Maximum flexibility in architectural flow is required.
  • You need deep, stateful workflows via LangGraph.

Choose LlamaIndex when:

  • Retrieval-Augmented Generation is the primary requirement.
  • You are working with massive, heterogeneous data sources that require unified ingestion.
  • You want a production-ready RAG setup with sensible, mathematically sound defaults.

Observe the difference in verbosity and control between the two frameworks. LangChain exposes the raw mechanics of chunking and embedding, while LlamaIndex abstracts them.

LangChain RAG (explicit, flexible):

from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
# Load and split - you control every step
loader = DirectoryLoader("./data")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
chunks = splitter.split_documents(documents)
# Create vector store - explicit embedding configuration
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
# Create chain - wire up retriever and LLM
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(),
retriever=vectorstore.as_retriever()
)
# Query
response = qa_chain.invoke("What is the main topic?")

LlamaIndex RAG (opinionated, concise):

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load and index - handles splitting internally with smart defaults
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Query - one line
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")

The most robust enterprise architectures do not treat these frameworks as mutually exclusive. You can utilize LlamaIndex for superior data ingestion and indexing, and wrap it as a tool for a LangChain orchestration agent. This combines the best of both worlds.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain.tools import Tool
# LlamaIndex for indexing - leverage its data handling
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# Wrap as LangChain tool - bridge the frameworks
def search_docs(query: str) -> str:
response = query_engine.query(query)
return str(response)
search_tool = Tool(
name="document_search",
func=search_docs,
description="Search internal documents for information"
)
# Use in LangChain agent - leverage its agent patterns
agent = create_tool_calling_agent(llm, [search_tool], prompt)
executor = AgentExecutor(agent=agent, tools=[search_tool])

CrewAI is a framework specifically engineered for orchestrating role-playing AI agents. Unlike LangGraph, which forces you to think in terms of state machines, cyclic graphs, and mathematical nodes, CrewAI’s abstraction is modeled entirely on human team dynamics. You construct teams by defining roles, responsibilities, and collaborative goals.

flowchart LR
subgraph CrewAI
direction LR
Agent[Agent<br/>- Role<br/>- Goal<br/>- Tools]
Task[Task<br/>- Goal<br/>- Agent<br/>- Output]
Crew[Crew<br/>- Team<br/>- Tasks<br/>- Flow]
Agent --> Task
Task --> Crew
end

Notice how declarative the configuration is. You define the “who” and the “what,” and the framework handles the “how” of the handoffs, passing the text output of one agent securely as the input context for the next.

from crewai import Agent, Task, Crew, Process
# Define agents with roles - like casting actors
researcher = Agent(
role="Senior Research Analyst",
goal="Find comprehensive information about AI frameworks",
backstory="You are an expert at finding and analyzing technical information.",
tools=[search_tool, web_scraper],
verbose=True
)
writer = Agent(
role="Technical Writer",
goal="Create clear, engaging technical content",
backstory="You specialize in making complex topics accessible.",
verbose=True
)
reviewer = Agent(
role="Quality Reviewer",
goal="Ensure content accuracy and clarity",
backstory="You have a keen eye for errors and unclear explanations.",
verbose=True
)
# Define tasks - like scenes in a script
research_task = Task(
description="Research the latest AI framework developments in 2024",
agent=researcher,
expected_output="Comprehensive research notes with sources"
)
writing_task = Task(
description="Write a blog post based on the research",
agent=writer,
expected_output="1500-word blog post in markdown",
context=[research_task] # Uses output from research
)
review_task = Task(
description="Review and improve the blog post",
agent=reviewer,
expected_output="Edited blog post with improvements",
context=[writing_task]
)
# Create crew - like assembling the production
crew = Crew(
agents=[researcher, writer, reviewer],
tasks=[research_task, writing_task, review_task],
process=Process.sequential # or Process.hierarchical
)
# Run the crew - action!
result = crew.kickoff()

When deciding how to orchestrate your agents, evaluate the strictness of your required workflow.

AspectCrewAILangGraph
Mental ModelHuman teams, rolesState machines, graphs
FlexibilityMore opinionatedMore flexible
Learning CurveLowerHigher
Complex WorkflowsLimitedExcellent
CustomizationModerateHigh

Conversational Multi-Agent Systems with AutoGen

Section titled “Conversational Multi-Agent Systems with AutoGen”

Originating from Microsoft Research, AutoGen focuses heavily on conversational multi-agent systems. Instead of executing rigid pipelines or passing JSON objects between nodes, AutoGen agents communicate natively through raw chat interfaces. They debate, correct each other, and dynamically decide when a task is finished based on conversation history.

sequenceDiagram
participant A as Agent A
participant B as Agent B
participant C as Agent C
A->>B: Messages (I'll research...)
B->>A: Based on that...
A->>C: Messages
C->>A: Let me synthesize...

AutoGen excels when incorporating human-in-the-loop feedback seamlessly into the workflow. In the example below, the system pauses execution and demands human validation before executing potentially destructive generated code.

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
# Create agents - each with a persona
assistant = AssistantAgent(
name="assistant",
llm_config={"model": "gpt-5"},
system_message="You are a helpful AI assistant."
)
coder = AssistantAgent(
name="coder",
llm_config={"model": "gpt-5"},
system_message="You write Python code to solve problems."
)
critic = AssistantAgent(
name="critic",
llm_config={"model": "gpt-5"},
system_message="You review code and suggest improvements."
)
# User proxy for human-in-the-loop
user_proxy = UserProxyAgent(
name="user",
human_input_mode="TERMINATE", # or "ALWAYS" for full control
code_execution_config={"work_dir": "coding"}
)
# Group chat - agents converse naturally
group_chat = GroupChat(
agents=[user_proxy, assistant, coder, critic],
messages=[],
max_round=10
)
manager = GroupChatManager(groupchat=group_chat)
# Start conversation
user_proxy.initiate_chat(
manager,
message="Create a Python function to calculate fibonacci numbers"
)

Stop and think: How would you containerize a multi-agent system like AutoGen? Would you run all the agents in a single Kubernetes Pod, or distribute them across multiple Deployments using a message broker? What are the profound trade-offs regarding state management and network latency in both approaches?

Beyond the major players, specialized frameworks dominate specific architectural niches in enterprise environments.

Semantic Kernel is tailored for enterprise environments deeply entrenched in the Microsoft ecosystem. It brings robust support for Azure, .NET, and strictly typed languages, bypassing the fragility of Python scripts in enterprise Windows shops.

import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
kernel = sk.Kernel()
kernel.add_service(AzureChatCompletion(
deployment_name="gpt-5",
endpoint="https://your-resource.openai.azure.com/",
api_key="your-key"
))
# Create semantic function - natural language as code
summarize = kernel.create_semantic_function(
"Summarize this text: {{$input}}",
max_tokens=200
)
result = await kernel.invoke(summarize, input="Long text here...")

Haystack focuses on search-oriented pipelines with explicit control over retrieval, routing, memory, and generation in production applications.

from haystack import Pipeline
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
# Build pipeline - search-first design
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store))
pipeline.add_component("prompt_builder", PromptBuilder(template="""
Context: {{documents}}
Question: {{query}}
Answer:
"""))
pipeline.add_component("generator", OpenAIGenerator())
pipeline.connect("retriever", "prompt_builder.documents")
pipeline.connect("prompt_builder", "generator")
result = pipeline.run({"query": "What is the capital of France?"})

DSPy fundamentally reimagines how LLM pipelines are built. Instead of hand-writing string prompts, developers define the signatures (inputs and outputs), and DSPy acts as a compiler, automatically optimizing the prompts mathematically for the specific model and dataset.

import dspy
# Configure
lm = dspy.OpenAI(model="gpt-5")
dspy.settings.configure(lm=lm)
# Define signature - what, not how
class QA(dspy.Signature):
"""Answer questions based on context."""
context = dspy.InputField()
question = dspy.InputField()
answer = dspy.OutputField()
# Create module - DSPy optimizes prompts automatically
qa = dspy.ChainOfThought(QA)
# Use - no manual prompt engineering needed
result = qa(context="Paris is the capital of France", question="What is the capital?")

Choosing incorrectly at the start of a project results in months of accumulated technical debt.

Follow this logic tree to narrow down your primary architectural foundation.

graph TD
Q[What's your primary use case?]
Q --> R[RAG / Knowledge]
Q --> A[Agent / Workflow]
Q --> M[Multi-Agent]
R --> LI[LlamaIndex<br/>simpler setup]
A --> LC[LangChain<br/>flexible]
LC --> LG[LangGraph<br/>complex state]
M --> CR[CrewAI<br/>roles]
M --> AU[AutoGen<br/>chat]
NeedRecommended Framework
Simple RAGLlamaIndex
Complex RAG with custom logicLangChain
Stateful agent workflowsLangGraph
Role-based team of agentsCrewAI
Conversational agent researchAutoGen
Enterprise/AzureSemantic Kernel
Search-first applicationHaystack
Prompt optimizationDSPy

Enterprise adoption proves that framework usage is highly specialized by domain. Notice that hyperscalers often build custom tooling, but utilize off-the-shelf frameworks for rapid iteration.

CompanyFrameworkScaleUse Case
Example knowledge platformretrieval-focused frameworklarge user basedocument-centric Q&A
Example developer platformgraph-based orchestrationlarge user baseAI coding assistance
Example customer-service deploymentframework-based orchestrationhigh conversation volumecustomer service
Example payments companycustom stack plus retrieval toolinglarge business scalefraud and risk workflows
Example commerce platformcustom stacklarge merchant baseproduct-content generation
Example streaming platformcustom recommendation stacklarge user baserecommendations

A recent engineering survey of Fortune 500 companies using AI frameworks revealed surprising patterns regarding adoption:

FrameworkFortune 500 AdoptionSurprise Factor
LangChainwidely usedExpected
LlamaIndexmeaningfully usedHigher than expected
Custom builtstill commonThey build their own!
CrewAIemergingFast for a new framework
AutoGenmore nicheMostly research teams

Do not over-engineer Day 1 architecture. Master the basics of retrieval before orchestrating a swarm of agents. Bringing in CrewAI before you understand basic vector math is a recipe for un-debuggable systems.

# Don't do this first:
from crewai import Agent, Task, Crew, Process
from langchain.agents import create_tool_calling_agent
from llama_index.core import VectorStoreIndex
# Do this first - learn one framework deeply:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

Because the AI ecosystem is incredibly volatile, never tightly couple your business logic to a specific framework’s syntax. Use abstract wrappers. If a framework is abandoned, your underlying application logic remains safe.

# Good: Framework-agnostic interface
class RAGSystem:
def __init__(self, implementation: str = "llamaindex"):
if implementation == "llamaindex":
self._engine = LlamaIndexRAG()
elif implementation == "langchain":
self._engine = LangChainRAG()
def query(self, question: str) -> str:
return self._engine.query(question)

Theoretical performance is irrelevant. Benchmark on your proprietary datasets. A framework that perfectly indexes Wikipedia might fail catastrophically on your internal legal PDFs.

# Compare frameworks on YOUR data
frameworks = ["llamaindex", "langchain", "haystack"]
test_queries = load_test_queries()
for framework in frameworks:
engine = create_engine(framework)
results = [engine.query(q) for q in test_queries]
print(f"{framework}: {evaluate(results)}")
  1. The LlamaIndex Origin: LlamaIndex emerged in late 2022 and quickly became one of the prominent open-source frameworks for retrieval-oriented LLM applications.
  2. The CrewAI Viral Moment: CrewAI rose quickly in early 2024 as developers looked for simpler multi-agent abstractions than lower-level orchestration frameworks.
  3. The DSPy Compiler: DSPy reframes prompt engineering as programming with declarative signatures and compiler-style optimization.
  4. The Framework Half-Life: The agent-framework ecosystem changes quickly enough that abstraction layers and migration planning matter.
MistakeWhy It HappensHow to Fix It
Defaulting to LangChain for simple RAGEngineers assume the most popular framework is best for every use case, leading to over-engineering.Start with a retrieval-focused framework or even a simpler custom baseline when document retrieval is the main requirement.
Ignoring document chunking strategyDevelopers use default ingestion paths without considering context window limits, causing truncation errors.Tune your chunking and retrieval settings to fit your model and dataset.
Hardcoding API keys in agent scriptsRapid prototyping habits leak into production, exposing secrets in logs or git history.Utilize Kubernetes Secrets, properly mounted as environment variables in your Pod definitions.
Over-complicating sequential tasksUsing LangGraph’s complex state machines for simple linear workflows creates debugging nightmares.Use CrewAI’s Process.sequential to define straightforward, role-based handoffs with zero boilerplate.
Over-prompting single agentsAssigning too many distinct responsibilities to one agent confuses the model’s attention mechanism.Enforce the Single Responsibility Principle: split complex agents into multiple, hyper-focused sub-agents.
Ignoring the framework half-lifeAdopting abandoned tools based on outdated blog posts leaves systems vulnerable and unpatchable.Audit GitHub commit velocity and enterprise backing before committing to a core orchestration framework.
Deploying agents without resource limitsAgents generating infinite loops or loading massive indexes crash nodes via OOMKilled events.Always define strict Kubernetes CPU/Memory requests and limits, and implement circuit breakers in agent loops.

Hands-On Exercise: Deploying a Multi-Agent RAG Pipeline to Kubernetes v1.35

Section titled “Hands-On Exercise: Deploying a Multi-Agent RAG Pipeline to Kubernetes v1.35”

In this lab, you will build a robust multi-framework application. You will use LlamaIndex to index a local dataset, wrap that index as a tool, and orchestrate a CrewAI team to research the data. Finally, you will containerize and deploy this pipeline to a modern Kubernetes cluster.

First, provision your Python environment and install the required dependencies.

Terminal window
python3 -m venv ai-env
source ai-env/bin/activate
pip install llama-index crewai langchain openai kubernetes==28.1.0
mkdir -p data
echo "KubeDojo provides advanced Kubernetes and AI training modules." > data/context.txt

Task 2: Implementing the Dual-Framework Script

Section titled “Task 2: Implementing the Dual-Framework Script”

Create a file named main.py. This script bridges the robust data capabilities of LlamaIndex with the precise orchestration power of CrewAI.

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from langchain.tools import Tool
from crewai import Agent, Task, Crew, Process
# Ensure API Key is loaded
if "OPENAI_API_KEY" not in os.environ:
raise ValueError("OPENAI_API_KEY environment variable missing.")
# 1. LlamaIndex setup
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# 2. Tool Bridge
def search_docs(query: str) -> str:
return str(query_engine.query(query))
search_tool = Tool(
name="Local Data Search",
func=search_docs,
description="Search local documents for specific training context."
)
# 3. CrewAI Setup
researcher = Agent(
role="Data Analyst",
goal="Extract core concepts from the provided tool.",
backstory="You are a precise data extraction specialist.",
tools=[search_tool],
allow_delegation=False,
verbose=True
)
research_task = Task(
description="Find out what KubeDojo provides.",
expected_output="A one sentence summary.",
agent=researcher
)
crew = Crew(
agents=[researcher],
tasks=[research_task],
process=Process.sequential
)
if __name__ == "__main__":
result = crew.kickoff()
print("FINAL RESULT:", result)

Before containerizing the application, you must verify the script executes flawlessly on your local machine.

Terminal window
# Execute the script locally to verify the dual-framework logic
python main.py

Create a Dockerfile to package your application securely. Using a slim Python image reduces the attack surface area and network overhead during cluster deployments.

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY main.py .
COPY data/ ./data/
CMD ["python", "main.py"]

Build the image locally.

Terminal window
pip freeze > requirements.txt
docker build -t ai-agent-job:v1.0 .

If you are running this lab in a local Minikube environment rather than a cloud provider, you must load the newly built image directly into the Minikube cluster node before deployment.

Terminal window
# If using Minikube, load the image directly into the cluster
minikube image load ai-agent-job:v1.0

Task 4: Provisioning the Kubernetes Environment

Section titled “Task 4: Provisioning the Kubernetes Environment”

Ensure your cluster is running Kubernetes v1.35+. Create a dedicated namespace and store your API key securely. Never commit secrets to source control.

Terminal window
kubectl create namespace ai-agents
kubectl create secret generic ai-secrets \
--from-literal=openai-key="sk-your-actual-key-here" \
-n ai-agents

Immediately verify that the namespace and secrets were generated correctly before proceeding.

Terminal window
# Verify the resources were created successfully
kubectl get namespace ai-agents
kubectl get secret ai-secrets -n ai-agents

Create a file named job.yaml targeting the batch API. Using a Job is usually appropriate for finite AI execution tasks, whereas Deployments are typically better suited to always-on API servers.

# Tested on Kubernetes v1.35
apiVersion: batch/v1
kind: Job
metadata:
name: rag-agent-job
namespace: ai-agents
spec:
backoffLimit: 2
template:
spec:
containers:
- name: agent-container
image: ai-agent-job:v1.0
imagePullPolicy: IfNotPresent
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-secrets
key: openai-key
resources:
limits:
memory: "512Mi"
cpu: "500m"
restartPolicy: Never

Apply the configuration, wait for completion, and monitor the output to confirm success.

Terminal window
kubectl apply -f job.yaml
kubectl wait --for=condition=complete job/rag-agent-job -n ai-agents --timeout=120s
kubectl logs -l job-name=rag-agent-job -n ai-agents
Task Success Checklist
  • Python virtual environment activates without dependency conflicts.
  • main.py successfully executes locally and queries the text file.
  • Docker container builds successfully.
  • Kubernetes cluster validates the batch/v1 Job specification.
  • Pod logs output the final CrewAI result without OOMKilled or API credential errors.
Question 1: The High-Stakes Legal Firm (Scenario)
A global law firm is building a system to query thousands of historical case files. The system must cite exact paragraphs from the PDFs and handle massive document ingestion. They do not need the AI to take external actions, just retrieve information perfectly. Which framework is most appropriate and why?

Answer: LlamaIndex is the most appropriate framework for this scenario. The firm's primary requirement is robust document ingestion, chunking, and Retrieval-Augmented Generation (RAG) rather than complex tool orchestration. LlamaIndex is fundamentally designed as a data framework, offering optimized SimpleDirectoryReader components and advanced indexing strategies specifically built to connect large datasets to LLMs. LangChain would introduce unnecessary architectural complexity without adding value for a pure retrieval task.
Question 2: The E-Commerce Pipeline (Scenario)
An e-commerce startup wants to build a system where an "Analyst" agent evaluates product trends, a "Writer" agent drafts marketing copy, and a "Reviewer" agent approves the text. The workflow is strictly sequential and relies on specific persona instructions. Which framework minimizes boilerplate for this implementation?

Answer: CrewAI is the ideal choice for this scenario. It utilizes a declarative, role-based mental model that perfectly aligns with human-like team structures. Developers can define specific agents with backstories, assign them sequential tasks, and kick off the process with minimal boilerplate. Implementing this strictly sequential persona flow in LangGraph would require manually defining state schemas and node transitions, which is overkill for a simple linear workflow.
Question 3: The Persistent Backend (Scenario)
You are migrating a legacy backend to an AI-driven system that requires complex state management, conditional loops, and memory persistence across long-running, multi-step asynchronous processes. Which orchestration framework is best suited for this robust engineering requirement?

Answer: LangGraph is the best framework for this use case. Unlike standard LangChain chains or CrewAI tasks, LangGraph is built on a state-machine architecture designed explicitly for complex, non-linear workflows with branching logic and cycles. It allows engineers to persist state natively across long-running executions, making it resilient and highly customizable for deep backend integrations.
Question 4: The Microsoft Ecosystem Integration (Scenario)
Your enterprise application runs entirely on Azure, is written primarily in C#, and relies on heavily governed internal microservices. Your CTO wants to integrate LLM routing capabilities. Which framework presents the lowest friction for integration?

Answer: Semantic Kernel is the optimal choice. Developed by Microsoft, it is deeply integrated with the .NET ecosystem and Azure AI services, offering enterprise-grade support and native typed language bindings. Attempting to force Python-native frameworks like LlamaIndex or CrewAI into a strict C# enterprise environment would introduce significant operational friction and language barrier complexities.
Question 5: The Enterprise Architecture Debate (Scenario)
Your engineering team is divided. Half the team wants to build a system focused on agent orchestration and tool execution, treating the LLM as a central reasoning engine. The other half insists on centering the design around data ingestion, indexing topologies, and query optimization. You must decide which framework aligns with each camp. Which frameworks correspond to these two distinct architectural approaches, and why?

Answer: LangChain approaches design from the perspective of agent orchestration and tool execution (the "how"), treating the LLM as a central reasoning engine that interacts with the outside world. Conversely, LlamaIndex centers heavily on data ingestion, indexing topologies, and query optimization (the "what"), treating the LLM primarily as a synthesis interface for structured proprietary data.
Question 6: The Runaway Pod (Scenario)
You have deployed an AutoGen system to a Kubernetes v1.35 cluster. During a seemingly simple task, the memory consumption of the Pod rapidly inflates, eventually resulting in an OOMKilled crash. Based on AutoGen's primary mode of interaction, why did this happen and how does the framework's design cause this specific resource exhaustion?

Answer: AutoGen models agent interactions as a continuous conversational group chat where agents debate and share context natively. This matters for Kubernetes resource limits because conversational logs can grow exponentially in complex debates, rapidly inflating memory consumption within the Pod. If the context window is not strictly managed, the container will inevitably crash with an OOMKilled error.
Question 7: The Silent Freeze (Scenario)
A startup deployed a CrewAI application, but the sequential process occasionally hangs indefinitely without throwing a formal Python exception. What is the most likely diagnostic cause of this failure mode and how should it be addressed?

Answer: The most likely cause is that an agent is caught in a tool execution loop or is awaiting a response from an external API that lacks a proper timeout configuration. Because CrewAI abstractions hide the underlying execution loops, a hanging network call or a model continuously re-attempting a failed tool use will freeze the sequential process. Implementing strict timeouts and circuit breakers on all external tool calls is required to mitigate this.
Question 8: The Model Migration Crisis (Scenario)
Your company is migrating from GPT-4 to Claude 3.5 Sonnet to reduce costs. However, all of your hand-crafted, string-based prompts have completely broken down, and accuracy has plummeted. Your engineering lead suggests adopting DSPy to resolve this fragility. How does DSPy function as a "compiler" to solve this specific migration problem?

Answer: DSPy acts as a compiler by allowing developers to define declarative signatures (inputs and expected outputs) rather than hand-writing string-based prompts. It solves the fragility problem of prompt engineering: when switching between underlying models (e.g., from GPT-4 to Claude), hand-crafted prompts often break. DSPy automatically optimizes and rewrites the prompts mathematically based on training examples, ensuring consistent accuracy across different models.

The ecosystem of AI frameworks is vast, but understanding the core philosophies behind the tools prevents costly architectural mistakes.

FrameworkPhilosophyBest For
LlamaIndexData frameworkRAG, knowledge systems
LangChainComposable chainsGeneral LLM apps
LangGraphState machinesComplex workflows
CrewAIRole-based teamsMulti-agent collab
AutoGenConversationsResearch, dialogues
  1. No single “best” framework: Choose based strictly on your engineering use case.
  2. Frameworks are converging: Interoperability (e.g., using LlamaIndex inside LangChain) is the modern standard.
  3. LlamaIndex excels at RAG: It is significantly simpler and more robust than LangChain for deep indexing.
  4. CrewAI is beginner-friendly: The role-based paradigm is excellent for multi-agent prototypes.
  5. Your data is your moat: Focus on securely indexing your proprietary data; the LLM is merely the engine.

With your foundational framework knowledge complete, you are fully equipped for the complexities of advanced orchestration.

  • Module 20: Advanced Agentic AI
  • Explore agent memory systems (short-term, long-term, episodic).
  • Implement advanced planning algorithms (ReWOO, Plan-and-Execute).
  • Master advanced multi-agent collaborative topologies.

You are now prepared to choose the exact right tool for your engineering challenges.


Module 1.5 Complete!

Next: Module 20 - Advanced Agentic AI

  • github.com: llama index — The official LlamaIndex repository README describes LlamaIndex as a data framework and explicitly lists connectors, indices/graphs, and retrieval/query interfaces.
  • github.com: crewai — The official CrewAI repository describes the project as orchestrating role-playing autonomous AI agents and documents the hierarchical process with a manager.
  • github.com: autogen — The official AutoGen repository says the project was pioneered in Microsoft Research and documents AgentChat support for group-chat patterns.
  • github.com: semantic kernel — The official Semantic Kernel repository README describes the framework this way and lists its supported runtimes and model integrations.
  • github.com: dspy — The official DSPy repository README explicitly presents DSPy as programming, not prompting, and describes its declarative modular approach.
  • kubernetes.io: distribute credentials secure — The Kubernetes task documentation explicitly covers using Secrets and exposing them to containers as environment variables.
  • kubernetes.io: manage resources containers — The Kubernetes resource-management documentation covers requests/limits and explains OOM-related behavior under memory pressure.
  • LangGraph Repository — Useful counterpart to the module’s orchestration discussion because it documents stateful, long-running graph-based agents.
  • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation — Primary paper for the conversational multi-agent model discussed in the AutoGen section.