Building AI Agents
Цей контент ще не доступний вашою мовою.
AI/ML Engineering Track | Complexity:
[COMPLEX]| Time: 6-8
Building AI Agents: The Framework Ecosystem
Section titled “Building AI Agents: The Framework Ecosystem”Reading Time: 6-8 hours Prerequisites: Module 18
Why This Module Matters
Section titled “Why This Module Matters”In early 2024, Air Canada was forced by a civil tribunal to honor a non-existent bereavement fare policy that had been hallucinated by their custom-built customer service chatbot. Although the tribunal award itself was modest, the incident showed that customer-facing LLM systems can create legal, operational, and reputational risk when they are not grounded in current policy data.
In late 2022, developers building with early LLMs quickly ran into a recurring problem: models could reason fluently about public information but could not reliably answer questions grounded in private company data without retrieval and indexing systems.
The lesson here is foundational for modern AI engineering: models are commoditized compute, but your proprietary data is your competitive moat. Every enterprise has access to the same foundational models via APIs. The only differentiator is how effectively an organization can connect those models to their internal knowledge bases and orchestrate them to take meaningful action. In this module, you will explore the ecosystem of frameworks that have emerged to solve this problem, learning how to select, deploy, and scale the right framework for the right architectural challenge, avoiding the catastrophic failures of early adopters.
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”By the end of this module, you will be able to:
- Evaluate the architectural trade-offs between composable chain frameworks and data-centric indexing systems.
- Design a role-based multi-agent orchestration system to distribute complex reasoning tasks.
- Implement a production-ready Retrieval-Augmented Generation pipeline bridging multiple framework paradigms.
- Diagnose common failure modes in multi-agent deployments on Kubernetes, such as context window exhaustion and synchronization deadlocks.
- Compare framework half-life and enterprise adoption metrics to formulate sustainable, long-term tooling strategies.
The Framework Landscape: A Map of the Territory
Section titled “The Framework Landscape: A Map of the Territory”You have already explored the foundational concepts of LLM interaction. However, the AI framework ecosystem is rich with alternative paradigms, each with distinct philosophies, strengths, and optimal deployment targets. Choosing a framework is akin to choosing a primary programming language: Python, Rust, and Go all solve computational problems, but they make wildly different trade-offs regarding memory safety, execution speed, and developer ergonomics.
Different engineering teams solve AI challenges differently, compounding into distinct operational philosophies. Understanding these philosophies prevents costly architectural rewrites months into a project.
| Framework | Philosophy | Best For |
|---|---|---|
| LangChain | Composable chains, flexibility | General-purpose LLM apps |
| LlamaIndex | Data-centric, indexing focus | RAG and knowledge systems |
| CrewAI | Role-based agents, simplicity | Multi-agent collaboration |
| AutoGen | Conversational agents | Research, complex dialogues |
| Semantic Kernel | Enterprise, Microsoft stack | .NET/Azure integration |
| Haystack | Search-first | Production search systems |
Think of the landscape as a set of specialized tools. LangChain provides a massive, general-purpose workshop where you can build anything from scratch. LlamaIndex acts as a highly optimized, automated factory specifically for document retrieval. CrewAI functions as a corporate org chart simulator, while AutoGen operates as a debate room for specialized scholars.
The Big Picture
Section titled “The Big Picture”The hierarchy of modern AI application frameworks generally splits into three major domains: general composition, pure data indexing, and multi-agent orchestration.
graph TD Root[AI Application Frameworks] Root --> LC[LangChain / LangGraph] Root --> LI[LlamaIndex] Root --> MA[Multi-Agent]
LC --> C[Chains] LC --> A[Agents] LC --> M[Memory] LC --> T[Tools]
LI --> IDX[Index Types] LI --> QE[Query Engine] LI --> RAG[RAG]
MA --> CR[CrewAI] MA --> AU[AutoGen]
CR --> ROL[Roles] AU --> AG[Agents]Pause and predict: If you were tasked with building a system that automatically generates quarterly financial reports by pulling from 15 different relational database tables and 30 PDF compliance documents, which framework’s philosophy makes the most sense as your foundation?
LlamaIndex: The Data Framework for LLMs
Section titled “LlamaIndex: The Data Framework for LLMs”What is LlamaIndex?
Section titled “What is LlamaIndex?”LlamaIndex is a strict data framework for building LLM applications. While other frameworks focus heavily on how agents chain thoughts together, LlamaIndex focuses on a completely different trinity of operations:
- Data Ingestion: Standardizing connections to vastly different data sources.
- Data Indexing: Structuring the ingested data mathematically and hierarchically for ultra-efficient retrieval.
- Query Interface: Providing a natural language translation layer to query the structured indexes.
Consider LlamaIndex as an expert librarian. Other frameworks represent the patron asking questions and taking notes. The librarian knows exactly where every book is located, understands the taxonomy of the library, and can retrieve the exact paragraph needed in seconds.
LlamaIndex Architecture
Section titled “LlamaIndex Architecture”The architecture functions identically to a modern data pipeline, treating the LLM not as a master controller, but as a final synthesis engine.
flowchart LR subgraph LlamaIndex direction LR DL[Data Loaders] --> IT[Index Types] IT --> QE[Query Engines] end DL -.-> PDF[PDF, CSV] DL -.-> Web[Web, API] DL -.-> DB[Database] DL -.-> Notion[Notion]
IT -.-> Vec[Vector] IT -.-> Sum[Summary] IT -.-> KG[Knowledge] IT -.-> Tree[Tree]
QE -.-> RAG2[RAG] QE -.-> Chat[Chat] QE -.-> SQL[SQL] QE -.-> Agent[Agent]Key Components
Section titled “Key Components”1. Data Connectors (Loaders)
Section titled “1. Data Connectors (Loaders)”LlamaIndex maintains a massive registry of data connectors. These act as flexible adapters, normalizing many heterogeneous data sources into a standard Document object. Whether the data is locked in a PostgreSQL database, a live webpage, or a massive PDF, the loader handles the extraction seamlessly.
from llama_index.core import SimpleDirectoryReaderfrom llama_index.readers.web import SimpleWebPageReaderfrom llama_index.readers.database import DatabaseReader
# Load from directory - handles PDF, DOCX, TXT, etc.documents = SimpleDirectoryReader("./data").load_data()
# Load from web - scrapes and converts to textweb_docs = SimpleWebPageReader(html_to_text=True).load_data( ["https://example.com/page1", "https://example.com/page2"])
# Load from database - runs SQL, converts rows to documentsdb_reader = DatabaseReader( sql_database="postgresql://user:pass@host:5432/db")documents = db_reader.load_data(query="SELECT * FROM articles")2. Index Types
Section titled “2. Index Types”Once data is loaded, it must be indexed. The choice of index dictates how the LLM will navigate the data. Choosing the wrong index leads to massive token waste and poor retrieval accuracy.
from llama_index.core import ( VectorStoreIndex, SummaryIndex, TreeIndex, KeywordTableIndex)
# Vector Index - best for semantic search# Like organizing books by topic rather than titlevector_index = VectorStoreIndex.from_documents(documents)
# Summary Index - good for summarization# Like having cliff notes for every booksummary_index = SummaryIndex.from_documents(documents)
# Tree Index - hierarchical organization# Like organizing books by category > subcategory > titletree_index = TreeIndex.from_documents(documents)
# Keyword Index - good for exact matches# Like a traditional library card catalogkeyword_index = KeywordTableIndex.from_documents(documents)3. Query Engines
Section titled “3. Query Engines”Query engines are the interface between the user’s natural language and the structured index. They handle the complex routing, retrieval, and synthesis of the final answer. They abstract away the prompt formatting needed to inject context into the LLM.
# Basic query enginequery_engine = index.as_query_engine()response = query_engine.query("What is the main topic?")
# With retrieval settingsquery_engine = index.as_query_engine( similarity_top_k=5, # Return top 5 most relevant chunks response_mode="compact" # Summarize results)
# Chat engine (maintains conversation context)chat_engine = index.as_chat_engine()response = chat_engine.chat("Tell me about the document")follow_up = chat_engine.chat("Can you elaborate?") # Remembers contextLlamaIndex RAG Pipeline
Section titled “LlamaIndex RAG Pipeline”Building a complete Retrieval-Augmented Generation pipeline is where LlamaIndex’s opinionated defaults shine. Notice how it handles chunking, embedding generation, and prompt formatting implicitly.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.llms.openai import OpenAI
# 1. Load documents (handles chunking internally)documents = SimpleDirectoryReader("./data").load_data()
# 2. Create index (embeds and stores)index = VectorStoreIndex.from_documents(documents)
# 3. Create query enginequery_engine = index.as_query_engine( llm=OpenAI(model="gpt-5"), similarity_top_k=3)
# 4. Queryresponse = query_engine.query("What are the key findings?")print(response)Advanced LlamaIndex Features
Section titled “Advanced LlamaIndex Features”As enterprise requirements scale, basic vector similarity search falls short. LlamaIndex provides advanced query architectures out of the box to handle multi-hop reasoning and diverse data silos.
Sub-Question Query Engine
Section titled “Sub-Question Query Engine”Complex queries often require multi-hop reasoning. The Sub-Question engine acts as an orchestrator, breaking down a complex prompt into discrete tasks, querying the appropriate indexes, and synthesizing the results.
from llama_index.core.query_engine import SubQuestionQueryEnginefrom llama_index.core.tools import QueryEngineTool, ToolMetadata
# Create tools from multiple indexestools = [ QueryEngineTool( query_engine=financial_index.as_query_engine(), metadata=ToolMetadata( name="financial_data", description="Financial reports and metrics" ) ), QueryEngineTool( query_engine=product_index.as_query_engine(), metadata=ToolMetadata( name="product_data", description="Product documentation" ) )]
# Sub-question engine decomposes complex queries automaticallyquery_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=tools)response = query_engine.query( "How did Q3 product launches affect revenue?")# Internally: "What products launched in Q3?" + "What was Q3 revenue?" → synthesisRouter Query Engine
Section titled “Router Query Engine”When dealing with massive organizational data, it is inefficient to query everything simultaneously. The Router Query Engine uses an LLM decision-maker to route the user’s query only to the relevant specialist index.
from llama_index.core.query_engine import RouterQueryEnginefrom llama_index.core.selectors import LLMSingleSelector
query_engine = RouterQueryEngine( selector=LLMSingleSelector.from_defaults(), # LLM decides which index to use query_engine_tools=[ QueryEngineTool( query_engine=technical_index.as_query_engine(), metadata=ToolMetadata( name="technical", description="Technical documentation and code" ) ), QueryEngineTool( query_engine=business_index.as_query_engine(), metadata=ToolMetadata( name="business", description="Business processes and policies" ) ) ])Knowledge Graph Index
Section titled “Knowledge Graph Index”For deep semantic relationships (e.g., “Company X acquired Company Y, which owns Patent Z”), vector search is insufficient. LlamaIndex can automatically extract triplets to build queryable knowledge graphs.
from llama_index.core import KnowledgeGraphIndex
# Create knowledge graph from documentskg_index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=3, # Extract relationships include_embeddings=True # Also enable semantic search)
# Query with graph traversalquery_engine = kg_index.as_query_engine( include_text=True, retriever_mode="keyword", response_mode="tree_summarize")LangChain vs LlamaIndex: A Deep Comparison
Section titled “LangChain vs LlamaIndex: A Deep Comparison”A common architectural failure is choosing a framework based on popularity rather than alignment with the engineering goal.
Philosophy Comparison
Section titled “Philosophy Comparison”| Aspect | LangChain | LlamaIndex |
|---|---|---|
| Core Focus | Chains, agents, tools | Data indexing, retrieval |
| Mental Model | ”Building blocks for LLM apps" | "Data framework for LLMs” |
| Strength | Flexibility, agent patterns | RAG, knowledge management |
| Complexity | Higher learning curve | More opinionated, simpler |
| Use Case | General-purpose | Data-intensive apps |
Choose LangChain when:
- Building complex agent systems that require executing diverse external tools.
- Maximum flexibility in architectural flow is required.
- You need deep, stateful workflows via LangGraph.
Choose LlamaIndex when:
- Retrieval-Augmented Generation is the primary requirement.
- You are working with massive, heterogeneous data sources that require unified ingestion.
- You want a production-ready RAG setup with sensible, mathematically sound defaults.
Code Comparison: Basic RAG
Section titled “Code Comparison: Basic RAG”Observe the difference in verbosity and control between the two frameworks. LangChain exposes the raw mechanics of chunking and embedding, while LlamaIndex abstracts them.
LangChain RAG (explicit, flexible):
from langchain_community.document_loaders import DirectoryLoaderfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain_community.vectorstores import Chromafrom langchain_openai import OpenAIEmbeddings, ChatOpenAIfrom langchain.chains import RetrievalQA
# Load and split - you control every steploader = DirectoryLoader("./data")documents = loader.load()splitter = RecursiveCharacterTextSplitter(chunk_size=1000)chunks = splitter.split_documents(documents)
# Create vector store - explicit embedding configurationvectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
# Create chain - wire up retriever and LLMqa_chain = RetrievalQA.from_chain_type( llm=ChatOpenAI(), retriever=vectorstore.as_retriever())
# Queryresponse = qa_chain.invoke("What is the main topic?")LlamaIndex RAG (opinionated, concise):
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load and index - handles splitting internally with smart defaultsdocuments = SimpleDirectoryReader("./data").load_data()index = VectorStoreIndex.from_documents(documents)
# Query - one linequery_engine = index.as_query_engine()response = query_engine.query("What is the main topic?")Using Both Together
Section titled “Using Both Together”The most robust enterprise architectures do not treat these frameworks as mutually exclusive. You can utilize LlamaIndex for superior data ingestion and indexing, and wrap it as a tool for a LangChain orchestration agent. This combines the best of both worlds.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom langchain.agents import create_tool_calling_agent, AgentExecutorfrom langchain.tools import Tool
# LlamaIndex for indexing - leverage its data handlingdocuments = SimpleDirectoryReader("./data").load_data()index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()
# Wrap as LangChain tool - bridge the frameworksdef search_docs(query: str) -> str: response = query_engine.query(query) return str(response)
search_tool = Tool( name="document_search", func=search_docs, description="Search internal documents for information")
# Use in LangChain agent - leverage its agent patternsagent = create_tool_calling_agent(llm, [search_tool], prompt)executor = AgentExecutor(agent=agent, tools=[search_tool])Multi-Agent Orchestration with CrewAI
Section titled “Multi-Agent Orchestration with CrewAI”What is CrewAI?
Section titled “What is CrewAI?”CrewAI is a framework specifically engineered for orchestrating role-playing AI agents. Unlike LangGraph, which forces you to think in terms of state machines, cyclic graphs, and mathematical nodes, CrewAI’s abstraction is modeled entirely on human team dynamics. You construct teams by defining roles, responsibilities, and collaborative goals.
Core Concepts
Section titled “Core Concepts”flowchart LR subgraph CrewAI direction LR Agent[Agent<br/>- Role<br/>- Goal<br/>- Tools] Task[Task<br/>- Goal<br/>- Agent<br/>- Output] Crew[Crew<br/>- Team<br/>- Tasks<br/>- Flow] Agent --> Task Task --> Crew endCrewAI Example
Section titled “CrewAI Example”Notice how declarative the configuration is. You define the “who” and the “what,” and the framework handles the “how” of the handoffs, passing the text output of one agent securely as the input context for the next.
from crewai import Agent, Task, Crew, Process
# Define agents with roles - like casting actorsresearcher = Agent( role="Senior Research Analyst", goal="Find comprehensive information about AI frameworks", backstory="You are an expert at finding and analyzing technical information.", tools=[search_tool, web_scraper], verbose=True)
writer = Agent( role="Technical Writer", goal="Create clear, engaging technical content", backstory="You specialize in making complex topics accessible.", verbose=True)
reviewer = Agent( role="Quality Reviewer", goal="Ensure content accuracy and clarity", backstory="You have a keen eye for errors and unclear explanations.", verbose=True)
# Define tasks - like scenes in a scriptresearch_task = Task( description="Research the latest AI framework developments in 2024", agent=researcher, expected_output="Comprehensive research notes with sources")
writing_task = Task( description="Write a blog post based on the research", agent=writer, expected_output="1500-word blog post in markdown", context=[research_task] # Uses output from research)
review_task = Task( description="Review and improve the blog post", agent=reviewer, expected_output="Edited blog post with improvements", context=[writing_task])
# Create crew - like assembling the productioncrew = Crew( agents=[researcher, writer, reviewer], tasks=[research_task, writing_task, review_task], process=Process.sequential # or Process.hierarchical)
# Run the crew - action!result = crew.kickoff()CrewAI vs LangGraph
Section titled “CrewAI vs LangGraph”When deciding how to orchestrate your agents, evaluate the strictness of your required workflow.
| Aspect | CrewAI | LangGraph |
|---|---|---|
| Mental Model | Human teams, roles | State machines, graphs |
| Flexibility | More opinionated | More flexible |
| Learning Curve | Lower | Higher |
| Complex Workflows | Limited | Excellent |
| Customization | Moderate | High |
Conversational Multi-Agent Systems with AutoGen
Section titled “Conversational Multi-Agent Systems with AutoGen”What is AutoGen?
Section titled “What is AutoGen?”Originating from Microsoft Research, AutoGen focuses heavily on conversational multi-agent systems. Instead of executing rigid pipelines or passing JSON objects between nodes, AutoGen agents communicate natively through raw chat interfaces. They debate, correct each other, and dynamically decide when a task is finished based on conversation history.
Core Concept
Section titled “Core Concept”sequenceDiagram participant A as Agent A participant B as Agent B participant C as Agent C
A->>B: Messages (I'll research...) B->>A: Based on that... A->>C: Messages C->>A: Let me synthesize...AutoGen Example
Section titled “AutoGen Example”AutoGen excels when incorporating human-in-the-loop feedback seamlessly into the workflow. In the example below, the system pauses execution and demands human validation before executing potentially destructive generated code.
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
# Create agents - each with a personaassistant = AssistantAgent( name="assistant", llm_config={"model": "gpt-5"}, system_message="You are a helpful AI assistant.")
coder = AssistantAgent( name="coder", llm_config={"model": "gpt-5"}, system_message="You write Python code to solve problems.")
critic = AssistantAgent( name="critic", llm_config={"model": "gpt-5"}, system_message="You review code and suggest improvements.")
# User proxy for human-in-the-loopuser_proxy = UserProxyAgent( name="user", human_input_mode="TERMINATE", # or "ALWAYS" for full control code_execution_config={"work_dir": "coding"})
# Group chat - agents converse naturallygroup_chat = GroupChat( agents=[user_proxy, assistant, coder, critic], messages=[], max_round=10)
manager = GroupChatManager(groupchat=group_chat)
# Start conversationuser_proxy.initiate_chat( manager, message="Create a Python function to calculate fibonacci numbers")Stop and think: How would you containerize a multi-agent system like AutoGen? Would you run all the agents in a single Kubernetes Pod, or distribute them across multiple Deployments using a message broker? What are the profound trade-offs regarding state management and network latency in both approaches?
Other Notable Frameworks in the Ecosystem
Section titled “Other Notable Frameworks in the Ecosystem”Beyond the major players, specialized frameworks dominate specific architectural niches in enterprise environments.
Semantic Kernel (Microsoft)
Section titled “Semantic Kernel (Microsoft)”Semantic Kernel is tailored for enterprise environments deeply entrenched in the Microsoft ecosystem. It brings robust support for Azure, .NET, and strictly typed languages, bypassing the fragility of Python scripts in enterprise Windows shops.
import semantic_kernel as skfrom semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
kernel = sk.Kernel()kernel.add_service(AzureChatCompletion( deployment_name="gpt-5", endpoint="https://your-resource.openai.azure.com/", api_key="your-key"))
# Create semantic function - natural language as codesummarize = kernel.create_semantic_function( "Summarize this text: {{$input}}", max_tokens=200)
result = await kernel.invoke(summarize, input="Long text here...")Haystack
Section titled “Haystack”Haystack focuses on search-oriented pipelines with explicit control over retrieval, routing, memory, and generation in production applications.
from haystack import Pipelinefrom haystack.components.retrievers import InMemoryEmbeddingRetrieverfrom haystack.components.generators import OpenAIGeneratorfrom haystack.components.builders import PromptBuilder
# Build pipeline - search-first designpipeline = Pipeline()pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store))pipeline.add_component("prompt_builder", PromptBuilder(template=""" Context: {{documents}} Question: {{query}} Answer:"""))pipeline.add_component("generator", OpenAIGenerator())
pipeline.connect("retriever", "prompt_builder.documents")pipeline.connect("prompt_builder", "generator")
result = pipeline.run({"query": "What is the capital of France?"})DSPy (Stanford)
Section titled “DSPy (Stanford)”DSPy fundamentally reimagines how LLM pipelines are built. Instead of hand-writing string prompts, developers define the signatures (inputs and outputs), and DSPy acts as a compiler, automatically optimizing the prompts mathematically for the specific model and dataset.
import dspy
# Configurelm = dspy.OpenAI(model="gpt-5")dspy.settings.configure(lm=lm)
# Define signature - what, not howclass QA(dspy.Signature): """Answer questions based on context.""" context = dspy.InputField() question = dspy.InputField() answer = dspy.OutputField()
# Create module - DSPy optimizes prompts automaticallyqa = dspy.ChainOfThought(QA)
# Use - no manual prompt engineering neededresult = qa(context="Paris is the capital of France", question="What is the capital?")Framework Selection Guide
Section titled “Framework Selection Guide”Choosing incorrectly at the start of a project results in months of accumulated technical debt.
Decision Framework
Section titled “Decision Framework”Follow this logic tree to narrow down your primary architectural foundation.
graph TD Q[What's your primary use case?] Q --> R[RAG / Knowledge] Q --> A[Agent / Workflow] Q --> M[Multi-Agent]
R --> LI[LlamaIndex<br/>simpler setup] A --> LC[LangChain<br/>flexible] LC --> LG[LangGraph<br/>complex state] M --> CR[CrewAI<br/>roles] M --> AU[AutoGen<br/>chat]Quick Decision Table
Section titled “Quick Decision Table”| Need | Recommended Framework |
|---|---|
| Simple RAG | LlamaIndex |
| Complex RAG with custom logic | LangChain |
| Stateful agent workflows | LangGraph |
| Role-based team of agents | CrewAI |
| Conversational agent research | AutoGen |
| Enterprise/Azure | Semantic Kernel |
| Search-first application | Haystack |
| Prompt optimization | DSPy |
Real Production Usage Examples
Section titled “Real Production Usage Examples”Enterprise adoption proves that framework usage is highly specialized by domain. Notice that hyperscalers often build custom tooling, but utilize off-the-shelf frameworks for rapid iteration.
| Company | Framework | Scale | Use Case |
|---|---|---|---|
| Example knowledge platform | retrieval-focused framework | large user base | document-centric Q&A |
| Example developer platform | graph-based orchestration | large user base | AI coding assistance |
| Example customer-service deployment | framework-based orchestration | high conversation volume | customer service |
| Example payments company | custom stack plus retrieval tooling | large business scale | fraud and risk workflows |
| Example commerce platform | custom stack | large merchant base | product-content generation |
| Example streaming platform | custom recommendation stack | large user base | recommendations |
Enterprise Adoption Patterns
Section titled “Enterprise Adoption Patterns”A recent engineering survey of Fortune 500 companies using AI frameworks revealed surprising patterns regarding adoption:
| Framework | Fortune 500 Adoption | Surprise Factor |
|---|---|---|
| LangChain | widely used | Expected |
| LlamaIndex | meaningfully used | Higher than expected |
| Custom built | still common | They build their own! |
| CrewAI | emerging | Fast for a new framework |
| AutoGen | more niche | Mostly research teams |
Engineering Best Practices
Section titled “Engineering Best Practices”1. Start Simple
Section titled “1. Start Simple”Do not over-engineer Day 1 architecture. Master the basics of retrieval before orchestrating a swarm of agents. Bringing in CrewAI before you understand basic vector math is a recipe for un-debuggable systems.
# Don't do this first:from crewai import Agent, Task, Crew, Processfrom langchain.agents import create_tool_calling_agentfrom llama_index.core import VectorStoreIndex
# Do this first - learn one framework deeply:from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderdocuments = SimpleDirectoryReader("./data").load_data()index = VectorStoreIndex.from_documents(documents)2. Abstract Your Framework
Section titled “2. Abstract Your Framework”Because the AI ecosystem is incredibly volatile, never tightly couple your business logic to a specific framework’s syntax. Use abstract wrappers. If a framework is abandoned, your underlying application logic remains safe.
# Good: Framework-agnostic interfaceclass RAGSystem: def __init__(self, implementation: str = "llamaindex"): if implementation == "llamaindex": self._engine = LlamaIndexRAG() elif implementation == "langchain": self._engine = LangChainRAG()
def query(self, question: str) -> str: return self._engine.query(question)3. Benchmark Before Committing
Section titled “3. Benchmark Before Committing”Theoretical performance is irrelevant. Benchmark on your proprietary datasets. A framework that perfectly indexes Wikipedia might fail catastrophically on your internal legal PDFs.
# Compare frameworks on YOUR dataframeworks = ["llamaindex", "langchain", "haystack"]test_queries = load_test_queries()
for framework in frameworks: engine = create_engine(framework) results = [engine.query(q) for q in test_queries] print(f"{framework}: {evaluate(results)}")Did You Know?
Section titled “Did You Know?”- The LlamaIndex Origin: LlamaIndex emerged in late 2022 and quickly became one of the prominent open-source frameworks for retrieval-oriented LLM applications.
- The CrewAI Viral Moment: CrewAI rose quickly in early 2024 as developers looked for simpler multi-agent abstractions than lower-level orchestration frameworks.
- The DSPy Compiler: DSPy reframes prompt engineering as programming with declarative signatures and compiler-style optimization.
- The Framework Half-Life: The agent-framework ecosystem changes quickly enough that abstraction layers and migration planning matter.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| Defaulting to LangChain for simple RAG | Engineers assume the most popular framework is best for every use case, leading to over-engineering. | Start with a retrieval-focused framework or even a simpler custom baseline when document retrieval is the main requirement. |
| Ignoring document chunking strategy | Developers use default ingestion paths without considering context window limits, causing truncation errors. | Tune your chunking and retrieval settings to fit your model and dataset. |
| Hardcoding API keys in agent scripts | Rapid prototyping habits leak into production, exposing secrets in logs or git history. | Utilize Kubernetes Secrets, properly mounted as environment variables in your Pod definitions. |
| Over-complicating sequential tasks | Using LangGraph’s complex state machines for simple linear workflows creates debugging nightmares. | Use CrewAI’s Process.sequential to define straightforward, role-based handoffs with zero boilerplate. |
| Over-prompting single agents | Assigning too many distinct responsibilities to one agent confuses the model’s attention mechanism. | Enforce the Single Responsibility Principle: split complex agents into multiple, hyper-focused sub-agents. |
| Ignoring the framework half-life | Adopting abandoned tools based on outdated blog posts leaves systems vulnerable and unpatchable. | Audit GitHub commit velocity and enterprise backing before committing to a core orchestration framework. |
| Deploying agents without resource limits | Agents generating infinite loops or loading massive indexes crash nodes via OOMKilled events. | Always define strict Kubernetes CPU/Memory requests and limits, and implement circuit breakers in agent loops. |
Hands-On Exercise: Deploying a Multi-Agent RAG Pipeline to Kubernetes v1.35
Section titled “Hands-On Exercise: Deploying a Multi-Agent RAG Pipeline to Kubernetes v1.35”In this lab, you will build a robust multi-framework application. You will use LlamaIndex to index a local dataset, wrap that index as a tool, and orchestrate a CrewAI team to research the data. Finally, you will containerize and deploy this pipeline to a modern Kubernetes cluster.
Task 1: Environment Setup & Local Testing
Section titled “Task 1: Environment Setup & Local Testing”First, provision your Python environment and install the required dependencies.
python3 -m venv ai-envsource ai-env/bin/activatepip install llama-index crewai langchain openai kubernetes==28.1.0mkdir -p dataecho "KubeDojo provides advanced Kubernetes and AI training modules." > data/context.txtTask 2: Implementing the Dual-Framework Script
Section titled “Task 2: Implementing the Dual-Framework Script”Create a file named main.py. This script bridges the robust data capabilities of LlamaIndex with the precise orchestration power of CrewAI.
import osfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom langchain.tools import Toolfrom crewai import Agent, Task, Crew, Process
# Ensure API Key is loadedif "OPENAI_API_KEY" not in os.environ: raise ValueError("OPENAI_API_KEY environment variable missing.")
# 1. LlamaIndex setupdocuments = SimpleDirectoryReader("./data").load_data()index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()
# 2. Tool Bridgedef search_docs(query: str) -> str: return str(query_engine.query(query))
search_tool = Tool( name="Local Data Search", func=search_docs, description="Search local documents for specific training context.")
# 3. CrewAI Setupresearcher = Agent( role="Data Analyst", goal="Extract core concepts from the provided tool.", backstory="You are a precise data extraction specialist.", tools=[search_tool], allow_delegation=False, verbose=True)
research_task = Task( description="Find out what KubeDojo provides.", expected_output="A one sentence summary.", agent=researcher)
crew = Crew( agents=[researcher], tasks=[research_task], process=Process.sequential)
if __name__ == "__main__": result = crew.kickoff() print("FINAL RESULT:", result)Before containerizing the application, you must verify the script executes flawlessly on your local machine.
# Execute the script locally to verify the dual-framework logicpython main.pyTask 3: Containerizing the AI Agent
Section titled “Task 3: Containerizing the AI Agent”Create a Dockerfile to package your application securely. Using a slim Python image reduces the attack surface area and network overhead during cluster deployments.
FROM python:3.11-slimWORKDIR /appCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txtCOPY main.py .COPY data/ ./data/CMD ["python", "main.py"]Build the image locally.
pip freeze > requirements.txtdocker build -t ai-agent-job:v1.0 .If you are running this lab in a local Minikube environment rather than a cloud provider, you must load the newly built image directly into the Minikube cluster node before deployment.
# If using Minikube, load the image directly into the clusterminikube image load ai-agent-job:v1.0Task 4: Provisioning the Kubernetes Environment
Section titled “Task 4: Provisioning the Kubernetes Environment”Ensure your cluster is running Kubernetes v1.35+. Create a dedicated namespace and store your API key securely. Never commit secrets to source control.
kubectl create namespace ai-agentskubectl create secret generic ai-secrets \ --from-literal=openai-key="sk-your-actual-key-here" \ -n ai-agentsImmediately verify that the namespace and secrets were generated correctly before proceeding.
# Verify the resources were created successfullykubectl get namespace ai-agentskubectl get secret ai-secrets -n ai-agentsTask 5: Deploying to Kubernetes v1.35
Section titled “Task 5: Deploying to Kubernetes v1.35”Create a file named job.yaml targeting the batch API. Using a Job is usually appropriate for finite AI execution tasks, whereas Deployments are typically better suited to always-on API servers.
# Tested on Kubernetes v1.35apiVersion: batch/v1kind: Jobmetadata: name: rag-agent-job namespace: ai-agentsspec: backoffLimit: 2 template: spec: containers: - name: agent-container image: ai-agent-job:v1.0 imagePullPolicy: IfNotPresent env: - name: OPENAI_API_KEY valueFrom: secretKeyRef: name: ai-secrets key: openai-key resources: limits: memory: "512Mi" cpu: "500m" restartPolicy: NeverApply the configuration, wait for completion, and monitor the output to confirm success.
kubectl apply -f job.yamlkubectl wait --for=condition=complete job/rag-agent-job -n ai-agents --timeout=120skubectl logs -l job-name=rag-agent-job -n ai-agentsTask Success Checklist
- Python virtual environment activates without dependency conflicts.
-
main.pysuccessfully executes locally and queries the text file. - Docker container builds successfully.
- Kubernetes cluster validates the
batch/v1Job specification. - Pod logs output the final CrewAI result without OOMKilled or API credential errors.
Module Quiz
Section titled “Module Quiz”Question 1: The High-Stakes Legal Firm (Scenario)
A global law firm is building a system to query thousands of historical case files. The system must cite exact paragraphs from the PDFs and handle massive document ingestion. They do not need the AI to take external actions, just retrieve information perfectly. Which framework is most appropriate and why?
Answer: LlamaIndex is the most appropriate framework for this scenario. The firm's primary requirement is robust document ingestion, chunking, and Retrieval-Augmented Generation (RAG) rather than complex tool orchestration. LlamaIndex is fundamentally designed as a data framework, offering optimized
SimpleDirectoryReader components and advanced indexing strategies specifically built to connect large datasets to LLMs. LangChain would introduce unnecessary architectural complexity without adding value for a pure retrieval task.
Question 2: The E-Commerce Pipeline (Scenario)
An e-commerce startup wants to build a system where an "Analyst" agent evaluates product trends, a "Writer" agent drafts marketing copy, and a "Reviewer" agent approves the text. The workflow is strictly sequential and relies on specific persona instructions. Which framework minimizes boilerplate for this implementation?
Answer: CrewAI is the ideal choice for this scenario. It utilizes a declarative, role-based mental model that perfectly aligns with human-like team structures. Developers can define specific agents with backstories, assign them sequential tasks, and kick off the process with minimal boilerplate. Implementing this strictly sequential persona flow in LangGraph would require manually defining state schemas and node transitions, which is overkill for a simple linear workflow.
Question 3: The Persistent Backend (Scenario)
You are migrating a legacy backend to an AI-driven system that requires complex state management, conditional loops, and memory persistence across long-running, multi-step asynchronous processes. Which orchestration framework is best suited for this robust engineering requirement?
Answer: LangGraph is the best framework for this use case. Unlike standard LangChain chains or CrewAI tasks, LangGraph is built on a state-machine architecture designed explicitly for complex, non-linear workflows with branching logic and cycles. It allows engineers to persist state natively across long-running executions, making it resilient and highly customizable for deep backend integrations.
Question 4: The Microsoft Ecosystem Integration (Scenario)
Your enterprise application runs entirely on Azure, is written primarily in C#, and relies on heavily governed internal microservices. Your CTO wants to integrate LLM routing capabilities. Which framework presents the lowest friction for integration?
Answer: Semantic Kernel is the optimal choice. Developed by Microsoft, it is deeply integrated with the .NET ecosystem and Azure AI services, offering enterprise-grade support and native typed language bindings. Attempting to force Python-native frameworks like LlamaIndex or CrewAI into a strict C# enterprise environment would introduce significant operational friction and language barrier complexities.
Question 5: The Enterprise Architecture Debate (Scenario)
Your engineering team is divided. Half the team wants to build a system focused on agent orchestration and tool execution, treating the LLM as a central reasoning engine. The other half insists on centering the design around data ingestion, indexing topologies, and query optimization. You must decide which framework aligns with each camp. Which frameworks correspond to these two distinct architectural approaches, and why?
Answer: LangChain approaches design from the perspective of agent orchestration and tool execution (the "how"), treating the LLM as a central reasoning engine that interacts with the outside world. Conversely, LlamaIndex centers heavily on data ingestion, indexing topologies, and query optimization (the "what"), treating the LLM primarily as a synthesis interface for structured proprietary data.
Question 6: The Runaway Pod (Scenario)
You have deployed an AutoGen system to a Kubernetes v1.35 cluster. During a seemingly simple task, the memory consumption of the Pod rapidly inflates, eventually resulting in an OOMKilled crash. Based on AutoGen's primary mode of interaction, why did this happen and how does the framework's design cause this specific resource exhaustion?
Answer: AutoGen models agent interactions as a continuous conversational group chat where agents debate and share context natively. This matters for Kubernetes resource limits because conversational logs can grow exponentially in complex debates, rapidly inflating memory consumption within the Pod. If the context window is not strictly managed, the container will inevitably crash with an OOMKilled error.
Question 7: The Silent Freeze (Scenario)
A startup deployed a CrewAI application, but the sequential process occasionally hangs indefinitely without throwing a formal Python exception. What is the most likely diagnostic cause of this failure mode and how should it be addressed?
Answer: The most likely cause is that an agent is caught in a tool execution loop or is awaiting a response from an external API that lacks a proper timeout configuration. Because CrewAI abstractions hide the underlying execution loops, a hanging network call or a model continuously re-attempting a failed tool use will freeze the sequential process. Implementing strict timeouts and circuit breakers on all external tool calls is required to mitigate this.
Question 8: The Model Migration Crisis (Scenario)
Your company is migrating from GPT-4 to Claude 3.5 Sonnet to reduce costs. However, all of your hand-crafted, string-based prompts have completely broken down, and accuracy has plummeted. Your engineering lead suggests adopting DSPy to resolve this fragility. How does DSPy function as a "compiler" to solve this specific migration problem?
Answer: DSPy acts as a compiler by allowing developers to define declarative signatures (inputs and expected outputs) rather than hand-writing string-based prompts. It solves the fragility problem of prompt engineering: when switching between underlying models (e.g., from GPT-4 to Claude), hand-crafted prompts often break. DSPy automatically optimizes and rewrites the prompts mathematically based on training examples, ensuring consistent accuracy across different models.
Summary
Section titled “Summary”The ecosystem of AI frameworks is vast, but understanding the core philosophies behind the tools prevents costly architectural mistakes.
| Framework | Philosophy | Best For |
|---|---|---|
| LlamaIndex | Data framework | RAG, knowledge systems |
| LangChain | Composable chains | General LLM apps |
| LangGraph | State machines | Complex workflows |
| CrewAI | Role-based teams | Multi-agent collab |
| AutoGen | Conversations | Research, dialogues |
- No single “best” framework: Choose based strictly on your engineering use case.
- Frameworks are converging: Interoperability (e.g., using LlamaIndex inside LangChain) is the modern standard.
- LlamaIndex excels at RAG: It is significantly simpler and more robust than LangChain for deep indexing.
- CrewAI is beginner-friendly: The role-based paradigm is excellent for multi-agent prototypes.
- Your data is your moat: Focus on securely indexing your proprietary data; the LLM is merely the engine.
Further Reading
Section titled “Further Reading”- LlamaIndex Documentation
- CrewAI Documentation
- AutoGen Documentation
- DSPy Documentation
- Haystack Documentation
Next Steps
Section titled “Next Steps”With your foundational framework knowledge complete, you are fully equipped for the complexities of advanced orchestration.
- Module 20: Advanced Agentic AI
- Explore agent memory systems (short-term, long-term, episodic).
- Implement advanced planning algorithms (ReWOO, Plan-and-Execute).
- Master advanced multi-agent collaborative topologies.
You are now prepared to choose the exact right tool for your engineering challenges.
Module 1.5 Complete!
Next: Module 20 - Advanced Agentic AI
Sources
Section titled “Sources”- github.com: llama index — The official LlamaIndex repository README describes LlamaIndex as a data framework and explicitly lists connectors, indices/graphs, and retrieval/query interfaces.
- github.com: crewai — The official CrewAI repository describes the project as orchestrating role-playing autonomous AI agents and documents the hierarchical process with a manager.
- github.com: autogen — The official AutoGen repository says the project was pioneered in Microsoft Research and documents AgentChat support for group-chat patterns.
- github.com: semantic kernel — The official Semantic Kernel repository README describes the framework this way and lists its supported runtimes and model integrations.
- github.com: dspy — The official DSPy repository README explicitly presents DSPy as programming, not prompting, and describes its declarative modular approach.
- kubernetes.io: distribute credentials secure — The Kubernetes task documentation explicitly covers using Secrets and exposing them to containers as environment variables.
- kubernetes.io: manage resources containers — The Kubernetes resource-management documentation covers requests/limits and explains OOM-related behavior under memory pressure.
- LangGraph Repository — Useful counterpart to the module’s orchestration discussion because it documents stateful, long-running graph-based agents.
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation — Primary paper for the conversational multi-agent model discussed in the AutoGen section.