Module 9.6: LangChain & LlamaIndex - LLM Application Frameworks
Complexity: [COMPLEX]
Section titled “Complexity: [COMPLEX]”Time to Complete: 90 minutes Prerequisites: Module 9.4 (vLLM), Basic Python, Understanding of LLM APIs Learning Objectives:
- Understand RAG architecture and implementation
- Build chains and agents with LangChain
- Create document indexes with LlamaIndex
- Deploy LLM applications on Kubernetes
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy LangChain and LlamaIndex applications on Kubernetes with scalable RAG pipeline architectures
- Configure vector database integrations (Qdrant, Weaviate, Milvus) for retrieval-augmented generation
- Implement LLM application monitoring with trace collection and evaluation metrics for production quality
- Optimize RAG pipeline performance with chunking strategies, embedding models, and retrieval tuning
Why This Module Matters
Section titled “Why This Module Matters”Raw LLMs have limitations: they hallucinate, don’t know about your data, and can’t take actions. Building production LLM applications requires retrieval, context management, output parsing, and orchestration. Writing this from scratch takes months.
LangChain and LlamaIndex are the toolkits for LLM applications.
They provide the building blocks: document loaders, text splitters, vector stores, retrievers, chains, and agents. Instead of reinventing the wheel, you compose proven components into production applications.
“LangChain is to LLM apps what Rails is to web apps—you get the patterns, you focus on the product.”
Did You Know?
Section titled “Did You Know?”- LangChain has 600+ integrations with LLMs, vector stores, tools, and APIs
- LlamaIndex was originally called “GPT Index” before the naming got confusing
- The RAG pattern (Retrieval-Augmented Generation) can reduce hallucinations by 80%+
- LangChain’s agent framework powers many of the “AI assistants” you use daily
- LlamaIndex can index structured data, code, SQL databases, not just documents
- Both frameworks can run with local models (Ollama, vLLM) or cloud APIs (OpenAI, Anthropic)
RAG: The Core Pattern
Section titled “RAG: The Core Pattern”┌─────────────────────────────────────────────────────────────────────────┐│ RAG (Retrieval-Augmented Generation) ││ ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ 1. INDEXING (Offline) │ ││ │ │ ││ │ Documents → Chunks → Embeddings → Vector Store │ ││ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────────────────┐ │ ││ │ │ PDFs │──▶│ 512 │──▶│ [0.1, │──▶│ Pinecone │ │ ││ │ │ Docs │ │ token │ │ 0.3, │ │ Weaviate │ │ ││ │ │ Web │ │ chunks│ │ ...] │ │ ChromaDB │ │ ││ │ └───────┘ └───────┘ └───────┘ └───────────────────┘ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ 2. RETRIEVAL (Online) │ ││ │ │ ││ │ Query → Embed → Search → Top K Chunks │ ││ │ ┌───────────┐ ┌───────┐ ┌───────────────────────────┐ │ ││ │ │"What is │──▶│[0.2, │──▶│ Similarity Search │ │ ││ │ │ K8s?" │ │ 0.4,] │ │ → Chunk 1, Chunk 5, Chunk │ │ ││ │ └───────────┘ └───────┘ └───────────────────────────┘ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ 3. GENERATION (Online) │ ││ │ │ ││ │ Context + Query → LLM → Answer │ ││ │ ┌─────────────────────────────┐ ┌───────────────────────┐ │ ││ │ │ System: Use this context... │──▶│ LLM │ │ ││ │ │ Context: [Chunk 1, 5, 7] │ │ (GPT-4, Llama, etc.) │ │ ││ │ │ Query: What is K8s? │ │ │ │ ││ │ └─────────────────────────────┘ └───────────┬───────────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ "Kubernetes is an open- │ ││ │ source container..." │ ││ └─────────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────────────┘LangChain vs LlamaIndex
Section titled “LangChain vs LlamaIndex”| Aspect | LangChain | LlamaIndex |
|---|---|---|
| Focus | General LLM orchestration | Data-centric LLM apps |
| Strengths | Agents, chains, tools | Indexing, retrieval, data connectors |
| Philosophy | ”Lego blocks for LLMs" | "Build knowledge bases” |
| Best For | Complex workflows, agents | RAG, document Q&A |
| Learning Curve | Steeper | Gentler |
| Integration | Work great together! | Work great together! |
When to Use What
Section titled “When to Use What”USE LANGCHAIN WHEN:├── Building agents that use tools├── Creating multi-step reasoning chains├── Integrating with many external systems├── Need fine-grained control over prompts└── Building conversational agents
USE LLAMAINDEX WHEN:├── Building document Q&A systems├── Creating knowledge bases from varied sources├── Need sophisticated retrieval strategies├── Working with structured + unstructured data└── Want simpler RAG setup
USE BOTH WHEN:├── Building production RAG applications├── LlamaIndex for indexing/retrieval└── LangChain for orchestration/agentsLangChain Fundamentals
Section titled “LangChain Fundamentals”Installation
Section titled “Installation”pip install langchain langchain-openai langchain-communitypip install chromadb # Vector storepip install tiktoken # Token countingBasic Chain
Section titled “Basic Chain”from langchain_openai import ChatOpenAIfrom langchain.prompts import ChatPromptTemplatefrom langchain.schema.output_parser import StrOutputParser
# Initialize LLM (can also use local models)llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Create a simple promptprompt = ChatPromptTemplate.from_template( "Explain {concept} in the context of Kubernetes. Keep it under 100 words.")
# Build the chainchain = prompt | llm | StrOutputParser()
# Runresult = chain.invoke({"concept": "pods"})print(result)RAG Chain with LangChain
Section titled “RAG Chain with LangChain”from langchain_openai import ChatOpenAI, OpenAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain_community.document_loaders import WebBaseLoaderfrom langchain_community.vectorstores import Chromafrom langchain.chains import RetrievalQA
# 1. Load documentsloader = WebBaseLoader("https://kubernetes.io/docs/concepts/overview/")documents = loader.load()
# 2. Split into chunkstext_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200,)chunks = text_splitter.split_documents(documents)
# 3. Create embeddings and vector storeembeddings = OpenAIEmbeddings()vectorstore = Chroma.from_documents(chunks, embeddings)
# 4. Create retrieval chainllm = ChatOpenAI(model="gpt-4o", temperature=0)qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),)
# 5. Queryresult = qa_chain.invoke("What are the main components of Kubernetes?")print(result["result"])Agent with Tools
Section titled “Agent with Tools”from langchain_openai import ChatOpenAIfrom langchain.agents import create_openai_functions_agent, AgentExecutorfrom langchain.tools import toolfrom langchain import hub
# Define custom tools@tooldef get_pod_count(namespace: str) -> str: """Get the number of pods in a Kubernetes namespace.""" # In reality, this would call kubectl or K8s API return f"There are 5 pods in namespace {namespace}"
@tooldef get_deployment_status(name: str) -> str: """Get the status of a Kubernetes deployment.""" return f"Deployment {name} is running with 3/3 replicas ready"
# Create agentllm = ChatOpenAI(model="gpt-4o", temperature=0)tools = [get_pod_count, get_deployment_status]
# Get a prompt templateprompt = hub.pull("hwchase17/openai-functions-agent")
# Create the agentagent = create_openai_functions_agent(llm, tools, prompt)agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Runresult = agent_executor.invoke({ "input": "How many pods are in the default namespace and what's the status of the nginx deployment?"})print(result["output"])LlamaIndex Fundamentals
Section titled “LlamaIndex Fundamentals”Installation
Section titled “Installation”pip install llama-indexpip install llama-index-llms-openaipip install llama-index-embeddings-openaipip install llama-index-vector-stores-chromaSimple Document Q&A
Section titled “Simple Document Q&A”from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load documents from a directorydocuments = SimpleDirectoryReader("./data").load_data()
# Create index (handles chunking and embedding)index = VectorStoreIndex.from_documents(documents)
# Queryquery_engine = index.as_query_engine()response = query_engine.query("What is Kubernetes?")print(response)Advanced RAG with LlamaIndex
Section titled “Advanced RAG with LlamaIndex”from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, StorageContext, Settings,)from llama_index.core.node_parser import SentenceSplitterfrom llama_index.llms.openai import OpenAIfrom llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.vector_stores.chroma import ChromaVectorStoreimport chromadb
# Configure settingsSettings.llm = OpenAI(model="gpt-4o", temperature=0)Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
# Create persistent vector storechroma_client = chromadb.PersistentClient(path="./chroma_db")chroma_collection = chroma_client.get_or_create_collection("k8s_docs")vector_store = ChromaVectorStore(chroma_collection=chroma_collection)storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Load and index documentsdocuments = SimpleDirectoryReader("./k8s_docs").load_data()index = VectorStoreIndex.from_documents( documents, storage_context=storage_context,)
# Create query engine with custom settingsquery_engine = index.as_query_engine( similarity_top_k=5, response_mode="tree_summarize",)
# Queryresponse = query_engine.query( "How do I troubleshoot a pod in CrashLoopBackOff?")print(response)print("\nSources:")for node in response.source_nodes: print(f"- {node.node.metadata.get('file_name', 'Unknown')}: {node.score:.3f}")Hybrid Search (Semantic + Keyword)
Section titled “Hybrid Search (Semantic + Keyword)”from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core.retrievers import VectorIndexRetriever, KeywordTableSimpleRetrieverfrom llama_index.core.query_engine import RetrieverQueryEnginefrom llama_index.core.postprocessor import SimilarityPostprocessor
# Create indicesdocuments = SimpleDirectoryReader("./data").load_data()vector_index = VectorStoreIndex.from_documents(documents)
# Semantic retrievervector_retriever = VectorIndexRetriever( index=vector_index, similarity_top_k=3,)
# Create query engine with rerankingquery_engine = RetrieverQueryEngine( retriever=vector_retriever, node_postprocessors=[ SimilarityPostprocessor(similarity_cutoff=0.7) ],)
response = query_engine.query("kubectl commands for debugging")Production Patterns
Section titled “Production Patterns”1. Using Local Models (vLLM)
Section titled “1. Using Local Models (vLLM)”# LangChain with vLLMfrom langchain_openai import ChatOpenAI
llm = ChatOpenAI( model="mistralai/Mistral-7B-Instruct-v0.2", openai_api_base="http://vllm-server:8000/v1", openai_api_key="not-needed",)
# LlamaIndex with vLLMfrom llama_index.llms.openai_like import OpenAILike
llm = OpenAILike( model="mistralai/Mistral-7B-Instruct-v0.2", api_base="http://vllm-server:8000/v1", api_key="not-needed",)2. Streaming Responses
Section titled “2. Streaming Responses”# LangChain streamingfrom langchain_openai import ChatOpenAIfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = ChatOpenAI( model="gpt-4o", streaming=True, callbacks=[StreamingStdOutCallbackHandler()],)
# LlamaIndex streamingfrom llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("Explain pods")for text in streaming_response.response_gen: print(text, end="", flush=True)3. Conversation Memory
Section titled “3. Conversation Memory”from langchain_openai import ChatOpenAIfrom langchain.memory import ConversationBufferWindowMemoryfrom langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-4o")memory = ConversationBufferWindowMemory(k=5) # Keep last 5 exchanges
conversation = ConversationChain( llm=llm, memory=memory, verbose=True,)
# Maintains context across callsconversation.predict(input="What is a Kubernetes pod?")conversation.predict(input="How does it relate to containers?") # Knows "it" = podconversation.predict(input="Can you give me an example?") # Still in context4. FastAPI Integration
Section titled “4. FastAPI Integration”from fastapi import FastAPIfrom pydantic import BaseModelfrom langchain_openai import ChatOpenAIfrom langchain.prompts import ChatPromptTemplatefrom langchain.schema.output_parser import StrOutputParser
app = FastAPI()
# Initialize chainllm = ChatOpenAI(model="gpt-4o")prompt = ChatPromptTemplate.from_template( "Answer this Kubernetes question: {question}")chain = prompt | llm | StrOutputParser()
class Query(BaseModel): question: str
@app.post("/ask")async def ask_question(query: Query): result = await chain.ainvoke({"question": query.question}) return {"answer": result}
# Run with: uvicorn main:app --host 0.0.0.0 --port 8080Deploying on Kubernetes
Section titled “Deploying on Kubernetes”Dockerfile
Section titled “Dockerfile”FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]Kubernetes Deployment
Section titled “Kubernetes Deployment”apiVersion: apps/v1kind: Deploymentmetadata: name: rag-service namespace: llm-appsspec: replicas: 2 selector: matchLabels: app: rag-service template: metadata: labels: app: rag-service spec: containers: - name: rag image: myregistry/rag-service:v1 ports: - containerPort: 8080 env: - name: OPENAI_API_KEY valueFrom: secretKeyRef: name: openai-credentials key: api-key - name: VLLM_ENDPOINT value: "http://vllm-server:8000/v1" - name: CHROMA_HOST value: "chromadb-service" resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "2Gi" cpu: "1" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 readinessProbe: httpGet: path: /ready port: 8080---apiVersion: v1kind: Servicemetadata: name: rag-servicespec: selector: app: rag-service ports: - port: 80 targetPort: 8080Vector Database (ChromaDB)
Section titled “Vector Database (ChromaDB)”apiVersion: apps/v1kind: StatefulSetmetadata: name: chromadb namespace: llm-appsspec: serviceName: chromadb replicas: 1 selector: matchLabels: app: chromadb template: metadata: labels: app: chromadb spec: containers: - name: chromadb image: chromadb/chroma:latest ports: - containerPort: 8000 volumeMounts: - name: data mountPath: /chroma/chroma volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 20GiWar Story: The Chatbot That Forgot Everything
Section titled “War Story: The Chatbot That Forgot Everything”A fintech company built a customer support chatbot using LangChain. It worked great in demos but failed in production.
The Problem: Customers would ask follow-up questions, and the bot would have no idea what they were talking about.
Customer: What's my account balance?Bot: Your balance is $1,234.56
Customer: Why is it so low?Bot: I'm sorry, I don't have information about what is low. Could you clarify what you're referring to?The Root Cause:
- No conversation memory between requests
- Each Lambda invocation was stateless
- The chat UI was sending only the latest message
The Solution:
from langchain.memory import ConversationBufferWindowMemoryfrom langchain_community.chat_message_histories import RedisChatMessageHistory
# Store conversation history in Redisdef get_memory(session_id: str): message_history = RedisChatMessageHistory( session_id=session_id, url="redis://redis-cluster:6379", ttl=3600, # Expire after 1 hour ) return ConversationBufferWindowMemory( memory_key="chat_history", chat_memory=message_history, return_messages=True, k=10, # Keep last 10 exchanges )
@app.post("/chat")async def chat(request: ChatRequest): memory = get_memory(request.session_id) chain = ConversationChain(llm=llm, memory=memory) response = await chain.ainvoke({"input": request.message}) return {"response": response["response"]}Kubernetes changes:
# Added Redis for session storageapiVersion: apps/v1kind: StatefulSetmetadata: name: redis-clusterspec: # ... Redis configurationResults:
- Customer satisfaction: 45% → 82%
- Issue resolution rate: 30% → 65%
- Average conversation length: 2 messages → 8 messages
The lesson: LLM applications need state management. The model doesn’t remember—you have to build the memory.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| No chunk overlap | Missing context at boundaries | Use 10-20% overlap |
| Chunks too large | Diluted relevance | 500-1000 tokens typical |
| Ignoring metadata | Can’t filter results | Include source, date, etc. |
| No reranking | Poor retrieval quality | Add reranker after retrieval |
| Stateless design | No conversation memory | Add Redis/DB for sessions |
| Prompt injection | Security vulnerability | Validate inputs, use guards |
Hands-On Exercise: Build a K8s Documentation Bot
Section titled “Hands-On Exercise: Build a K8s Documentation Bot”Objective: Create a RAG chatbot that answers questions about Kubernetes.
Task 1: Set Up the Environment
Section titled “Task 1: Set Up the Environment”pip install langchain langchain-openai chromadb
# Create data directorymkdir -p ./k8s_dataTask 2: Download Sample Data
Section titled “Task 2: Download Sample Data”import requests
urls = [ "https://raw.githubusercontent.com/kubernetes/website/main/content/en/docs/concepts/overview/what-is-kubernetes.md", "https://raw.githubusercontent.com/kubernetes/website/main/content/en/docs/concepts/workloads/pods/_index.md",]
for i, url in enumerate(urls): response = requests.get(url) with open(f"./k8s_data/doc_{i}.md", "w") as f: f.write(response.text) print(f"Downloaded doc_{i}.md")Task 3: Build the RAG Application
Section titled “Task 3: Build the RAG Application”from langchain_openai import ChatOpenAI, OpenAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain_community.document_loaders import DirectoryLoaderfrom langchain_community.vectorstores import Chromafrom langchain.chains import ConversationalRetrievalChainfrom langchain.memory import ConversationBufferMemory
# Load documentsloader = DirectoryLoader("./k8s_data", glob="**/*.md")documents = loader.load()print(f"Loaded {len(documents)} documents")
# Splitsplitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200,)chunks = splitter.split_documents(documents)print(f"Created {len(chunks)} chunks")
# Create vector storeembeddings = OpenAIEmbeddings()vectorstore = Chroma.from_documents( chunks, embeddings, persist_directory="./chroma_db")
# Create conversational chainllm = ChatOpenAI(model="gpt-4o", temperature=0)memory = ConversationBufferMemory( memory_key="chat_history", return_messages=True,)
qa_chain = ConversationalRetrievalChain.from_llm( llm=llm, retriever=vectorstore.as_retriever(search_kwargs={"k": 3}), memory=memory, verbose=True,)
# Interactive chat loopprint("\nK8s Documentation Bot ready! Type 'quit' to exit.\n")while True: question = input("You: ") if question.lower() == 'quit': break
result = qa_chain.invoke({"question": question}) print(f"\nBot: {result['answer']}\n")Task 4: Test the Bot
Section titled “Task 4: Test the Bot”python rag_bot.py
# Try these questions:# - What is Kubernetes?# - What are pods?# - How do pods relate to containers? # Tests memory# - Can you give me an example? # Tests contextTask 5: Wrap in FastAPI
Section titled “Task 5: Wrap in FastAPI”from fastapi import FastAPIfrom pydantic import BaseModel# ... import the chain from above
app = FastAPI()
class Query(BaseModel): session_id: str question: str
# Store chains per session (use Redis in production)sessions = {}
@app.post("/chat")async def chat(query: Query): if query.session_id not in sessions: sessions[query.session_id] = create_chain()
chain = sessions[query.session_id] result = await chain.ainvoke({"question": query.question}) return {"answer": result["answer"]}
@app.get("/health")def health(): return {"status": "ok"}Success Criteria
Section titled “Success Criteria”- Documents loaded and chunked
- Vector store created and persisted
- Can ask questions and get relevant answers
- Bot remembers conversation context
- Follow-up questions work correctly
Question 1
Section titled “Question 1”What is RAG and why is it important?
Show Answer
RAG (Retrieval-Augmented Generation) retrieves relevant context before generating responses
RAG reduces hallucinations by grounding the LLM’s responses in actual documents. Instead of relying solely on training data, the model uses retrieved context to answer questions accurately.
Question 2
Section titled “Question 2”When should you use LangChain vs LlamaIndex?
Show Answer
LangChain for agents/chains/orchestration, LlamaIndex for indexing/retrieval
LangChain excels at building complex workflows with tools and agents. LlamaIndex focuses on turning data into queryable indexes. They work well together—LlamaIndex for retrieval, LangChain for orchestration.
Question 3
Section titled “Question 3”What is chunk overlap and why is it important?
Show Answer
Chunk overlap ensures context isn’t lost at chunk boundaries
When splitting documents into chunks, overlap (e.g., 200 tokens) means adjacent chunks share some text. This prevents important context from being split between chunks and lost during retrieval.
Question 4
Section titled “Question 4”How do you use local models (vLLM) with LangChain?
Show Answer
Point ChatOpenAI to vLLM’s OpenAI-compatible endpoint
llm = ChatOpenAI( model="model-name", openai_api_base="http://vllm-server:8000/v1", openai_api_key="not-needed",)Question 5
Section titled “Question 5”What is an agent in LangChain?
Show Answer
An LLM that can decide which tools to use to accomplish a task
Agents use the LLM to reason about what actions to take, call tools (functions), observe results, and continue until the task is complete. They’re more flexible than fixed chains.
Question 6
Section titled “Question 6”How do you add conversation memory to a LangChain application?
Show Answer
Use ConversationBufferMemory or similar memory classes
from langchain.memory import ConversationBufferMemorymemory = ConversationBufferMemory(return_messages=True)chain = ConversationChain(llm=llm, memory=memory)For production, use Redis or a database backend for persistence.
Question 7
Section titled “Question 7”What is the purpose of a vector store in RAG?
Show Answer
Store document embeddings and enable semantic similarity search
Vector stores (Chroma, Pinecone, Weaviate) store the numerical representations (embeddings) of text chunks. When a query comes in, it’s embedded and compared to stored vectors to find semantically similar content.
Question 8
Section titled “Question 8”What’s the typical chunk size for RAG applications?
Show Answer
500-1000 tokens with 10-20% overlap
Chunks should be small enough to be specific but large enough to contain context. 512-1024 tokens is common. Too large dilutes relevance; too small loses context.
Key Takeaways
Section titled “Key Takeaways”- RAG grounds LLMs in facts - retrieval + generation reduces hallucinations
- LangChain for orchestration - chains, agents, tools, memory
- LlamaIndex for data - indexing, retrieval, structured data
- Chunk size matters - 500-1000 tokens with overlap
- Memory is essential - use Redis/DB for conversation state
- Local models work - vLLM, Ollama via OpenAI-compatible APIs
- Streaming improves UX - show tokens as they generate
- Metadata enables filtering - include source, date, etc.
- Reranking improves quality - post-process retrieved results
- Production needs persistence - vector stores, session storage
Further Reading
Section titled “Further Reading”- LangChain Documentation - Official guides
- LlamaIndex Documentation - Official guides
- RAG Survey Paper - Academic overview
- LangChain Templates - Production examples
Toolkit Complete!
Section titled “Toolkit Complete!”Congratulations! You’ve completed the ML Platforms Toolkit. You now understand:
- Model training with Kubeflow
- Experiment tracking with MLflow
- Feature stores for ML
- High-throughput inference with vLLM
- Distributed serving with Ray Serve
- Building LLM applications with LangChain and LlamaIndex
Continue to other toolkits or apply these skills to build production ML systems!