Skip to content

Synthesis Apps: Building LLM-Native Applications on Kubernetes

AI/ML Engineering Track | Phase 7 | 3-module mini-arc


KubeDojo already teaches the ingredients of modern LLM systems in depth: vector databases, RAG, inference engines, GPU scheduling, AI agents, MLOps, and platform operations. The missing step is synthesis. Learners can complete those sections and still be left asking how the pieces become one Kubernetes-native application that starts reliably, communicates through internal services, survives pod restarts, exposes useful health checks, and gives the application layer enough failure vocabulary to implement retries and quality gates.

This mini-arc closes that gap by building one LLM-native application substrate in three passes. The first module wires the backing services. The second module adds the orchestration service and state handling. The third module adds production gates for evaluation, tracing, and scaling. The point is not to celebrate another tool stack; it is to teach the operational boundaries between the layers so you can replace individual tools later without losing the architecture.

The arc starts below the application code because that is where many LLM apps fail first. If inference and memory are not healthy as Kubernetes services, a LangGraph, LangChain, LlamaIndex, or custom orchestration layer has no stable substrate. Once those services are verified, the second module can teach application logic against known failure modes instead of hoping the backend is fine. The final module then makes quality and observability part of the delivery path, not a dashboard someone opens after users complain.

Module 3.1 Module 3.2 Module 3.3
+-------------+ +-------------------+ +----------------------+
| Backing |-->| Orchestration |-->| Production gates |
| services | | service + state | | evals + tracing + HPA|
+-------------+ +-------------------+ +----------------------+
vLLM + Qdrant LangGraph shape DeepEval / Promptfoo
GPU + storage Redis checkpointing LangFuse / metrics
NetworkPolicy retries + budgets readiness criteria
ModuleLearning Outcome
3.1 LLM-Native Stack: Inference and Memory on KubernetesDeploy vLLM and Qdrant as coordinated Kubernetes services, bound by namespace resource policy and NetworkPolicy, then verify and break the stack deliberately.
3.2 Wiring the LLM App: The Orchestration LayerBuild a Kubernetes-deployed orchestration service that calls the Module 3.1 services, persists state, and handles backend failure modes explicitly.
3.3 Production Gates: Evals, Observability, and ScalingGate deployments with automated LLM evals, trace requests across the stack, and scale inference with signals that match GPU serving behavior.

You should already be comfortable with Kubernetes workload objects, Services, ConfigMaps, Secrets, PersistentVolumeClaims, and namespace-level policy. The modules will explain why each object exists in this stack, but they will not re-teach every Kubernetes primitive from first principles. If those objects still feel unfamiliar, strengthen the Kubernetes certification tracks before treating this arc as a build path.

You should also have completed the inference and vector database prerequisites. Module 3.1 assumes you have seen vLLM as an inference engine and Qdrant as a vector database before. The new material here is the coordination layer between them: resource quota, service discovery, readiness, restart behavior, and verification probes.

This is not a broad survey of every LLM application framework. The arc mentions KServe, Ray Serve, BentoML, DeepEval, Promptfoo, and LangFuse only where they clarify a concrete integration pattern. You will not deploy all of them in this section. The goal is to learn the responsibilities they abstract or reinforce.

This is also not an agent-building section. Module 3.2 uses orchestration vocabulary, but the dedicated framework and agent material remains in the Frameworks & Agents phase. Here the question is narrower and more operational: when the app calls inference, memory, and state services inside a cluster, what must be true for the request to be reliable?

Read the three modules in order. The first module produces the backing service substrate that the second module depends on. The second module gives the application layer the failure handling vocabulary that the third module tests and observes. Skipping ahead is possible if you already operate this kind of stack, but it removes the main teaching benefit for most learners.

AI Infrastructure
|
v
Synthesis Apps 3.1
|
v
Synthesis Apps 3.2
|
v
Synthesis Apps 3.3
|
v
Advanced GenAI & Safety

This phase sits after AI Infrastructure because you need a working mental model for inference engines and GPU scheduling before you can design a Kubernetes application around them. It sits before Advanced GenAI & Safety because evaluation and safety gates make more sense once you can see the application path those gates protect. That placement is intentional: first learn the pieces, then synthesize them, then harden the resulting system.

If You Just FinishedCome Here To Practice
Vector Search & RAGRunning the vector database as a managed dependency rather than a local demo.
Frameworks & AgentsGiving the orchestration layer stable internal services and explicit failure modes.
MLOps & LLMOpsTranslating deployment discipline into multi-service LLM application boundaries.
AI InfrastructureMoving from an isolated inference endpoint to a full internal application stack.

By the end of the three modules, you will have a working shape for a private, Kubernetes-native LLM application. It will not be a toy notebook and it will not be a black-box platform service. It will be a service graph whose responsibilities you can name, probe, restart, restrict, and test. That is the skill the rest of the AI/ML Engineering track needs when isolated specializations become one production system.

Begin with LLM-Native Stack: Inference and Memory on Kubernetes. It builds the substrate: GPU allocation, vLLM, Qdrant, internal networking, readiness, and controlled failure injection. Once that layer is stable, the next module can safely teach application-level retry and orchestration logic.