ML Platforms Toolkit
4 modules are currently being reworked. Watch this section over the next few days.
Toolkit Track | 11 Modules | ~9.5 hours total
Overview
Section titled “Overview”The ML Platforms Toolkit covers the infrastructure for production machine learning on Kubernetes. From traditional ML pipelines with Kubeflow and MLflow to the LLM revolution with vLLM and LangChain—this toolkit provides the complete foundation for modern AI/ML infrastructure. Whether you’re running batch training, serving real-time predictions, or building RAG applications, these tools form the backbone of production AI systems.
This toolkit applies concepts from MLOps Discipline.
Prerequisites
Section titled “Prerequisites”Before starting this toolkit:
- MLOps Discipline
- Kubernetes fundamentals
- Basic ML concepts (training, inference)
- Python familiarity
Modules
Section titled “Modules”| # | Module | Complexity | Time |
|---|---|---|---|
| 9.1 | Kubeflow | [COMPLEX] | 50-60 min |
| 9.2 | MLflow | [MEDIUM] | 40-45 min |
| 9.3 | Feature Stores | [MEDIUM] | 40-45 min |
| 9.4 | vLLM | [COMPLEX] | 50-60 min |
| 9.5 | Ray Serve | [COMPLEX] | 50-60 min |
| 9.6 | LangChain & LlamaIndex | [COMPLEX] | 50-60 min |
| 9.7 | GPU Scheduling | [COMPLEX] | 50 min |
| 9.8 | KServe | [COMPLEX] | 55-65 min |
| 9.9 | Seldon Core | [COMPLEX] | 55-65 min |
| 9.10 | BentoML | [COMPLEX] | 50-60 min |
| 9.11 | Bare-Metal MLOps | [COMPLEX] | 60-70 min |
Learning Outcomes
Section titled “Learning Outcomes”After completing this toolkit, you will be able to:
- Deploy Kubeflow — Pipelines, notebooks, model serving
- Track experiments — MLflow tracking and model registry
- Manage features — Feast offline and online stores
- Serve LLMs efficiently — vLLM with PagedAttention for high throughput
- Build distributed inference — Ray Serve for multi-model pipelines
- Create RAG applications — LangChain and LlamaIndex for LLM apps
- Deploy inference graphs — Seldon Core 2 pipelines, multi-model serving, and Alibi explainability
- Package and serve Python-first models — BentoML Service/runners/Bento lifecycle, K8s deployment, adaptive micro-batching
- Build bare-metal ML platforms — Assemble MinIO, MLflow, KServe, Argo Workflows, and kube-prometheus-stack into a production ML platform without managed cloud
Tool Selection Guide
Section titled “Tool Selection Guide”WHICH ML PLATFORM TOOL?─────────────────────────────────────────────────────────────────
"I need to orchestrate ML training pipelines"└──▶ Kubeflow Pipelines • Workflow orchestration • Artifact management • Kubernetes-native • GPU scheduling
"I need to track experiments and models"└──▶ MLflow • Parameter/metric logging • Model versioning • Model registry • Framework-agnostic
"I need to manage and serve features"└──▶ Feast • Feature definitions • Point-in-time correctness • Online/offline stores • Training-serving consistency
"I need AutoML / hyperparameter tuning"└──▶ Kubeflow Katib • Bayesian optimization • Neural architecture search • Parallel trials • Early stopping
"I need to serve models at scale"└──▶ KServe (with Kubeflow) • Auto-scaling • Canary deployments • Multi-framework support • GPU inference
"I need inference graphs, multi-model serving, and explainability"└──▶ Seldon Core • Multi-framework model loading (sklearn, MLflow, Triton, HuggingFace) • Inference graph pipelines with chaining, joins, and conditional routing • Alibi explainability (anchors, integrated gradients, counterfactual) • Drift and outlier detection (Alibi-Detect)
"I need to serve LLMs with high throughput"└──▶ vLLM • PagedAttention memory optimization • Continuous batching • OpenAI-compatible API • Multi-GPU tensor parallelism
"I need distributed model serving"└──▶ Ray Serve • Model composition • Fractional GPU allocation • Auto-scaling • A/B testing built-in
"I need to build LLM applications"└──▶ LangChain / LlamaIndex
"I need a production ML platform without AWS/GCP/Azure"└──▶ Bare-Metal MLOps Recipe (Module 9.11) • kubeadm/k3s + MetalLB + Longhorn + MinIO • MLflow self-hosted + KServe/Seldon/BentoML • Argo Workflows + ArgoCD + kube-prometheus-stack • RAG (Retrieval-Augmented Generation) • Chains and agents • Memory management • Document processing
"I need Python-first model packaging and custom serving logic"└──▶ BentoML • Python Service + runner architecture • Bento packaging with reproducible artifacts • Adaptive micro-batching per runner • K8s deployment via Helm or Yatai control plane
ML PLATFORM STACK:───────────────────────────────────────────────────────────────── ML Workflow───────────────────────────────────────────────────────────────── │ ┌─────────────────────┼─────────────────────┐ │ │ │ ▼ ▼ ▼┌─────────┐ ┌───────────┐ ┌─────────┐│Kubeflow │ │ MLflow │ │ Feast ││Pipelines│ │ Tracking │ │Features │└─────────┘ └───────────┘ └─────────┘ │ │ │ ▼ ▼ ▼Training Model Registry Feature StoreOrchestration Versioning Online/OfflineThe ML Platform Stack
Section titled “The ML Platform Stack”┌─────────────────────────────────────────────────────────────────┐│ ML PLATFORM ARCHITECTURE │├─────────────────────────────────────────────────────────────────┤│ ││ EXPERIMENTATION ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Kubeflow Notebooks │ ││ │ • JupyterHub • GPU access • Shared storage │ ││ │ │ ││ │ MLflow Tracking │ ││ │ • Parameters • Metrics • Artifacts │ ││ └───────────────────────────────────────────────────────────┘ ││ │ ││ TRAINING ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Kubeflow Pipelines │ ││ │ • Workflow DAGs • Artifact tracking • Caching │ ││ │ │ ││ │ Katib (AutoML) │ ││ │ • Hyperparameter tuning • Neural architecture search │ ││ │ │ ││ │ Training Operators │ ││ │ • TFJob • PyTorchJob • MPIJob │ ││ └───────────────────────────────────────────────────────────┘ ││ │ ││ FEATURE MANAGEMENT ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Feast │ ││ │ • Offline store (training) • Online store (inference) │ ││ │ • Point-in-time joins • Feature serving │ ││ └───────────────────────────────────────────────────────────┘ ││ │ ││ SERVING & PRODUCTION ││ ┌───────────────────────────────────────────────────────────┐ ││ │ MLflow Model Registry │ ││ │ • Model versions • Stages • Aliases │ ││ │ │ ││ │ KServe │ ││ │ • Model serving • Auto-scaling • Canary rollouts │ ││ └───────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Study Path
Section titled “Study Path”Module 9.1: Kubeflow │ │ ML platform foundation │ Pipelines, notebooks, training ▼Module 9.2: MLflow │ │ Experiment tracking │ Model registry ▼Module 9.3: Feature Stores │ │ Feature management │ Training-serving consistency ▼Module 9.4: vLLM │ │ High-throughput LLM serving │ PagedAttention optimization ▼Module 9.5: Ray Serve │ │ Distributed inference │ Model composition ▼Module 9.6: LangChain & LlamaIndex │ │ LLM application frameworks │ RAG and agents ▼Module 9.7: GPU Scheduling │ │ GPU resource management │ Device plugins, NVIDIA operator ▼Module 9.8: KServe │ │ Production model inference │ Serverless + raw deployment modes ▼Module 9.9: Seldon Core │ │ Multi-framework model serving │ Inference graphs, Alibi explainability ▼Module 9.10: BentoML │ │ Python-first model packaging │ Runners, micro-batching, K8s deployment ▼Module 9.11: Bare-Metal MLOps │ │ Integration capstone │ Full platform assembly without managed cloud ▼[Toolkit Complete] → Production AI/ML!Key Concepts
Section titled “Key Concepts”MLOps Platform Principles
Section titled “MLOps Platform Principles”| Principle | Tool | Implementation |
|---|---|---|
| Reproducibility | Kubeflow Pipelines | Containerized steps, artifacts |
| Experiment tracking | MLflow | Parameters, metrics, models |
| Feature consistency | Feast | Point-in-time correct features |
| Model lifecycle | MLflow Registry | Versions, stages, aliases |
| Scalable training | Kubeflow Operators | Distributed training |
| Model serving | KServe | Auto-scaling inference |
Platform Components
Section titled “Platform Components”ML PLATFORM COMPONENTS─────────────────────────────────────────────────────────────────
KUBEFLOW├── Notebooks - Interactive development├── Pipelines - Workflow orchestration├── Katib - Hyperparameter tuning├── Training Operators - Distributed training└── KServe - Model serving
MLFLOW├── Tracking - Experiment logging├── Projects - Reproducible packaging├── Models - Unified model format└── Registry - Model lifecycle
FEAST├── Feature Views - Feature definitions├── Offline Store - Historical features├── Online Store - Latest features└── Feature Server - Real-time servingIntegration Patterns
Section titled “Integration Patterns”Complete ML Pipeline
Section titled “Complete ML Pipeline”INTEGRATED ML WORKFLOW─────────────────────────────────────────────────────────────────
Data Features Training │ │ │ ▼ ▼ ▼┌──────────┐ ┌──────────┐ ┌──────────────────┐│Raw Data │───────▶│ Feast │───────▶│Kubeflow Pipeline ││(S3, GCS) │ │(Features)│ │ │└──────────┘ └──────────┘ │ 1. Load features │ │ 2. Train model │ │ 3. Evaluate │ │ 4. Register │ └────────┬─────────┘ │ ┌────────────────────────────┼────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ MLflow │ │ MLflow │ │ KServe │ │ Tracking │ │ Registry │ │(Serving) │ └──────────┘ └──────────┘ └──────────┘Kubeflow + MLflow
Section titled “Kubeflow + MLflow”# Pipeline step with MLflow tracking@dsl.component(packages_to_install=["mlflow"])def train_with_tracking(mlflow_uri: str): import mlflow
mlflow.set_tracking_uri(mlflow_uri)
with mlflow.start_run(): # Train model mlflow.log_params(params) mlflow.log_metrics(metrics) mlflow.sklearn.log_model(model, "model", registered_model_name="my-model")MLflow + Feast
Section titled “MLflow + Feast”# Training with Feast features and MLflow trackingimport mlflowfrom feast import FeatureStore
store = FeatureStore()mlflow.set_tracking_uri("http://mlflow:5000")
# Get training featurestraining_df = store.get_historical_features( entity_df=entity_df, features=["user_features:feature1", "user_features:feature2"]).to_df()
with mlflow.start_run(): mlflow.log_param("feature_store", "feast") mlflow.log_param("feature_view", "user_features")
model.fit(training_df) mlflow.sklearn.log_model(model, "model")Common Architectures
Section titled “Common Architectures”Development to Production
Section titled “Development to Production”ML DEVELOPMENT TO PRODUCTION─────────────────────────────────────────────────────────────────
DEVELOPMENT STAGING PRODUCTION─────────────────────────────────────────────────────────────────
┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Notebooks │ │ Pipelines │ │ KServe ││ (Kubeflow) │ │ (Kubeflow) │ │ (Serving) │└──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ ▼ ▼ ▼┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ MLflow │ │ MLflow │ │ MLflow ││ Tracking │─────▶│ Registry │─────▶│ Registry ││ │ │ (Staging) │ │(Production) │└─────────────┘ └─────────────┘ └─────────────┘ │ │ │ ▼ ▼ ▼┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Feast │ │ Feast │ │ Feast ││ (Dev Store) │─────▶│(Stage Store)│─────▶│(Prod Store) │└─────────────┘ └─────────────┘ └─────────────┘Real-Time ML System
Section titled “Real-Time ML System”REAL-TIME ML SERVING─────────────────────────────────────────────────────────────────
Request Feast Model │ │ │ │ user_id=123 │ │ │ ─────────────────▶ │ │ │ │ │ │ Get features │ │ ─────────────────────────────▶ │ │ │ │ │ features = [...] │ │ ◀───────────────────────────── │ │ │ │ │ │ Predict │ │ │ ─────────────────────▶│ │ │ │ │ prediction │ prediction │ │ ◀─────────────────────────────────────────────────│
Latency: < 100ms total - Feature fetch: ~10ms (Redis) - Model inference: ~50ms (GPU)Hands-On Focus
Section titled “Hands-On Focus”| Module | Key Exercise |
|---|---|
| Kubeflow | Deploy Pipelines, run training pipeline |
| MLflow | Track experiment, register model |
| Feature Stores | Define features, serve online/offline |
| vLLM | Deploy LLM with PagedAttention, benchmark throughput |
| Ray Serve | Build multi-model pipeline with composition |
| LangChain/LlamaIndex | Create RAG application with vector store |
| GPU Scheduling | Install NVIDIA operator, schedule GPU workloads |
| KServe | Deploy InferenceService, canary rollout, switch to raw mode |
| Seldon Core | Deploy inference graph with two model variants, A/B Experiment, and Alibi explainer |
| BentoML | Build two-runner Bento (embedder + classifier), deploy to K8s with adaptive micro-batching, observe throughput/latency |
| Bare-Metal MLOps | Deploy full platform: MinIO + MLflow + KServe + Argo Workflows + Prometheus; train sklearn model, register, serve, observe |
Tool Comparison
Section titled “Tool Comparison”ML PLATFORM TOOLS─────────────────────────────────────────────────────────────────
Kubeflow MLflow Feast─────────────────────────────────────────────────────────────────Primary focus Workflows Tracking FeaturesKubernetes-native ✓✓ ✓ ✓Standalone ✗ ✓✓ ✓Experiment log Basic ✓✓ ✗Model registry ✗ ✓✓ ✗Feature store ✗ ✗ ✓✓Pipeline DAGs ✓✓ Projects ✗AutoML Katib ✗ ✗Model serving KServe Basic ✗─────────────────────────────────────────────────────────────────
RECOMMENDATION: Use all three together- Kubeflow: Orchestration & training- MLflow: Tracking & model registry- Feast: Feature managementRelated Tracks
Section titled “Related Tracks”- Before: MLOps Discipline — MLOps concepts and practices
- Related: IaC Discipline — Infrastructure provisioning for ML platforms
- Related: IaC Tools Toolkit — Terraform modules for ML infrastructure
- Related: Observability Toolkit — Monitor ML systems
- Related: GitOps & Deployments Toolkit — Deploy ML infrastructure
- Related: Scaling & Reliability Toolkit — Scale ML workloads
“The best ML platform is invisible to data scientists. They focus on models; the platform handles everything else.”