Skip to content

ML Platforms Toolkit

4 modules are currently being reworked. Watch this section over the next few days.

Toolkit Track | 11 Modules | ~9.5 hours total

The ML Platforms Toolkit covers the infrastructure for production machine learning on Kubernetes. From traditional ML pipelines with Kubeflow and MLflow to the LLM revolution with vLLM and LangChain—this toolkit provides the complete foundation for modern AI/ML infrastructure. Whether you’re running batch training, serving real-time predictions, or building RAG applications, these tools form the backbone of production AI systems.

This toolkit applies concepts from MLOps Discipline.

Before starting this toolkit:

  • MLOps Discipline
  • Kubernetes fundamentals
  • Basic ML concepts (training, inference)
  • Python familiarity
#ModuleComplexityTime
9.1Kubeflow[COMPLEX]50-60 min
9.2MLflow[MEDIUM]40-45 min
9.3Feature Stores[MEDIUM]40-45 min
9.4vLLM[COMPLEX]50-60 min
9.5Ray Serve[COMPLEX]50-60 min
9.6LangChain & LlamaIndex[COMPLEX]50-60 min
9.7GPU Scheduling[COMPLEX]50 min
9.8KServe[COMPLEX]55-65 min
9.9Seldon Core[COMPLEX]55-65 min
9.10BentoML[COMPLEX]50-60 min
9.11Bare-Metal MLOps[COMPLEX]60-70 min

After completing this toolkit, you will be able to:

  1. Deploy Kubeflow — Pipelines, notebooks, model serving
  2. Track experiments — MLflow tracking and model registry
  3. Manage features — Feast offline and online stores
  4. Serve LLMs efficiently — vLLM with PagedAttention for high throughput
  5. Build distributed inference — Ray Serve for multi-model pipelines
  6. Create RAG applications — LangChain and LlamaIndex for LLM apps
  7. Deploy inference graphs — Seldon Core 2 pipelines, multi-model serving, and Alibi explainability
  8. Package and serve Python-first models — BentoML Service/runners/Bento lifecycle, K8s deployment, adaptive micro-batching
  9. Build bare-metal ML platforms — Assemble MinIO, MLflow, KServe, Argo Workflows, and kube-prometheus-stack into a production ML platform without managed cloud
WHICH ML PLATFORM TOOL?
─────────────────────────────────────────────────────────────────
"I need to orchestrate ML training pipelines"
└──▶ Kubeflow Pipelines
• Workflow orchestration
• Artifact management
• Kubernetes-native
• GPU scheduling
"I need to track experiments and models"
└──▶ MLflow
• Parameter/metric logging
• Model versioning
• Model registry
• Framework-agnostic
"I need to manage and serve features"
└──▶ Feast
• Feature definitions
• Point-in-time correctness
• Online/offline stores
• Training-serving consistency
"I need AutoML / hyperparameter tuning"
└──▶ Kubeflow Katib
• Bayesian optimization
• Neural architecture search
• Parallel trials
• Early stopping
"I need to serve models at scale"
└──▶ KServe (with Kubeflow)
• Auto-scaling
• Canary deployments
• Multi-framework support
• GPU inference
"I need inference graphs, multi-model serving, and explainability"
└──▶ Seldon Core
• Multi-framework model loading (sklearn, MLflow, Triton, HuggingFace)
• Inference graph pipelines with chaining, joins, and conditional routing
• Alibi explainability (anchors, integrated gradients, counterfactual)
• Drift and outlier detection (Alibi-Detect)
"I need to serve LLMs with high throughput"
└──▶ vLLM
• PagedAttention memory optimization
• Continuous batching
• OpenAI-compatible API
• Multi-GPU tensor parallelism
"I need distributed model serving"
└──▶ Ray Serve
• Model composition
• Fractional GPU allocation
• Auto-scaling
• A/B testing built-in
"I need to build LLM applications"
└──▶ LangChain / LlamaIndex
"I need a production ML platform without AWS/GCP/Azure"
└──▶ Bare-Metal MLOps Recipe (Module 9.11)
• kubeadm/k3s + MetalLB + Longhorn + MinIO
• MLflow self-hosted + KServe/Seldon/BentoML
• Argo Workflows + ArgoCD + kube-prometheus-stack
• RAG (Retrieval-Augmented Generation)
• Chains and agents
• Memory management
• Document processing
"I need Python-first model packaging and custom serving logic"
└──▶ BentoML
• Python Service + runner architecture
• Bento packaging with reproducible artifacts
• Adaptive micro-batching per runner
• K8s deployment via Helm or Yatai control plane
ML PLATFORM STACK:
─────────────────────────────────────────────────────────────────
ML Workflow
─────────────────────────────────────────────────────────────────
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌───────────┐ ┌─────────┐
│Kubeflow │ │ MLflow │ │ Feast │
│Pipelines│ │ Tracking │ │Features │
└─────────┘ └───────────┘ └─────────┘
│ │ │
▼ ▼ ▼
Training Model Registry Feature Store
Orchestration Versioning Online/Offline
┌─────────────────────────────────────────────────────────────────┐
│ ML PLATFORM ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ EXPERIMENTATION │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Kubeflow Notebooks │ │
│ │ • JupyterHub • GPU access • Shared storage │ │
│ │ │ │
│ │ MLflow Tracking │ │
│ │ • Parameters • Metrics • Artifacts │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ TRAINING │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Kubeflow Pipelines │ │
│ │ • Workflow DAGs • Artifact tracking • Caching │ │
│ │ │ │
│ │ Katib (AutoML) │ │
│ │ • Hyperparameter tuning • Neural architecture search │ │
│ │ │ │
│ │ Training Operators │ │
│ │ • TFJob • PyTorchJob • MPIJob │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ FEATURE MANAGEMENT │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Feast │ │
│ │ • Offline store (training) • Online store (inference) │ │
│ │ • Point-in-time joins • Feature serving │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ SERVING & PRODUCTION │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ MLflow Model Registry │ │
│ │ • Model versions • Stages • Aliases │ │
│ │ │ │
│ │ KServe │ │
│ │ • Model serving • Auto-scaling • Canary rollouts │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Module 9.1: Kubeflow
│ ML platform foundation
│ Pipelines, notebooks, training
Module 9.2: MLflow
│ Experiment tracking
│ Model registry
Module 9.3: Feature Stores
│ Feature management
│ Training-serving consistency
Module 9.4: vLLM
│ High-throughput LLM serving
│ PagedAttention optimization
Module 9.5: Ray Serve
│ Distributed inference
│ Model composition
Module 9.6: LangChain & LlamaIndex
│ LLM application frameworks
│ RAG and agents
Module 9.7: GPU Scheduling
│ GPU resource management
│ Device plugins, NVIDIA operator
Module 9.8: KServe
│ Production model inference
│ Serverless + raw deployment modes
Module 9.9: Seldon Core
│ Multi-framework model serving
│ Inference graphs, Alibi explainability
Module 9.10: BentoML
│ Python-first model packaging
│ Runners, micro-batching, K8s deployment
Module 9.11: Bare-Metal MLOps
│ Integration capstone
│ Full platform assembly without managed cloud
[Toolkit Complete] → Production AI/ML!
PrincipleToolImplementation
ReproducibilityKubeflow PipelinesContainerized steps, artifacts
Experiment trackingMLflowParameters, metrics, models
Feature consistencyFeastPoint-in-time correct features
Model lifecycleMLflow RegistryVersions, stages, aliases
Scalable trainingKubeflow OperatorsDistributed training
Model servingKServeAuto-scaling inference
ML PLATFORM COMPONENTS
─────────────────────────────────────────────────────────────────
KUBEFLOW
├── Notebooks - Interactive development
├── Pipelines - Workflow orchestration
├── Katib - Hyperparameter tuning
├── Training Operators - Distributed training
└── KServe - Model serving
MLFLOW
├── Tracking - Experiment logging
├── Projects - Reproducible packaging
├── Models - Unified model format
└── Registry - Model lifecycle
FEAST
├── Feature Views - Feature definitions
├── Offline Store - Historical features
├── Online Store - Latest features
└── Feature Server - Real-time serving
INTEGRATED ML WORKFLOW
─────────────────────────────────────────────────────────────────
Data Features Training
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────────┐
│Raw Data │───────▶│ Feast │───────▶│Kubeflow Pipeline │
│(S3, GCS) │ │(Features)│ │ │
└──────────┘ └──────────┘ │ 1. Load features │
│ 2. Train model │
│ 3. Evaluate │
│ 4. Register │
└────────┬─────────┘
┌────────────────────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ MLflow │ │ MLflow │ │ KServe │
│ Tracking │ │ Registry │ │(Serving) │
└──────────┘ └──────────┘ └──────────┘
# Pipeline step with MLflow tracking
@dsl.component(packages_to_install=["mlflow"])
def train_with_tracking(mlflow_uri: str):
import mlflow
mlflow.set_tracking_uri(mlflow_uri)
with mlflow.start_run():
# Train model
mlflow.log_params(params)
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(model, "model",
registered_model_name="my-model")
# Training with Feast features and MLflow tracking
import mlflow
from feast import FeatureStore
store = FeatureStore()
mlflow.set_tracking_uri("http://mlflow:5000")
# Get training features
training_df = store.get_historical_features(
entity_df=entity_df,
features=["user_features:feature1", "user_features:feature2"]
).to_df()
with mlflow.start_run():
mlflow.log_param("feature_store", "feast")
mlflow.log_param("feature_view", "user_features")
model.fit(training_df)
mlflow.sklearn.log_model(model, "model")
ML DEVELOPMENT TO PRODUCTION
─────────────────────────────────────────────────────────────────
DEVELOPMENT STAGING PRODUCTION
─────────────────────────────────────────────────────────────────
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Notebooks │ │ Pipelines │ │ KServe │
│ (Kubeflow) │ │ (Kubeflow) │ │ (Serving) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ MLflow │ │ MLflow │ │ MLflow │
│ Tracking │─────▶│ Registry │─────▶│ Registry │
│ │ │ (Staging) │ │(Production) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Feast │ │ Feast │ │ Feast │
│ (Dev Store) │─────▶│(Stage Store)│─────▶│(Prod Store) │
└─────────────┘ └─────────────┘ └─────────────┘
REAL-TIME ML SERVING
─────────────────────────────────────────────────────────────────
Request Feast Model
│ │ │
│ user_id=123 │ │
│ ─────────────────▶ │ │
│ │ │
│ Get features │
│ ─────────────────────────────▶ │
│ │ │
│ features = [...] │
│ ◀───────────────────────────── │
│ │ │
│ │ Predict │
│ │ ─────────────────────▶│
│ │ │
│ prediction │ prediction │
│ ◀─────────────────────────────────────────────────│
Latency: < 100ms total
- Feature fetch: ~10ms (Redis)
- Model inference: ~50ms (GPU)
ModuleKey Exercise
KubeflowDeploy Pipelines, run training pipeline
MLflowTrack experiment, register model
Feature StoresDefine features, serve online/offline
vLLMDeploy LLM with PagedAttention, benchmark throughput
Ray ServeBuild multi-model pipeline with composition
LangChain/LlamaIndexCreate RAG application with vector store
GPU SchedulingInstall NVIDIA operator, schedule GPU workloads
KServeDeploy InferenceService, canary rollout, switch to raw mode
Seldon CoreDeploy inference graph with two model variants, A/B Experiment, and Alibi explainer
BentoMLBuild two-runner Bento (embedder + classifier), deploy to K8s with adaptive micro-batching, observe throughput/latency
Bare-Metal MLOpsDeploy full platform: MinIO + MLflow + KServe + Argo Workflows + Prometheus; train sklearn model, register, serve, observe
ML PLATFORM TOOLS
─────────────────────────────────────────────────────────────────
Kubeflow MLflow Feast
─────────────────────────────────────────────────────────────────
Primary focus Workflows Tracking Features
Kubernetes-native ✓✓ ✓ ✓
Standalone ✗ ✓✓ ✓
Experiment log Basic ✓✓ ✗
Model registry ✗ ✓✓ ✗
Feature store ✗ ✗ ✓✓
Pipeline DAGs ✓✓ Projects ✗
AutoML Katib ✗ ✗
Model serving KServe Basic ✗
─────────────────────────────────────────────────────────────────
RECOMMENDATION: Use all three together
- Kubeflow: Orchestration & training
- MLflow: Tracking & model registry
- Feast: Feature management

“The best ML platform is invisible to data scientists. They focus on models; the platform handles everything else.”