Skip to content

ML Platforms Toolkit

Toolkit Track | 6 Modules | ~5 hours total

The ML Platforms Toolkit covers the infrastructure for production machine learning on Kubernetes. From traditional ML pipelines with Kubeflow and MLflow to the LLM revolution with vLLM and LangChain—this toolkit provides the complete foundation for modern AI/ML infrastructure. Whether you’re running batch training, serving real-time predictions, or building RAG applications, these tools form the backbone of production AI systems.

This toolkit applies concepts from MLOps Discipline.

Before starting this toolkit:

  • MLOps Discipline
  • Kubernetes fundamentals
  • Basic ML concepts (training, inference)
  • Python familiarity
#ModuleComplexityTime
9.1Kubeflow[COMPLEX]50-60 min
9.2MLflow[MEDIUM]40-45 min
9.3Feature Stores[MEDIUM]40-45 min
9.4vLLM[COMPLEX]50-60 min
9.5Ray Serve[COMPLEX]50-60 min
9.6LangChain & LlamaIndex[COMPLEX]50-60 min
9.7GPU Scheduling[COMPLEX]50 min

After completing this toolkit, you will be able to:

  1. Deploy Kubeflow — Pipelines, notebooks, model serving
  2. Track experiments — MLflow tracking and model registry
  3. Manage features — Feast offline and online stores
  4. Serve LLMs efficiently — vLLM with PagedAttention for high throughput
  5. Build distributed inference — Ray Serve for multi-model pipelines
  6. Create RAG applications — LangChain and LlamaIndex for LLM apps
WHICH ML PLATFORM TOOL?
─────────────────────────────────────────────────────────────────
"I need to orchestrate ML training pipelines"
└──▶ Kubeflow Pipelines
• Workflow orchestration
• Artifact management
• Kubernetes-native
• GPU scheduling
"I need to track experiments and models"
└──▶ MLflow
• Parameter/metric logging
• Model versioning
• Model registry
• Framework-agnostic
"I need to manage and serve features"
└──▶ Feast
• Feature definitions
• Point-in-time correctness
• Online/offline stores
• Training-serving consistency
"I need AutoML / hyperparameter tuning"
└──▶ Kubeflow Katib
• Bayesian optimization
• Neural architecture search
• Parallel trials
• Early stopping
"I need to serve models at scale"
└──▶ KServe (with Kubeflow)
• Auto-scaling
• Canary deployments
• Multi-framework support
• GPU inference
"I need to serve LLMs with high throughput"
└──▶ vLLM
• PagedAttention memory optimization
• Continuous batching
• OpenAI-compatible API
• Multi-GPU tensor parallelism
"I need distributed model serving"
└──▶ Ray Serve
• Model composition
• Fractional GPU allocation
• Auto-scaling
• A/B testing built-in
"I need to build LLM applications"
└──▶ LangChain / LlamaIndex
• RAG (Retrieval-Augmented Generation)
• Chains and agents
• Memory management
• Document processing
ML PLATFORM STACK:
─────────────────────────────────────────────────────────────────
ML Workflow
─────────────────────────────────────────────────────────────────
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌───────────┐ ┌─────────┐
│Kubeflow │ │ MLflow │ │ Feast │
│Pipelines│ │ Tracking │ │Features │
└─────────┘ └───────────┘ └─────────┘
│ │ │
▼ ▼ ▼
Training Model Registry Feature Store
Orchestration Versioning Online/Offline
┌─────────────────────────────────────────────────────────────────┐
│ ML PLATFORM ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ EXPERIMENTATION │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Kubeflow Notebooks │ │
│ │ • JupyterHub • GPU access • Shared storage │ │
│ │ │ │
│ │ MLflow Tracking │ │
│ │ • Parameters • Metrics • Artifacts │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ TRAINING │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Kubeflow Pipelines │ │
│ │ • Workflow DAGs • Artifact tracking • Caching │ │
│ │ │ │
│ │ Katib (AutoML) │ │
│ │ • Hyperparameter tuning • Neural architecture search │ │
│ │ │ │
│ │ Training Operators │ │
│ │ • TFJob • PyTorchJob • MPIJob │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ FEATURE MANAGEMENT │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Feast │ │
│ │ • Offline store (training) • Online store (inference) │ │
│ │ • Point-in-time joins • Feature serving │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ SERVING & PRODUCTION │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ MLflow Model Registry │ │
│ │ • Model versions • Stages • Aliases │ │
│ │ │ │
│ │ KServe │ │
│ │ • Model serving • Auto-scaling • Canary rollouts │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Module 9.1: Kubeflow
│ ML platform foundation
│ Pipelines, notebooks, training
Module 9.2: MLflow
│ Experiment tracking
│ Model registry
Module 9.3: Feature Stores
│ Feature management
│ Training-serving consistency
Module 9.4: vLLM
│ High-throughput LLM serving
│ PagedAttention optimization
Module 9.5: Ray Serve
│ Distributed inference
│ Model composition
Module 9.6: LangChain & LlamaIndex
│ LLM application frameworks
│ RAG and agents
[Toolkit Complete] → Production AI/ML!
PrincipleToolImplementation
ReproducibilityKubeflow PipelinesContainerized steps, artifacts
Experiment trackingMLflowParameters, metrics, models
Feature consistencyFeastPoint-in-time correct features
Model lifecycleMLflow RegistryVersions, stages, aliases
Scalable trainingKubeflow OperatorsDistributed training
Model servingKServeAuto-scaling inference
ML PLATFORM COMPONENTS
─────────────────────────────────────────────────────────────────
KUBEFLOW
├── Notebooks - Interactive development
├── Pipelines - Workflow orchestration
├── Katib - Hyperparameter tuning
├── Training Operators - Distributed training
└── KServe - Model serving
MLFLOW
├── Tracking - Experiment logging
├── Projects - Reproducible packaging
├── Models - Unified model format
└── Registry - Model lifecycle
FEAST
├── Feature Views - Feature definitions
├── Offline Store - Historical features
├── Online Store - Latest features
└── Feature Server - Real-time serving
INTEGRATED ML WORKFLOW
─────────────────────────────────────────────────────────────────
Data Features Training
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────────┐
│Raw Data │───────▶│ Feast │───────▶│Kubeflow Pipeline │
│(S3, GCS) │ │(Features)│ │ │
└──────────┘ └──────────┘ │ 1. Load features │
│ 2. Train model │
│ 3. Evaluate │
│ 4. Register │
└────────┬─────────┘
┌────────────────────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ MLflow │ │ MLflow │ │ KServe │
│ Tracking │ │ Registry │ │(Serving) │
└──────────┘ └──────────┘ └──────────┘
# Pipeline step with MLflow tracking
@dsl.component(packages_to_install=["mlflow"])
def train_with_tracking(mlflow_uri: str):
import mlflow
mlflow.set_tracking_uri(mlflow_uri)
with mlflow.start_run():
# Train model
mlflow.log_params(params)
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(model, "model",
registered_model_name="my-model")
# Training with Feast features and MLflow tracking
import mlflow
from feast import FeatureStore
store = FeatureStore()
mlflow.set_tracking_uri("http://mlflow:5000")
# Get training features
training_df = store.get_historical_features(
entity_df=entity_df,
features=["user_features:feature1", "user_features:feature2"]
).to_df()
with mlflow.start_run():
mlflow.log_param("feature_store", "feast")
mlflow.log_param("feature_view", "user_features")
model.fit(training_df)
mlflow.sklearn.log_model(model, "model")
ML DEVELOPMENT TO PRODUCTION
─────────────────────────────────────────────────────────────────
DEVELOPMENT STAGING PRODUCTION
─────────────────────────────────────────────────────────────────
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Notebooks │ │ Pipelines │ │ KServe │
│ (Kubeflow) │ │ (Kubeflow) │ │ (Serving) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ MLflow │ │ MLflow │ │ MLflow │
│ Tracking │─────▶│ Registry │─────▶│ Registry │
│ │ │ (Staging) │ │(Production) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Feast │ │ Feast │ │ Feast │
│ (Dev Store) │─────▶│(Stage Store)│─────▶│(Prod Store) │
└─────────────┘ └─────────────┘ └─────────────┘
REAL-TIME ML SERVING
─────────────────────────────────────────────────────────────────
Request Feast Model
│ │ │
│ user_id=123 │ │
│ ─────────────────▶ │ │
│ │ │
│ Get features │
│ ─────────────────────────────▶ │
│ │ │
│ features = [...] │
│ ◀───────────────────────────── │
│ │ │
│ │ Predict │
│ │ ─────────────────────▶│
│ │ │
│ prediction │ prediction │
│ ◀─────────────────────────────────────────────────│
Latency: < 100ms total
- Feature fetch: ~10ms (Redis)
- Model inference: ~50ms (GPU)
ModuleKey Exercise
KubeflowDeploy Pipelines, run training pipeline
MLflowTrack experiment, register model
Feature StoresDefine features, serve online/offline
vLLMDeploy LLM with PagedAttention, benchmark throughput
Ray ServeBuild multi-model pipeline with composition
LangChain/LlamaIndexCreate RAG application with vector store
ML PLATFORM TOOLS
─────────────────────────────────────────────────────────────────
Kubeflow MLflow Feast
─────────────────────────────────────────────────────────────────
Primary focus Workflows Tracking Features
Kubernetes-native ✓✓ ✓ ✓
Standalone ✗ ✓✓ ✓
Experiment log Basic ✓✓ ✗
Model registry ✗ ✓✓ ✗
Feature store ✗ ✗ ✓✓
Pipeline DAGs ✓✓ Projects ✗
AutoML Katib ✗ ✗
Model serving KServe Basic ✗
─────────────────────────────────────────────────────────────────
RECOMMENDATION: Use all three together
- Kubeflow: Orchestration & training
- MLflow: Tracking & model registry
- Feast: Feature management

“The best ML platform is invisible to data scientists. They focus on models; the platform handles everything else.”