ML Platforms Toolkit

Toolkit Track | 11 Modules | ~9.5 hours total

Overview

The ML Platforms Toolkit covers the infrastructure for production machine learning on Kubernetes. From traditional ML pipelines with Kubeflow and MLflow to the LLM revolution with vLLM and LangChain—this toolkit provides the complete foundation for modern AI/ML infrastructure. Whether you’re running batch training, serving real-time predictions, or building RAG applications, these tools form the backbone of production AI systems.

This toolkit applies concepts from MLOps Discipline.

Prerequisites

Before starting this toolkit:

MLOps Discipline
Kubernetes fundamentals
Basic ML concepts (training, inference)
Python familiarity

Modules

#	Module	Complexity	Time
9.1	Kubeflow	`[COMPLEX]`	50-60 min
9.2	MLflow	`[MEDIUM]`	40-45 min
9.3	Feature Stores	`[MEDIUM]`	40-45 min
9.4	vLLM	`[COMPLEX]`	50-60 min
9.5	Ray Serve	`[COMPLEX]`	50-60 min
9.6	LangChain & LlamaIndex	`[COMPLEX]`	50-60 min
9.7	GPU Scheduling	`[COMPLEX]`	50 min
9.8	KServe	`[COMPLEX]`	55-65 min
9.9	Seldon Core	`[COMPLEX]`	55-65 min
9.10	BentoML	`[COMPLEX]`	50-60 min
9.11	Bare-Metal MLOps	`[COMPLEX]`	60-70 min

Learning Outcomes

After completing this toolkit, you will be able to:

Deploy Kubeflow — Pipelines, notebooks, model serving
Track experiments — MLflow tracking and model registry
Manage features — Feast offline and online stores
Serve LLMs efficiently — vLLM with PagedAttention for high throughput
Build distributed inference — Ray Serve for multi-model pipelines
Create RAG applications — LangChain and LlamaIndex for LLM apps
Deploy inference graphs — Seldon Core 2 pipelines, multi-model serving, and Alibi explainability
Package and serve Python-first models — BentoML Service/runners/Bento lifecycle, K8s deployment, adaptive micro-batching
Build bare-metal ML platforms — Assemble MinIO, MLflow, KServe, Argo Workflows, and kube-prometheus-stack into a production ML platform without managed cloud

Tool Selection Guide

WHICH ML PLATFORM TOOL?
─────────────────────────────────────────────────────────────────

"I need to orchestrate ML training pipelines"
└──▶ Kubeflow Pipelines
     • Workflow orchestration
     • Artifact management
     • Kubernetes-native
     • GPU scheduling

"I need to track experiments and models"
└──▶ MLflow
     • Parameter/metric logging
     • Model versioning
     • Model registry
     • Framework-agnostic

"I need to manage and serve features"
└──▶ Feast
     • Feature definitions
     • Point-in-time correctness
     • Online/offline stores
     • Training-serving consistency

"I need AutoML / hyperparameter tuning"
└──▶ Kubeflow Katib
     • Bayesian optimization
     • Neural architecture search
     • Parallel trials
     • Early stopping

"I need to serve models at scale"
└──▶ KServe (with Kubeflow)
     • Auto-scaling
     • Canary deployments
     • Multi-framework support
     • GPU inference

"I need inference graphs, multi-model serving, and explainability"
└──▶ Seldon Core
     • Multi-framework model loading (sklearn, MLflow, Triton, HuggingFace)
     • Inference graph pipelines with chaining, joins, and conditional routing
     • Alibi explainability (anchors, integrated gradients, counterfactual)
     • Drift and outlier detection (Alibi-Detect)

"I need to serve LLMs with high throughput"
└──▶ vLLM
     • PagedAttention memory optimization
     • Continuous batching
     • OpenAI-compatible API
     • Multi-GPU tensor parallelism

"I need distributed model serving"
└──▶ Ray Serve
     • Model composition
     • Fractional GPU allocation
     • Auto-scaling
     • A/B testing built-in

"I need to build LLM applications"
└──▶ LangChain / LlamaIndex

"I need a production ML platform without AWS/GCP/Azure"
└──▶ Bare-Metal MLOps Recipe (Module 9.11)
     • kubeadm/k3s + MetalLB + Longhorn + MinIO
     • MLflow self-hosted + KServe/Seldon/BentoML
     • Argo Workflows + ArgoCD + kube-prometheus-stack
     • RAG (Retrieval-Augmented Generation)
     • Chains and agents
     • Memory management
     • Document processing

"I need Python-first model packaging and custom serving logic"
└──▶ BentoML
     • Python Service + runner architecture
     • Bento packaging with reproducible artifacts
     • Adaptive micro-batching per runner
     • K8s deployment via Helm or Yatai control plane

ML PLATFORM STACK:
─────────────────────────────────────────────────────────────────
                      ML Workflow
─────────────────────────────────────────────────────────────────
                          │
    ┌─────────────────────┼─────────────────────┐
    │                     │                     │
    ▼                     ▼                     ▼
┌─────────┐        ┌───────────┐        ┌─────────┐
│Kubeflow │        │  MLflow   │        │  Feast  │
│Pipelines│        │ Tracking  │        │Features │
└─────────┘        └───────────┘        └─────────┘
    │                     │                     │
    ▼                     ▼                     ▼
Training            Model Registry       Feature Store
Orchestration       Versioning          Online/Offline

The ML Platform Stack

┌─────────────────────────────────────────────────────────────────┐
│                    ML PLATFORM ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  EXPERIMENTATION                                                │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │  Kubeflow Notebooks                                        │ │
│  │  • JupyterHub  • GPU access  • Shared storage             │ │
│  │                                                            │ │
│  │  MLflow Tracking                                          │ │
│  │  • Parameters  • Metrics  • Artifacts                     │ │
│  └───────────────────────────────────────────────────────────┘ │
│                              │                                   │
│  TRAINING                                                       │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │  Kubeflow Pipelines                                        │ │
│  │  • Workflow DAGs  • Artifact tracking  • Caching          │ │
│  │                                                            │ │
│  │  Katib (AutoML)                                           │ │
│  │  • Hyperparameter tuning  • Neural architecture search    │ │
│  │                                                            │ │
│  │  Training Operators                                        │ │
│  │  • TFJob  • PyTorchJob  • MPIJob                          │ │
│  └───────────────────────────────────────────────────────────┘ │
│                              │                                   │
│  FEATURE MANAGEMENT                                             │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │  Feast                                                     │ │
│  │  • Offline store (training)  • Online store (inference)   │ │
│  │  • Point-in-time joins  • Feature serving                 │ │
│  └───────────────────────────────────────────────────────────┘ │
│                              │                                   │
│  SERVING & PRODUCTION                                           │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │  MLflow Model Registry                                     │ │
│  │  • Model versions  • Stages  • Aliases                    │ │
│  │                                                            │ │
│  │  KServe                                                    │ │
│  │  • Model serving  • Auto-scaling  • Canary rollouts       │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Study Path

Module 9.1: Kubeflow
     │
     │  ML platform foundation
     │  Pipelines, notebooks, training
     ▼
Module 9.2: MLflow
     │
     │  Experiment tracking
     │  Model registry
     ▼
Module 9.3: Feature Stores
     │
     │  Feature management
     │  Training-serving consistency
     ▼
Module 9.4: vLLM
     │
     │  High-throughput LLM serving
     │  PagedAttention optimization
     ▼
Module 9.5: Ray Serve
     │
     │  Distributed inference
     │  Model composition
     ▼
Module 9.6: LangChain & LlamaIndex
     │
     │  LLM application frameworks
     │  RAG and agents
     ▼
Module 9.7: GPU Scheduling
     │
     │  GPU resource management
     │  Device plugins, NVIDIA operator
     ▼
Module 9.8: KServe
     │
     │  Production model inference
     │  Serverless + raw deployment modes
     ▼
Module 9.9: Seldon Core
     │
     │  Multi-framework model serving
     │  Inference graphs, Alibi explainability
     ▼
Module 9.10: BentoML
     │
     │  Python-first model packaging
     │  Runners, micro-batching, K8s deployment
     ▼
Module 9.11: Bare-Metal MLOps
     │
     │  Integration capstone
     │  Full platform assembly without managed cloud
     ▼
[Toolkit Complete] → Production AI/ML!

Key Concepts

MLOps Platform Principles

Principle	Tool	Implementation
Reproducibility	Kubeflow Pipelines	Containerized steps, artifacts
Experiment tracking	MLflow	Parameters, metrics, models
Feature consistency	Feast	Point-in-time correct features
Model lifecycle	MLflow Registry	Versions, stages, aliases
Scalable training	Kubeflow Operators	Distributed training
Model serving	KServe	Auto-scaling inference

Platform Components

ML PLATFORM COMPONENTS
─────────────────────────────────────────────────────────────────

KUBEFLOW
├── Notebooks - Interactive development
├── Pipelines - Workflow orchestration
├── Katib - Hyperparameter tuning
├── Training Operators - Distributed training
└── KServe - Model serving

MLFLOW
├── Tracking - Experiment logging
├── Projects - Reproducible packaging
├── Models - Unified model format
└── Registry - Model lifecycle

FEAST
├── Feature Views - Feature definitions
├── Offline Store - Historical features
├── Online Store - Latest features
└── Feature Server - Real-time serving

Integration Patterns

Complete ML Pipeline

INTEGRATED ML WORKFLOW
─────────────────────────────────────────────────────────────────

Data                    Features                Training
  │                        │                       │
  ▼                        ▼                       ▼
┌──────────┐        ┌──────────┐        ┌──────────────────┐
│Raw Data  │───────▶│  Feast   │───────▶│Kubeflow Pipeline │
│(S3, GCS) │        │(Features)│        │                  │
└──────────┘        └──────────┘        │ 1. Load features │
                                        │ 2. Train model   │
                                        │ 3. Evaluate      │
                                        │ 4. Register      │
                                        └────────┬─────────┘
                                                 │
                    ┌────────────────────────────┼────────────┐
                    │                            │            │
                    ▼                            ▼            ▼
             ┌──────────┐                 ┌──────────┐ ┌──────────┐
             │  MLflow  │                 │  MLflow  │ │  KServe  │
             │ Tracking │                 │ Registry │ │(Serving) │
             └──────────┘                 └──────────┘ └──────────┘

Kubeflow + MLflow

# Pipeline step with MLflow tracking
@dsl.component(packages_to_install=["mlflow"])
def train_with_tracking(mlflow_uri: str):
    import mlflow

    mlflow.set_tracking_uri(mlflow_uri)

    with mlflow.start_run():
        # Train model
        mlflow.log_params(params)
        mlflow.log_metrics(metrics)
        mlflow.sklearn.log_model(model, "model",
            registered_model_name="my-model")

MLflow + Feast

# Training with Feast features and MLflow tracking
import mlflow
from feast import FeatureStore

store = FeatureStore()
mlflow.set_tracking_uri("http://mlflow:5000")

# Get training features
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=["user_features:feature1", "user_features:feature2"]
).to_df()

with mlflow.start_run():
    mlflow.log_param("feature_store", "feast")
    mlflow.log_param("feature_view", "user_features")

    model.fit(training_df)
    mlflow.sklearn.log_model(model, "model")

Common Architectures

Development to Production

ML DEVELOPMENT TO PRODUCTION
─────────────────────────────────────────────────────────────────

DEVELOPMENT                 STAGING                 PRODUCTION
─────────────────────────────────────────────────────────────────

┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│  Notebooks  │      │  Pipelines  │      │   KServe    │
│  (Kubeflow) │      │  (Kubeflow) │      │  (Serving)  │
└──────┬──────┘      └──────┬──────┘      └──────┬──────┘
       │                    │                    │
       ▼                    ▼                    ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│   MLflow    │      │   MLflow    │      │   MLflow    │
│  Tracking   │─────▶│  Registry   │─────▶│  Registry   │
│             │      │  (Staging)  │      │(Production) │
└─────────────┘      └─────────────┘      └─────────────┘
       │                    │                    │
       ▼                    ▼                    ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│   Feast     │      │   Feast     │      │   Feast     │
│ (Dev Store) │─────▶│(Stage Store)│─────▶│(Prod Store) │
└─────────────┘      └─────────────┘      └─────────────┘

Real-Time ML System

REAL-TIME ML SERVING
─────────────────────────────────────────────────────────────────

Request                     Feast                    Model
  │                           │                        │
  │ user_id=123               │                        │
  │ ─────────────────▶        │                        │
  │                           │                        │
  │                     Get features                   │
  │                     ─────────────────────────────▶ │
  │                           │                        │
  │                     features = [...]               │
  │                     ◀───────────────────────────── │
  │                           │                        │
  │                           │  Predict               │
  │                           │  ─────────────────────▶│
  │                           │                        │
  │ prediction                │  prediction            │
  │ ◀─────────────────────────────────────────────────│

Latency: < 100ms total
  - Feature fetch: ~10ms (Redis)
  - Model inference: ~50ms (GPU)

Hands-On Focus

Module	Key Exercise
Kubeflow	Deploy Pipelines, run training pipeline
MLflow	Track experiment, register model
Feature Stores	Define features, serve online/offline
vLLM	Deploy LLM with PagedAttention, benchmark throughput
Ray Serve	Build multi-model pipeline with composition
LangChain/LlamaIndex	Create RAG application with vector store
GPU Scheduling	Install NVIDIA operator, schedule GPU workloads
KServe	Deploy InferenceService, canary rollout, switch to raw mode
Seldon Core	Deploy inference graph with two model variants, A/B Experiment, and Alibi explainer
BentoML	Build two-runner Bento (embedder + classifier), deploy to K8s with adaptive micro-batching, observe throughput/latency
Bare-Metal MLOps	Deploy full platform: MinIO + MLflow + KServe + Argo Workflows + Prometheus; train sklearn model, register, serve, observe

Tool Comparison

ML PLATFORM TOOLS
─────────────────────────────────────────────────────────────────

                   Kubeflow        MLflow          Feast
─────────────────────────────────────────────────────────────────
Primary focus      Workflows       Tracking        Features
Kubernetes-native  ✓✓              ✓               ✓
Standalone         ✗               ✓✓              ✓
Experiment log     Basic           ✓✓              ✗
Model registry     ✗               ✓✓              ✗
Feature store      ✗               ✗               ✓✓
Pipeline DAGs      ✓✓              Projects        ✗
AutoML             Katib           ✗               ✗
Model serving      KServe          Basic           ✗
─────────────────────────────────────────────────────────────────

RECOMMENDATION: Use all three together
- Kubeflow: Orchestration & training
- MLflow: Tracking & model registry
- Feast: Feature management

Before: MLOps Discipline — MLOps concepts and practices
Related: IaC Discipline — Infrastructure provisioning for ML platforms
Related: IaC Tools Toolkit — Terraform modules for ML infrastructure
Related: Observability Toolkit — Monitor ML systems
Related: GitOps & Deployments Toolkit — Deploy ML infrastructure
Related: Scaling & Reliability Toolkit — Scale ML workloads

“The best ML platform is invisible to data scientists. They focus on models; the platform handles everything else.”