Skip to content

MLOps Discipline

1 module is currently being reworked. Watch this section over the next few days.

Discipline Track | 12 Modules | ~10 hours total

MLOps brings engineering rigor to machine learning. Most ML projects fail not because of bad models, but because teams can’t operationalize them. Data scientists build prototypes; MLOps turns them into production systems.

This track covers the complete ML lifecycle—from experiment tracking and feature stores to data versioning, model serving, monitoring, and automated pipelines—giving you the skills to deploy and maintain ML systems at scale.

Before starting this track:

  • Observability Theory Track — Monitoring fundamentals
  • Basic machine learning concepts (training, inference, models)
  • Python programming experience
  • Understanding of CI/CD concepts
  • Kubernetes basics (helpful but not required)
#ModuleComplexityTime
5.1MLOps Fundamentals[MEDIUM]35-40 min
5.2Feature Engineering & Stores[COMPLEX]40-45 min
5.3Model Training & Experimentation[COMPLEX]40-45 min
5.4Model Serving & Inference[COMPLEX]40-45 min
5.5Model Monitoring & Observability[COMPLEX]40-45 min
5.6ML Pipelines & Automation[COMPLEX]40-45 min
5.7Data Versioning with DVC[COMPLEX]50-60 min
5.8Great Expectations Data Quality[COMPLEX]50-60 min
5.9ML Repository Hygiene[COMPLEX]45-55 min
5.10Production Model-Serving Traffic Patterns[COMPLEX]55-65 min
5.11Drift-Triggered Auto-Retraining Loop[COMPLEX]60-70 min
5.12CML for ML CI[COMPLEX]55-60 min

After completing this track, you will be able to:

  1. Understand MLOps maturity — From notebooks to automated pipelines
  2. Build feature stores — Ensure consistency between training and serving
  3. Track experiments — Reproduce results, compare approaches systematically
  4. Deploy models — KServe, canary deployments, A/B testing
  5. Monitor ML systems — Detect drift, track performance without labels
  6. Automate pipelines — Kubeflow, continuous training, CI/CD for ML
  7. Version data and models — Use DVC to connect Git commits, data hashes, model artifacts, and metrics
  8. Gate data quality — Use Great Expectations to validate schema, completeness, and distribution contracts before training or serving
  9. Maintain clean ML repositories — Keep data, models, notebooks, dependencies, hooks, and CI policies reviewable without bloating Git history
  10. Control serving exposure — Use canary, A/B, shadow, mirroring, and bandit patterns to promote models with measurable rollback and cost controls
  11. Close the model lifecycle loop — Close the model lifecycle loop with drift-triggered automated retraining, gated promotion, and forensic rollback
  12. Surface model review evidence — Surface model metric deltas, validation reports, and deployment health back into the Git PR review surface using CML on GitHub Actions or GitLab CI.
┌─────────────────────────────────────────────────────────────────┐
│ ML LIFECYCLE │
│ │
│ DATA EXPERIMENTATION PRODUCTION │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Data │ │ Model │ │ Model │ │
│ │ Ingestion│───────▶│ Training │────────────▶│ Serving │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ │
│ │ Data │ │ Model │ │ Model │ │
│ │Validation│ │Validation│ │Monitoring│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ │
│ │ Feature │ │ Model │ │ Trigger │ │
│ │ Store │ │ Registry │ │ Retrain │◀──────┘
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
  1. Reproducibility — Every training run must be reproducible
  2. Automation — Automate everything from training to deployment
  3. Versioning — Version code, data, AND models
  4. Monitoring — ML systems fail silently; monitor everything
  5. Continuous Training — Models degrade; keep them fresh
AspectDevOpsMLOps
ArtifactCodeCode + Data + Model
TestingUnit, integration+ Model validation, drift tests
VersioningGitGit + DVC/MLflow
MonitoringInfrastructure+ Data quality, model performance
CI/CDBuild, test, deploy+ Train, validate, serve
CategoryTools
Experiment TrackingMLflow, Weights & Biases, Neptune
Feature StoresFeast, Tecton, Hopsworks
Model ServingKServe, Seldon Core, BentoML, TorchServe
Pipeline OrchestrationKubeflow Pipelines, Apache Airflow, Argo
MonitoringEvidently, WhyLabs, Arize, NannyML
Hyperparameter TuningOptuna, Katib, Ray Tune
PlatformsKubeflow, SageMaker, Vertex AI, Databricks
Module 5.1: MLOps Fundamentals
│ Why ML is different, maturity levels
Module 5.2: Feature Engineering & Stores
│ Training/serving skew, Feast
Module 5.3: Model Training & Experimentation
│ MLflow, HPO, reproducibility
Module 5.4: Model Serving & Inference
│ KServe, deployment patterns
Module 5.5: Model Monitoring & Observability
│ Drift detection, Evidently
Module 5.6: ML Pipelines & Automation
│ Kubeflow, CI/CD for ML
Module 5.7: Data Versioning with DVC
│ Git + DVC metadata, remotes, reproducible data pipelines
Module 5.8: Great Expectations Data Quality
│ Data contracts, checkpoints, Data Docs, K8s validation Jobs
Module 5.9: ML Repository Hygiene
│ src layout, ignore policy, lock files, notebook discipline, pre-commit gates
Module 5.10: Production Model-Serving Traffic Patterns
│ KServe canary, Istio A/B, shadow, mirroring, bandits, cost controls
Module 5.11: Drift-Triggered Auto-Retraining Loop
│ Drift signals, Argo triggers, retraining DAGs, validation gates, rollback
Module 5.12: CML for ML CI
│ PR comments, DVC metric deltas, validation reports, runner cost controls
[Track Complete] → ML Platforms Toolkit

“A model is only as good as the system that serves it. MLOps is that system.”