Machine Learning

AI/ML Engineering Track | Phase 11

Overview

Machine learning is the engineering discipline behind most production ML systems on tabular and structured data. Despite the deep-learning headlines, the majority of business-critical ML in fraud detection, churn, credit scoring, demand forecasting, and recommendation ranking still runs on the algorithms in this section: linear models with regularization, regularized GBMs, random forests, calibrated classifiers, and time-series methods.

This section is organized as a Tier-1 spine of twelve practitioner-essentials, followed by a Tier-2 set of advanced topics that production teams reach for once the basics are stable. Every module is taught at Bloom Level 3+ — design, evaluate, debug — not “remember the API.”

Tier-1 Modules

#	Module	Status
1.1	Scikit-learn API & Pipelines	Available
1.2	Linear & Logistic Regression with Regularization	Available
1.3	Model Evaluation, Validation, Leakage & Calibration	Available
1.4	Feature Engineering & Preprocessing	Available
1.5	Decision Trees & Random Forests	Available
1.6	XGBoost & Gradient Boosting	Available
1.7	Naive Bayes, k-NN & SVMs	Available
1.8	Unsupervised Learning: Clustering	Available
1.9	Anomaly Detection & Novelty Detection	Available
1.10	Dimensionality Reduction	Available
1.11	Hyperparameter Optimization	Available
1.12	Time Series Forecasting	Available

Tier-2 Modules

#	Module	Status
2.1	Class Imbalance & Cost-Sensitive Learning	Available
2.2	ML Interpretability + Failure Slicing	Available
2.3	Probabilistic & Bayesian ML with PyMC	Available
2.4	Recommender Systems	Available
2.5	Conformal Prediction & Uncertainty Quantification	Available
2.6	Fairness & Bias Auditing	Available
2.7	Causal Inference for ML Practitioners	Available

Recommended Order

For first-time practitioners:

Start with 1.1 to internalize the sklearn estimator/transformer/pipeline contract.
Move to 1.3 (evaluation, validation, leakage, calibration) before any modeling work — most ML failures in production are evaluation failures, not modeling failures.
Build feature engineering muscle in 1.4.
Then walk through algorithms 1.2, 1.5, 1.6, 1.7 — each adds to your sense of which model to reach for.
Branch into 1.8–1.10 for unsupervised work, 1.11 for systematic tuning, 1.12 for time series.

The Tier-2 set is sequence-independent — pick by problem.

Cross-Links

For deep learning architectures (CNNs, transformers, training loops): Deep Learning Foundations
For RL: Reinforcement Learning
For deploying these models on Kubernetes: MLOps & LLMOps
For drift, monitoring, and observability of these models in production: see MLOps Module 1.10 — ML Monitoring

See the full expansion plan in issue #677.