Machine Learning
AI/ML Engineering Track
Overview
Section titled “Overview”Machine learning is the engineering discipline behind most production ML systems on tabular and structured data. Despite the deep-learning headlines, the majority of business-critical ML in fraud detection, churn, credit scoring, demand forecasting, and recommendation ranking still runs on the algorithms in this section: linear models with regularization, regularized GBMs, random forests, calibrated classifiers, and time-series methods.
This section is organized as a Tier-1 spine of twelve practitioner-essentials, followed by a Tier-2 set of advanced topics that production teams reach for once the basics are stable. Every module is taught at Bloom Level 3+ — design, evaluate, debug — not “remember the API.”
Tier-1 Modules
Section titled “Tier-1 Modules”| # | Module | Status |
|---|---|---|
| 1.1 | Scikit-learn API & Pipelines | Available |
| 1.2 | Linear & Logistic Regression with Regularization | Available |
| 1.3 | Model Evaluation, Validation, Leakage & Calibration | Available |
| 1.4 | Feature Engineering & Preprocessing | Available |
| 1.5 | Decision Trees & Random Forests | Available |
| 1.6 | XGBoost & Gradient Boosting | Available |
| 1.7 | Naive Bayes, k-NN & SVMs | Available |
| 1.8 | Unsupervised Learning: Clustering | Available |
| 1.9 | Anomaly Detection & Novelty Detection | Available |
| 1.10 | Dimensionality Reduction | Available |
| 1.11 | Hyperparameter Optimization | Available |
| 1.12 | Time Series Forecasting | Available |
Tier-2 Modules
Section titled “Tier-2 Modules”| # | Module | Status |
|---|---|---|
| 2.1 | Class Imbalance & Cost-Sensitive Learning | Available |
| 2.2 | ML Interpretability + Failure Slicing | Available |
| 2.3 | Probabilistic & Bayesian ML with PyMC | Available |
| 2.4 | Recommender Systems | Available |
| 2.5 | Conformal Prediction & Uncertainty Quantification | Available |
| 2.6 | Fairness & Bias Auditing | Available |
| 2.7 | Causal Inference for ML Practitioners | Available |
Recommended Order
Section titled “Recommended Order”For first-time practitioners:
- Start with 1.1 to internalize the sklearn estimator/transformer/pipeline contract.
- Move to 1.3 (evaluation, validation, leakage, calibration) before any modeling work — most ML failures in production are evaluation failures, not modeling failures.
- Build feature engineering muscle in 1.4.
- Then walk through algorithms 1.2, 1.5, 1.6, 1.7 — each adds to your sense of which model to reach for.
- Branch into 1.8–1.10 for unsupervised work, 1.11 for systematic tuning, 1.12 for time series.
The Tier-2 set is sequence-independent — pick by problem.
Cross-Links
Section titled “Cross-Links”- For deep learning architectures (CNNs, transformers, training loops): Deep Learning Foundations
- For RL: Reinforcement Learning
- For deploying these models on Kubernetes: MLOps & LLMOps
- For drift, monitoring, and observability of these models in production: see MLOps Module 1.10 — ML Monitoring
See the full expansion plan in issue #677.