Skip to content

Machine Learning

AI/ML Engineering Track

Machine learning is the engineering discipline behind most production ML systems on tabular and structured data. Despite the deep-learning headlines, the majority of business-critical ML in fraud detection, churn, credit scoring, demand forecasting, and recommendation ranking still runs on the algorithms in this section: linear models with regularization, regularized GBMs, random forests, calibrated classifiers, and time-series methods.

This section is organized as a Tier-1 spine of twelve practitioner-essentials, followed by a Tier-2 set of advanced topics that production teams reach for once the basics are stable. Every module is taught at Bloom Level 3+ — design, evaluate, debug — not “remember the API.”

#ModuleStatus
1.1Scikit-learn API & PipelinesAvailable
1.2Linear & Logistic Regression with RegularizationAvailable
1.3Model Evaluation, Validation, Leakage & CalibrationAvailable
1.4Feature Engineering & PreprocessingAvailable
1.5Decision Trees & Random ForestsAvailable
1.6XGBoost & Gradient BoostingAvailable
1.7Naive Bayes, k-NN & SVMsAvailable
1.8Unsupervised Learning: ClusteringAvailable
1.9Anomaly Detection & Novelty DetectionAvailable
1.10Dimensionality ReductionAvailable
1.11Hyperparameter OptimizationAvailable
1.12Time Series ForecastingAvailable
#ModuleStatus
2.1Class Imbalance & Cost-Sensitive LearningAvailable
2.2ML Interpretability + Failure SlicingAvailable
2.3Probabilistic & Bayesian ML with PyMCAvailable
2.4Recommender SystemsAvailable
2.5Conformal Prediction & Uncertainty QuantificationAvailable
2.6Fairness & Bias AuditingAvailable
2.7Causal Inference for ML PractitionersAvailable

For first-time practitioners:

  1. Start with 1.1 to internalize the sklearn estimator/transformer/pipeline contract.
  2. Move to 1.3 (evaluation, validation, leakage, calibration) before any modeling work — most ML failures in production are evaluation failures, not modeling failures.
  3. Build feature engineering muscle in 1.4.
  4. Then walk through algorithms 1.2, 1.5, 1.6, 1.7 — each adds to your sense of which model to reach for.
  5. Branch into 1.8–1.10 for unsupervised work, 1.11 for systematic tuning, 1.12 for time series.

The Tier-2 set is sequence-independent — pick by problem.

See the full expansion plan in issue #677.