What's New

June 2026

AI/ML Engineering — the track is complete and currency-reviewed end to end. The AI/ML Engineering track is now a full ~120-module curriculum across its phases, and every module has passed an independent cross-family quality review for technical accuracy and currency — model names, framework versions, and CNCF project maturity verified against upstream sources and quarantined into dated snapshots so the fast-moving specifics stay refreshable without rewriting the lessons. This cycle rounded out the track beyond the from-scratch deep-learning and machine-learning arcs noted below: generative AI, vector search & RAG, agent frameworks, advanced generative AI, multimodal AI, AI infrastructure on Kubernetes, and reinforcement learning. Issues #2020, #2031.
Platform Engineering Foundations — the theory spine, built out across seven sections. Platform Engineering Foundations grew from stubs into full lessons that teach the durable concepts beneath the tooling — the primitives and tradeoffs that outlast any single vendor or release. Systems Thinking (feedback loops, mental models for operations, and complexity and emergent behavior), Reliability Engineering (failure modes and effects, redundancy and fault tolerance, and the theory behind SLIs, SLOs, and error budgets), Observability Theory (the three pillars, instrumentation principles, and turning data into insight), Security Principles (the security mindset, defense in depth, identity and access, and secure-by-default), Distributed Systems (consensus and coordination, eventual consistency, partial failure and timeouts, and clock skew and ordering), Advanced Networking (DNS at scale, CDN and edge, WAF and DDoS mitigation, BGP and core routing, cloud load balancing, zero-trust networking, and IPv6/dual-stack Kubernetes), and Engineering Leadership (incident command, blameless postmortems, on-call and burnout prevention, architecture decision records, stakeholder communication, and mentorship). The through-line: teach the primitives and tradeoffs, and quarantine volatile specifics into dated snapshots. Issues #1898–#1903, #1947/#1950.
FinOps — a new cost-engineering discipline (six modules). A new FinOps section under Platform Disciplines treats cloud cost as an engineering concern rather than a finance afterthought: FinOps Fundamentals & the Cloud Bill, Kubernetes Cost Allocation & Visibility (showback/chargeback, OpenCost/Kubecost, and idle and shared-cost handling), Workload Rightsizing & Optimization, Cluster Scaling & Compute Optimization, Storage & Network Cost Management, and FinOps Culture & Automation. Issues #1954/#1955.
Ukrainian translation — major coverage jump (~14% → ~35%). Up-to-date Ukrainian coverage roughly doubled this cycle, tracked by currency (translation freshness against the English source) rather than file existence. The entire Kubernetes certifications tree is now fully translated and calque-reviewed — CKA, CKAD, CKS, KCNA, KCSA, plus every certification landing page (CGOA, OTCA, CBA, and the rest) — with the k8s track at 261/261 pages current. The Prerequisites track (Zero to Terminal and the early CKA/CKAD path) remains 100% translated and calque-reviewed for natural Ukrainian rather than word-for-word renders. Ukrainian translation of the AI History book has also begun: Parts 1 and 2 (the first ten chapters) are live, each reviewed by three independent model families before merge. Next up for translation: the AI and AI/ML Engineering tracks. See the Ukrainian changelog for the UK-side view; progress is tracked in epic #1911.
Synthesis Apps mini-arc extended to four modules — orchestration, eval gates, and observability. The AI/ML Engineering Synthesis Apps arc now goes past the backing-services substrate (3.1) into the full application path: 3.2 Wiring the LLM App: The Orchestration Layer (a Kubernetes orchestration service that calls vLLM and Qdrant, Redis checkpointing, and explicit retries/budgets/circuit-breaking), 3.3 Production Gates: LLM Evals in CI (promptfoo/ragas/DeepEval suites as a rollout gate, golden-vs-regression thresholds, Argo Rollouts and GitHub Actions gating), and 3.4 Agent Observability with OpenTelemetry (the agent span tree, current gen_ai.* GenAI semantic conventions, token-cost attribution, and closing the loop back into the eval set). Teaching content; runnable labs tracked separately (#386). Issue #1626.
Neural Networks from scratch — the deep-learning section, rebuilt (10 → 26 modules). The AI/ML Engineering deep-learning section is now a coherent from-scratch arc. It starts in pure NumPy (the perceptron, forward propagation, activation and loss functions, backprop by hand, a scalar autograd engine), bridges to PyTorch with a training-as-engineering block (initialization and signal propagation, optimizers, regularization, normalization layers, a training-diagnostics playbook, numerical stability and mixed precision), then builds the modern architecture primitives by hand — embeddings, residual connections, attention, and the transformer block. Block D completes the picture with convolutional networks, recurrent networks and sequence models, and an end-to-end capstone (dataset → baseline → architecture → diagnostics → ablation → runbook). The through-line: a transformer, a CNN, and an RNN are each shown to be the same nn.Linear + nonlinearity, trained by the same backprop the learner built by hand. Epic #1793 (supersedes #1609).

May 2026

AI Engineering Foundations — the engineering spine for working with AI. A new AI Engineering Foundations section (12 modules) bridges AI literacy and production agent work: prompt design, reasoning, safety, and contracts; context fundamentals, repository engineering for agents, retrieval and memory boundaries, and dynamic context orchestration; harness layers, guardrails, and operating agent loops; plus Symphony-style work orchestration as an applied harness.
Open Models & Local Inference. A new Open Models & Local Inference section (7 modules) covers model hubs and model cards, Hugging Face for learners, quantization and formats, MLX on Apple Silicon, Linux local inference, and runtime choice across Ollama, MLX, Transformers, and vLLM.
AI Engineering Foundations prompt-safety module. Prompt Safety and Evaluation adds a prompt-eval harness lesson covering golden-set regression, LLM-as-judge calibration, direct and indirect prompt injection, prompt leakage, jailbreak probes, drift detection, and CI-ready safety gates.
Kafka on Kubernetes: MirrorMaker2, partition reassignment, and Cruise Control. Module 1.2: Apache Kafka on Kubernetes (Strimzi) expanded with three new operator-grade sections covering KafkaMirrorMaker2 CRD configuration for DR and multi-region topologies (offset translation, heartbeat/checkpoint connectors, active-active vs active-passive patterns), kafka-reassign-partitions.sh three-phase workflow (generate/execute/verify with bandwidth throttling), and Cruise Control continuous rebalancing via the KafkaRebalance CRD (state machine, goal tiers, self-healing anomaly detection, production tuning). Issue #1324.
Operator implementation decision framework. Module 7.15: Helm vs Ansible vs Go Operator Decision Framework gives platform engineers a 12-axis decision matrix for choosing between Helm, Ansible, and Go operator styles. Covers OperatorHub.io capability levels, three worked examples (AWX Operator + cert-manager + Crossplane), code volume estimates, migration paths, and a hands-on lab that builds all three operator flavors against the same WebApp CRD on kind.
Platform Toolkits / IaC Tools: Module 7.13 Advanced watches.yaml Patterns — multi-CRD operators, WATCH_NAMESPACE scoping, cluster-scoped RBAC, watchDependentResources + blacklist filtering, finalizer mapping, selector filters, and performance tuning via ANSIBLE_WORKERS (#1356, T2-13 Wave 2).
Platform Toolkits / IaC Tools: Module 7.17 Testing Ansible Operators with Molecule and Kuttl (#1360, T2-13 arc complete) — Molecule delegated + docker + kind driver scenarios, Kuttl TestStep/TestAssert/errors.yaml E2E tests, operator-sdk scorecard, GitHub Actions matrix pipeline, coverage measurement, 8-row common mistakes table, 6 scenario-based quiz questions, hands-on lab with 4 tasks (~700 content lines).
Platform Toolkits / IaC Tools: Module 7.14 AWX, Tower, and Event-Driven Ansible (EDA) Integration (#1357 — AWX Operator architecture, AAP vs AWX vs standalone decision matrix, EDA rulebooks on Kubernetes via ansible.eda.webhook + informer-forwarder pattern, webhook integration with Alertmanager and GitHub Actions, credential management, k8s_info + add_host inventory pattern, operator-as-AWX-job pattern, production deployment and backup strategy, ~450 lines of prose content).
AI Infrastructure inference benchmarking. Benchmarking LLM Inference: TTFT, TPOT, and Workload-Aware Load Shaping teaches TTFT, TPOT, percentile latency, structured vLLM benchmark runs, workload-aware load shaping, serving-parameter tuning, and bandwidth-math validation for production inference stacks.
Production LLM inference engine selection. Production-Tier LLM Inference Engines: Decision Framework maps ExLlamaV2/V3, vLLM, SGLang, TensorRT-LLM, NVIDIA Dynamo, TGI, LMDeploy, MLC LLM, and OpenVINO to hardware tiers, workload classes, migration paths, and production failure modes.
AI Infrastructure bandwidth math. GPU Memory Hierarchy and Bandwidth Math for LLM Inference teaches HBM/GDDR/DRAM/NVLink/PCIe tradeoffs, arithmetic intensity, decode tokens-per-second prediction, and benchmark validation before engine or GPU selection.
2026-05-19 feat(content): module 3.1 LLM-Native Stack on Kubernetes — first module of synthesis-apps mini-arc (T2-7).
Platform Toolkits / IaC Tools: Module 7.11 HCP Terraform Workflow Operations (#1304 Wave 2, T2-23, ~2091 lines).
Platform Toolkits / GitOps & Deployments: Module 2.5 Dapr + Buildpacks application definition beyond Helm (#1304 Wave 2, T2-20, ~2189 lines).
MLOps: Module 5.12 CML for ML CI — closes MLOps Discipline track (#1304 Wave 2, T2-19, ~2069 lines).
MLOps: Module 5.11 Drift-Triggered Auto-Retraining Loop (#1304 Wave 2, T2-18, ~2001 lines).
MLOps model-serving traffic gap filled. Production Model-Serving Traffic Patterns now covers KServe canaries, Istio A/B routing, shadow traffic, mirroring, bandits, rollback gates, and cost controls for production model promotion.
MLOps repository-hygiene gap filled. ML Repository Hygiene now covers clean src/ layouts, DVC-aware ignore policy, uv lock discipline, notebook output stripping, pre-commit gates, CI split, and cost controls for ML repository bloat.
MLOps data-quality gap filled. Great Expectations Data Quality now covers GX Core 1.x suites, checkpoints, Data Docs, DVC baseline review, cost controls, and Kubernetes/Argo validation gates.
MLOps data-versioning gap filled. Data Versioning with DVC now covers Git-plus-DVC artifact lineage, DVC pipelines, MinIO remotes on kind, cost controls, and CI/Kubeflow integration boundaries.
Azure Essentials application-hosting gap filled. Azure App Service — Operator Path now covers App Service Plans and SKUs, deployment slots with slot-swap rollback, Hybrid Connections versus VNet integration versus Private Endpoint, managed identity, autoscale, App Service Environment v3, and the App Service versus Container Apps versus AKS decision.
Azure Essentials edge-operations gap filled. Azure Application Gateway — Operator Path now covers WAF policy tuning, Key Vault-backed TLS, AGIC versus Application Gateway for Containers, autoscaling capacity units, Log Analytics/KQL diagnostics, and v1-to-v2 migration gotchas.
Cloud governance gap filled. Cloud Custodian — Policy-as-Code Governance Across Multi-Cloud now covers declarative multi-cloud remediation, AWS/Azure examples, and production operations.
On-prem multi-cluster track complete. Four new modules in On-Premises Multi-Cluster: Gardener (open-source Kubernetes-as-a-Service with a three-tier Gardens/Seeds/Shoots architecture), Karmada + Liqo + kube-vip (federation and virtual-IP advertisement for on-prem clusters), OpenStack on Kubernetes (running OpenStack’s own control plane as Kubernetes workloads), and VMware Tanzu (the full Tanzu portfolio — TKG, vSphere with Tanzu, TMC, TAP — plus an honest look at enterprise decisions after the Broadcom acquisition).
ML model-serving toolkit. KServe, Seldon Core, and BentoML now have full modules — covering inference graphs, A/B experiments, and Python-first model packaging. The new bare-metal MLOps capstone shows how to wire all three into a complete production stack on your own hardware, from GPU scheduling and storage through to observability.
Machine Learning track — Tier-1 layer complete (twelve modules). The Machine Learning section now has a complete twelve-module Tier-1 layer (ten new modules plus two existing modules renumbered into slots 1.6 and 1.12): scikit-learn API and Pipelines, regression with regularization, evaluation, leakage, and calibration, feature engineering, decision trees and random forests, k-NN, Naive Bayes, and SVMs, clustering, anomaly detection, dimensionality reduction, and hyperparameter optimization.
Machine Learning track — seven advanced modules. The Tier-2 layer tackles the harder questions you hit in production: class imbalance and cost-sensitive learning, interpretability and failure slicing, Bayesian ML with PyMC, recommender systems, conformal prediction and uncertainty quantification, fairness and bias auditing, and causal inference for ML practitioners.
Deep Learning and Reinforcement Learning extensions. Two new deep learning modules: self-supervised learning (SimCLR, DINO, MAE — when to use them versus supervised pretraining) and graph neural networks (GCN, GraphSAGE, GAT — with an honest look at when a plain MLP still wins). The new Reinforcement Learning section opens with practitioner foundations and offline RL and imitation learning.

April 2026

AI History Book — Part 1. The first nine chapters of the AI History book are live. Part 1 covers AI’s mathematical foundations from the 1840s to the 1950s — Boole, Turing, Shannon, McCulloch–Pitts, and the cybernetics movement. Each chapter includes a cast of characters, timeline, glossary, and a “Why this still matters” note. Start with Chapter 1.
New AI track. A beginner-friendly AI track covers AI literacy and practical working habits — what LLMs are, prompting, verification, privacy, and using AI for learning, writing, research, and coding. A gentler entry point than the advanced AI/ML Engineering path.
Local-first AI/ML path. Ten new modules so you can build a working RAG system or fine-tune a model on a single home GPU, no cloud account required. Covers environment setup (CUDA/ROCm), home-scale RAG, local inference stacks, single-GPU and multi-GPU fine-tuning, and home AI operations. See the full path.
Hub pages redesigned. Platform Engineering and Kubernetes Certifications now open with persona routes — SRE, DevEx Builder, Platform Architect; Operator, Developer, Security Specialist — so you find the right starting point immediately. Bridge pages link K8s to On-Premises, K8s to Platform Engineering, and AI/ML to AI Platform Engineering.
Certification prep expanded. New exam-prep modules for LFCS, CNPE, CNPA, and CGOA added to the Certifications track.

March 2026

Site migrated to Starlight. The build now takes seconds instead of minutes. Broken links from the old site have been cleaned up.
New site design. Custom homepage, a sidebar that follows your current track, breadcrumbs, complexity and time chips, dark/light mode, and a Mark-Complete button with an exportable progress dashboard.
Linux Deep Dive promoted to its own top-level track — 37 modules, no longer buried under Fundamentals.
On-Premises Kubernetes — 30 new modules. A complete bare-metal track: planning and economics, provisioning (PXE, Talos, Flatcar, Sidero, Metal3), networking (spine-leaf, BGP, MetalLB, kube-vip), storage (Ceph/Rook), multi-cluster, security (air-gapped, HSM/TPM, AD/LDAP/OIDC), operations, and resilience (multi-site DR, hybrid connectivity, cloud repatriation). Start here.
New Platform Engineering disciplines. Five new Networking modules (CNI Architecture, Network Policy, Service Mesh, Ingress and Gateway API, Multi-Cluster Networking), five new Platform Leadership modules (Building Platform Teams, Developer Experience Strategy, Platform as Product, Adoption and Migration, Scaling Platform Organizations), and a four-section Supply Chain Defense guide covering transitive dependency auditing, registry quarantine, AI gateway security, and credential rotation.
Ecosystem updates. Zero to Terminal has 10 beginner modules. Ukrainian translation covers 115+ pages across Prerequisites, CKA, and CKAD. All content aligned with Kubernetes 1.35. Platform Engineering Toolkit expanded with FinOps, Kyverno, Chaos Engineering, Operators, CAPI, vCluster, and GPU Scheduling. All 21 CNCF certification learning paths are in the sidebar.