Platform Engineering
Principles, practices, and disciplines for running production systems on Kubernetes.
Kubernetes certifications teach you how to use Kubernetes. This track teaches you how to run production systems — the theory, disciplines, and leadership that separate operators from practitioners.
This is for people who:
- Have Kubernetes fundamentals (or certifications)
- Want to understand theory, not just tools
- Need to make technology decisions at work
- Want to implement best practices, not just pass exams
Looking for tool-specific guides? See Cloud Native Tools.
Structure
Section titled “Structure”platform/├── foundations/ # Theory that doesn't change (32 modules)│ ├── systems-thinking/ # Mental models for complex systems│ ├── reliability-engineering/ # Failure theory, redundancy, risk│ ├── observability-theory/ # What to measure and why│ ├── security-principles/ # Zero trust, threat modeling│ ├── distributed-systems/ # CAP, consensus, consistency│ ├── advanced-networking/ # Network theory, protocols, design│ └── engineering-leadership/ # Technical leadership, org design│└── disciplines/ # Applied practices (81 modules) ├── core-platform/ │ ├── sre/ # Operations, reliability, on-call │ ├── platform-engineering/ # Developer experience, self-service │ └── platform-leadership/ # Strategy, adoption, evangelism │ ├── delivery-automation/ │ ├── release-engineering/ # Build, release, deploy lifecycle │ ├── gitops/ # Deployment, reconciliation │ └── iac/ # IaC patterns, testing, drift │ ├── reliability-security/ │ ├── networking/ # Network architecture, policy │ ├── chaos-engineering/ # Failure injection, resilience │ └── devsecops/ # Security integration, compliance │ ├── data-ai/ │ ├── data-engineering/ # Pipelines, streaming, storage │ ├── mlops/ # ML lifecycle, model serving │ ├── aiops/ # AI-driven operations │ └── ai-infrastructure/ # GPU scheduling, model hosting │ └── business-value/ └── finops/ # Cloud cost optimizationReading Order
Section titled “Reading Order”Start with Foundations
Section titled “Start with Foundations”Theory that applies everywhere. Read these first — they don’t change.
| Track | Why Start Here |
|---|---|
| Systems Thinking | Mental models for complex systems |
| Reliability Engineering | Failure theory, redundancy, risk |
| Distributed Systems | CAP, consensus, consistency |
| Observability Theory | What to measure and why |
| Security Principles | Zero trust, threat modeling |
| Advanced Networking | Network theory, protocols, design |
| Engineering Leadership | Technical leadership, org design |
Then Pick a Discipline
Section titled “Then Pick a Discipline”Applied practices — how to do the work.
Core Platform
Section titled “Core Platform”| Discipline | Modules | Best For |
|---|---|---|
| SRE | 7 | Operations, reliability, on-call |
| Platform Engineering | 6 | Developer experience, self-service |
| Platform Leadership | 5 | Strategy, adoption, evangelism |
Delivery & Automation
Section titled “Delivery & Automation”| Discipline | Modules | Best For |
|---|---|---|
| Release Engineering | 5 | Build, release, deploy lifecycle |
| GitOps | 6 | Deployment, reconciliation |
| Infrastructure as Code | 6 | IaC patterns, testing, drift management |
Reliability & Security
Section titled “Reliability & Security”| Discipline | Modules | Best For |
|---|---|---|
| Networking | 5 | Network architecture, policy, design |
| Chaos Engineering | 5 | Failure injection, resilience |
| DevSecOps | 7 | Security integration, compliance |
Data & AI
Section titled “Data & AI”| Discipline | Modules | Best For |
|---|---|---|
| Data Engineering | 6 | Pipelines, streaming, storage |
| MLOps | 6 | ML lifecycle, model serving |
| AIOps | 6 | AI-driven operations, automation |
| AI Infrastructure | 6 | GPU scheduling, model hosting |
Business Value
Section titled “Business Value”| Discipline | Modules | Best For |
|---|---|---|
| FinOps | 6 | Cloud cost optimization |
Module Format
Section titled “Module Format”Every module includes:
- Why This Matters — Real-world motivation
- Theory — Principles and mental models
- Current Landscape — Tools that implement this
- Hands-On — Practical implementation
- Best Practices — What good looks like
- Common Mistakes — Anti-patterns to avoid
- Further Reading — Books, talks, papers
Status
Section titled “Status”| Section | Modules | Description |
|---|---|---|
| Foundations | 32 | 7 sections: Systems Thinking, Reliability Engineering, Observability Theory, Security Principles, Distributed Systems, Advanced Networking, Engineering Leadership |
| Disciplines | 81 | 14 disciplines across Core Platform, Delivery & Automation, Reliability & Security, Data & AI, and Business Value |
| Total | 113 |
Tool-specific implementation guides (96 modules) are in Cloud Native Tools.
Prerequisites
Section titled “Prerequisites”Before starting this track, you should have:
- Kubernetes basics (or completed Prerequisites)
- Some production experience (helpful but not required)
- Curiosity about “why” not just “how”
“Tools change. Principles don’t.”