Observability Theory

Foundation Track | 4 Modules | ~2 hours total

The science of understanding system behavior from external outputs. Theory and principles that apply regardless of which tools you use.

Why Observability Theory?

You can’t fix what you can’t see. But observability is more than just seeing—it’s about understanding.

Observability theory teaches you to:

Distinguish monitoring (known-unknowns) from observability (unknown-unknowns)
Correlate signals across logs, metrics, and traces
Instrument systems for debuggability, not just alerting
Transform data into actionable insight

This isn’t about installing tools. It’s about building the mental models that make those tools useful.

Modules

#	Module	Time	Description
3.1	What is Observability?	25-30 min	Control theory origins, monitoring vs observability
3.2	The Three Pillars	30-35 min	Logs, metrics, traces, and correlation
3.3	Instrumentation Principles	30-35 min	What to instrument, patterns, context propagation
3.4	From Data to Insight	35-40 min	Alerting, debugging workflows, mental models

Learning Path

START HERE
    │
    ▼
┌─────────────────────────────────────┐
│  Module 3.1                         │
│  What is Observability?             │
│  └── Control theory origins         │
│  └── Monitoring vs observability    │
│  └── The observability equation     │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│  Module 3.2                         │
│  The Three Pillars                  │
│  └── Logs: Events over time         │
│  └── Metrics: Aggregated numbers    │
│  └── Traces: Request journeys       │
│  └── Correlation: The fourth pillar │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│  Module 3.3                         │
│  Instrumentation Principles         │
│  └── What to measure                │
│  └── Where to instrument            │
│  └── Context propagation            │
│  └── The cost of observability      │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│  Module 3.4                         │
│  From Data to Insight               │
│  └── Alerting philosophy            │
│  └── Debugging workflows            │
│  └── Dashboard design               │
│  └── Mental models                  │
└──────────────────┬──────────────────┘
                   │
                   ▼
              COMPLETE
                   │
    ┌──────────────┼──────────────┐
    │              │              │
    ▼              ▼              ▼
 Security       SRE          Observability
Principles   Discipline        Toolkit

Key Concepts You’ll Learn

Concept	Module	What It Means
Observability	3.1	Ability to understand internal state from external outputs
Cardinality	3.1, 3.3	Number of unique values a dimension can have
Structured Logging	3.2	Machine-parseable log format (JSON)
Log Levels	3.2	ERROR, WARN, INFO, DEBUG hierarchy
Metric Types	3.2	Counter, gauge, histogram, summary
Spans	3.2	Individual operations within a trace
Trace Context	3.2, 3.3	Metadata that flows through distributed calls
RED Method	3.3	Rate, Errors, Duration for services
USE Method	3.3	Utilization, Saturation, Errors for resources
Golden Signals	3.3	Latency, traffic, errors, saturation
Context Propagation	3.3	Passing correlation data across boundaries
Signal-to-Noise	3.4	Ratio of useful alerts to total alerts
Alert Fatigue	3.4	Desensitization from too many alerts

Prerequisites

Required: Systems Thinking Track — Understanding feedback loops and emergence
Recommended: Reliability Engineering Track — SLIs, SLOs, error budgets
Helpful: Experience running any production system
Helpful: Basic understanding of HTTP and distributed systems

Where This Leads

After completing Observability Theory, you’re ready for:

Track	Why
Security Principles	Security monitoring uses same concepts
SRE Discipline	Put observability into SRE practice
Observability Toolkit	Learn specific tools (Prometheus, Grafana, OTel)
Platform Engineering	Build observability into your platform

Key Resources

Books referenced throughout this track:

“Observability Engineering” — Charity Majors, Liz Fong-Jones, George Miranda
“Distributed Systems Observability” — Cindy Sridharan
“Site Reliability Engineering” — Google (Chapters 4-6)
“The Art of Monitoring” — James Turnbull

Standards and Specifications:

OpenTelemetry — opentelemetry.io
W3C Trace Context — w3.org/TR/trace-context
Prometheus Data Model — prometheus.io/docs/concepts/data_model

The Mental Shift

Traditional Monitoring	Modern Observability
What’s broken?	Why is it broken?
Dashboard watching	Hypothesis exploration
Known failure modes	Novel failure modes
Alert on symptoms	Understand root causes
More metrics = better	Right metrics = better
Tools-first	Questions-first

“Observability is not about logs, metrics, and traces. It’s about being able to ask arbitrary questions about your system without having to know in advance what questions you’ll need to ask.”

— Charity Majors