Skip to content

Observability Theory

Foundation Track | 4 Modules | ~2 hours total

The science of understanding system behavior from external outputs. Theory and principles that apply regardless of which tools you use.


You can’t fix what you can’t see. But observability is more than just seeing—it’s about understanding.

Observability theory teaches you to:

  • Distinguish monitoring (known-unknowns) from observability (unknown-unknowns)
  • Correlate signals across logs, metrics, and traces
  • Instrument systems for debuggability, not just alerting
  • Transform data into actionable insight

This isn’t about installing tools. It’s about building the mental models that make those tools useful.


#ModuleTimeDescription
3.1What is Observability?25-30 minControl theory origins, monitoring vs observability
3.2The Three Pillars30-35 minLogs, metrics, traces, and correlation
3.3Instrumentation Principles30-35 minWhat to instrument, patterns, context propagation
3.4From Data to Insight35-40 minAlerting, debugging workflows, mental models

START HERE
┌─────────────────────────────────────┐
│ Module 3.1 │
│ What is Observability? │
│ └── Control theory origins │
│ └── Monitoring vs observability │
│ └── The observability equation │
└──────────────────┬──────────────────┘
┌─────────────────────────────────────┐
│ Module 3.2 │
│ The Three Pillars │
│ └── Logs: Events over time │
│ └── Metrics: Aggregated numbers │
│ └── Traces: Request journeys │
│ └── Correlation: The fourth pillar │
└──────────────────┬──────────────────┘
┌─────────────────────────────────────┐
│ Module 3.3 │
│ Instrumentation Principles │
│ └── What to measure │
│ └── Where to instrument │
│ └── Context propagation │
│ └── The cost of observability │
└──────────────────┬──────────────────┘
┌─────────────────────────────────────┐
│ Module 3.4 │
│ From Data to Insight │
│ └── Alerting philosophy │
│ └── Debugging workflows │
│ └── Dashboard design │
│ └── Mental models │
└──────────────────┬──────────────────┘
COMPLETE
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
Security SRE Observability
Principles Discipline Toolkit

ConceptModuleWhat It Means
Observability3.1Ability to understand internal state from external outputs
Cardinality3.1, 3.3Number of unique values a dimension can have
Structured Logging3.2Machine-parseable log format (JSON)
Log Levels3.2ERROR, WARN, INFO, DEBUG hierarchy
Metric Types3.2Counter, gauge, histogram, summary
Spans3.2Individual operations within a trace
Trace Context3.2, 3.3Metadata that flows through distributed calls
RED Method3.3Rate, Errors, Duration for services
USE Method3.3Utilization, Saturation, Errors for resources
Golden Signals3.3Latency, traffic, errors, saturation
Context Propagation3.3Passing correlation data across boundaries
Signal-to-Noise3.4Ratio of useful alerts to total alerts
Alert Fatigue3.4Desensitization from too many alerts

  • Required: Systems Thinking Track — Understanding feedback loops and emergence
  • Recommended: Reliability Engineering Track — SLIs, SLOs, error budgets
  • Helpful: Experience running any production system
  • Helpful: Basic understanding of HTTP and distributed systems

After completing Observability Theory, you’re ready for:

TrackWhy
Security PrinciplesSecurity monitoring uses same concepts
SRE DisciplinePut observability into SRE practice
Observability ToolkitLearn specific tools (Prometheus, Grafana, OTel)
Platform EngineeringBuild observability into your platform

Books referenced throughout this track:

  • “Observability Engineering” — Charity Majors, Liz Fong-Jones, George Miranda
  • “Distributed Systems Observability” — Cindy Sridharan
  • “Site Reliability Engineering” — Google (Chapters 4-6)
  • “The Art of Monitoring” — James Turnbull

Standards and Specifications:

  • OpenTelemetry — opentelemetry.io
  • W3C Trace Context — w3.org/TR/trace-context
  • Prometheus Data Model — prometheus.io/docs/concepts/data_model

Traditional MonitoringModern Observability
What’s broken?Why is it broken?
Dashboard watchingHypothesis exploration
Known failure modesNovel failure modes
Alert on symptomsUnderstand root causes
More metrics = betterRight metrics = better
Tools-firstQuestions-first

“Observability is not about logs, metrics, and traces. It’s about being able to ask arbitrary questions about your system without having to know in advance what questions you’ll need to ask.”

— Charity Majors