Skip to content

Module 1.1: What is Systems Thinking?

Complexity: [MEDIUM] | Time: 35-45 minutes | Prerequisites: None (entry point to Platform track)

After completing this module, you will be able to:

  1. Analyze a system failure by mapping interconnected components rather than investigating services in isolation
  2. Explain how emergent properties arise from component interactions and why they cannot be predicted from individual parts alone
  3. Apply systems thinking frameworks (stocks-and-flows, boundaries, leverage points) to real infrastructure architectures
  4. Diagnose incidents by zooming out to identify upstream and downstream dependencies before restarting individual services

Tuesday, 2:47 AM. The on-call engineer’s phone explodes with alerts.

“Payment service latency increased.” Standard stuff. They check the dashboards—CPU at 23%, memory looks fine, no error spikes in the logs. Weird.

They do what any reasonable engineer would do at 3 AM: restart the service. Latency drops immediately. Back to sleep.

3:52 AM. Same alert. Latency is back.

Restart again. Works again. Sleep again.

4:23 AM. Again.

Now they’re annoyed. They add more replicas. Check the code for memory leaks. Review recent deployments. Nothing. The service is fine. But the system is broken.

At 6 AM, exhausted and frustrated, they finally do what should have been done hours earlier: zoom out. Instead of staring at the payment service, they look at what’s around it.

That’s when they see it. A batch job—completely unrelated to payments—started running at 2:30 AM. It’s processing end-of-day reports. It’s hammering the shared database with massive analytical queries. Those queries are holding locks that block the payment service’s small, fast transactions.

The payment service was perfect. It was the victim of a system-level problem that couldn’t be seen by looking at individual components.

This is systems thinking. The ability to see the whole, not just the parts. To understand that behavior emerges from relationships, not components. To stop playing whack-a-mole with symptoms and start solving actual problems.

This scenario plays out in engineering teams every day. This module will teach you how to avoid it.


  • Why looking at components in isolation is a trap
  • How behavior emerges from interactions (not from parts)
  • The iceberg model for seeing below the surface
  • The vocabulary that will change how you troubleshoot
  • How to apply this to your next incident (starting tonight)

Part 1: The Problem with Component Thinking

Section titled “Part 1: The Problem with Component Thinking”

Here’s an analogy that helped me understand this:

A mechanic fixes cars by testing parts until they find the broken one. Alternator dead? Replace it. Brake pads worn? New ones. This works because cars are complicated but not complex—the same input always produces the same output.

An engineer understands how the car actually works. They know that a weak alternator doesn’t just fail—it causes the battery to drain, which causes the car to run lean, which damages the catalytic converter, which triggers the check engine light. They see the cascade.

Most ops teams are mechanics. They replace parts (restart services, scale pods, rollback deployments) until the alert goes away. Sometimes that’s enough. But for complex systems, you need to be an engineer.

THE MECHANIC VS THE ENGINEER
═══════════════════════════════════════════════════════════════════
MECHANIC APPROACH:
─────────────────────────────────────────────────────────────────
Alert fires
Check the broken thing
Restart it
Alert clears → Done! (until next time)
ENGINEER APPROACH:
─────────────────────────────────────────────────────────────────
Alert fires
What's the pattern? When did this start?
What changed in the system? What's connected?
What's the actual cause? (Often not where the alert fired)
Fix the cause → Problem actually solved

A system is a set of interconnected elements organized to achieve a purpose.

Three key parts:

PartWhat It IsExample
ElementsThe things you can point atPods, services, databases, queues
InterconnectionsHow elements affect each otherNetwork calls, shared resources, data flows
PurposeWhy the system existsProcess payments, serve users, store data

Here’s the crucial insight: you can understand every element perfectly and still not understand the system.

COMPONENT VIEW SYSTEMS VIEW
─────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Service │ │ Service │◀──────┐ │
│ │ A │ │ A │ │ │
│ └─────────┘ └────┬────┘ │ │
│ │ │ │
│ ┌─────────┐ ┌────▼────┐ │ Feedback│
│ │ Service │ ──▶ │ Service │───────┤ Loops │
│ │ B │ │ B │ │ │
│ └─────────┘ └────┬────┘ │ │
│ │ │ │
│ ┌─────────┐ ┌────▼────┐ │ │
│ │Database │ │Database │───────┘ │
│ └─────────┘ └─────────┘ │
│ │
│ "Three healthy boxes" "A system with behavior" │
│ (tells you nothing) (tells you everything) │
│ │
└─────────────────────────────────────────────────────────────────┘

💡 Did You Know? The word “system” comes from Greek systema, meaning “organized whole.” The ancient Greeks understood something we keep forgetting: the whole is fundamentally different from the sum of its parts. Aristotle wrote about this 2,400 years ago. We’re still learning the same lesson.

Emergence is when a system exhibits properties that none of its individual parts possess.

This is the most important concept in this entire module. Read it again.

Your brain is made of neurons. No single neuron is conscious—it’s just an electrochemical switch. But 86 billion of them connected in the right way, and suddenly you’re reading this sentence and thinking about it. Consciousness emerges.

In distributed systems:

  • Individual service metrics: Service A: 50ms, Service B: 40ms, Service C: 30ms
  • System behavior: p99 latency of 2000ms, random timeouts, cascade failures

Where did the 2000ms come from? Where did the cascades come from? Not from any individual service. They emerged from the interactions—retries that amplify load, connection pools that exhaust, locks that contend.

EMERGENCE: THE 50ms SERVICES THAT CREATE 2-SECOND LATENCY
═══════════════════════════════════════════════════════════════════
Individual service latencies:
───────────────────────────────────────────────────────────────────
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Service A │ │ Service B │ │ Service C │
│ 50ms avg │ │ 40ms avg │ │ 30ms avg │
│ Looking │ │ Looking │ │ Looking │
│ healthy! │ │ healthy! │ │ healthy! │
└─────────────┘ └─────────────┘ └─────────────┘
What the dashboards show: "Everything is fine ✅"
Actual system behavior:
───────────────────────────────────────────────────────────────────
Request hits A → A calls B → B times out → A retries
→ B finally responds → A calls C → C is slow because
B's retry exhausted the shared connection pool → C times out
→ Request fails → User retries → Adds more load → Everything
gets worse
What users experience: "Your system is broken 🔥"
The 2-second latency doesn't exist in any component.
It EMERGES from their interaction.

Thought Exercise (2 minutes)

Think of a behavior in your system that only exists when components interact.

Examples:

  • Shopping cart totals (product service + cart service + pricing rules)
  • Search ranking (search service + recommendation engine + user history)
  • Cascading failures (any service + retry logic + shared resources)

Where does that behavior “live”? Not in any single service’s code.

Reductionism is the scientific approach of understanding something by breaking it into parts and studying each part separately.

It works brilliantly for complicated machines. Want to understand a car engine? Take it apart. Study each piece. Reassemble. Done.

It fails catastrophically for complex systems. Here’s why:

AspectComplicated (car engine)Complex (distributed system)
BehaviorPredictable from partsEmergent, surprising
Cause & EffectLinear, traceableCircular, networked
AnalysisTake apart, study piecesMust observe whole in motion
FixingReplace broken partChange relationships
Same inputSame outputDifferent output each time

This is why “works on my machine” is such a meme. Your laptop isn’t the system. The system includes the network, other services, the database state, the load from other users, and a hundred other interacting factors.

The Optimization Trap

Here’s a counterintuitive truth: optimizing individual components often makes the system worse.

Example: You make Service A 10x faster. Congratulations! Now it hammers the database 10x harder, causing lock contention that slows Services B and C. Global latency increases. Users are angrier than before.

This is called suboptimization—winning locally while losing globally. It’s one of the most common mistakes in distributed systems.


Most troubleshooting happens at the surface level. An alert fires, we react. Another alert, another reaction. We’re playing an endless game of whack-a-mole.

The iceberg model teaches us to look deeper:

THE ICEBERG MODEL
═══════════════════════════════════════════════════════════════════
≈≈≈≈≈≈ VISIBLE (what we react to) ≈≈≈≈≈≈
╔═══════════════════════════════════════════════════╗
║ EVENTS ║
║ "Payment service is slow right now" ║
║ ║
║ Response: Restart it, scale it up ║
║ Mindset: REACTIVE ║
╠═══════════════════════════════════════════════════╣
║ ║
║ PATTERNS ║
║ "Payment is slow every Monday 6-9 AM" ║
║ ║
║ Response: Create runbook, schedule extra pods ║
║ Mindset: ADAPTIVE ║
╠═══════════════════════════════════════════════════╣
║ ║
║ STRUCTURES ║
║ "Batch job shares database with payments" ║
║ "No resource isolation between workloads" ║
║ ║
║ Response: Separate resources, add limits ║
║ Mindset: REDESIGN ║
╠═══════════════════════════════════════════════════╣
║ ║
║ MENTAL MODELS ║
║ "All workloads can safely share infrastructure" ║
║ "Batch jobs don't affect real-time traffic" ║
║ ║
║ Response: New policies, architecture reviews ║
║ Mindset: TRANSFORM ║
╚═══════════════════════════════════════════════════╝
THE DEEPER YOU GO,
THE MORE LEVERAGE YOU HAVE

Event level (what happened?):

  • “The payment service is slow right now.”
  • Response: Restart, scale up, add more resources
  • Limitation: You’ll be doing this forever

Pattern level (what’s been happening?):

  • “This happens every Monday morning.”
  • Response: Schedule extra capacity, create runbooks
  • Limitation: You’re managing the problem, not solving it

Structure level (what’s causing this?):

  • “The batch job and payment service share a database with no isolation.”
  • Response: Resource quotas, separate databases, connection pooling
  • Limitation: Fixes this problem, but similar ones will appear

Mental model level (what beliefs allow this to exist?):

  • “We assumed production workloads don’t need resource isolation.”
  • Response: New architecture principles, design review processes
  • Impact: Prevents entire categories of problems

War Story: The Monday Mystery (Extended Cut)

Section titled “War Story: The Monday Mystery (Extended Cut)”

I want to tell you about the incident that taught me the iceberg model—the hard way.

A fintech company had “random” payment failures every Monday. They weren’t truly random, of course, but nobody had connected the dots. Every Monday morning, the on-call engineer would get paged, restart some services, add some pods, and things would stabilize. They wrote a runbook. They scheduled extra capacity for Mondays. They were world-class at managing the symptom.

For eight months.

One day, a new engineer—fresh out of university, hadn’t learned to accept dysfunction yet—asked an innocent question:

“Why only Mondays?”

The senior engineers looked at each other. Nobody knew. They’d always just… dealt with it.

The new engineer started digging. She pulled metrics from the past year. She correlated payment failures with everything she could find: deployment schedules, traffic patterns, marketing campaigns, infrastructure changes.

And there it was. At exactly 2:30 AM every Sunday night, CPU usage on one database server spiked. By 6 AM Monday, the spike ended—but the damage was done. Connection pools exhausted. Queries backed up. Payment timeouts cascading.

The culprit? A “Weekly Analytics Summary” batch job. Created two years ago. When the company was small. Processing a few thousand transactions. Now processing millions. What used to take 30 minutes now took 4 hours—and it was still growing.

Nobody owned this job anymore. The engineer who wrote it had left. It ran on the same database as real-time payments because, at the time, “it’s just a small report.”

The Fix:

LevelWhat They Did
Event(What they’d been doing) Restart services, add pods
PatternAdded Monday runbook, scheduled extra capacity
StructureMoved batch job to replica database, added connection pool limits
Mental ModelNew policy: “No analytical workloads on transactional databases. Ever.”

The Monday pages stopped. Not because they got better at responding—because they eliminated the cause.

Total time to fix once they understood the problem: 2 hours. Time they’d spent managing the symptom over 8 months: ~200 engineer-hours.

Lesson: The question “why only Mondays?” was worth hundreds of hours. The right question at the right level changes everything.


To see systems clearly, you need the right words. These terms will become essential to how you troubleshoot and communicate.

TermDefinitionExample
SystemInterconnected elements with a purposeYour entire production stack
BoundaryWhat’s in vs. out of the systemYour services vs. AWS infrastructure
StockAccumulation within the systemQueue depth, connection count, error budget
FlowRate of change to a stockRequests/second, pod creation rate
FeedbackWhen a system’s output influences its inputAutoscaler: high CPU → more pods → lower CPU
DelayTime between cause and effectMetric collection lag, autoscaler reaction time

The easiest way to understand stocks and flows is to think of a bathtub:

STOCKS AND FLOWS: THE BATHTUB
═══════════════════════════════════════════════════════════════════
INFLOW (faucet)
│ 10 liters/min
┌───────────────────┐
│ │
│ STOCK: Water │
│ (liters in tub) │
│ │
│ Current: 50L │
│ │
└────────┬──────────┘
▼ 8 liters/min
OUTFLOW (drain)
RULES:
─────────────────────────────────────────────────────────────────
• If INFLOW > OUTFLOW → Stock rises (tub fills up)
• If INFLOW < OUTFLOW → Stock falls (tub drains)
• If INFLOW = OUTFLOW → Stock stable (water level constant)
Right now: 10 in, 8 out → Tub is filling at 2L/min → Overflow coming!

Now apply this to your systems:

STOCKS AND FLOWS: REQUEST QUEUE
═══════════════════════════════════════════════════════════════════
INFLOW: Incoming requests
│ 1000 req/sec
┌───────────────────┐
│ │
│ STOCK: Queue │
│ depth │
│ │
│ Current: 5000 │
│ requests │
│ │
└────────┬──────────┘
▼ 800 req/sec
OUTFLOW: Processed requests
You can't fix latency by looking at inflow or outflow alone.
The STOCK determines latency. 5000 queued ÷ 800/sec = 6.25 seconds.
The only ways to reduce latency:
1. Reduce inflow (rate limiting, shedding load)
2. Increase outflow (more capacity, faster processing)
3. Accept the backlog will drain eventually (if traffic drops)

Delays are everywhere in distributed systems:

DelayTypical Duration
Metric collection10-60 seconds
Alerting pipeline30-120 seconds
Autoscaler reaction1-5 minutes
DNS propagationSeconds to hours
Human responseMinutes to hours
Rolling deploymentMinutes to hours

Why delays matter: They cause oscillation and overshoot.

AUTOSCALER OSCILLATION: A TRAGEDY IN THREE ACTS
═══════════════════════════════════════════════════════════════════
Target
ACT 1: The Spike │
───────────────────────────────── │
Load: ═══════════════╗ │
║ │
Pods: ┌─────────╨────┐ │
│ │ │
│ Autoscaler: │ │
│ "Need more │ │
│ pods!" │ │
└──────────────┘ │
ACT 2: The Overshoot │
───────────────────────────────── │
Load: ══════════════════════════╗ │
║ │
║ │
Pods: ══════════════════════╬═══ │ ← Way too many pods!
║ │
New pods finally ready! │
But load already dropped. │
ACT 3: The Oscillation │
───────────────────────────────── │
Load: ╱╲ ╱╲ ╱╲ │
╱ ╲ ╱ ╲ ╱ ╲ │
────────╱────╲╱────╲╱────╲─────────────│── Target
╱ ╲ │
╱ ╲ │
Delay → Overshoot → Undershoot → Repeat
THE FIX: Account for delays. Scale gradually. Use predictive scaling.
The delay isn't a bug—it's physics. Design around it.

💡 Did You Know? The famous “thundering herd” problem is a delay-induced catastrophe. A cache expires. All requests hit the database simultaneously. The database slows down. Requests time out and retry. More load. More timeouts. More retries. The delay between cache miss and successful refill creates a feedback loop that amplifies the original problem exponentially.


When troubleshooting or designing systems, systems thinkers ask different questions:

Normal QuestionSystems Thinking Question
”Which service is broken?""What changed in the system as a whole?"
"Who deployed what?""What feedback loops are active?"
"What’s the error?""What’s the pattern over time?"
"How do I fix this?""What structure enables this problem?"
"Is this service healthy?""Is the system achieving its purpose?”

A Systems Thinking Troubleshooting Session

Section titled “A Systems Thinking Troubleshooting Session”

Scenario: Users report intermittent slowness. Dashboards show all services green.

Component approach (what most people do):

  1. Check each service’s CPU, memory, errors
  2. Everything looks fine
  3. Blame the network
  4. Add more logging
  5. Wait for it to happen again
  6. Still confused

Systems approach:

Step 1: MAP THE SYSTEM
───────────────────────────────────────────────────────────────────
Draw connections, not just boxes. What calls what? What shares what?
┌────────────────────────────────────────────────┐
│ │
User ───▶│ API ──▶ Cache ──┬──▶ Service A ──┐ │
│ │ │ │ │
│ │ └──▶ Service B ──┼──▶ DB │
│ │ │ │ │
│ └───────────────▶ Service C ──┘ │ │
│ │ │ │
│ └───────────┘ │
│ │
│ ↑ Both B and C share DB connection │
│ pool. Did we check pool exhaustion? │
└────────────────────────────────────────────────┘
Step 2: IDENTIFY FEEDBACK LOOPS
───────────────────────────────────────────────────────────────────
• Cache misses → DB load → Slower queries → More timeouts →
More retries → More DB load → ... (Reinforcing loop!)
• Rate limiter → Rejected requests → Less load → Faster response →
Rate limiter allows more → ... (Balancing loop ✓)
Step 3: LOOK FOR STOCKS
───────────────────────────────────────────────────────────────────
• Queue depths? Growing.
• Connection pool? 100% utilized!
• Error budget? Almost gone.
Step 4: CHECK THE DELAYS
───────────────────────────────────────────────────────────────────
• How old are these metrics? 60 seconds old.
• Autoscaler cooldown? 5 minutes.
• When did the pattern start? Yesterday at 3 PM.
Step 5: GO DEEPER THAN EVENTS
───────────────────────────────────────────────────────────────────
• Is this a pattern? Yes—happens during peak hours.
• What structure enables it? Shared DB connection pool with no limits.
• What mental model? "Services are independent."

Root cause found: Service B and C share a database connection pool. During peak load, Service B takes all connections for a slow analytics query. Service C starves. Timeouts cascade.

Fix: Per-service connection pool limits.


MistakeWhy It HurtsWhat To Do Instead
Treating symptomsProblem recurs, wastes timeUse iceberg model—go deeper
Optimizing componentsCan make system worseOptimize for system-level goals
Ignoring delaysCauses oscillation, overshootMap delays explicitly
Tight system boundariesMiss external dependenciesInclude what affects behavior
Looking for THE root causeComplex systems have multiple causesLook for contributing factors
Assuming independenceServices affect each otherMap connections and shared resources

“A system’s behavior is best understood by studying its individual components in isolation.” True or false?

Show Answer

False. A system’s behavior emerges from the interactions between components, not the components themselves. You can have five healthy services that together create an unhealthy system. Studying parts in isolation misses the relationships, feedback loops, and emergent properties that determine actual behavior.

This is why “all services green” and “users unhappy” can coexist.

What is “emergence” and why does it matter for troubleshooting?

Show Answer

Emergence is when a system exhibits properties that none of its individual parts possess.

Why it matters for troubleshooting:

  • The 2-second latency doesn’t exist in any single service
  • Cascading failures don’t exist in any single service
  • The behavior you’re trying to fix might not be in any component
  • You have to look at interactions, not just parts

Examples: Traffic jams emerge from cars (no car is a jam). Consciousness emerges from neurons (no neuron thinks). Cascading failures emerge from microservices (no service is a cascade).

In the iceberg model, why is addressing mental models more impactful than addressing events?

Show Answer

Mental models are the beliefs and assumptions that shape the structures we build. Structures create patterns. Patterns produce events.

LevelLeverageExample Fix
EventsLowRestart service when it’s slow
PatternsMediumSchedule extra capacity for Mondays
StructuresHighAdd connection pool isolation
Mental ModelsHighest”All workloads must declare resource needs”

Changing mental models prevents entire categories of problems—not just the current incident, but all future incidents that would arise from the same flawed assumptions.

The Monday Mystery team spent 8 months managing events. Once they changed the mental model (“batch jobs don’t affect real-time traffic” → “all workloads must be isolated”), the problem and all similar future problems disappeared.

Why do delays in distributed systems cause oscillation?

Show Answer

Delays cause oscillation because by the time a corrective action takes effect, the original condition may have changed.

Example: Autoscaler sees high CPU → Adds pods → Pods take 3 minutes to start → By then, load dropped → Now we have too many pods → Autoscaler removes pods → Pods take 1 minute to terminate → By then, load increased → Now we have too few pods → Repeat.

Without delays, adjustments would be instant and smooth. Delays create the gap where overshoot and undershoot live.

Solutions:

  • Account for delays in your scaling algorithms
  • Use predictive scaling (ML-based) instead of reactive
  • Add dampening (don’t react to every fluctuation)
  • Accept some oscillation as normal, design for it

Part A: Observe Emergence in Kubernetes (15 minutes)

Section titled “Part A: Observe Emergence in Kubernetes (15 minutes)”

Objective: See how system behavior emerges from component interactions.

Terminal window
# Create namespace
kubectl create namespace systems-lab
# Deploy interconnected services
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: systems-lab
spec:
replicas: 2
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: backend
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: backend
namespace: systems-lab
spec:
selector:
app: backend
ports:
- port: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: systems-lab
spec:
replicas: 2
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: curlimages/curl:latest
command: ["/bin/sh", "-c"]
args:
- |
while true; do
curl -s -o /dev/null -w "%{http_code}\n" http://backend/
sleep 1
done
EOF
# Watch the system
kubectl get pods -n systems-lab -w

Now break something and watch the system respond:

Terminal window
# In another terminal, kill a backend pod
kubectl delete pod -n systems-lab -l app=backend \
$(kubectl get pod -n systems-lab -l app=backend -o jsonpath='{.items[0].metadata.name}') \
--wait=false

What to observe:

  • Frontend continues working (load balances to surviving backend)
  • New backend pod is created automatically
  • System self-heals without human intervention

This is emergence. The self-healing behavior doesn’t exist in any individual pod. It emerges from the interactions between Deployment controller, ReplicaSet, Service, and kube-proxy.

Clean up:

Terminal window
kubectl delete namespace systems-lab

Part B: Apply the Iceberg Model (20 minutes)

Section titled “Part B: Apply the Iceberg Model (20 minutes)”

Pick a recurring issue in your environment (or use this hypothetical):

“The checkout page is slow during sales events.”

Apply the iceberg model:

LevelAnalysis
EventWhat happens?
PatternWhen? How often? What correlates?
StructureWhat architecture/config enables this?
Mental ModelWhat assumption allowed this structure?

Example answer:

LevelAnswer
EventCheckout timeouts during Black Friday
PatternHappens every sale event. Correlates with >10x traffic.
StructureCheckout service calls inventory synchronously. No caching. Single DB.
Mental Model”Real-time inventory is always required” (but is it for checkout display?)

Success Criteria:

  • Observed pod deletion and automatic recovery in Part A
  • Can explain what “emergence” you witnessed
  • Completed iceberg analysis for Part B
  • Identified at least one mental model that enables the problem

  • Systems thinking was born in biology, not engineering. Ludwig von Bertalanffy developed General Systems Theory in the 1930s to understand living organisms as integrated wholes, not collections of parts.

  • The Apollo program was one of the first engineering projects to formally apply systems thinking. With 2 million parts and 400,000 people, NASA couldn’t understand the spacecraft by studying components—they had to see the whole.

  • W. Edwards Deming, the quality management guru, estimated that 94% of problems are caused by the system, not the individual workers. When something goes wrong, the structure almost always matters more than the person.

  • Jeff Bezos attributes Amazon’s success to “working backwards from the customer”—a systems thinking approach. Instead of building components and hoping they create good outcomes, he defines the desired system behavior first.


  • “Thinking in Systems: A Primer” by Donella Meadows — The foundational text. Readable, profound, and changed how I see everything.

  • “How Complex Systems Fail” by Richard Cook — An 18-point paper that every engineer should read. Takes 10 minutes. Will change your career.

  • “The Fifth Discipline” by Peter Senge — Systems thinking applied to organizations. Explains why your team keeps having the same problems.

  • “Drift into Failure” by Sidney Dekker — How systems gradually drift toward catastrophe while every individual decision seems reasonable.


Module 1.2: Feedback Loops — Understanding reinforcing and balancing feedback, and why your autoscaler sometimes makes things worse.


“To understand is to perceive patterns.” — Isaiah Berlin

“You can’t understand a system by taking it apart. You can only understand it by seeing it in motion.” — Adapted from Russell Ackoff