Module 5.1: What Makes Systems Distributed

Цей контент ще не доступний вашою мовою.

Complexity: [MEDIUM]

Time to Complete: 25-30 minutes

Prerequisites: Systems Thinking Track (recommended)

Track: Foundations

What You’ll Be Able to Do

After completing this module, you will be able to:

Explain the eight fallacies of distributed computing and identify where each manifests in real cloud-native architectures
Analyze a system’s architecture to determine which components introduce distributed-system challenges (partial failure, network partitions, clock skew)
Design service boundaries that account for the inherent unreliability of network communication between components
Evaluate when a distributed architecture is justified versus when a simpler single-process design would be more reliable and maintainable

February 28, 2017. Amazon Web Services experiences what would become one of the most costly outages in cloud computing history.

A simple typo in a command to remove a small number of S3 servers accidentally removes a larger set of servers than intended. The servers being removed supported the S3 index and placement systems—the metadata layer that tracks where every object is stored.

For the next 4 hours, S3 in US-East-1 was essentially offline. But the real impact was just beginning. Hundreds of other AWS services depended on S3. Websites couldn’t load images. Applications couldn’t access configuration. CI/CD pipelines failed. The AWS status dashboard—itself hosted on S3—couldn’t display that S3 was down.

The cascade revealed what most engineers knew but rarely confronted: even the simplest web application is a distributed system, with hidden dependencies, partial failures, and emergent behaviors that no single person fully understands.

S&P 500 companies collectively lost an estimated $150 million during the outage. And it all started because distributed systems don’t behave like the single-machine programs most engineers learn to write.

This module teaches why distributed systems are fundamentally different—and why understanding their constraints is essential for building anything that scales.

Why This Module Matters

Every modern system is distributed. The moment you have a web server talking to a database, you have a distributed system. The moment you deploy to the cloud, you’re distributed across availability zones. The moment you scale beyond one machine, distribution becomes your reality.

But distributed systems don’t behave like single machines. Things that were easy become hard. Things that were guaranteed become probabilistic. Assumptions that held for decades suddenly break. The sooner you understand why, the sooner you can design systems that actually work.

This module introduces the fundamental challenges of distributed systems—the laws of physics and logic that constrain what’s possible, and the patterns that emerge when you can’t escape them.

The Space Station Analogy

Imagine Mission Control in Houston talking to astronauts on the International Space Station. Messages take time to travel. Signals can be lost. The astronauts can’t wait for permission before every action—they need autonomy. And Mission Control can’t know the exact state of the station at any instant. Every distributed system faces the same challenges: latency, unreliability, and uncertainty about remote state.

What You’ll Learn

What makes a system “distributed” (and why it matters)
The fundamental challenges: latency, partial failure, no global clock
The CAP theorem and what it really means
Why distributed systems require different thinking
How Kubernetes is a distributed system

Part 1: Defining Distributed Systems

1.1 What is a Distributed System?

DISTRIBUTED SYSTEM DEFINITION
═══════════════════════════════════════════════════════════════

A distributed system is one where:
- Components run on multiple networked computers
- Components coordinate by passing messages
- The system appears as a single coherent system to users

Leslie Lamport's definition:
"A distributed system is one in which the failure of a computer
you didn't even know existed can render your own computer unusable."

KEY PROPERTIES
─────────────────────────────────────────────────────────────
1. CONCURRENCY: Multiple things happen simultaneously
2. NO GLOBAL CLOCK: No single source of "now"
3. INDEPENDENT FAILURE: Parts can fail without others knowing
4. MESSAGE PASSING: Communication via network, not shared memory

1.2 Why Distribute?

REASONS TO DISTRIBUTE
═══════════════════════════════════════════════════════════════

SCALABILITY
─────────────────────────────────────────────────────────────
One machine has limits. Need more capacity? Add more machines.

    Single server: 10,000 requests/sec max
    10 servers: 100,000 requests/sec
    100 servers: 1,000,000 requests/sec

AVAILABILITY
─────────────────────────────────────────────────────────────
One machine will fail. Multiple machines can cover for each other.

    Single server: Server dies → System dies
    Multiple servers: Server dies → Others handle load

LATENCY
─────────────────────────────────────────────────────────────
Physics limits speed. Put data closer to users.

    US user → US server: 20ms
    US user → Europe server: 100ms
    US user → Asia server: 200ms

ORGANIZATIONAL
─────────────────────────────────────────────────────────────
Different teams, different services, different deployment cycles.

    Monolith: One team, one deployment, everyone waits
    Microservices: Independent teams, independent deploys

1.3 The Distribution Spectrum

DISTRIBUTION SPECTRUM
═══════════════════════════════════════════════════════════════

NOT DISTRIBUTED                                    FULLY DISTRIBUTED
      │                                                      │
      ▼                                                      ▼
┌──────────┐  ┌──────────────┐  ┌────────────┐  ┌──────────────┐
│ Single   │  │ Web + DB     │  │ Microsvcs  │  │ Global       │
│ Process  │  │ (2 servers)  │  │ (dozens)   │  │ Multi-region │
└──────────┘  └──────────────┘  └────────────┘  └──────────────┘
     │              │                  │                │
     │              │                  │                │
 No network    Simple          Complex          Maximum
 No latency    failure modes   coordination     complexity
 Easy debug    Easy reasoning  Hard to debug    Very hard

Most systems live in the middle—distributed enough to need
the mental models, but not so distributed that nothing works.

Try This (2 minutes)

Think about a system you work with. Where does it fall on the spectrum?

Component Runs Where Communicates Via

Is it more distributed than you initially thought?

Part 2: The Fundamental Challenges

2.1 Challenge #1: Latency

LATENCY: THE SPEED OF LIGHT PROBLEM
═══════════════════════════════════════════════════════════════

In a single machine:
    Function call: ~1 nanosecond
    Memory access: ~100 nanoseconds

Across a network:
    Same datacenter: ~500 microseconds (500,000 ns)
    Cross-country: ~30 milliseconds (30,000,000 ns)
    Cross-ocean: ~100 milliseconds (100,000,000 ns)

IMPACT
─────────────────────────────────────────────────────────────
┌────────────────────────────────────────────────────────────┐
│                                                            │
│  Local call:     1 ns                                      │
│  Network call:   1,000,000 ns (1 ms)                      │
│                                                            │
│  That's 1,000,000x slower!                                │
│                                                            │
│  Design that works locally can be disastrous distributed. │
│                                                            │
│  Example: Loop with 100 database calls                    │
│    Local: 100 × 1ns = 100ns                               │
│    Network: 100 × 1ms = 100ms (user notices!)             │
│                                                            │
└────────────────────────────────────────────────────────────┘

LATENCY NUMBERS EVERY PROGRAMMER SHOULD KNOW
─────────────────────────────────────────────────────────────
L1 cache reference:                    0.5 ns
L2 cache reference:                      7 ns
Main memory reference:                 100 ns
SSD random read:                    16,000 ns   (16 μs)
Network round trip (same DC):      500,000 ns   (500 μs)
Disk seek:                       2,000,000 ns   (2 ms)
Network round trip (US→EU):    100,000,000 ns   (100 ms)

2.2 Challenge #2: Partial Failure

PARTIAL FAILURE: THE UNRELIABILITY PROBLEM
═══════════════════════════════════════════════════════════════

In a single machine:
    Either the whole thing works, or the whole thing crashes.
    Failure is total and obvious.

In a distributed system:
    Part can fail while other parts continue.
    Failure can be partial, intermittent, or undetectable.

FAILURE MODES
─────────────────────────────────────────────────────────────
┌────────────────────────────────────────────────────────────┐
│                                                            │
│  CRASH FAILURE                                             │
│    Node stops responding entirely.                         │
│    At least you know it's dead.                           │
│                                                            │
│  OMISSION FAILURE                                          │
│    Messages lost. Did the server get my request?          │
│    Did it respond but I didn't receive?                   │
│                                                            │
│  TIMING FAILURE                                            │
│    Response came, but too slow. Did it timeout?           │
│    Did it complete? Is it still running?                  │
│                                                            │
│  BYZANTINE FAILURE                                         │
│    Component lies or behaves maliciously.                 │
│    Hardest to handle. Usually assume this won't happen.   │
│                                                            │
└────────────────────────────────────────────────────────────┘

THE FUNDAMENTAL UNCERTAINTY
─────────────────────────────────────────────────────────────
You send a request. No response. What happened?

    1. Request was lost in transit?
    2. Server crashed before processing?
    3. Server crashed after processing?
    4. Response was lost in transit?
    5. Server is just slow?

You cannot distinguish these cases from the client side.
This is fundamental. No protocol can solve it.

2.3 Challenge #3: No Global Clock

NO GLOBAL CLOCK: THE ORDERING PROBLEM
═══════════════════════════════════════════════════════════════

In a single machine:
    Events have a clear order.
    Thread A did X, then Thread B did Y.
    System clock is the arbiter.

In a distributed system:
    No shared clock. Each machine has its own.
    Clocks drift. Even GPS-synced clocks differ by milliseconds.
    "Before" and "after" become fuzzy concepts.

THE ORDERING PROBLEM
─────────────────────────────────────────────────────────────
                  Server A              Server B
                     │                     │
     10:00:00.001    │ ─── Request ───────▶│   10:00:00.003
                     │                     │
     10:00:00.005    │◀─── Response ───── │   10:00:00.004
                     │                     │

Which happened first? The request or the response?

Server A says: Request at 10:00:00.001, response at 10:00:00.005
Server B says: Request at 10:00:00.003, response at 10:00:00.004

Server B's clock is 2ms ahead. The timestamps are misleading.

CONSEQUENCES
─────────────────────────────────────────────────────────────
- Can't use timestamps to order events across machines
- "Last write wins" requires agreeing on what "last" means
- Debugging is hard: logs from different servers don't align
- Need logical clocks (Lamport clocks, vector clocks) for ordering

War Story: The $12 Million Clock Skew Incident

March 2018. A high-frequency trading firm discovers they’ve been losing money for months—not from bad trades, but from bad timestamps.

The firm operated trading servers in both New Jersey and Chicago. The servers used NTP to synchronize clocks, but NTP only guarantees accuracy within tens of milliseconds. The clocks had drifted 47ms apart.

When a trade executed in Chicago triggered a hedge trade in New Jersey, the timestamps sometimes showed the hedge happening before the original trade. The reconciliation system, which used timestamps to order events, processed them wrong. Sometimes hedges were cancelled as “duplicate trades.” Sometimes positions were calculated incorrectly.

Timeline of discovery:

Day 1: Risk analyst notices P&L discrepancies in overnight reports

Week 2: Engineering traces problem to event ordering logic

Week 3: Clock drift identified as root cause

Week 4: Logical clocks implemented (Lamport timestamps)

Month 2: Full audit reveals $12 million in losses over 6 months

The fix: The team replaced wall-clock ordering with Lamport timestamps. Every event now carries a logical clock value that increments with each operation. When server A sends a message to server B, it includes its clock value. Server B sets its clock to max(its_clock, received_clock) + 1. This guarantees that caused events always have higher timestamps than their causes—regardless of what the physical clocks say.

The lesson: In distributed systems, “when” something happened is less important than “what caused what.” Physical time is unreliable. Causal ordering is essential.

Part 3: The CAP Theorem

3.1 Understanding CAP

THE CAP THEOREM
═══════════════════════════════════════════════════════════════

In a distributed system, you can have at most TWO of:

    C - CONSISTENCY
        Every read receives the most recent write.
        All nodes see the same data at the same time.

    A - AVAILABILITY
        Every request receives a response.
        System is always operational.

    P - PARTITION TOLERANCE
        System continues operating despite network partitions.
        Messages between nodes can be lost or delayed.

THE CATCH
─────────────────────────────────────────────────────────────
Network partitions WILL happen. You don't get to choose.

So really, during a partition you choose between:
    CP: Consistent but unavailable during partition
    AP: Available but possibly inconsistent during partition

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│     Normal operation: You can have all three!               │
│                                                             │
│     During partition: Choose C or A                         │
│                                                             │
│         CP: Reject requests to maintain consistency         │
│         AP: Accept requests, reconcile later                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

3.2 CAP in Practice

CAP TRADE-OFFS IN REAL SYSTEMS
═══════════════════════════════════════════════════════════════

CP SYSTEMS (Consistent, Partition-tolerant)
─────────────────────────────────────────────────────────────
During partition: Refuse some requests to maintain consistency

Examples:
    - Traditional RDBMS with synchronous replication
    - ZooKeeper
    - etcd (Kubernetes' brain)
    - MongoDB with majority write concern

Use when: Correctness matters more than availability
    - Financial transactions
    - Inventory systems
    - Coordination services

AP SYSTEMS (Available, Partition-tolerant)
─────────────────────────────────────────────────────────────
During partition: Accept requests, data may diverge

Examples:
    - Cassandra
    - DynamoDB (eventually consistent reads)
    - DNS
    - Caches

Use when: Availability matters more than immediate consistency
    - Shopping carts
    - Social media feeds
    - Metrics/logging

REALITY CHECK
─────────────────────────────────────────────────────────────
Most systems need BOTH consistency AND availability.
The choice is: which to sacrifice DURING a partition?

Partitions are rare but not rare enough to ignore.
Design for partition. Choose your trade-off deliberately.

3.3 Beyond CAP: PACELC

PACELC: A MORE COMPLETE MODEL
═══════════════════════════════════════════════════════════════

CAP only talks about partitions. But what about normal operation?

PACELC: If Partition, then Availability vs Consistency,
        Else, Latency vs Consistency

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   During Partition (P):                                     │
│       Choose A (availability) or C (consistency)            │
│                                                             │
│   Else (normal operation):                                  │
│       Choose L (latency) or C (consistency)                 │
│                                                             │
│   PA/EL: Available during partition, low latency normally   │
│          (Cassandra, DynamoDB)                              │
│                                                             │
│   PC/EC: Consistent during partition, consistent normally   │
│          (Traditional RDBMS, etcd)                          │
│                                                             │
│   PA/EC: Available during partition, consistent normally    │
│          (Some configs of MongoDB)                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

This model acknowledges that even in normal operation,
you trade latency for consistency (synchronous replication)
or consistency for latency (asynchronous replication).

Try This (3 minutes)

For each scenario, which trade-off makes sense?

Scenario During Partition Why
Bank account balance CP Can’t show wrong balance
Social media likes count AP
Kubernetes pod state
Shopping cart
DNS

Scenario	During Partition	Why
Bank account balance	CP	Can’t show wrong balance
Social media likes count	AP
Kubernetes pod state
Shopping cart
DNS

Part 4: Distributed System Patterns

4.1 The Two Generals Problem

THE TWO GENERALS PROBLEM
═══════════════════════════════════════════════════════════════

Setup:
    Two generals must coordinate an attack.
    They can only communicate via messengers.
    Messengers may be captured (lost messages).
    Both must attack simultaneously or not at all.

The Problem:
─────────────────────────────────────────────────────────────
    General A                           General B
        │                                   │
        │ "Attack at dawn"                  │
        │ ─────────messenger──────────────▶ │
        │                                   │
        │           "Got it"                │
        │ ◀────────messenger─────────────── │
        │                                   │
        │   But did B receive my ack?       │
        │   "I got your ack"                │
        │ ─────────messenger──────────────▶ │
        │                                   │
        │   But did A receive...            │
        │   (infinite regress)              │

IMPOSSIBILITY
─────────────────────────────────────────────────────────────
There is NO protocol that guarantees agreement if messages
can be lost. This is proven mathematically impossible.

IMPLICATIONS FOR DISTRIBUTED SYSTEMS
─────────────────────────────────────────────────────────────
You cannot guarantee that two nodes agree on anything
if the network is unreliable. The best you can do:
    - Increase probability (retries, acknowledgments)
    - Accept uncertainty (eventual consistency)
    - Use consensus protocols (Paxos, Raft) when possible

4.2 The Byzantine Generals Problem

BYZANTINE GENERALS PROBLEM
═══════════════════════════════════════════════════════════════

Same as Two Generals, but worse:
    - Some generals might be traitors
    - Traitors send conflicting messages to different generals
    - Loyal generals must still reach agreement

THE CHALLENGE
─────────────────────────────────────────────────────────────
    General A: "Attack!"
    General B: "Attack!" (traitor, tells C "Retreat")
    General C: "Attack!"

    C receives: "Attack" from A, "Retreat" from B
    A receives: "Attack" from B (B lies), "Attack" from C

    How can A and C agree despite B's lies?

SOLUTION REQUIREMENTS
─────────────────────────────────────────────────────────────
To tolerate f Byzantine (lying) nodes, you need 3f + 1 total.

    1 traitor → need 4 generals
    2 traitors → need 7 generals

This is expensive! Most systems assume no Byzantine failures.

WHERE IT MATTERS
─────────────────────────────────────────────────────────────
- Blockchain (trustless networks)
- Safety-critical systems (aviation, medical)
- Systems where nodes might be compromised

Most internal systems assume nodes are honest but buggy.
They handle crash failures, not Byzantine failures.

4.3 Idempotency

IDEMPOTENCY: SAFE TO RETRY
═══════════════════════════════════════════════════════════════

An operation is IDEMPOTENT if doing it multiple times
has the same effect as doing it once.

WHY IT MATTERS
─────────────────────────────────────────────────────────────
Request sent. Timeout. Did it succeed?

    If NOT idempotent: Dangerous to retry
        "Transfer $100" → Retry → $200 transferred!

    If idempotent: Safe to retry
        "Set balance to $100" → Retry → Still $100

IDEMPOTENT OPERATIONS
─────────────────────────────────────────────────────────────
✓ SET x = 5           (result is always 5)
✓ DELETE user 123     (can't delete twice)
✓ PUT /users/123      (creates or replaces)
✗ INCREMENT x         (adds more each time)
✗ POST /users         (creates new each time)
✗ TRANSFER $100       (transfers more each time)

MAKING OPERATIONS IDEMPOTENT
─────────────────────────────────────────────────────────────
Add a unique request ID. Server tracks completed requests.

    Client: "Transfer $100, request_id=abc123"
    Server: Checks if abc123 already processed
            If yes: Return cached result
            If no: Process, save result, return

    Client timeout → Retry with same abc123 → Safe!

Part 5: Kubernetes as a Distributed System

5.1 Kubernetes Architecture

KUBERNETES: A DISTRIBUTED SYSTEM
═══════════════════════════════════════════════════════════════

CONTROL PLANE
─────────────────────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   ┌───────────┐  ┌───────────┐  ┌───────────┐             │
│   │ API Server│  │ API Server│  │ API Server│  (HA)       │
│   └─────┬─────┘  └─────┬─────┘  └─────┬─────┘             │
│         │              │              │                     │
│         └──────────────┼──────────────┘                     │
│                        │                                    │
│                        ▼                                    │
│              ┌─────────────────┐                           │
│              │      etcd       │  (distributed consensus)  │
│              │   ┌───┬───┬───┐ │                           │
│              │   │ 1 │ 2 │ 3 │ │  (Raft protocol)         │
│              │   └───┴───┴───┘ │                           │
│              └─────────────────┘                           │
│                                                             │
│   ┌────────────────┐    ┌────────────────┐                │
│   │ Controller Mgr │    │   Scheduler    │                │
│   │  (leader elect)│    │  (leader elect)│                │
│   └────────────────┘    └────────────────┘                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

DATA PLANE (Nodes)
─────────────────────────────────────────────────────────────
┌────────────┐  ┌────────────┐  ┌────────────┐
│   Node 1   │  │   Node 2   │  │   Node 3   │
│  ┌──────┐  │  │  ┌──────┐  │  │  ┌──────┐  │
│  │kubelet│  │  │  │kubelet│  │  │  │kubelet│  │
│  └──────┘  │  │  └──────┘  │  │  └──────┘  │
│  ┌──────┐  │  │  ┌──────┐  │  │  ┌──────┐  │
│  │kube- │  │  │  │kube- │  │  │  │kube- │  │
│  │proxy │  │  │  │proxy │  │  │  │proxy │  │
│  └──────┘  │  │  └──────┘  │  │  └──────┘  │
│  [pods]    │  │  [pods]    │  │  [pods]    │
└────────────┘  └────────────┘  └────────────┘

5.2 How Kubernetes Handles Distribution

KUBERNETES DISTRIBUTED PATTERNS
═══════════════════════════════════════════════════════════════

CONSENSUS (etcd)
─────────────────────────────────────────────────────────────
etcd uses Raft consensus for strong consistency.
All cluster state lives in etcd.
Writes require majority agreement (quorum).

    3 etcd nodes: Can lose 1
    5 etcd nodes: Can lose 2

LEADER ELECTION
─────────────────────────────────────────────────────────────
Controller Manager and Scheduler use leader election.
Only one active at a time (avoid duplicate work).
If leader fails, another takes over.

    # Check current leader
    kubectl get leases -n kube-system

EVENTUAL CONSISTENCY (Controllers)
─────────────────────────────────────────────────────────────
Controllers reconcile desired vs actual state.
They're eventually consistent—changes propagate over time.

    You: "I want 3 replicas"
    Controller: Sees 2, creates 1
    Repeat until actual = desired

WATCH PATTERN
─────────────────────────────────────────────────────────────
Components watch etcd for changes, not poll.
Reduces load, provides near-real-time updates.

    kubelet watches for pod assignments
    Controller watches for resource changes

5.3 Kubernetes Failure Modes

KUBERNETES FAILURE SCENARIOS
═══════════════════════════════════════════════════════════════

NODE FAILURE
─────────────────────────────────────────────────────────────
kubelet stops sending heartbeats.
Node marked "NotReady" after timeout (default 40s).
Pods rescheduled after tolerationSeconds (default 300s).

    Timeline:
    0s: Node dies
    40s: Node marked NotReady
    300s: Pods evicted, rescheduled elsewhere

ETCD QUORUM LOSS
─────────────────────────────────────────────────────────────
If majority of etcd nodes fail, cluster becomes read-only.
Existing pods keep running (kubelet cached state).
No new pods, no updates until quorum restored.

    3-node etcd: Lose 2 → Quorum lost
    5-node etcd: Lose 3 → Quorum lost

NETWORK PARTITION
─────────────────────────────────────────────────────────────
Nodes can't reach control plane.
Existing pods keep running (autonomous kubelet).
No new scheduling until connectivity restored.

Split-brain scenario:
    Partition A: Some nodes + some etcd
    Partition B: Other nodes + other etcd

    If neither has etcd quorum: Both read-only
    Kubernetes chooses consistency over availability (CP)

API SERVER FAILURE
─────────────────────────────────────────────────────────────
With multiple API servers, load balancer routes around failure.
kubectl commands may fail temporarily.
Pods keep running (kubelet operates independently).

Did You Know?

The speed of light limits distributed systems. A message from New York to London takes at least 28ms—the time light needs to travel through fiber. You can’t engineer around physics.
Google’s Spanner uses atomic clocks and GPS to synchronize time across datacenters to within a few milliseconds. It’s one of the few systems that can offer both strong consistency and global distribution.
The term “Byzantine fault” comes from the Byzantine Generals Problem, named for the Byzantine Empire’s reputation for complex political intrigue. In distributed systems, a Byzantine node is one that might behave arbitrarily—including lying.
Leslie Lamport’s “Time, Clocks, and the Ordering of Events” paper (1978) introduced logical clocks and is one of the most cited papers in computer science. Lamport later won the Turing Award partly for this work. The key insight: you don’t need real time to order events—you just need to track causality (“happened before” relationships).

Common Mistakes

Mistake	Problem	Solution
Ignoring network latency	Design works locally, fails distributed	Measure latency, batch requests
Assuming networks are reliable	No retry logic, no timeouts	Implement timeouts, retries, circuit breakers
Using wall-clock time for ordering	Clock skew causes wrong order	Use logical clocks or version vectors
Not designing for partial failure	One failure takes down everything	Bulkheads, graceful degradation
Synchronous everything	Latency compounds, cascade failures	Async where possible, queues
Ignoring CAP trade-offs	Surprised by behavior during partition	Choose trade-offs explicitly

Quiz

What are the three fundamental challenges of distributed systems?
Answer

The three fundamental challenges are:
1. Latency: Network communication is millions of times slower than local function calls. Speed of light imposes physical limits.
2. Partial failure: Components can fail independently. You can’t always tell if a remote call succeeded, failed, or is just slow.
3. No global clock: Each machine has its own clock that drifts. You can’t rely on timestamps to order events across machines.
These challenges are fundamental—they can be mitigated but not eliminated. Any distributed system design must account for all three.
Explain the CAP theorem in your own words.
Answer

The CAP theorem states that during a network partition, a distributed system must choose between consistency and availability:
- Consistency: All nodes see the same data
- Availability: All requests get responses
- Partition tolerance: System handles network splits
Since network partitions are inevitable, the real choice is:
- CP: During partition, refuse some requests to maintain consistency
- AP: During partition, serve requests but data might diverge
In normal operation (no partition), you can have both. The choice matters when things go wrong.

Example: A banking system chooses CP—better to reject a transaction than show wrong balance. A social media site chooses AP—better to show slightly stale like count than fail entirely.
Why can’t you use timestamps to order events in a distributed system?
Answer

You can’t rely on timestamps because:
1. Clock drift: Each machine’s clock runs slightly fast or slow. Over time, clocks diverge.
2. No perfect sync: Even with NTP, clocks differ by milliseconds. Events happening close together may get wrong order.
3. Network delay: A message sent “earlier” might arrive “later” than one sent after it.
Example: Server A (clock running fast) timestamps event at 10:00:00.050. Server B (accurate clock) timestamps event at 10:00:00.045. B’s event actually happened after A’s, but has earlier timestamp.

Solutions:
- Logical clocks (Lamport clocks): Track causality, not wall time
- Vector clocks: Track causality per node
- Hybrid logical clocks: Combine physical and logical time
What makes Kubernetes a distributed system, and how does it handle distribution?
Answer

Kubernetes is distributed because:
- Multiple components (API server, etcd, controllers, kubelets)
- Running on multiple machines
- Coordinating via network messages
- No single point of failure (when properly configured)
How it handles distribution:
1. etcd for consensus: Uses Raft protocol for strong consistency of cluster state. Requires quorum for writes.
2. Leader election: Controller Manager and Scheduler elect leaders to avoid duplicate work.
3. Eventual consistency: Controllers continuously reconcile desired vs actual state. Changes propagate over time.
4. Watch pattern: Components watch etcd for changes instead of polling, reducing load.
5. Autonomous kubelets: Nodes keep running pods even if disconnected from control plane.
Kubernetes chooses CP during partitions—better to refuse changes than have inconsistent state.
A service makes 50 sequential database calls per request. Each call takes 2ms over the network. What’s the minimum latency for a user request? How could you improve it?
Answer

Calculation:
- 50 calls × 2ms = 100ms minimum latency
- This doesn’t include processing time, just network round trips
Improvement strategies:
1. Batch calls: Instead of 50 individual calls, combine into fewer queries
  - 5 batched calls × 2ms = 10ms (10x improvement)
2. Parallelize: If calls are independent, execute simultaneously
  - 50 parallel calls = 2ms (50x improvement)
3. Cache: Store frequently accessed data locally
  - Cache hit: 0.1ms vs 2ms network call
4. Denormalize: Store data together to reduce joins/calls
  - Single call with all data: 2ms total
The lesson: Network latency compounds. Design minimizes round trips.
You send a request to a service and don’t receive a response after 5 seconds. List all possible states the request could be in. Why is this uncertainty fundamental?
Answer

Possible states:
1. Request lost in transit - Never reached server
2. Server received but crashed before processing - No work done
3. Server processing but slow - Will eventually complete
4. Server processed successfully, response lost - Work completed!
5. Server processed, crashed after responding - Work completed!
6. Response delayed in network - Will eventually arrive
Why fundamental:

This uncertainty cannot be eliminated by any protocol. From the client’s perspective, states 2, 3, 4, 5, and 6 look identical: silence.

This is the Two Generals Problem—you cannot achieve certainty about remote state over an unreliable channel.

Practical implications:
- Must design operations to be idempotent (safe to retry)
- Use unique request IDs to detect duplicates
- Implement timeouts and retries with exponential backoff
- Accept that “at-most-once” or “at-least-once” delivery—never “exactly-once”
An etcd cluster has 5 nodes. How many can fail while maintaining quorum? If you lose quorum, what happens to the Kubernetes cluster?
Answer

Quorum calculation:
- Quorum requires majority: floor(n/2) + 1
- For 5 nodes: floor(5/2) + 1 = 3 nodes required
- Can lose 2 nodes and still have quorum (3 remaining)
If quorum is lost (3+ nodes fail):
1. etcd becomes read-only - No writes accepted
2. No new pods can be scheduled - Scheduler can’t write
3. No deployments can update - Controllers can’t write
4. Existing pods keep running - kubelets have cached state
5. kubectl get works - Reads still succeed
6. kubectl apply fails - Writes rejected
Recovery:
- Restore failed nodes to regain quorum
- Or restore from etcd backup to new cluster
- Never reduce below 3 nodes (can’t lose any)
Best practice: Use 5 nodes for production (tolerates 2 failures), 3 for smaller clusters (tolerates 1 failure).
A social media platform shows like counts that update every 30 seconds. During a network partition between US and EU datacenters, a post gets 1000 likes in US and 500 likes in EU. When the partition heals, what should the count be? What CAP trade-off did this system make?
Answer

The count should be 1500 (1000 + 500)

This requires a CRDT (Conflict-free Replicated Data Type)—specifically a G-Counter (grow-only counter) that merges by summing per-replica increments.

CAP trade-off:
- System chose AP (Availability over Consistency)
- During partition: Both regions accepted writes (available)
- Result: Temporary inconsistency (US saw 1000, EU saw 500)
- After partition: Merge brings eventual consistency
Why this is appropriate for likes:
- Temporary wrong count is acceptable
- Users don’t expect real-time accuracy
- Unavailability (can’t like) would be worse than inconsistency
Contrast with banking:
- Bank would choose CP
- During partition: Reject transactions
- Why: Incorrect balance is unacceptable
- Unavailability is better than double-spending
The pattern: Choose AP for data where “close enough” is acceptable, CP for data where correctness is critical.

Hands-On Exercise

Task: Explore distributed system behavior in Kubernetes.

Part 1: Observe Latency (10 minutes)

Run a pod and measure API server latency:

# Create a test pod
kubectl run latency-test --image=busybox --command -- sleep 3600

# Time various operations
time kubectl get pod latency-test
time kubectl get pod latency-test -o yaml
time kubectl describe pod latency-test

# Compare to local operations
time cat /etc/hostname

Record your findings:

Operation	Time	Notes
kubectl get pod
kubectl describe
Local file read

Part 2: Observe Eventual Consistency (10 minutes)

Watch a deployment scale:

# Create deployment
kubectl create deployment scale-test --image=nginx --replicas=1

# In terminal 1: Watch pods
kubectl get pods -w

# In terminal 2: Scale up rapidly
kubectl scale deployment scale-test --replicas=10

# Observe:
# - How long until all pods are Running?
# - Do pods appear in order?
# - What intermediate states do you see?

Part 3: Simulate Partial Failure (10 minutes)

If using kind or minikube with multiple nodes:

# List nodes
kubectl get nodes

# Cordon a node (no new pods)
kubectl cordon <node-name>

# Watch what happens to pods
kubectl get pods -o wide -w

# Uncordon
kubectl uncordon <node-name>

Success Criteria:

Measured API server latency vs local operations
Observed eventual consistency during scaling
Understood intermediate states during changes
(Optional) Observed behavior during node failure

Key Takeaways

Before moving on, ensure you understand:

The three fundamental challenges: Latency (network is slow), partial failure (parts fail independently), no global clock (can’t trust timestamps)
CAP theorem reality: During a network partition, you choose consistency OR availability—not both. Most systems need both, so you choose which to sacrifice during failures
PACELC extension: Even without partitions, you trade latency for consistency. Synchronous replication = consistent but slow. Async = fast but potentially stale
The Two Generals Problem: You cannot guarantee agreement over unreliable networks. This is mathematically proven impossible
Idempotency is essential: Design operations to be safe to retry. Use unique request IDs to detect duplicates
Kubernetes is CP: It chooses consistency over availability during partitions. etcd requires quorum; without it, the cluster becomes read-only
Logical clocks > wall clocks: Don’t use timestamps to order events. Use Lamport clocks or vector clocks to track causality
Distribution is a spectrum: A web server + database is distributed. Understanding where your system falls helps choose appropriate patterns

Next Module

Module 5.2: Consensus and Coordination - How distributed systems agree on anything, and why it’s so hard.