Module 1.4: Architecture Decision Records & Technical Writing
Complexity:
[MEDIUM]| Time: 2 hours | Prerequisites: System Design basicsTrack: Foundations / Engineering Leadership
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Build Architecture Decision Records that capture context, constraints, alternatives considered, and rationale in a format future engineers can act on
- Design a decision documentation workflow that integrates with existing pull request and review processes without adding ceremony
- Evaluate when a decision warrants an ADR versus informal documentation based on reversibility, impact scope, and team size
- Apply technical writing principles (clarity, conciseness, audience awareness) to produce documentation that engineers actually read and reference
The Decision Nobody Remembers
Section titled “The Decision Nobody Remembers”Thursday afternoon. Sprint planning.
“Why are we using RabbitMQ instead of Kafka?”
The room goes quiet. Engineers exchange glances. The tech lead who made that decision left 18 months ago. Someone mutters, “I think there was a Slack thread about it… maybe?” Another engineer opens Confluence and types “messaging” into search. Forty-seven results. None of them answer the question.
So the team does what every team does in this situation: they relitigate the decision from scratch. Two senior engineers spend a week benchmarking. A third writes a comparison document. The VP of Engineering weighs in during a 1:1 and mentions a constraint nobody else knew about. Three weeks later, they arrive at the same conclusion the previous tech lead reached in 2022---but they’ve burned a month of engineering time to get there.
This happens constantly. Not because engineers are careless, but because most organizations have no systematic way to record why decisions were made. They document what was built (API docs, runbooks, READMEs) but almost never why it was built that way.
Architecture Decision Records fix this. They’re one of the highest-leverage tools in engineering leadership, and they take about 30 minutes to write.
This module teaches you how to write them well, and more broadly, how to communicate technical decisions to different audiences---because the best decision in the world is worthless if nobody understands or remembers it.
Why This Module Matters
Section titled “Why This Module Matters”Every engineering team makes hundreds of decisions per quarter. Which database to use. How to handle authentication. Whether to build or buy. Monolith or microservices. Managed service or self-hosted.
Most of these decisions are invisible. They live in Slack threads that scroll away, in meeting notes nobody reads, in the heads of engineers who eventually leave. When new team members join, they inherit a codebase full of choices they don’t understand. They either:
- Accept everything blindly (“I guess there was a reason for this”), leading to cargo-cult engineering
- Question everything constantly (“Why are we doing it this way?”), burning time and frustrating the team
- Redo decisions from scratch (“Let’s migrate to Kafka”), wasting months on already-solved problems
ADRs solve all three. They give newcomers context, protect institutional knowledge, and prevent decision amnesia.
But ADRs are just one form of technical writing. The broader skill---communicating decisions clearly to different audiences---is what separates a good engineer from an engineering leader. You might need to explain the same Kafka decision to:
- Your team: with full technical depth, trade-off analysis, and benchmark data
- Your VP of Engineering: focused on risk, timeline, and cost implications
- Your CEO: one paragraph about business impact
This module teaches the discipline of capturing and communicating technical decisions effectively.
The Real Cost of Undocumented Decisions
A 2021 survey by Stripe estimated that developers spend 17.3 hours per week on maintenance and technical debt. A significant portion of that time is spent understanding why things were built the way they were. ADRs won’t eliminate tech debt, but they dramatically reduce the cognitive overhead of working in a mature codebase.
What You’ll Learn
Section titled “What You’ll Learn”- Why documenting decisions matters more than documenting code
- The ADR format: Context, Options, Decision, Consequences
- How to write for different audiences (engineers, product, executives)
- The RFC process and how it connects to ADRs
- How to run effective design reviews
- Real-world ADR examples you can use as templates
Part 1: The Case for Decision Records
Section titled “Part 1: The Case for Decision Records”What Decisions Deserve Recording?
Section titled “What Decisions Deserve Recording?”Not every decision needs an ADR. You don’t need one for choosing between camelCase and snake_case (that goes in a style guide). You don’t need one for which CI/CD tool to use if your company already standardized on one.
ADRs are for architecturally significant decisions---choices that are:
- Hard to reverse once implemented (database choice, messaging system, API paradigm)
- Cross-cutting across multiple teams or services
- Costly in terms of engineering time, infrastructure spend, or operational burden
- Contentious where reasonable engineers disagree
DECISION SIGNIFICANCE FILTER======================================================================
Ask these questions about a technical decision:
1. Would reversing this cost more than a sprint? → YES = ADR 2. Does it affect more than one team or service? → YES = ADR 3. Will people ask "why did we do this?" in 6 months? → YES = ADR 4. Are there trade-offs that won't be obvious later? → YES = ADR 5. Did multiple engineers disagree on the approach? → YES = ADR
If you answered YES to any of these, write an ADR.If you answered YES to three or more, write it TODAY.Why Not Just Use Confluence/Notion/Google Docs?
Section titled “Why Not Just Use Confluence/Notion/Google Docs?”You can use any tool, but there’s a strong argument for keeping ADRs in the repository alongside the code they describe:
| Approach | Pros | Cons |
|---|---|---|
ADRs in repo (docs/adr/) | Versioned with code, survives tool changes, discoverable via grep, reviewed in PRs | Not accessible to non-engineers |
| Wiki (Confluence/Notion) | Accessible to everyone, rich formatting | Gets stale, hard to find, not versioned with code, dies when you switch tools |
| Slack threads | Fast, natural discussion | Disappears in weeks, unsearchable, no structure |
| Google Docs | Collaborative editing, comments | Disconnected from code, access control issues |
The recommendation: write the ADR in markdown in your repo, and link to it from your wiki if non-technical stakeholders need access.
Part 2: ADR Structure
Section titled “Part 2: ADR Structure”The Standard ADR Format
Section titled “The Standard ADR Format”Michael Nygard proposed the original ADR format in 2011, and it has become the de facto standard. Here’s the structure with explanations:
# ADR-NNN: Title of the Decision
## Status
[Proposed | Accepted | Deprecated | Superseded by ADR-XXX]
## Date
YYYY-MM-DD
## Context
What is the issue that we're seeing that is motivating this decision?What are the forces at play? Technical constraints, business requirements,team capabilities, timeline pressures?
## Options Considered
### Option 1: [Name]Description, pros, cons, estimated effort.
### Option 2: [Name]Description, pros, cons, estimated effort.
### Option 3: [Name]Description, pros, cons, estimated effort.
## Decision
What is the change that we're proposing and/or doing?State the decision clearly and unambiguously.
## Consequences
What becomes easier or harder as a result of this decision?Both positive and negative consequences should be listed.Let’s break down each section.
Context: The “Why” Section
Section titled “Context: The “Why” Section”This is the most important section. It captures the forces that led to the decision---the business requirements, technical constraints, team dynamics, and timeline pressures that shaped the choice.
Bad context:
We need to choose a message broker for our event-driven architecture.
Good context:
Our order processing system currently uses synchronous HTTP calls between 6 microservices. During Black Friday 2024, this caused cascading failures when the inventory service became slow (see incident report IR-2024-112). Peak load is 15,000 orders/minute with a target of 50,000/minute by Q3. The team has 4 backend engineers, 2 of whom have Kafka experience. Our infrastructure budget for messaging is approximately $3,000/month on AWS.
The good version tells you why the decision is being made, what constraints exist, and what success looks like. Someone reading this in 2 years will understand the full picture.
Options Considered: Show Your Work
Section titled “Options Considered: Show Your Work”List at least 2-3 options, including ones you rejected. This is critical because:
- It proves you didn’t just pick your favorite technology
- It helps future engineers understand what was available at the time
- It prevents re-evaluation of options that were already considered
For each option, include:
OPTION EVALUATION TEMPLATE======================================================================
Option Name: [Technology/Approach]
Description: What is it? How would we use it? Pros: What does it do well for our use case? Cons: What are the downsides? Effort: How long to implement? What skills needed? Cost: Infrastructure, licensing, operational cost Risk: What could go wrong? Team Fit: Do we have the skills? Can we hire for it?Decision: Be Unambiguous
Section titled “Decision: Be Unambiguous”State the decision clearly. Don’t hedge. Don’t say “we might” or “we’re leaning toward.” Say “We will use X because Y.”
Bad decision:
We’re going to try Kafka and see how it goes.
Good decision:
We will use AWS MSK (Managed Kafka) for asynchronous event processing between order, inventory, and notification services. We chose managed Kafka over self-hosted to reduce operational burden, accepting the higher per-hour cost in exchange for automatic patching, scaling, and monitoring.
Consequences: The Honest Part
Section titled “Consequences: The Honest Part”Every decision has trade-offs. List both positive and negative consequences. This is where you earn trust---by being honest about what you’re giving up.
CONSEQUENCES TEMPLATE======================================================================
POSITIVE: + Decouples services, reducing cascading failure risk + Enables replay of events for debugging and recovery + Scales horizontally for Black Friday load requirements
NEGATIVE: - Adds operational complexity (topic management, consumer groups) - Introduces eventual consistency (team must handle this in code) - AWS MSK costs ~$2,800/month (vs ~$800 for self-hosted RabbitMQ) - Vendor lock-in to AWS for messaging infrastructure
RISKS: ! Team has limited Kafka experience (mitigated by MSK managed service) ! Schema evolution could cause consumer breakage (mitigated by Schema Registry)Part 3: ADR Lifecycle
Section titled “Part 3: ADR Lifecycle”ADRs are not static documents. They have a lifecycle:
ADR LIFECYCLE======================================================================
PROPOSED Someone writes the ADR and opens it for discussion. │ This might be a PR, a design review agenda item, │ or an RFC shared on Slack. │ ▼ ACCEPTED The team agrees on the decision. The ADR is merged. │ The decision is now the team's official stance. │ Implementation begins. │ ▼ [Time passes, the world changes] │ ▼ DEPRECATED The decision no longer applies. The technology was │ sunset, the requirements changed, or a better option │ emerged. A new ADR explains why. │ ▼ SUPERSEDED A new ADR explicitly replaces this one. BY ADR-XXX The old ADR links to the new one, preserving the full decision history.Never Delete ADRs
Section titled “Never Delete ADRs”This is a common mistake. When a decision is overturned, teams want to delete the old ADR. Don’t. The old ADR is history. It explains why the previous approach was chosen, which helps people understand the current one.
Instead, mark the old ADR as Superseded by ADR-XXX and add a note at the top:
## Status
**Superseded by [ADR-042: Migration to Apache Pulsar](042-migration-to-pulsar/)**
> This ADR documented our 2023 decision to use AWS MSK. In 2025, we> migrated to Apache Pulsar due to multi-cloud requirements. See ADR-042> for the updated decision and migration plan.Numbering and Organization
Section titled “Numbering and Organization”Keep ADRs in a dedicated directory:
docs/└── adr/ ├── README.md # Index of all ADRs with status ├── 001-use-postgresql.md ├── 002-event-driven-architecture.md ├── 003-managed-kafka.md ├── 004-graphql-api.md └── template.md # Blank template for new ADRsThe README.md acts as an index:
# Architecture Decision Records
| ADR | Title | Status | Date ||-----|-------|--------|------|| 001 | Use PostgreSQL for primary data store | Accepted | 2023-03-15 || 002 | Adopt event-driven architecture | Accepted | 2023-06-01 || 003 | Use AWS MSK for messaging | Superseded by 007 | 2023-08-20 || 004 | GraphQL for public API | Accepted | 2023-11-10 |Part 4: Writing for Different Audiences
Section titled “Part 4: Writing for Different Audiences”An ADR is written for engineers. But the same decision often needs to be communicated to product managers, VPs, and executives. The information is the same; the framing changes completely.
The Audience Pyramid
Section titled “The Audience Pyramid”THE AUDIENCE PYRAMID======================================================================
┌─────────┐ │ C-SUITE │ Impact. One paragraph. │ / EXECS │ "What does this mean for the business?" ├──────────┤ │ VP / │ Risk, cost, timeline. │ DIRECTOR │ "What are the trade-offs and │ │ when will it be done?" ├───────────┤ │ PRODUCT │ User impact, feature implications. │ MANAGERS │ "How does this affect the roadmap?" ├────────────┤ │ ENGINEERING │ Full technical detail. │ TEAM │ "How does this work and why?" └──────────────┘
Rule: As you go UP the pyramid, remove technical detail and add business context.Same Decision, Four Audiences
Section titled “Same Decision, Four Audiences”Let’s say you’ve decided to migrate from a monolithic API to microservices. Here’s how you’d communicate it:
To your engineering team (ADR):
We will decompose the Order module into three microservices (order-api, order-processor, order-notifications) communicating via async events on Kafka. This reduces deployment coupling---currently a change to notifications requires redeploying the entire order system, causing 15-minute deploy windows. Independent services will deploy in under 2 minutes with zero-downtime rolling updates.
To product managers:
We’re splitting the order system into independent components. This means the team can ship notification features (like SMS alerts) without touching---or risking---the core order flow. Deploy frequency for order-related features will increase from weekly to multiple times per day.
To your VP of Engineering:
We’re investing 6 weeks of engineering time to decompose the order monolith. This reduces our deployment risk (3 incidents in the last quarter were caused by unrelated changes deployed together) and unblocks the notification team, who are currently blocked 2-3 days per sprint waiting for deploy windows. Expected ROI: the engineering time pays for itself within one quarter through reduced incident cost and faster shipping.
To the CTO/CEO:
We’re restructuring the order system so teams can ship features independently. This eliminates deployment bottlenecks that have delayed 3 features this quarter and reduces the risk of outages during deploys. Six weeks of investment, payback within one quarter.
The “So What?” Test
Section titled “The “So What?” Test”Before sending any technical communication, ask: “So what?”
- “We’re migrating to Kafka.” So what? Orders will process 10x faster during peak load.
- “We’re adding a caching layer.” So what? Page load times drop from 3 seconds to 200ms.
- “We’re refactoring the auth module.” So what? We can add SSO support, which Sales has been requesting for 6 months.
Every communication to a non-technical audience should lead with the “so what” answer, not the technical detail.
Part 5: The RFC Process
Section titled “Part 5: The RFC Process”ADRs vs RFCs
Section titled “ADRs vs RFCs”ADRs and RFCs serve different purposes but complement each other:
| Aspect | ADR | RFC |
|---|---|---|
| Purpose | Record a decision | Propose a change and solicit feedback |
| Timing | Written when decision is made (or shortly after) | Written before work begins |
| Length | 1-2 pages | 3-15 pages |
| Audience | Future engineers | Current team + stakeholders |
| Tone | Declarative (“We decided…”) | Propositional (“I propose…”) |
| Approval | Team consensus or tech lead decision | Formal review process |
Think of it this way: an RFC is the discussion, and the ADR is the conclusion.
RFC Structure
Section titled “RFC Structure”A good RFC includes:
# RFC: [Title]
**Author**: [Name]**Date**: YYYY-MM-DD**Status**: [Draft | In Review | Accepted | Rejected | Withdrawn]**Reviewers**: [Names of people whose input is required]
## SummaryOne paragraph. What are you proposing and why?
## MotivationWhy is this needed? What problem does it solve?Include data: error rates, customer complaints, engineering hours wasted.
## Detailed DesignThe technical proposal. Architecture diagrams, API contracts,data models, sequence diagrams. This is the meat of the RFC.
## Alternatives ConsideredWhat else could we do? Why is this proposal better?
## Migration PlanHow do we get from here to there? What's the rollout strategy?How do we roll back if things go wrong?
## Open QuestionsWhat haven't you figured out yet? What do you need input on?Being honest about unknowns builds trust.
## TimelineRough estimate of effort and milestones.Running Effective Design Reviews
Section titled “Running Effective Design Reviews”The RFC is the document. The design review is the meeting where you discuss it. Here’s how to run a good one:
DESIGN REVIEW BEST PRACTICES======================================================================
BEFORE THE MEETING: - Share the RFC at least 2 business days before the review - Ask reviewers to leave async comments first - Identify the 2-3 most contentious decisions to focus on
DURING THE MEETING: - Don't read the RFC aloud (everyone should have read it) - Start with: "What questions do you have?" - Focus on trade-offs, not preferences - Time-box to 45 minutes (30 discussion + 15 decisions) - Assign someone to take notes on decisions made
AFTER THE MEETING: - Update the RFC with decisions and rationale - Write the ADR(s) capturing final decisions - Share the outcome with stakeholders
ANTI-PATTERNS TO AVOID: ✗ Bikeshedding (arguing about trivial details) ✗ "Let me present for 40 minutes" (it's a discussion, not a lecture) ✗ Design-by-committee (someone must own the decision) ✗ Blocking on perfection (good enough now > perfect later)Part 6: Documenting “Why” Over “What”
Section titled “Part 6: Documenting “Why” Over “What””Code Tells You What. Comments Tell You Why. ADRs Tell You Why at Scale.
Section titled “Code Tells You What. Comments Tell You Why. ADRs Tell You Why at Scale.”This principle applies at every level:
# BAD: Comment says what the code does (I can read the code)# Retry 3 times with exponential backofffor i in range(3): try: response = call_api() break except TimeoutError: time.sleep(2 ** i)
# GOOD: Comment says WHY (I can't read your mind)# The payment gateway rate-limits aggressive retries.# After incident INC-2024-089, we found that exponential backoff# with max 3 retries keeps us under their 10-req/sec threshold.for i in range(3): try: response = call_api() break except TimeoutError: time.sleep(2 ** i)The same principle applies to ADRs. Don’t just document what you chose. Document why you chose it, what you rejected, and what trade-offs you accepted.
The “Future Engineer” Test
Section titled “The “Future Engineer” Test”When writing any technical document, imagine someone joining your team in 18 months. They’re smart, but they don’t know your history. Ask:
- Will they understand what we decided? (The easy part)
- Will they understand why we decided it? (The hard part)
- Will they know what alternatives were considered? (Prevents re-litigation)
- Will they understand what constraints existed at the time? (Prevents unfair criticism)
If the answer to any of these is “no,” your documentation is incomplete.
Part 7: Real-World ADR Examples
Section titled “Part 7: Real-World ADR Examples”Example 1: Database Selection
Section titled “Example 1: Database Selection”# ADR-012: Use PostgreSQL with Read Replicas Instead of DynamoDB
## StatusAccepted
## Date2024-09-15
## ContextOur user profile service handles 2,000 reads/sec and 200 writes/sec.Growth projections suggest 10,000 reads/sec within 12 months. The team(5 backend engineers) has deep PostgreSQL expertise but no DynamoDBexperience. Our data model includes complex relationships (users →organizations → roles → permissions) that benefit from JOIN operations.
We evaluated options during Sprint 42 after the CEO asked aboutscaling concerns raised by a prospective enterprise customer.
## Options Considered
### Option 1: DynamoDB- Fully managed, scales automatically- Pay-per-request pricing at our scale: ~$400/month- Requires denormalizing our relational data model- Team would need 2-3 months to become proficient- No JOIN support; complex queries require application-level logic
### Option 2: PostgreSQL with Read Replicas- Team already proficient; no ramp-up time- RDS managed instances with read replicas: ~$600/month- Handles our relational data model naturally- Read replicas handle read scaling to 50,000 reads/sec- Requires manual scaling decisions (add/remove replicas)
### Option 3: CockroachDB- Distributed SQL; scales horizontally- Compatible with PostgreSQL wire protocol- Starting at ~$1,200/month for managed service- Team would need 1 month to learn operational differences- Overkill for our current and projected scale
## DecisionWe will use PostgreSQL on AWS RDS with read replicas.
Our data model is inherently relational, and the team's PostgreSQLexpertise means we can ship the scaling improvements in 2 weeksinstead of the 2-3 months DynamoDB would require. Read replicasgive us 10x headroom on reads, which covers our 12-month projection.
## Consequences+ Team ships immediately with no learning curve+ Relational queries remain simple and maintainable+ Cost-effective at current and projected scale- We accept manual scaling decisions (adding read replicas)- If we exceed 50,000 reads/sec, we'll need to re-evaluate- Single-region limitation with RDS (acceptable for now)Example 2: API Paradigm
Section titled “Example 2: API Paradigm”# ADR-018: Use REST for Internal APIs, GraphQL for Public API
## StatusAccepted
## Date2024-11-20
## ContextWe maintain 12 internal microservices and a public API consumed by~200 third-party integrators. Internal services need simple,predictable communication. External consumers have diverse dataneeds---some need full user profiles, others need just email and name.
Our current REST API forces external consumers to make 3-4 callsto assemble data that could be fetched in one GraphQL query.Support tickets about API inefficiency have increased 40% QoQ.
## DecisionInternal services will continue using REST with OpenAPI specs.The public-facing API will add a GraphQL layer backed by theexisting REST services.
## Consequences+ External consumers get flexible data fetching (reduced API calls)+ Internal services remain simple and well-understood+ GraphQL layer acts as a BFF (Backend for Frontend), insulating internal changes from external consumers- Team must learn and maintain GraphQL (training budget approved)- Two API paradigms to document and support- GraphQL introduces query complexity risks (mitigated by depth limiting)Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It’s a Problem | Better Approach |
|---|---|---|
| Writing ADRs after the fact as an afterthought | Context and reasoning are forgotten; the ADR becomes a hollow justification | Write the ADR during or immediately after the decision. The discussion is freshest then. |
| Listing only the chosen option | Future engineers don’t know what was considered and re-evaluate from scratch | Always include 2-3 alternatives with honest pros/cons for each. |
| Omitting negative consequences | Erodes trust; makes ADRs feel like marketing documents | List trade-offs honestly. Every decision has downsides---acknowledge them. |
| Making ADRs too long | Nobody reads 10-page ADRs. They become write-only documents. | Keep ADRs to 1-2 pages. Move detailed analysis to appendices or linked RFCs. |
| No clear owner or approval process | ADRs rot in “Proposed” status forever because nobody is responsible for driving them to acceptance | Assign an owner to each ADR. Set a review deadline (1-2 weeks). |
| Deleting superseded ADRs | Destroys decision history. Future engineers lose context about why the previous approach was abandoned. | Mark as “Superseded by ADR-XXX” and keep the file. It’s an archive, not a wiki. |
| Using ADRs for non-architectural decisions | Dilutes the value. Teams get ADR fatigue and stop reading them. | Reserve ADRs for significant, hard-to-reverse decisions. Use style guides, runbooks, and READMEs for everything else. |
| Writing for yourself instead of your audience | You understand the jargon and context today. Your reader in 18 months will not. | Apply the “Future Engineer” test. Write for someone smart but new to the team. |
Test your understanding of ADRs and technical writing.
Question 1: What are the four core sections of a standard ADR?
Show Answer
Context, Options Considered, Decision, and Consequences.
The Context explains why the decision is needed. Options Considered shows the alternatives evaluated. Decision states what was chosen and why. Consequences lists both positive and negative outcomes.
Some templates also include Status and Date, which are important metadata but not part of the core reasoning structure.
Question 2: Your team decided to use Redis for caching 2 years ago. Now you’re migrating to Memcached. What should you do with the original Redis ADR?
Show Answer
Do not delete it. Mark it as “Superseded by ADR-XXX” (the Memcached ADR) and add a brief note explaining the transition. The original ADR is valuable history---it explains why Redis was chosen at the time, which helps future engineers understand the evolution of the system.
The new Memcached ADR should reference the original and explain what changed (requirements, scale, team expertise, cost) to motivate the migration.
Question 3: You need to explain a database migration to your CEO. Which of these is better?
A) “We’re migrating from MySQL 5.7 to PostgreSQL 16 because MySQL’s query optimizer doesn’t support hash joins and our analytical workloads require better parallel query execution.”
B) “We’re switching databases to handle 10x more reporting queries without slowing down the product. This supports the enterprise sales push by enabling the real-time dashboards large customers are requesting.”
Show Answer
B is better. The CEO cares about business impact (enterprise sales, customer features), not query optimizer internals. Option A provides technical detail that’s irrelevant to the CEO’s decision-making context.
Option A would be appropriate for the engineering team’s ADR. The skill is matching the depth and framing to your audience.
Question 4: What is the key difference between an ADR and an RFC?
Show Answer
An RFC (Request for Comments) is written before a decision to propose a change and solicit feedback. It’s a discussion document.
An ADR (Architecture Decision Record) is written when a decision is made to record the outcome. It’s a historical record.
Think of it this way: the RFC is the debate, the ADR is the verdict. An RFC often results in one or more ADRs.
Question 5: A junior engineer asks, “Why do we keep ADRs in the git repo instead of Confluence?” Give two strong reasons.
Show Answer
-
Version control: ADRs in the repo are versioned alongside the code they describe. You can see exactly what decisions were in effect when a particular version of the code was written. Confluence doesn’t provide this temporal alignment.
-
Durability: Companies switch wiki tools every few years (Confluence to Notion to Slite to whatever’s next). Files in a git repo survive tool migrations. The ADR from 2020 is still readable in 2030.
Bonus reasons: ADRs in the repo are discoverable via grep, can be reviewed in pull requests, and can be linked from code comments.
Question 6: You’re writing an ADR and you can only think of one option. What should you do?
Show Answer
If you can only think of one option, you haven’t done enough research---or the decision isn’t significant enough to warrant an ADR.
Every architectural decision has alternatives. At minimum, consider:
- Do nothing (keep the status quo)
- Build vs buy (if applicable)
- A competing technology in the same category
If after research you genuinely believe there’s only one viable option, document the alternatives you considered and explain why they were eliminated. The “Options Considered” section should show your reasoning process, even if the conclusion is obvious.
Question 7: What is the “So What?” test and when should you apply it?
Show Answer
The “So What?” test is a technique for ensuring technical communication is relevant to its audience. After writing a statement, ask “So what?”---if the audience wouldn’t care about the answer, you’re writing at the wrong level of abstraction.
Apply it whenever communicating with non-technical stakeholders:
- “We’re adding a CDN.” So what? “Pages will load 3x faster for international customers.”
- “We’re containerizing the app.” So what? “Deployments will go from 2 hours to 5 minutes, and we’ll ship features faster.”
The test forces you to connect technical decisions to outcomes that matter to your audience.
Hands-On Exercise: Draft an ADR
Section titled “Hands-On Exercise: Draft an ADR”Scenario
Section titled “Scenario”Your team runs an event-driven e-commerce platform on Kubernetes. The current RabbitMQ cluster is struggling under load during peak sales events (Black Friday, seasonal promotions). Messages are being dropped, consumers can’t keep up, and the operations team spends significant time managing the RabbitMQ cluster.
You need to decide between:
- Option A: AWS MSK (Managed Kafka) --- AWS-managed Apache Kafka service
- Option B: Self-hosted Kafka on Kubernetes --- Running Kafka using Strimzi operator on your existing K8s clusters
Constraints:
- Team of 6 engineers; 2 have Kafka experience, all have Kubernetes experience
- Current message volume: 5,000 events/sec, projected 25,000 events/sec in 12 months
- Infrastructure budget: $5,000/month for messaging
- Company uses AWS for all infrastructure
- Must support at least 7-day message retention for replay capability
- Uptime requirement: 99.95%
Your Task
Section titled “Your Task”Write a complete ADR using the standard format. Your ADR must include:
- Context: Describe the current problem, constraints, and what success looks like
- Options Considered: Evaluate both options with honest pros and cons
- Decision: Choose one option and explain why
- Consequences: List positive and negative outcomes
Evaluation Hints
Section titled “Evaluation Hints”Consider these trade-off dimensions:
TRADE-OFF ANALYSIS FRAMEWORK======================================================================
Dimension AWS MSK Self-Hosted (Strimzi) ───────────────────────────────────────────────────────────────── Operational Cost Higher $/hour but Lower $/hour but team zero ops overhead must manage upgrades, patches, scaling
Control AWS manages broker Full control over config, limited configuration, tuning, customization and version
Scaling AWS handles broker Team must plan and scaling, you manage execute scaling partition scaling operations manually
Skills Required Kafka application Kafka application + knowledge only Kafka operations + Strimzi operator
Vendor Lock-in Tied to AWS MSK Portable across any (MSK-specific APIs) Kubernetes cluster
Reliability AWS SLA (99.9%), Depends on your team's managed failover operational maturity
Cost at Scale ~$3,500-4,500/mo ~$1,200-2,000/mo for projected load for projected loadSuccess Criteria
Section titled “Success Criteria”- ADR follows the standard format (Status, Date, Context, Options, Decision, Consequences)
- Context section explains the current problem with specific numbers
- At least 2 options are evaluated with honest pros and cons
- Decision is clear, unambiguous, and justified
- Consequences include both positive and negative outcomes
- A non-technical reader could understand why the decision was made (even if not the technical details)
- The ADR is 1-2 pages (not a 10-page novel)
Stretch Goals
Section titled “Stretch Goals”- Write a 3-sentence summary of the decision for your VP of Engineering (focus on cost and risk)
- Write a 1-sentence summary for the CTO (focus on business impact)
- Identify which consequence is most likely to cause problems in 12 months
Did You Know?
Section titled “Did You Know?”-
Michael Nygard proposed the ADR format in a blog post in November 2011. The post was barely 500 words long, yet it launched a practice now used by thousands of engineering teams worldwide. Sometimes the most impactful technical writing is the shortest.
-
Amazon’s famous “6-pager” memos are essentially elaborate RFCs. Jeff Bezos banned PowerPoint in 2004, requiring teams to write structured prose documents instead. His reasoning: “The narrative structure of a good memo forces better thinking than the bullet points of a PowerPoint.” Meetings begin with 20 minutes of silent reading before any discussion.
-
Google’s design documents are typically 5-20 pages and require approval from at least one “readability reviewer” who ensures the document is clear to someone outside the immediate team. Google engineers write an estimated 1,000+ design docs per week across the company.
-
The Linux kernel has some of the most thorough decision documentation in open source---not as formal ADRs, but in mailing list discussions that are archived forever. Linus Torvalds’ emails explaining why a patch was rejected are legendary examples of technical communication. His 2006 email explaining Git’s design decisions is still cited today.
Further Reading
Section titled “Further Reading”-
“Documenting Architecture Decisions” by Michael Nygard --- The original blog post that started it all. Read this first; it takes 5 minutes.
-
“Design Docs at Google” --- Google’s engineering practices documentation explains their design review process in detail.
-
“Architecture Decision Records” on GitHub (joelparkerhenderson/architecture-decision-record) --- A comprehensive collection of ADR templates, examples, and tools.
-
“The Staff Engineer’s Path” by Tanya Reilly --- Chapter on technical decision-making and communication. Excellent coverage of writing for different audiences.
-
“Writing for Engineers” by Karan Goel --- Practical guide to technical writing that engineers will actually read.
Next Module
Section titled “Next Module”Module 1.5: Stakeholder Communication & Managing Expectations --- Translating tech debt into business risk, saying “No” effectively, and communicating during crises to non-technical stakeholders.
“The best time to write an ADR is when you make the decision. The second best time is now.” --- Engineering proverb
“Architecture is the decisions you wish you could get right early in a project, but that you are not necessarily more likely to get right than any others.” --- Ralph Johnson