Module 3.3: Environment Promotion
Discipline Module | Complexity:
[MEDIUM]| Time: 30-35 min
Prerequisites
Section titled “Prerequisites”Before starting this module:
- Required: Module 3.2: Repository Strategies — Repository structure
- Required: Module 3.1: What is GitOps? — GitOps fundamentals
- Recommended: Experience with multi-environment deployments
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Design environment promotion pipelines that move changes safely from dev through staging to production
- Implement automated promotion gates with testing, approval, and rollback capabilities
- Build promotion strategies that handle dependencies between microservices during coordinated releases
- Evaluate promotion patterns — image tag updates, Kustomize overlays, Helm value overrides — for your context
Why This Module Matters
Section titled “Why This Module Matters”Your new feature works in dev. Now what?
The journey from dev to production is where most deployment problems occur:
- “It worked in staging!” (but prod is different)
- “Who promoted this?” (no audit trail)
- “Can we roll back?” (unclear what to roll back to)
Good promotion strategy means:
- Changes are tested before reaching users
- Promotions are auditable and reversible
- The path to production is clear and consistent
This module teaches you how to promote changes safely through environments using GitOps.
The Promotion Problem
Section titled “The Promotion Problem”In traditional deployments, promotion often means:
- Run CI/CD pipeline for staging
- Wait for manual approval
- Run CI/CD pipeline for prod
- Hope nothing is different
What can go wrong:
- Different pipeline runs = potentially different results
- Config drift between environments
- Manual steps introduce errors
- No clear record of what was promoted
GitOps changes this by making promotion a Git operation.
Directory-Based Promotion
Section titled “Directory-Based Promotion”The simplest and most common GitOps promotion pattern.
The Structure
Section titled “The Structure”my-service/├── base/│ ├── deployment.yaml│ ├── service.yaml│ └── kustomization.yaml└── overlays/ ├── dev/ │ └── kustomization.yaml # image: v1.2.3 ├── staging/ │ └── kustomization.yaml # image: v1.2.2 └── prod/ └── kustomization.yaml # image: v1.2.1Promotion = Update Image Tag
Section titled “Promotion = Update Image Tag”apiVersion: kustomize.config.k8s.io/v1beta1kind: Kustomizationresources: - ../../baseimages: - name: my-service newTag: v1.2.3 # Latest in dev
# overlays/staging/kustomization.yamlapiVersion: kustomize.config.k8s.io/v1beta1kind: Kustomizationresources: - ../../baseimages: - name: my-service newTag: v1.2.2 # Promoted from dev
# overlays/prod/kustomization.yamlapiVersion: kustomize.config.k8s.io/v1beta1kind: Kustomizationresources: - ../../baseimages: - name: my-service newTag: v1.2.1 # Promoted from stagingThe Promotion Flow
Section titled “The Promotion Flow”┌─────────────────────────────────────────────────────────────┐│ Git Repository │└─────────────────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ overlays/dev/ overlays/staging/ overlays/prod/ image: v1.2.3 image: v1.2.2 image: v1.2.1 │ │ │ │ │ │ Promote: Promote: Deployed: PR to update PR to update GitOps syncs staging/tag prod/tagManual Promotion
Section titled “Manual Promotion”# Promote v1.2.3 from dev to stagingcd config-repo
# Update staging overlayyq eval '.images[0].newTag = "v1.2.3"' -i overlays/staging/kustomization.yaml
# Commit and pushgit add overlays/staging/git commit -m "Promote my-service v1.2.3 to staging"git push origin main
# GitOps agent syncs staging clusterAutomated Promotion
Section titled “Automated Promotion”# GitHub Action for auto-promotionname: Promote to Staging
on: workflow_dispatch: inputs: version: description: 'Version to promote' required: true
jobs: promote: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Update staging image tag run: | yq eval '.images[0].newTag = "${{ inputs.version }}"' \ -i overlays/staging/kustomization.yaml
- name: Create PR uses: peter-evans/create-pull-request@v5 with: title: "Promote my-service ${{ inputs.version }} to staging" branch: promote-staging-${{ inputs.version }} body: | Promoting my-service to staging.
**Version**: ${{ inputs.version }} **Source**: dev environment
Please review and merge to complete promotion.Try This: Trace a Promotion
Section titled “Try This: Trace a Promotion”Think about a recent deployment to production:
1. What version/commit was promoted? _________________2. When did it reach each environment? - Dev: _________________ - Staging: _________________ - Prod: _________________3. Who approved each promotion? _________________4. What testing happened between environments? _________________5. How would you roll back? _________________If you can’t answer these easily, your promotion process needs work.
Image Tag Promotion
Section titled “Image Tag Promotion”The most common artifact to promote is the container image tag.
Why Image Tags?
Section titled “Why Image Tags?”Image tag represents:- Specific code version- Built artifact- Immutable (if using proper tagging)
Promoting an image tag = deploying exact same artifactTag Strategies
Section titled “Tag Strategies”Semantic Versioning:
my-service:v1.2.3my-service:v1.2.4my-service:v2.0.0- Clear version progression
- Works well with release processes
- Requires version management
Git SHA:
my-service:abc123fmy-service:def456g- Directly traceable to code
- No version management needed
- Less human-readable
Git SHA + Build Number:
my-service:abc123f-42my-service:def456g-43- Traceable + ordering
- Useful for debugging
Avoid:
my-service:latest # Mutable, ambiguousmy-service:staging # Which version?Promotion Script Example
Section titled “Promotion Script Example”#!/bin/bash# promote.sh - Promote image tag through environments
set -e
VERSION=$1FROM_ENV=$2TO_ENV=$3
if [ -z "$VERSION" ] || [ -z "$FROM_ENV" ] || [ -z "$TO_ENV" ]; then echo "Usage: ./promote.sh <version> <from-env> <to-env>" echo "Example: ./promote.sh v1.2.3 dev staging" exit 1fi
echo "Promoting $VERSION from $FROM_ENV to $TO_ENV"
# Verify version exists in source environmentCURRENT=$(yq eval '.images[0].newTag' overlays/$FROM_ENV/kustomization.yaml)if [ "$CURRENT" != "$VERSION" ]; then echo "Warning: $FROM_ENV has $CURRENT, not $VERSION" read -p "Continue anyway? (y/n) " -n 1 -r echo if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1 fifi
# Update target environmentyq eval ".images[0].newTag = \"$VERSION\"" -i overlays/$TO_ENV/kustomization.yaml
# Show diffgit diff overlays/$TO_ENV/
# Commitgit add overlays/$TO_ENV/git commit -m "Promote my-service $VERSION to $TO_ENV
Promoted from: $FROM_ENVPrevious version: $(git show HEAD^:overlays/$TO_ENV/kustomization.yaml | yq '.images[0].newTag')"
echo "Committed. Push to deploy."Progressive Delivery
Section titled “Progressive Delivery”Beyond simple environment promotion, progressive delivery gradually shifts traffic.
Canary Deployments
Section titled “Canary Deployments”100% traffic │████████████████████████│ v1.2.2 (stable) │ │ 5% traffic │█ │ v1.2.3 (canary) │ │ └────────────────────────┘
Monitor canary for errors...
If OK: Increase to 25%, 50%, 100%If bad: Roll back canaryGitOps Canary with Flagger:
apiVersion: flagger.app/v1beta1kind: Canarymetadata: name: my-servicespec: targetRef: apiVersion: apps/v1 kind: Deployment name: my-service progressDeadlineSeconds: 60 service: port: 80 analysis: interval: 30s threshold: 5 maxWeight: 50 stepWeight: 10 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: request-duration thresholdRange: max: 500 interval: 30sBlue-Green Deployments
Section titled “Blue-Green Deployments”Blue (current) │████████████████████████│ v1.2.2 (100%) │ │Green (new) │████████████████████████│ v1.2.3 (0%) │ │ └────────────────────────┘
Test Green environment...
If OK: Switch traffic: Blue=0%, Green=100%If bad: Keep traffic on BlueGitOps Blue-Green with Argo Rollouts:
apiVersion: argoproj.io/v1alpha1kind: Rolloutmetadata: name: my-servicespec: replicas: 5 strategy: blueGreen: activeService: my-service previewService: my-service-preview autoPromotionEnabled: false # Manual promotion template: spec: containers: - name: my-service image: my-service:v1.2.3When to Use Progressive Delivery
Section titled “When to Use Progressive Delivery”| Scenario | Strategy |
|---|---|
| Low risk, fast feedback | Direct promotion |
| Medium risk, need validation | Canary (gradual) |
| High risk, need full testing | Blue-green (parallel) |
| Critical services | Canary with automated rollback |
Did You Know?
Section titled “Did You Know?”-
Facebook deploys code to 2% of users first, then gradually rolls out. They call this “gatekeeper” and have done it for over a decade.
-
“GitOps promotion” is fundamentally different from “CI/CD promotion”. In CI/CD, the pipeline pushes. In GitOps, the change is a Git commit and the cluster pulls.
-
Some teams use “promotion bots” that automatically promote after tests pass and a timer expires. No human approval needed for staging, human approval only for prod.
-
LinkedIn’s deployment system promotes changes through 5 stages before reaching all users: canary → early adopters → first tier → second tier → full rollout. Each stage has automated health checks that can halt the promotion.
War Story: The Promotion That Skipped Staging
Section titled “War Story: The Promotion That Skipped Staging”A team I worked with had an “emergency fix”:
The Situation:
- Bug in production causing customer issues
- Developer had a fix ready
- “We don’t have time for staging!”
What They Did:
# Direct to prod (bad idea!)git checkout mainyq eval '.images[0].newTag = "hotfix-123"' -i overlays/prod/kustomization.yamlgit commit -m "Emergency fix for customer issue"git pushWhat Happened:
The fix worked for the original bug. But the hotfix image:
- Had a different config dependency
- Config existed in dev (where it was developed)
- Config didn’t exist in prod
- New bug: service crashed on startup
Impact:
- 15 minutes of downtime
- Customer issue now worse
- Emergency rollback
- 3 AM debugging session
The Root Cause:
The fix wasn’t tested in an environment that matched prod. Skipping staging meant skipping validation.
The Better Approach:
# Even for emergencies:1. Deploy to staging first (10 min)2. Quick smoke test (5 min)3. Deploy to prod4. Total: 15 min delayed, hours of debugging savedPolicy Change:
They implemented:
- No direct-to-prod promotions, ever
- “Emergency” staging = fast track, not skip
- Automated smoke tests in staging
- Alert if prod has version not in staging
Lesson: Emergencies don’t justify skipping environments. They justify faster promotion through environments.
Approval Workflows
Section titled “Approval Workflows”Promotion often requires human approval, especially for production.
Pull Request Approvals
Section titled “Pull Request Approvals”# GitHub branch protection for config repo# main branch rules:- Require pull request before merging- Require 1 approval for overlays/staging/- Require 2 approvals for overlays/prod/- Require status checks to passGitOps Tool Integration
Section titled “GitOps Tool Integration”ArgoCD Sync Windows:
apiVersion: argoproj.io/v1alpha1kind: AppProjectmetadata: name: productionspec: syncWindows: - kind: allow schedule: '0 10-16 * * 1-5' # Weekdays 10am-4pm duration: 6h applications: - '*' - kind: deny schedule: '0 0 * * 0' # No Sunday deploys duration: 24hFlux Notifications:
apiVersion: notification.toolkit.fluxcd.io/v1beta2kind: Alertmetadata: name: production-promotionsspec: providerRef: name: slack eventSeverity: info eventSources: - kind: Kustomization name: '*' namespace: flux-system inclusionList: - ".*overlays/prod.*"Approval Patterns
Section titled “Approval Patterns”| Pattern | How It Works | Best For |
|---|---|---|
| PR approval | Human approves PR | Most teams |
| Scheduled windows | Auto-approve during windows | Controlled deployments |
| Test gates | Auto-approve if tests pass | High automation |
| Change advisory | External approval system | Enterprise/compliance |
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| Skipping environments | Bugs reach prod untested | Always promote through all envs |
| Manual copy-paste | Errors, inconsistency | Use scripts/automation |
| No approval for prod | Risky changes slip through | Require PR approval |
| Same image tag, different configs | Works in staging, fails in prod | Include config in testing |
| No rollback plan | Stuck with broken deploy | Always know how to roll back |
| Promote on Friday | Weekend incidents | Freeze before weekends |
Quiz: Check Your Understanding
Section titled “Quiz: Check Your Understanding”Question 1
Section titled “Question 1”Why is promoting image tags better than promoting “latest”?
Show Answer
Image tags provide:
- Immutability:
v1.2.3is always the same image - Traceability: Know exactly what’s deployed
- Reproducibility: Can deploy same version anywhere
- Rollback clarity: Roll back to
v1.2.2, not “previous latest”
“latest” problems:
- Mutable: Changes over time
- Ambiguous: “latest” when? Built by whom?
- Not reproducible: Can’t redeploy same “latest” later
- Rollback unclear: What was previous “latest”?
Rule: Never use latest in production. Use immutable tags (semver or git SHA).
Question 2
Section titled “Question 2”Your prod deployment failed. How do you roll back with GitOps?
Show Answer
Option 1: Git revert (recommended)
# Find the promotion commitgit log --oneline overlays/prod/
# Revert itgit revert <commit-sha>git push origin main
# GitOps agent syncs reverted stateOption 2: Update to previous version
# Set image back to previous versionyq eval '.images[0].newTag = "v1.2.2"' -i overlays/prod/kustomization.yamlgit commit -m "Rollback: v1.2.3 -> v1.2.2 due to <reason>"git push origin mainOption 3: GitOps tool rollback
# ArgoCDargocd app rollback my-app
# Fluxflux suspend kustomization prod# Fix in Git, then:flux resume kustomization prodBest practice: Use git revert to maintain history. The rollback itself is audited.
Question 3
Section titled “Question 3”How do you ensure staging is always tested before prod?
Show Answer
Technical controls:
-
PR rules: Prod changes require staging version match
# CI check: verify staging has the version being promoted to prod -
Required checks: Tests must pass in staging before prod merge
-
Sync windows: Prod only deploys after staging stabilization period
-
Alerts: Notify if prod has version not in staging
Process controls:
- Policy: All promotions follow dev → staging → prod
- Review: Prod PRs require evidence of staging success
- Automation: Promotion scripts enforce the path
Example CI check:
STAGING_VERSION=$(yq '.images[0].newTag' overlays/staging/kustomization.yaml)PROD_VERSION=$(yq '.images[0].newTag' overlays/prod/kustomization.yaml)NEW_PROD_VERSION=$(git diff HEAD^ -- overlays/prod/ | grep newTag | ...)
if [ "$NEW_PROD_VERSION" != "$STAGING_VERSION" ]; then echo "Error: Promoting $NEW_PROD_VERSION to prod but staging has $STAGING_VERSION" exit 1fiQuestion 4
Section titled “Question 4”When would you use canary vs blue-green deployment?
Show Answer
Canary (gradual rollout):
- Slowly increase traffic to new version
- Good for: catching issues with real traffic patterns
- Rollback: shift traffic back to stable
- Use when: you want production traffic validation
Blue-green (parallel environments):
- Run both versions simultaneously
- Test new version completely before switching
- Rollback: instant switch back to blue
- Use when: you need full testing before any production traffic
Decision factors:
| Factor | Canary | Blue-Green |
|---|---|---|
| Resource usage | Lower (gradual) | Higher (2x during deploy) |
| Rollback speed | Gradual | Instant |
| Production testing | Yes (partial traffic) | No (separate preview) |
| User impact on failure | Partial users affected | No users affected |
| Complexity | Medium | Medium |
Common approach: Blue-green for staging (safe testing), canary for prod (gradual exposure).
Hands-On Exercise: Design Promotion Pipeline
Section titled “Hands-On Exercise: Design Promotion Pipeline”Design a complete promotion pipeline for a service.
Scenario
Section titled “Scenario”- Service:
order-service - Environments: dev, staging, prod
- Requirements:
- Auto-deploy to dev on merge to main
- Manual promotion to staging
- Approval required for prod
- Ability to roll back any environment
Part 1: Repository Structure
Section titled “Part 1: Repository Structure”## Repository Structure
config-repo/├── order-service/│ ├── base/│ │ ├── deployment.yaml│ │ ├── service.yaml│ │ └── kustomization.yaml│ └── overlays/│ ├── dev/│ │ └── kustomization.yaml│ ├── staging/│ │ └── kustomization.yaml│ └── prod/│ └── kustomization.yamlPart 2: Promotion Commands
Section titled “Part 2: Promotion Commands”Define the exact commands/steps:
## Promotion Steps
### Dev (automatic)Trigger: Merge to main in app repoSteps:1. CI builds image: order-service:<git-sha>2. CI updates: _______________3. GitOps agent syncs
### Staging (manual trigger)Trigger: Developer initiates promotionSteps:1. _______________2. _______________3. _______________
### Prod (with approval)Trigger: Release manager initiatesSteps:1. _______________2. _______________3. _______________Part 3: Rollback Procedure
Section titled “Part 3: Rollback Procedure”## Rollback Procedure
### If bad deployment detected:
**Step 1**: Identify the bad version```bash# Command to see current deployed version:_______________Step 2: Find previous good version
# Command to see version history:_______________Step 3: Roll back
# Commands to roll back:______________________________Step 4: Verify
# How to verify rollback successful:_______________### Part 4: Approval Configuration
```markdown## Approval Configuration
### Staging- Required approvals: ___- Who can approve: _______________
### Prod- Required approvals: ___- Who can approve: _______________- Deployment windows: _______________- Blackout periods: _______________Success Criteria
Section titled “Success Criteria”- Defined promotion steps for all three environments
- Specified how auto-deploy to dev works
- Created rollback procedure with actual commands
- Documented approval requirements
- Included deployment windows/blackouts for prod
Key Takeaways
Section titled “Key Takeaways”- Promotion = Git commit: Change the image tag in the environment directory
- Never skip environments: Even emergencies go through staging (just faster)
- Use immutable image tags: semver or git SHA, never
latest - Progressive delivery: Consider canary/blue-green for production
- Approval gates: PR approvals, sync windows, test gates
Further Reading
Section titled “Further Reading”Books:
- “GitOps and Kubernetes” — Chapter on Environment Promotion
- “Continuous Delivery” — Jez Humble (foundational)
Articles:
- “Progressive Delivery” — RedMonk
- “Canary Deployments with Flagger” — Flux docs
Tools:
- Flagger: Progressive delivery for Kubernetes
- Argo Rollouts: Advanced deployment strategies
- Kustomize: Base/overlay pattern
Summary
Section titled “Summary”Environment promotion in GitOps is a Git operation:
- Update the image tag in the environment’s directory
- Create a PR (for approval)
- Merge to deploy
This provides:
- Clear audit trail
- Easy rollback (git revert)
- Consistent process
- Approval integration
The key insight: promotion is declaring “this environment should have this version.” The GitOps agent makes it so.
Next Module
Section titled “Next Module”Continue to Module 3.4: Drift Detection and Remediation to learn how to detect and handle when cluster state doesn’t match Git.
“The best promotion is the one you can roll back in 30 seconds.” — GitOps Wisdom