Module 2.3: Flux
Toolkit Track | Complexity:
[COMPLEX]| Time: 40-45 min
The platform engineer stared at the dashboard in disbelief. At 2:47 AM, their multi-cluster GitOps setup had saved them from what would have been a catastrophic misconfiguration. A developer had accidentally pushed a ConfigMap with replicas: 0 for their payment service to the main branch. Within 90 seconds, Flux’s image automation controller detected the change, and because they’d configured proper health checks, the Kustomization failed to reconcile—the cluster stayed healthy while the bad commit was automatically flagged. “Before Flux,” she thought, “this would have taken down 12 clusters simultaneously.” The company later estimated the prevented outage would have cost $3.2 million in lost transactions and SLA penalties.
Prerequisites
Section titled “Prerequisites”Before starting this module:
- Module 2.1: ArgoCD — GitOps concepts (for comparison)
- GitOps Discipline — GitOps principles
- Kubernetes basics
- Git fundamentals
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy Flux controllers and configure GitRepository, Kustomization, and HelmRelease resources
- Implement multi-cluster GitOps with Flux’s toolkit approach and dependency ordering
- Configure image automation to detect new container images and update Git repositories automatically
- Compare Flux’s controller-based architecture against ArgoCD for different organizational patterns
Why This Module Matters
Section titled “Why This Module Matters”Flux is the GitOps Toolkit—a set of specialized controllers that each do one thing well. While ArgoCD is an application, Flux is a framework. This gives you incredible flexibility but requires understanding how the pieces fit together.
Flux was created by Weaveworks, the company that coined “GitOps.” It’s now a CNCF graduated project, running in production at companies like Deutsche Telekom, Volvo, and SAP.
Did You Know?
Section titled “Did You Know?”- Weaveworks invented the term “GitOps” in 2017—Flux was the first tool to implement the concept
- Flux v2 was a complete rewrite—the original Flux was a monolith; Flux v2 is a toolkit of specialized controllers
- Flux can reconcile 1000+ resources per second—its controller architecture makes it extremely efficient
- Flux is the only CNCF graduated GitOps project—ArgoCD is also CNCF but at incubating stage (as of 2024)
Flux Architecture
Section titled “Flux Architecture”┌─────────────────────────────────────────────────────────────────┐│ FLUX ARCHITECTURE │├─────────────────────────────────────────────────────────────────┤│ ││ GIT REPOSITORY ││ ┌──────────────────────────────────────────────────────────┐ ││ │ clusters/ │ ││ │ ├── production/ │ ││ │ │ ├── flux-system/ (Flux components) │ ││ │ │ └── apps/ (Applications) │ ││ │ └── staging/ │ ││ └────────────────────────────┬─────────────────────────────┘ ││ │ ││ ▼ ││ ┌──────────────────────────────────────────────────────────┐ ││ │ FLUX CONTROLLERS (GitOps Toolkit) │ ││ │ │ ││ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ ││ │ │ Source │ │ Kustomize │ │ Helm │ │ ││ │ │ Controller │ │ Controller │ │ Controller │ │ ││ │ │ │ │ │ │ │ │ ││ │ │ Fetches Git, │ │ Applies │ │ Manages Helm │ │ ││ │ │ Helm repos, │ │ Kustomize │ │ releases │ │ ││ │ │ S3, OCI │ │ overlays │ │ │ │ ││ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ ││ │ │ │ │ │ ││ │ └─────────────────┴─────────────────┘ │ ││ │ │ │ ││ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ ││ │ │ Notification │ │ Image │ │ Image │ │ ││ │ │ Controller │ │ Reflector │ │ Automation │ │ ││ │ │ │ │ │ │ │ │ ││ │ │ Slack, Teams │ │ Scans │ │ Updates Git │ │ ││ │ │ alerts │ │ registries │ │ with new │ │ ││ │ │ │ │ for tags │ │ image tags │ │ ││ │ └──────────────┘ └──────────────┘ └──────────────┘ │ ││ └──────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌──────────────────────────────────────────────────────────┐ ││ │ KUBERNETES CLUSTER │ ││ │ │ ││ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ ││ │ │ Deploy │ │ Service │ │ Helm │ │ Config │ │ ││ │ │ │ │ │ │ Release │ │ Map │ │ ││ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ ││ └──────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Core Controllers
Section titled “Core Controllers”| Controller | CRD | Purpose |
|---|---|---|
| source-controller | GitRepository, HelmRepository, OCIRepository, Bucket | Fetches artifacts from external sources |
| kustomize-controller | Kustomization | Applies Kustomize overlays |
| helm-controller | HelmRelease | Manages Helm chart installations |
| notification-controller | Alert, Provider | Sends notifications to Slack, Teams, etc. |
| image-reflector-controller | ImageRepository, ImagePolicy | Scans registries for new tags |
| image-automation-controller | ImageUpdateAutomation | Commits image tag updates to Git |
Installing Flux
Section titled “Installing Flux”Bootstrap with CLI
Section titled “Bootstrap with CLI”# Install Flux CLIbrew install fluxcd/tap/flux # macOS# orcurl -s https://fluxcd.io/install.sh | sudo bash
# Check prerequisitesflux check --pre
# Bootstrap with GitHubflux bootstrap github \ --owner=my-org \ --repository=fleet-infra \ --branch=main \ --path=./clusters/my-cluster \ --personal
# Bootstrap with GitLabflux bootstrap gitlab \ --owner=my-group \ --repository=fleet-infra \ --branch=main \ --path=./clusters/my-clusterBootstrap Result
Section titled “Bootstrap Result”# Creates this structure in your repo:fleet-infra/└── clusters/ └── my-cluster/ └── flux-system/ ├── gotk-components.yaml # Flux controllers ├── gotk-sync.yaml # Self-management └── kustomization.yamlManual Installation
Section titled “Manual Installation”# Install all componentskubectl apply -f https://github.com/fluxcd/flux2/releases/latest/download/install.yaml
# Or specific componentsflux install \ --components=source-controller,kustomize-controller,helm-controller \ --export > flux-components.yamlSource Management
Section titled “Source Management”GitRepository
Section titled “GitRepository”apiVersion: source.toolkit.fluxcd.io/v1kind: GitRepositorymetadata: name: my-app namespace: flux-systemspec: interval: 1m # How often to check for updates url: https://github.com/org/my-app.git ref: branch: main # or tag: v1.0.0, semver: ">=1.0.0"
# For private repos secretRef: name: git-credentials
# Ignore certain paths ignore: | # Exclude all /* # Include only deploy folder !/deploy---# Secret for private reposapiVersion: v1kind: Secretmetadata: name: git-credentials namespace: flux-systemtype: Opaquedata: username: <base64> password: <base64> # Or use SSH key with 'identity' fieldHelmRepository
Section titled “HelmRepository”apiVersion: source.toolkit.fluxcd.io/v1kind: HelmRepositorymetadata: name: bitnami namespace: flux-systemspec: interval: 1h url: https://charts.bitnami.com/bitnami---# OCI Registry (Helm 3.8+)apiVersion: source.toolkit.fluxcd.io/v1beta2kind: HelmRepositorymetadata: name: podinfo namespace: flux-systemspec: interval: 1h url: oci://ghcr.io/stefanprodan/charts type: ociOCIRepository
Section titled “OCIRepository”apiVersion: source.toolkit.fluxcd.io/v1beta2kind: OCIRepositorymetadata: name: manifests namespace: flux-systemspec: interval: 5m url: oci://ghcr.io/org/manifests ref: tag: latest
# For private registries secretRef: name: oci-credentialsKustomization
Section titled “Kustomization”Basic Kustomization
Section titled “Basic Kustomization”apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: my-app namespace: flux-systemspec: interval: 10m retryInterval: 2m
sourceRef: kind: GitRepository name: my-app
path: ./deploy/overlays/production
prune: true # Delete removed resources wait: true # Wait for resources to be ready
healthChecks: - apiVersion: apps/v1 kind: Deployment name: my-app namespace: production
# Substitute variables postBuild: substitute: ENVIRONMENT: production CLUSTER_NAME: prod-us-east substituteFrom: - kind: ConfigMap name: cluster-config - kind: Secret name: cluster-secretsKustomization Dependencies
Section titled “Kustomization Dependencies”# Install cert-manager firstapiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: cert-manager namespace: flux-systemspec: interval: 10m sourceRef: kind: GitRepository name: infrastructure path: ./cert-manager---# Then install ingress (depends on cert-manager)apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: ingress namespace: flux-systemspec: interval: 10m dependsOn: - name: cert-manager # Wait for cert-manager sourceRef: kind: GitRepository name: infrastructure path: ./ingress---# Then install apps (depends on ingress)apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: apps namespace: flux-systemspec: interval: 10m dependsOn: - name: ingress # Wait for ingress sourceRef: kind: GitRepository name: apps path: ./productionHelmRelease
Section titled “HelmRelease”Basic HelmRelease
Section titled “Basic HelmRelease”apiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: nginx namespace: webspec: interval: 10m chart: spec: chart: nginx version: "15.x" # Semver range sourceRef: kind: HelmRepository name: bitnami namespace: flux-system interval: 1h
values: replicaCount: 3 service: type: ClusterIP
# Or from ConfigMap/Secret valuesFrom: - kind: ConfigMap name: nginx-values valuesKey: values.yamlHelmRelease with Dependencies
Section titled “HelmRelease with Dependencies”apiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: wordpress namespace: blogspec: interval: 10m dependsOn: - name: mysql # Wait for MySQL to be ready namespace: database chart: spec: chart: wordpress version: "18.x" sourceRef: kind: HelmRepository name: bitnami namespace: flux-system
values: externalDatabase: host: mysql.database.svc.cluster.local database: wordpress mariadb: enabled: falseHelmRelease from Git
Section titled “HelmRelease from Git”apiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: my-app namespace: productionspec: interval: 10m chart: spec: chart: ./charts/my-app # Path to chart in repo sourceRef: kind: GitRepository name: my-app namespace: flux-systemImage Automation
Section titled “Image Automation”Automated Image Updates
Section titled “Automated Image Updates”┌─────────────────────────────────────────────────────────────────┐│ IMAGE AUTOMATION FLOW │├─────────────────────────────────────────────────────────────────┤│ ││ 1. CI builds and pushes new image ││ ┌─────────┐ ┌─────────────────┐ ││ │ CI │─────▶│ Container │ ││ │ Build │ │ Registry │ ││ └─────────┘ │ myapp:v1.2.3 │ ││ └────────┬────────┘ ││ │ ││ 2. Image Reflector scans │ ││ ┌─────────────────────────▼──────────────────────────┐ ││ │ ImageRepository │ ││ │ Scans registry every 1m │ ││ │ Finds new tag: v1.2.3 │ ││ └─────────────────────────┬──────────────────────────┘ ││ │ ││ 3. Image Policy selects │ ││ ┌─────────────────────────▼──────────────────────────┐ ││ │ ImagePolicy │ ││ │ Policy: semver, filter: ^v1\.2\.x │ ││ │ Selected: v1.2.3 │ ││ └─────────────────────────┬──────────────────────────┘ ││ │ ││ 4. Image Automation updates │ ││ ┌─────────────────────────▼──────────────────────────┐ ││ │ ImageUpdateAutomation │ ││ │ Updates deployment.yaml in Git │ ││ │ Commits: "Update myapp to v1.2.3" │ ││ └─────────────────────────┬──────────────────────────┘ ││ │ ││ 5. Flux syncs change │ ││ ┌─────────────────────────▼──────────────────────────┐ ││ │ Kustomization │ ││ │ Detects Git change, applies new manifest │ ││ └──────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Configuration
Section titled “Configuration”# 1. ImageRepository - Scans container registryapiVersion: image.toolkit.fluxcd.io/v1beta2kind: ImageRepositorymetadata: name: my-app namespace: flux-systemspec: image: ghcr.io/org/my-app interval: 1m secretRef: name: registry-credentials---# 2. ImagePolicy - Selects which tags to useapiVersion: image.toolkit.fluxcd.io/v1beta2kind: ImagePolicymetadata: name: my-app namespace: flux-systemspec: imageRepositoryRef: name: my-app policy: semver: range: ">=1.0.0" # Use any 1.x.x or higher---# Or use alphabetical for date-based tagsapiVersion: image.toolkit.fluxcd.io/v1beta2kind: ImagePolicymetadata: name: my-app-dev namespace: flux-systemspec: imageRepositoryRef: name: my-app policy: alphabetical: order: desc # Latest date first filterTags: pattern: '^main-[a-f0-9]+-(?P<ts>.*)' extract: '$ts'---# 3. ImageUpdateAutomation - Commits updatesapiVersion: image.toolkit.fluxcd.io/v1beta2kind: ImageUpdateAutomationmetadata: name: my-app namespace: flux-systemspec: interval: 30m sourceRef: kind: GitRepository name: fleet-infra
git: checkout: ref: branch: main commit: author: name: fluxcdbot email: flux@example.com messageTemplate: | Update image to {{range .Updated.Images}}{{println .}}{{end}} push: branch: main
update: path: ./clusters/production strategy: SettersMarking Images for Update
Section titled “Marking Images for Update”# In your deployment.yaml, add markersapiVersion: apps/v1kind: Deploymentmetadata: name: my-appspec: template: spec: containers: - name: app image: ghcr.io/org/my-app:v1.0.0 # {"$imagepolicy": "flux-system:my-app"}Notifications
Section titled “Notifications”Slack Notifications
Section titled “Slack Notifications”# Provider - Where to sendapiVersion: notification.toolkit.fluxcd.io/v1beta3kind: Providermetadata: name: slack namespace: flux-systemspec: type: slack channel: gitops-alerts secretRef: name: slack-webhook---apiVersion: v1kind: Secretmetadata: name: slack-webhook namespace: flux-systemdata: address: <base64-encoded-webhook-url>---# Alert - What to sendapiVersion: notification.toolkit.fluxcd.io/v1beta3kind: Alertmetadata: name: all-alerts namespace: flux-systemspec: providerRef: name: slack eventSeverity: info eventSources: - kind: GitRepository name: "*" - kind: Kustomization name: "*" - kind: HelmRelease name: "*"GitHub Commit Status
Section titled “GitHub Commit Status”apiVersion: notification.toolkit.fluxcd.io/v1beta3kind: Providermetadata: name: github namespace: flux-systemspec: type: github address: https://github.com/org/repo secretRef: name: github-token---apiVersion: notification.toolkit.fluxcd.io/v1beta3kind: Alertmetadata: name: sync-status namespace: flux-systemspec: providerRef: name: github eventSources: - kind: Kustomization name: appsMulti-Cluster Management
Section titled “Multi-Cluster Management”Repository Structure
Section titled “Repository Structure”fleet-infra/├── clusters/│ ├── production/│ │ ├── flux-system/│ │ │ └── gotk-sync.yaml│ │ └── apps.yaml # Points to apps/production│ ├── staging/│ │ ├── flux-system/│ │ │ └── gotk-sync.yaml│ │ └── apps.yaml # Points to apps/staging│ └── dev/│ └── ...├── infrastructure/│ ├── base/ # Shared infra (cert-manager, ingress)│ ├── production/ # Prod-specific configs│ └── staging/└── apps/ ├── base/ # App definitions ├── production/ # Prod overlays └── staging/ # Staging overlaysCluster Kustomization
Section titled “Cluster Kustomization”apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: apps namespace: flux-systemspec: interval: 10m sourceRef: kind: GitRepository name: flux-system path: ./apps/production prune: true
# Cluster-specific substitutions postBuild: substitute: CLUSTER_NAME: prod-us-east ENVIRONMENT: productionFlux vs ArgoCD
Section titled “Flux vs ArgoCD”┌─────────────────────────────────────────────────────────────────┐│ FLUX vs ARGOCD │├─────────────────────────────────────────────────────────────────┤│ ││ FLUX ARGOCD ││ ──── ────── ││ ││ Architecture: Architecture: ││ • Toolkit of controllers • Monolithic application ││ • CLI-driven • Web UI-driven ││ • GitOps-native only • GitOps + traditional ││ ││ Strengths: Strengths: ││ • Simpler to extend • Beautiful UI ││ • Image automation built-in • Easier onboarding ││ • OCI artifacts native • Rich RBAC/SSO ││ • Lower resource usage • Diff visualization ││ ││ Best for: Best for: ││ • Platform teams • Application teams ││ • Automation-first • UI-first ││ • Multi-cluster at scale • Developer self-service ││ ││ Philosophy: Philosophy: ││ "Everything is a CR" "Applications are first-class"││ │└─────────────────────────────────────────────────────────────────┘
RECOMMENDATION:• Small team, wants UI → ArgoCD• Platform team, automation-heavy → Flux• Both work great, pick one and master itCommon Mistakes
Section titled “Common Mistakes”| Mistake | Why It’s Bad | Better Approach |
|---|---|---|
| No dependencies | Resources apply in random order | Use dependsOn for order |
| Missing healthChecks | Flux doesn’t wait for readiness | Add deployment health checks |
| Hardcoded values | Can’t reuse across environments | Use postBuild.substitute |
| No prune | Orphaned resources accumulate | Enable prune: true |
| Long intervals | Slow to detect changes | 1m for git, 10m for apps |
| No notifications | Silent failures | Set up Slack/Teams alerts |
War Story: The $2.1 Million Substitution Surprise
Section titled “War Story: The $2.1 Million Substitution Surprise”A fintech company managing 8 clusters across 3 regions used Flux’s postBuild.substitute to inject environment-specific values. Their setup worked flawlessly for 18 months—until it didn’t.
They had a core deployment template with:
replicas: ${REPLICAS}resources: limits: memory: ${MEMORY_LIMIT} cpu: ${CPU_LIMIT}In production, all variables were defined in a ConfigMap: REPLICAS=5, MEMORY_LIMIT=2Gi, CPU_LIMIT=1000m. But when they added a new cluster in Asia-Pacific, a junior engineer copied the cluster bootstrap but forgot to copy the ConfigMap.
The substitutions kept the literal strings. Kubernetes rejected replicas: ${REPLICAS} as an invalid integer—but only for new deployments. Existing deployments kept running, masking the problem. The Kustomization showed Applied successfully because the YAML was syntactically valid.
THE INCIDENT TIMELINE─────────────────────────────────────────────────────────────────Day 1, 09:00 AM New APAC cluster bootstrapped with FluxDay 1, 09:15 AM Kustomization reports "Ready" (no health checks configured)Day 1-14 Cluster appears healthy, no new deploymentsDay 15, 03:00 AM Scheduled maintenance deploys new version across all clustersDay 15, 03:05 AM APAC deployment fails: "replicas: ${REPLICAS} is not valid"Day 15, 03:05 AM APAC cluster has zero running pods (old pods terminated)Day 15, 03:06 AM PagerDuty alerts: APAC region completely downDay 15, 03:45 AM Root cause identified: missing ConfigMapDay 15, 04:15 AM ConfigMap applied, services restoredDay 15, 04:15 AM 70 minutes of complete APAC downtimeFinancial Impact:
INCIDENT COST BREAKDOWN─────────────────────────────────────────────────────────────────APAC revenue during outage (70 min): $1,450,000 × 100% traffic loss = $1,450,000 lost revenue
SLA violation penalties: - Enterprise customers (12) = $180,000 - Contractual credits = $95,000
War room costs: - 8 engineers × 4 hours × $150/hr = $4,800 - Executive escalation = $25,000
Regulatory review (financial services): = $150,000Post-incident audit: = $75,000
Customer churn (attributed): = $125,000
TOTAL COST: $2,104,800─────────────────────────────────────────────────────────────────The Fix—Defense in Depth:
# 1. Make ConfigMap mandatorypostBuild: substituteFrom: - kind: ConfigMap name: cluster-config optional: false # ← Fail if missing
# 2. Add health checks to catch silent failureshealthChecks: - apiVersion: apps/v1 kind: Deployment name: api-gateway namespace: production - apiVersion: apps/v1 kind: Deployment name: payment-service namespace: production
# 3. Add timeout to prevent indefinite waitingtimeout: 5mAdditional Safeguards Added:
# Pre-bootstrap validation script#!/bin/bashREQUIRED_CMS="cluster-config cluster-secrets"for cm in $REQUIRED_CMS; do if ! kubectl get configmap $cm -n flux-system &>/dev/null; then echo "ERROR: Missing required ConfigMap: $cm" exit 1 fidone
# Notification on Kustomization failureapiVersion: notification.toolkit.fluxcd.io/v1beta3kind: Alertmetadata: name: reconciliation-failuresspec: providerRef: name: pagerduty eventSeverity: error eventSources: - kind: Kustomization name: "*"Lessons Learned:
- Never trust “Applied successfully”—it only means YAML was valid, not that apps are healthy
- Always use
optional: falsefor required substitutions - Always add healthChecks—they’re the only way to know if deployments actually work
- Validate new clusters before production traffic with a checklist
- Alert on Kustomization failures, not just successes
Question 1
Section titled “Question 1”What’s the main architectural difference between Flux and ArgoCD?
Show Answer
Flux: A toolkit of independent controllers (source-controller, kustomize-controller, helm-controller, etc.). Each controller manages specific CRDs and can be installed independently. Configuration is entirely through Kubernetes resources.
ArgoCD: A monolithic application with a web UI, API server, and backend. It’s installed as a single unit and has its own Application CRD. Configuration can be through UI, CLI, or CRDs.
Flux is more composable and automation-friendly. ArgoCD is more user-friendly with better visualization. Both achieve the same GitOps outcomes.
Question 2
Section titled “Question 2”How does Flux’s image automation work?
Show Answer
Three components work together:
-
ImageRepository: Scans a container registry at intervals, finds all available tags
-
ImagePolicy: Selects which tag to use based on policy (semver, alphabetical, numerical)
-
ImageUpdateAutomation: Updates YAML files in Git with the selected tag and commits the change
The automation requires markers in your YAML:
image: myapp:v1.0.0 # {"$imagepolicy": "flux-system:my-app"}This closes the GitOps loop: CI pushes image → Flux updates Git → Flux applies from Git.
Question 3
Section titled “Question 3”You have three Kustomizations: cert-manager, ingress, and apps. Apps depends on ingress, ingress depends on cert-manager. How would you configure this?
Show Answer
apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: cert-managerspec: # No dependencies, runs first path: ./cert-manager---apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: ingressspec: dependsOn: - name: cert-manager path: ./ingress---apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: appsspec: dependsOn: - name: ingress path: ./appsFlux will:
- Apply cert-manager and wait for it to be healthy
- Then apply ingress and wait for it to be healthy
- Then apply apps
Question 4
Section titled “Question 4”Your Flux reconciliation is failing. What commands would you use to debug?
Show Answer
# Check overall Flux healthflux check
# See all Flux resources and their statusflux get all
# Specific resource statusflux get sources gitflux get kustomizationsflux get helmreleases
# Detailed info on a failing resourceflux get kustomization my-app -o wide
# View events and conditionskubectl describe kustomization my-app -n flux-system
# View controller logsflux logs --kind=Kustomization --name=my-app
# Force immediate reconciliationflux reconcile kustomization my-app --with-source
# Suspend to stop reconciliation during debuggingflux suspend kustomization my-appflux resume kustomization my-appQuestion 5
Section titled “Question 5”Your organization needs to deploy the same application to 15 clusters with minor variations (different domains, replica counts). Compare how you’d approach this in Flux vs ArgoCD.
Show Answer
Flux Approach:
# Base Kustomization with substitutionsapiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: my-app namespace: flux-systemspec: interval: 10m sourceRef: kind: GitRepository name: apps path: ./my-app/base prune: true
postBuild: substituteFrom: - kind: ConfigMap name: cluster-config # Each cluster has its ownEach cluster’s cluster-config:
apiVersion: v1kind: ConfigMapmetadata: name: cluster-configdata: CLUSTER_NAME: prod-us-east-1 DOMAIN: us-east.example.com REPLICAS: "5"ArgoCD Approach:
# ApplicationSet with cluster generatorapiVersion: argoproj.io/v1alpha1kind: ApplicationSetmetadata: name: my-appspec: generators: - clusters: selector: matchLabels: env: production template: spec: source: helm: parameters: - name: domain value: "{{metadata.labels.domain}}" - name: replicas value: "{{metadata.labels.replicas}}"Comparison:
| Aspect | Flux | ArgoCD |
|---|---|---|
| Config location | ConfigMaps per cluster | Cluster labels or Git files |
| Scaling | Add ConfigMap to new cluster | Register cluster with labels |
| Visibility | flux get kustomizations | UI shows all ApplicationSet instances |
| Flexibility | Full Kustomize power | Generator types (cluster, git, list) |
Recommendation for 15 clusters:
- Flux: Better if clusters have complex, unique configurations
- ArgoCD: Better if variations are simple and you want UI visibility
Question 6
Section titled “Question 6”You’re implementing image automation for a development environment. You want to auto-deploy any image tagged with the git commit SHA from the develop branch. Write the ImageRepository, ImagePolicy, and ImageUpdateAutomation configuration.
Show Answer
# 1. ImageRepository - Scan the registryapiVersion: image.toolkit.fluxcd.io/v1beta2kind: ImageRepositorymetadata: name: my-app-dev namespace: flux-systemspec: image: ghcr.io/myorg/my-app interval: 1m secretRef: name: ghcr-credentials---# 2. ImagePolicy - Select develop branch buildsapiVersion: image.toolkit.fluxcd.io/v1beta2kind: ImagePolicymetadata: name: my-app-dev namespace: flux-systemspec: imageRepositoryRef: name: my-app-dev
# Match tags like: develop-abc123f-1702234567 filterTags: pattern: '^develop-[a-f0-9]+-(?P<ts>[0-9]+)$' extract: '$ts'
policy: numerical: order: asc # Highest timestamp wins---# 3. ImageUpdateAutomation - Update GitapiVersion: image.toolkit.fluxcd.io/v1beta2kind: ImageUpdateAutomationmetadata: name: my-app-dev namespace: flux-systemspec: interval: 5m sourceRef: kind: GitRepository name: fleet-infra
git: checkout: ref: branch: main commit: author: name: flux-bot email: flux@example.com messageTemplate: | [dev] Auto-update {{ .AutomationObject }}
Images: {{ range .Updated.Images }} - {{ . }} {{ end }} push: branch: main
update: path: ./clusters/development strategy: SettersIn your deployment YAML, add the marker:
spec: containers: - name: app image: ghcr.io/myorg/my-app:develop-abc123f-1702234567 # {"$imagepolicy": "flux-system:my-app-dev"}Tag pattern explained:
develop- branch name prefix[a-f0-9]+- git commit SHA(?P<ts>[0-9]+)- named capture group for timestamp- Numerical policy sorts by extracted timestamp, picking latest
Question 7
Section titled “Question 7”Calculate the resource requirements for Flux controllers managing 50 GitRepositories (checked every 1m), 100 Kustomizations, and 75 HelmReleases. What’s the expected API server load?
Show Answer
Controller Resource Estimation:
FLUX CONTROLLER RESOURCE REQUIREMENTS─────────────────────────────────────────────────────────────────BASE REQUIREMENTS (minimal installation):source-controller: 128Mi memory, 100m CPUkustomize-controller: 256Mi memory, 100m CPUhelm-controller: 256Mi memory, 100m CPUnotification-controller: 64Mi memory, 50m CPU ─────────────────────────Base total: 704Mi memory, 350m CPU
SCALING FACTORS:─────────────────────────────────────────────────────────────────GitRepositories (50 × 1m interval): - Each git fetch: ~5MB memory spike during clone - Concurrent fetches: 2 (default) - Memory buffer: 50 × 2MB = 100Mi - Add: +128Mi to source-controller
Kustomizations (100): - Each reconcile: ~10MB for manifest processing - Concurrent reconciles: 4 (default) - Memory buffer: 4 × 10MB = 40Mi - Add: +256Mi to kustomize-controller
HelmReleases (75): - Each release: ~20MB for chart rendering - Concurrent releases: 2 (default) - Memory buffer: 2 × 20MB = 40Mi - Add: +256Mi to helm-controller
RECOMMENDED PRODUCTION LIMITS:─────────────────────────────────────────────────────────────────source-controller: 512Mi memory, 500m CPUkustomize-controller: 768Mi memory, 500m CPUhelm-controller: 768Mi memory, 500m CPUnotification-controller: 128Mi memory, 100m CPU ─────────────────────────Total: 2176Mi memory, 1600m CPUAPI Server Load Calculation:
API SERVER CALLS PER MINUTE─────────────────────────────────────────────────────────────────Source reconciliation: 50 GitRepositories × 1/min × 3 API calls = 150 calls/min (status update, event, artifact update)
Kustomization reconciliation: 100 Kustomizations × 1/10min × 15 API calls = 150 calls/min (get manifests, apply each resource, status)
HelmRelease reconciliation: 75 HelmReleases × 1/10min × 10 API calls = 75 calls/min (get chart, render, apply, status)
Informer watches (constant): ~20 watches × heartbeat = minimal
TOTAL: ~375 API calls/minute (~6 calls/second)─────────────────────────────────────────────────────────────────
This is VERY LOW for a Kubernetes API server.Typical API servers handle 1000+ calls/second easily.Optimization Tips:
# If API server load becomes concern:spec: interval: 5m # Increase from 1m (reduces load 5x) retryInterval: 1m # Keep retry fast for failures
# Reduce concurrent operations:# In controller deployment args:--concurrent=2 # Default is 4 for kustomize-controllerQuestion 8
Section titled “Question 8”Your Kustomization is stuck in “Not Ready” with the message “dependency ‘flux-system/cert-manager’ is not ready”. The cert-manager Kustomization shows “Applied successfully”. What’s wrong and how do you fix it?
Show Answer
The Problem:
“Applied successfully” means manifests were sent to the API server, but it does NOT mean resources are healthy. The dependent Kustomization waits for the dependency to be Ready, not just Applied.
Root Cause Investigation:
# Check cert-manager Kustomization statusflux get kustomization cert-manager -o wide
# Look for the actual statuskubectl get kustomization cert-manager -n flux-system -o yaml
# Common findings:# - Status shows "Applied" but conditions show issues# - Health checks are failing# - Resources created but pods not runningCommon Causes:
- No health checks defined (most common):
# BAD: No health checks, "Ready" based only on apply successapiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: cert-managerspec: # ... no healthChecks- Pods failing to start:
# cert-manager pods might be CrashLoopingkubectl get pods -n cert-managerkubectl logs -n cert-manager -l app=cert-manager- CRDs not yet available:
# cert-manager CRDs might not be registered yetkubectl get crds | grep cert-managerThe Fix:
# Add health checks to cert-manager KustomizationapiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: cert-manager namespace: flux-systemspec: interval: 10m sourceRef: kind: GitRepository name: infrastructure path: ./cert-manager prune: true wait: true # ← Wait for resources to be ready
healthChecks: - apiVersion: apps/v1 kind: Deployment name: cert-manager namespace: cert-manager - apiVersion: apps/v1 kind: Deployment name: cert-manager-cainjector namespace: cert-manager - apiVersion: apps/v1 kind: Deployment name: cert-manager-webhook namespace: cert-manager
timeout: 5m # Fail if not healthy within 5 minutesDebugging Commands:
# Force reconciliation with source refreshflux reconcile kustomization cert-manager --with-source
# Watch the reconciliation progressflux get kustomization cert-manager --watch
# Check what's blockingkubectl describe kustomization cert-manager -n flux-system | grep -A 20 "Conditions"
# If cert-manager pods are the issuekubectl get events -n cert-manager --sort-by='.lastTimestamp'Key Insight: Always add healthChecks for any Kustomization that other resources depend on. Without them, Flux considers a Kustomization “Ready” as soon as kubectl apply succeeds, even if the actual pods never start.
Hands-On Exercise
Section titled “Hands-On Exercise”Scenario: GitOps with Flux
Section titled “Scenario: GitOps with Flux”Bootstrap Flux and deploy an application with image automation.
# Create kind clusterkind create cluster --name flux-lab
# Check Flux prerequisitesflux check --pre
# Since we don't have a real Git repo, we'll use local manifests# Install Flux controllers onlyflux installDeploy Application Manually (Simulated GitOps)
Section titled “Deploy Application Manually (Simulated GitOps)”# Create a GitRepository pointing to a public repoflux create source git podinfo \ --url=https://github.com/stefanprodan/podinfo \ --branch=master \ --interval=1m \ --export > podinfo-source.yaml
kubectl apply -f podinfo-source.yaml
# Create Kustomization to deploy podinfoflux create kustomization podinfo \ --source=GitRepository/podinfo \ --path="./kustomize" \ --prune=true \ --interval=10m \ --export > podinfo-kustomization.yaml
kubectl apply -f podinfo-kustomization.yamlVerify Deployment
Section titled “Verify Deployment”# Check Flux resourcesflux get sources gitflux get kustomizations
# Check deployed podskubectl get pods -A | grep podinfo
# Watch reconciliationflux get kustomizations --watchDeploy a HelmRelease
Section titled “Deploy a HelmRelease”# Add Bitnami repositoryflux create source helm bitnami \ --url=https://charts.bitnami.com/bitnami \ --interval=1h \ --export > bitnami-source.yaml
kubectl apply -f bitnami-source.yaml
# Deploy NGINX via Helmcat <<EOF | kubectl apply -f -apiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: nginx namespace: defaultspec: interval: 10m chart: spec: chart: nginx version: "15.x" sourceRef: kind: HelmRepository name: bitnami namespace: flux-system values: replicaCount: 2 service: type: ClusterIPEOF
# Check HelmRelease statusflux get helmreleasesSuspend and Resume
Section titled “Suspend and Resume”# Suspend reconciliationflux suspend kustomization podinfo
# Make a change (it won't be reverted)kubectl scale deployment podinfo --replicas=5
# Resume reconciliation (change will be reverted)flux resume kustomization podinfo
# Verify pods went back to original countkubectl get pods | grep podinfoSuccess Criteria
Section titled “Success Criteria”- Flux controllers are running
- GitRepository source is synced
- Kustomization applies manifests
- HelmRelease deploys chart
- Understand suspend/resume behavior
Cleanup
Section titled “Cleanup”kind delete cluster --name flux-labrm -f podinfo-*.yaml bitnami-source.yamlKey Takeaways
Section titled “Key Takeaways”Before moving on, ensure you can:
- Explain Flux’s toolkit architecture (source, kustomize, helm, notification controllers)
- Bootstrap Flux to a cluster with
flux bootstrap github/gitlab - Create GitRepository, HelmRepository, and OCIRepository sources
- Write Kustomizations with dependencies, health checks, and substitutions
- Configure HelmReleases with values from ConfigMaps and Secrets
- Set up image automation (ImageRepository, ImagePolicy, ImageUpdateAutomation)
- Configure Slack/Teams notifications for reconciliation events
- Debug failed reconciliations with
flux get,flux logs, andkubectl describe - Compare Flux vs ArgoCD trade-offs for different use cases
- Design multi-cluster GitOps with cluster-specific substitutions
Next Module
Section titled “Next Module”Continue to Module 2.4: Helm & Kustomize where we’ll dive deep into the package management tools that power GitOps.
“GitOps is not a tool, it’s a practice. Flux gives you the toolkit to practice it well.”