Module 10.8: Enterprise GitOps & Platform Engineering
Complexity: [COMPLEX] | Time to Complete: 2.5h | Prerequisites: GitOps Basics (ArgoCD/Flux), Kubernetes RBAC, Multi-Cloud Fleet Management (Module 10.5)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Design enterprise GitOps architectures using Argo CD or Flux with multi-tenant repository structures
- Implement progressive delivery pipelines with Argo Rollouts for canary and blue/green deployments at scale
- Configure GitOps promotion workflows across dev, staging, and production environments with approval gates
- Deploy GitOps-based platform engineering patterns that enable self-service application deployment for development teams
Why This Module Matters
Section titled “Why This Module Matters”In 2024, a major European bank had 160 development teams deploying to 42 Kubernetes clusters. They had adopted ArgoCD two years earlier, and it had been a success — at first. But as adoption grew, so did the chaos. Every team had their own ArgoCD instance. Some teams had 3 ArgoCD instances for dev, staging, and prod. The bank was running 87 ArgoCD installations, each configured slightly differently. When a critical ArgoCD CVE was announced, patching took 11 weeks because each instance was managed independently. Their GitOps had become “87 small GitOps islands” instead of one unified platform.
Meanwhile, the developer experience was deteriorating. A new team joining the bank needed to: (1) request a namespace from the platform team (3-day SLA), (2) configure ArgoCD manually by editing a YAML file in a shared repository (error-prone), (3) set up their own monitoring dashboards (copy-paste from another team’s config), and (4) figure out secrets management by asking on Slack (no documentation). The average time from “new team formed” to “first deployment to production” was 6 weeks. Their competitors using modern Internal Developer Platforms were onboarding teams in days.
Enterprise GitOps is not just “install ArgoCD.” It is the discipline of building a self-service platform where teams can deploy, operate, and observe their applications through Git workflows, without needing to understand the underlying infrastructure. In this module, you will learn how to build a Backstage-powered Internal Developer Platform, scale ArgoCD with ApplicationSets and App of Apps patterns, design multi-tenant Git repository strategies, implement RBAC for GitOps, and manage secrets in an enterprise GitOps workflow.
The Internal Developer Platform
Section titled “The Internal Developer Platform”An Internal Developer Platform (IDP) is the self-service layer that sits between developers and infrastructure. It codifies organizational standards into templates and workflows that make it easy to do the right thing and hard to do the wrong thing.
Stop and think: If a developer has to ask a platform engineer to create a namespace, is it a platform or just a ticketing system? How does GitOps change this dynamic?
What a Good IDP Provides
Section titled “What a Good IDP Provides”flowchart TD subgraph IDP [Internal Developer Platform] subgraph Backstage [Backstage] direction LR Catalog["Service Catalog<br/>('What exists')"] Templates["Software Templates<br/>('Create new')"] Docs["Tech Docs<br/>('How to')"] Search["Search<br/>('Find stuff')"] end Argo["GitOps Engine<br/>(ArgoCD)"]
Backstage --> Argo Argo --> Dev["Dev Cluster"] Argo --> Staging["Staging Cluster"] Argo --> Prod["Prod Cluster"] endBackstage for Platform Engineering
Section titled “Backstage for Platform Engineering”Backstage provides four core capabilities that transform GitOps from “YAML editing in Git” to a self-service platform:
apiVersion: scaffolder.backstage.io/v1beta3kind: Templatemetadata: name: create-microservice title: Create a New Microservice description: | Provision a new microservice with CI/CD, monitoring, ArgoCD deployment, and Backstage catalog entry.spec: owner: platform-team type: service
parameters: - title: Service Details required: - serviceName - team - description properties: serviceName: title: Service Name type: string pattern: '^[a-z][a-z0-9-]{2,30}$' description: "Lowercase letters, numbers, hyphens. 3-31 characters." team: title: Owning Team type: string enum: ['payments', 'identity', 'notifications', 'search', 'platform'] description: title: Description type: string maxLength: 200
- title: Technical Configuration properties: language: title: Language type: string enum: ['go', 'python', 'typescript', 'java'] default: 'go' deployTarget: title: Deployment Target type: string enum: ['eks-prod', 'aks-prod', 'gke-prod'] default: 'eks-prod' requiresDatabase: title: Needs Database? type: boolean default: false publicFacing: title: Internet-Facing? type: boolean default: false
steps: - id: scaffold name: Generate Service Scaffold action: fetch:template input: url: ./skeletons/${{ parameters.language }} values: serviceName: ${{ parameters.serviceName }} team: ${{ parameters.team }} deployTarget: ${{ parameters.deployTarget }}
- id: create-repo name: Create GitHub Repository action: publish:github input: repoUrl: github.com?owner=company-services&repo=${{ parameters.serviceName }} defaultBranch: main protectDefaultBranch: true requireCodeOwnerReviews: true
- id: create-argocd-app name: Register with ArgoCD action: argocd:create-resources input: appName: ${{ parameters.serviceName }} projectName: ${{ parameters.team }} repoURL: ${{ steps['create-repo'].output.remoteUrl }} path: k8s/overlays/production destServer: ${{ parameters.deployTarget }} destNamespace: ${{ parameters.team }}
- id: register-catalog name: Register in Backstage Catalog action: catalog:register input: repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }} catalogInfoPath: /catalog-info.yaml
output: links: - title: Repository url: ${{ steps['create-repo'].output.remoteUrl }} - title: ArgoCD Application url: https://argocd.company.com/applications/${{ parameters.serviceName }}Scaling ArgoCD for Enterprise
Section titled “Scaling ArgoCD for Enterprise”The Problem with One ArgoCD Per Team
Section titled “The Problem with One ArgoCD Per Team”BAD: 87 ArgoCD instances - 87 separate upgrades when CVEs are announced - 87 different configurations to maintain - No cross-team visibility - Massive resource waste (each instance runs 3+ pods)
GOOD: 1-3 centralized ArgoCD instances - Centralized patching and configuration - Multi-tenant RBAC via ArgoCD Projects - Cross-team visibility and governance - Resource efficientApp of Apps Pattern
Section titled “App of Apps Pattern”The App of Apps pattern uses a root ArgoCD Application that manages other Applications. This creates a hierarchy where one top-level Application bootstraps the entire platform.
apiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: platform-root namespace: argocdspec: project: platform source: repoURL: https://github.com/company/platform-config.git targetRevision: main path: apps destination: server: https://kubernetes.default.svc namespace: argocd syncPolicy: automated: prune: true selfHeal: trueplatform-config/├── apps/ # Root Application points here│ ├── monitoring.yaml # Application for monitoring stack│ ├── logging.yaml # Application for logging stack│ ├── cert-manager.yaml # Application for cert-manager│ ├── kyverno.yaml # Application for policy engine│ ├── team-payments.yaml # ApplicationSet for payments team│ ├── team-identity.yaml # ApplicationSet for identity team│ └── team-search.yaml # ApplicationSet for search team│├── platform/ # Platform services configs│ ├── monitoring/│ ├── logging/│ ├── cert-manager/│ └── kyverno/│└── teams/ # Team-specific configs ├── payments/ ├── identity/ └── search/ApplicationSets: Dynamic Application Generation
Section titled “ApplicationSets: Dynamic Application Generation”ApplicationSets generate ArgoCD Applications dynamically based on templates and generators. They are the key to scaling from tens to hundreds of applications.
Pause and predict: If you have 100 microservices, writing 100 Application manifests is tedious. How might ArgoCD automate the creation of these manifests?
# Generator: Create an Application for every directory in a Git repoapiVersion: argoproj.io/v1alpha1kind: ApplicationSetmetadata: name: team-payments-services namespace: argocdspec: generators: - git: repoURL: https://github.com/company/payments-services.git revision: main directories: - path: services/* template: metadata: name: 'payments-{{path.basename}}' labels: team: payments spec: project: payments source: repoURL: https://github.com/company/payments-services.git targetRevision: main path: '{{path}}/k8s/overlays/production' destination: server: https://kubernetes.default.svc namespace: payments syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true - ServerSideApply=true retry: limit: 3 backoff: duration: 5s factor: 2 maxDuration: 3m# Generator: Deploy to every cluster with matching labelsapiVersion: argoproj.io/v1alpha1kind: ApplicationSetmetadata: name: monitoring-fleet namespace: argocdspec: generators: - clusters: selector: matchLabels: monitoring: enabled template: metadata: name: 'monitoring-{{name}}' spec: project: platform source: repoURL: https://github.com/company/platform-config.git targetRevision: main path: platform/monitoring helm: valueFiles: - 'values-{{metadata.labels.environment}}.yaml' destination: server: '{{server}}' namespace: monitoring syncPolicy: automated: prune: true selfHeal: true# Generator: Matrix - combine clusters with servicesapiVersion: argoproj.io/v1alpha1kind: ApplicationSetmetadata: name: platform-services-matrix namespace: argocdspec: generators: - matrix: generators: - clusters: selector: matchLabels: environment: production - list: elements: - service: kyverno path: platform/kyverno - service: falco path: platform/falco - service: otel-collector path: platform/otel-collector template: metadata: name: '{{service}}-{{name}}' spec: project: platform source: repoURL: https://github.com/company/platform-config.git targetRevision: main path: '{{path}}' destination: server: '{{server}}' namespace: '{{service}}-system'Multi-Tenant Git Repository Strategy
Section titled “Multi-Tenant Git Repository Strategy”How you organize Git repositories determines how scalable and maintainable your GitOps platform is. There are three common patterns.
Stop and think: In a mono-repo, a syntax error in the root directory might break the platform. In a repo-per-team, how do you enforce a new security policy across 50 repositories?
Pattern 1: Mono-Repo
Section titled “Pattern 1: Mono-Repo”company-k8s/├── platform/ # Platform team owns this│ ├── monitoring/│ ├── logging/│ └── policy/├── teams/│ ├── payments/ # Payments team owns this subtree│ │ ├── service-a/│ │ ├── service-b/│ │ └── service-c/│ ├── identity/ # Identity team owns this subtree│ │ ├── auth-service/│ │ └── user-service/│ └── search/└── CODEOWNERS # Enforce ownership via GitHub CODEOWNERSPros: Single source of truth. Cross-cutting changes in one PR. Easy to search. Cons: Large repos are slow. CODEOWNERS is the only access control. One team’s bad merge affects everyone.
Pattern 2: Repo Per Team
Section titled “Pattern 2: Repo Per Team”company/├── platform-config/ # Platform team├── payments-k8s/ # Payments team├── identity-k8s/ # Identity team├── search-k8s/ # Search team└── fleet-config/ # ArgoCD configuration (platform team)Pros: Team autonomy. Clean access control per repo. Independent merge queues. Cons: Cross-cutting changes require PRs to multiple repos. Harder to enforce consistency.
Pattern 3: Hybrid (Recommended for Enterprise)
Section titled “Pattern 3: Hybrid (Recommended for Enterprise)”# Central platform repo (platform team owns)platform-config/├── argocd/ # App of Apps, ApplicationSets├── platform-services/ # Monitoring, logging, policy├── cluster-configs/ # Per-cluster configurations└── team-onboarding/ # Templates for new team repos
# Per-team repos (teams own their own)payments-k8s/├── services/│ ├── payment-processor/│ │ ├── base/│ │ └── overlays/│ │ ├── dev/│ │ ├── staging/│ │ └── production/│ └── invoice-service/├── shared/│ ├── network-policies/│ └── resource-quotas/└── catalog-info.yaml # Backstage catalog entry# CODEOWNERS for team repo# payments-k8s/.github/CODEOWNERS
# Platform team must approve changes to shared configs/shared/ @company/platform-team
# Payments team owns their services/services/ @company/payments-team
# Production overlays require senior review/services/*/overlays/production/ @company/payments-leads @company/platform-teamRBAC for Enterprise GitOps
Section titled “RBAC for Enterprise GitOps”ArgoCD’s RBAC system uses Projects to isolate teams and control what they can deploy, where they can deploy it, and which Git repos they can use.
ArgoCD Project per Team
Section titled “ArgoCD Project per Team”apiVersion: argoproj.io/v1alpha1kind: AppProjectmetadata: name: payments namespace: argocdspec: description: "Payments team services"
# Which repos can this project use? sourceRepos: - 'https://github.com/company/payments-k8s.git' - 'https://github.com/company/shared-charts.git'
# Where can this project deploy? destinations: - namespace: payments server: https://kubernetes.default.svc - namespace: payments server: https://eks-prod.company.internal - namespace: payments-staging server: https://eks-staging.company.internal
# What cluster resources can this project create? clusterResourceWhitelist: - group: '' kind: Namespace
# What namespaced resources are allowed? namespaceResourceWhitelist: - group: '' kind: '*' - group: apps kind: '*' - group: networking.k8s.io kind: '*'
# What resources are DENIED? namespaceResourceBlacklist: - group: '' kind: ResourceQuota # Only platform team sets quotas - group: '' kind: LimitRange # Only platform team sets limits
# Enforce signed commits signatureKeys: - keyID: "ABCDEF1234567890"
# Sync windows (no deployments during maintenance) syncWindows: - kind: deny schedule: '0 2 * * *' # Deny syncs between 2-6 AM duration: 4h applications: - '*' - kind: allow schedule: '0 6 * * 1-5' # Allow during business hours M-F duration: 14h applications: - '*'RBAC Policy for ArgoCD
Section titled “RBAC Policy for ArgoCD”# argocd-rbac-cm ConfigMap data# Format: p, <role>, <resource>, <action>, <project>/<object>, <allow/deny>
# Platform admins: full access to everythingp, role:platform-admin, applications, *, */*, allowp, role:platform-admin, clusters, *, *, allowp, role:platform-admin, repositories, *, *, allowp, role:platform-admin, projects, *, *, allow
# Payments team: manage their own apps, read-only on platformp, role:payments-team, applications, get, payments/*, allowp, role:payments-team, applications, sync, payments/*, allowp, role:payments-team, applications, action/*, payments/*, allowp, role:payments-team, applications, create, payments/*, allowp, role:payments-team, applications, delete, payments/*, allowp, role:payments-team, logs, get, payments/*, allowp, role:payments-team, exec, create, payments/*, denyp, role:payments-team, applications, get, platform/*, allow
# Read-only role for all teams (view platform apps)p, role:viewer, applications, get, */*, allowp, role:viewer, logs, get, */*, allow
# Map SSO groups to rolesg, company:platform-engineers, role:platform-adming, company:payments-developers, role:payments-teamg, company:all-engineers, role:viewerSecrets in Enterprise GitOps
Section titled “Secrets in Enterprise GitOps”The biggest challenge in GitOps is secrets: you cannot store plaintext secrets in Git, but GitOps requires everything to be in Git. Several solutions exist, each with different trade-offs.
Secrets Management Comparison
Section titled “Secrets Management Comparison”| Solution | How It Works | Pros | Cons |
|---|---|---|---|
| Sealed Secrets | Encrypt secrets with a cluster-specific key. Only the cluster can decrypt. | Simple, no external dependencies | Key per cluster, rotation is manual |
| External Secrets Operator (ESO) | Syncs secrets from AWS Secrets Manager, Azure Key Vault, HashiCorp Vault | External source of truth, centralized management | Dependency on external service |
| SOPS + age/KMS | Encrypt YAML values in-place in Git. Decrypt at sync time. | Secrets versioned in Git (encrypted), audit trail | Key management complexity |
| Vault Agent Injector | HashiCorp Vault injects secrets via sidecar | Rich policy engine, dynamic secrets | Vault is complex to operate, sidecar overhead |
| ArgoCD Vault Plugin | Decrypt/fetch secrets during ArgoCD sync | No sidecar, centralized vault | Tight coupling between ArgoCD and Vault |
External Secrets Operator (Recommended for Enterprise)
Section titled “External Secrets Operator (Recommended for Enterprise)”# Install ESO# helm install external-secrets external-secrets/external-secrets -n external-secrets --create-namespace
# Create a SecretStore that connects to AWS Secrets ManagerapiVersion: external-secrets.io/v1beta1kind: ClusterSecretStoremetadata: name: aws-secrets-managerspec: provider: aws: service: SecretsManager region: us-east-1 auth: jwt: serviceAccountRef: name: external-secrets-sa namespace: external-secrets
---# Create an ExternalSecret that syncs a specific secretapiVersion: external-secrets.io/v1beta1kind: ExternalSecretmetadata: name: database-credentials namespace: paymentsspec: refreshInterval: 1h secretStoreRef: name: aws-secrets-manager kind: ClusterSecretStore target: name: database-credentials creationPolicy: Owner data: - secretKey: DB_HOST remoteRef: key: payments/production/database property: host - secretKey: DB_PASSWORD remoteRef: key: payments/production/database property: password - secretKey: DB_USERNAME remoteRef: key: payments/production/database property: usernameSOPS for Git-Native Secrets
Section titled “SOPS for Git-Native Secrets”# Encrypt a secret with SOPS + AWS KMS# .sops.yaml in the repo root configures encryption rules
cat <<'EOF' > .sops.yamlcreation_rules: - path_regex: .*\.enc\.yaml$ kms: arn:aws:kms:us-east-1:123456789012:key/abc-123-def-456 encrypted_regex: ^(data|stringData)$EOF
# Create a secret and encrypt itcat <<'EOF' > database-secret.enc.yamlapiVersion: v1kind: Secretmetadata: name: database-credentials namespace: paymentstype: OpaquestringData: DB_HOST: prod-db.company.internal DB_PASSWORD: super-secret-password DB_USERNAME: payments_svcEOF
# Encrypt the secret (only data/stringData fields are encrypted)sops --encrypt --in-place database-secret.enc.yaml
# The encrypted file looks like:# apiVersion: v1# kind: Secret# metadata:# name: database-credentials# namespace: payments# type: Opaque# stringData:# DB_HOST: ENC[AES256_GCM,data:abc123...]# DB_PASSWORD: ENC[AES256_GCM,data:def456...]# DB_USERNAME: ENC[AES256_GCM,data:ghi789...]
# Commit the encrypted file to Git (safe!)git add database-secret.enc.yamlgit commit -m "feat: Add payments database credentials (encrypted)"Did You Know?
Section titled “Did You Know?”-
Backstage has been adopted by over 3,200 companies as of 2025, making it the de facto standard for Internal Developer Platforms. Spotify’s internal Backstage instance manages over 7,500 services, 4,500 templates, and serves 6,000 developers. The average developer at Spotify interacts with Backstage 18 times per day — more than any other internal tool except their IDE.
-
ArgoCD processes over 8 million sync operations per day across its global user base. The largest known ArgoCD installation manages over 12,000 Applications on a single ArgoCD instance, with 380 clusters registered. At that scale, the ArgoCD application controller consumes about 16GB of RAM and requires careful tuning of the
--app-resyncinterval to avoid overwhelming the Kubernetes API servers. -
The External Secrets Operator (ESO) was created to replace three competing projects: Kubernetes External Secrets, Secrets Manager CSI Driver, and the original External Secrets. GoDaddy, originally a major contributor to the Kubernetes External Secrets project, switched to ESO in 2022 after finding that maintaining a vendor-specific solution was unsustainable. ESO now supports 21 secret store providers.
-
SOPS (Secrets OPerationS) was created by Mozilla in 2015 for encrypting YAML files used in Firefox infrastructure automation. It was adopted by the Kubernetes community because it solved a fundamental problem: how to version-control secrets without storing plaintext in Git. Mozilla’s original use case — encrypting AWS CloudFormation parameters — is almost forgotten, but the tool’s Kubernetes adoption has made it one of Mozilla’s most widely used open-source contributions.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| One ArgoCD per team | Teams want autonomy. Platform team does not want to manage multi-tenant ArgoCD. | Invest in multi-tenant ArgoCD with Projects and RBAC. One or two centralized instances are far easier to maintain than 50+ team instances. |
| ApplicationSets without sync waves | All applications in an ApplicationSet try to sync simultaneously. CRDs not installed before resources that depend on them. Sync failures cascade. | Use sync waves and sync hooks to control deployment order. Deploy CRDs before custom resources. Deploy namespaces before workloads. |
| Git repo too large for ArgoCD | Mono-repo grows to 10,000+ files. ArgoCD clone takes 30+ seconds. Sync times increase to minutes. | Use shallow clones (--depth 1), split large repos, or use ApplicationSets with directory generators to limit what ArgoCD syncs per Application. |
| Plaintext secrets in Git | ”We will encrypt them later.” Later never comes. Credential scanning finds secrets in Git history even after removal. | Use External Secrets Operator or SOPS from day one. Add pre-commit hooks that detect secrets (gitleaks, talisman). Rotate any secrets found in Git history immediately. |
| No ArgoCD sync windows | Teams deploy at 3 AM on a Saturday, break production, and nobody is awake to respond. | Configure sync windows on ArgoCD Projects. Deny automatic syncs outside business hours for production. Allow manual overrides with justification. |
| RBAC too permissive | ”Just give everyone admin to unblock them.” ArgoCD becomes a free-for-all where anyone can deploy anything anywhere. | Design RBAC from the start. Projects per team. Source repo restrictions. Destination namespace restrictions. Deny cluster-scoped resources for non-platform teams. |
| No Backstage templates for common tasks | Developers still need to manually create repos, configure ArgoCD, set up monitoring. The platform exists but the self-service layer does not. | Invest in Backstage templates for the top 5 use cases: new service, new environment, new database, new team onboarding, debug session. |
| ArgoCD managing its own configuration | ArgoCD tries to sync its own ConfigMaps and Secrets. A bad configuration change locks ArgoCD out. | Keep ArgoCD self-management in a separate, manually-controlled Application with automated: false. Critical configs (RBAC, repos, clusters) require manual sync. |
Question 1: Scenario: Your rapidly expanding enterprise just acquired three smaller companies, bringing the total number of engineering teams to 40. The CTO asks if you should provision a dedicated ArgoCD instance for each team to ensure isolation. Should you use one ArgoCD instance or one per team? Justify your architecture decision.
Use one centralized ArgoCD instance (or at most 2-3 for HA across regions) rather than one per team. With 40 teams, running 40 individual instances means maintaining 40 separate RBAC configurations and performing 40 isolated upgrades whenever a critical CVE is announced, which is operationally unsustainable. A centralized instance managed via ArgoCD Projects provides cross-team visibility, uniform governance, and drastically reduces resource overhead by eliminating redundant controller pods. The trade-off is that this centralized instance becomes critical platform infrastructure, requiring high availability, rigorous monitoring, and dedicated platform engineers to operate it reliably at scale.
Question 2: Scenario: Your platform team is provisioning a new Kubernetes cluster. They need to install monitoring, logging, cert-manager, and Kyverno immediately upon creation. Instead of running `kubectl apply` for each tool, they want the cluster to bootstrap itself. How does the App of Apps pattern solve this bootstrapping problem, and what are its operational limitations?
The App of Apps pattern solves the bootstrapping problem by utilizing a single root ArgoCD Application that points to a Git directory containing the manifests for all core platform tools. Once you apply this single root Application to the new cluster, ArgoCD automatically discovers, creates, and syncs all the child Applications, ensuring the entire platform is provisioned in a consistent, declarative state without manual intervention. However, a major operational limitation is that if the root Application’s directory contains invalid YAML or an error occurs at the root level, the sync can fail globally, potentially cascading to all child applications. Additionally, relying solely on App of Apps makes dynamic cluster targeting difficult, which is why ApplicationSets are generally preferred for dynamic, multi-cluster environments.
Question 3: Scenario: A critical database is deployed in your production cluster, and the 'payments' team stores its credentials in AWS Secrets Manager. They use the External Secrets Operator to sync these credentials into Kubernetes. Suddenly, AWS Secrets Manager experiences a regional outage. What happens to the running applications, and what operational challenges might arise during the outage?
When AWS Secrets Manager experiences an outage, ESO cannot refresh the secrets, but the existing Kubernetes Secrets created during the last successful sync remain fully functional within the cluster. Pods that are currently running will continue to use the in-memory copy of the secrets, and newly scheduled pods can still mount the existing Kubernetes Secrets without issue. However, problems arise if a secret value needs to be rotated urgently or if a new ExternalSecret resource is created, as the initial sync cannot complete until the provider is back online. To mitigate the impact of temporary provider outages, you should configure a reasonable refresh interval (such as 1 hour) and ensure critical secrets are fully synced during the initial deployment phase.
Question 4: Scenario: Your centralized ArgoCD installation has grown to manage 500 Applications distributed across 20 distinct clusters. Recently, developers have started complaining that sync times are unacceptably slow and the ArgoCD user interface frequently times out. What specific tuning adjustments would you apply to stabilize the system under this load?
To optimize an ArgoCD instance at this scale, you should first increase the --app-resync interval from the default 180 seconds to a higher value like 300 or 600 seconds to reduce the frequency of application state evaluations. Next, enable controller sharding by configuring multiple application controller replicas with the --shard flag, allowing each controller to manage a dedicated subset of the 20 clusters. You can also improve performance by enabling Server-Side Apply to reduce the calculation overhead for diffs, and by relying on Git webhooks instead of polling to handle repository changes more efficiently. Finally, ensure the Redis cache has sufficient memory allocated (4-8GB for 500 apps) to store manifest states without eviction, and consider splitting the instance by environment if a single centralized installation continues to struggle under the load.
Question 5: Scenario: A rapidly growing fintech startup has expanded from 2 to 15 engineering teams over the last year. They currently store all their Kubernetes manifests in a single mono-repo. Recently, deployment pipelines have slowed to a crawl, and teams are stepping on each other's changes. Why is the mono-repo failing them at this scale, and how would migrating to a hybrid repository strategy resolve these operational bottlenecks?
The mono-repo becomes a bottleneck at scale because as the number of teams and manifests grows, Git clone and fetch operations take significantly longer, delaying ArgoCD sync times and slowing down continuous integration pipelines. Furthermore, a single repository means that one team introducing a syntax error or triggering a complex merge conflict can block the deployment queue for the entire engineering organization. By migrating to a hybrid repository strategy, the platform team maintains a central repository for global configurations and standardized policies, ensuring strict governance. Concurrently, each engineering team receives their own repository for their specific microservices, granting them independent merge queues, isolated access controls, and true deployment autonomy without impacting the broader platform.
Question 6: Scenario: The 'payments' team and 'identity' team share a central ArgoCD instance. A developer on the 'payments' team accidentally modifies their Application manifest to target the `identity-prod` namespace. How does ArgoCD's architecture prevent the 'payments' team from overwriting the 'identity' team's workloads, even if they commit this change to their repository?
This cross-tenant deployment attempt is blocked by enforcing strict boundary restrictions using ArgoCD AppProjects. The platform team configures the ‘payments’ AppProject to explicitly define allowed destination namespaces, restricting them solely to the ‘payments’ environment. When the developer commits the invalid manifest targeting the ‘identity’ namespace, the ArgoCD application controller evaluates the target against the AppProject’s destination whitelist and rejects the sync operation entirely because it is not permitted. To provide defense-in-depth, the platform team also uses Kubernetes RBAC to restrict the specific ArgoCD service account associated with the ‘payments’ project, ensuring it physically lacks the permissions to modify resources in the ‘identity’ namespace.
Hands-On Exercise: Build an Enterprise GitOps Platform
Section titled “Hands-On Exercise: Build an Enterprise GitOps Platform”In this exercise, you will set up a multi-tenant ArgoCD installation with Project-based RBAC, ApplicationSets, and External Secrets Operator.
Task 1: Create the Lab Cluster and Install ArgoCD
Section titled “Task 1: Create the Lab Cluster and Install ArgoCD”Solution
kind create cluster --name enterprise-gitops
# Install ArgoCDkubectl create namespace argocdkubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yamlkubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=120s
ARGOCD_PW=$(kubectl -n argocd get secret argocd-initial-admin-secret \ -o jsonpath='{.data.password}' | base64 -d)echo "ArgoCD admin password: $ARGOCD_PW"Task 2: Configure Multi-Tenant ArgoCD Projects
Section titled “Task 2: Configure Multi-Tenant ArgoCD Projects”Solution
# Create team namespacesfor TEAM in payments identity search platform; do kubectl create namespace $TEAMdone
# Create ArgoCD Projects for each teamcat <<'EOF' | kubectl apply -f -apiVersion: argoproj.io/v1alpha1kind: AppProjectmetadata: name: payments namespace: argocdspec: description: "Payments team" sourceRepos: - '*' destinations: - namespace: payments server: https://kubernetes.default.svc clusterResourceWhitelist: [] namespaceResourceWhitelist: - group: '' kind: '*' - group: apps kind: '*' - group: networking.k8s.io kind: '*'---apiVersion: argoproj.io/v1alpha1kind: AppProjectmetadata: name: identity namespace: argocdspec: description: "Identity team" sourceRepos: - '*' destinations: - namespace: identity server: https://kubernetes.default.svc clusterResourceWhitelist: [] namespaceResourceWhitelist: - group: '' kind: '*' - group: apps kind: '*'---apiVersion: argoproj.io/v1alpha1kind: AppProjectmetadata: name: platform namespace: argocdspec: description: "Platform team - full access" sourceRepos: - '*' destinations: - namespace: '*' server: '*' clusterResourceWhitelist: - group: '*' kind: '*'EOF
echo "ArgoCD Projects:"kubectl get appprojects -n argocdTask 3: Deploy Applications with the App of Apps Pattern
Section titled “Task 3: Deploy Applications with the App of Apps Pattern”Solution
# Create a simulated application for the payments teamcat <<'EOF' | kubectl apply -f -apiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: payments-processor namespace: argocd labels: team: paymentsspec: project: payments source: repoURL: https://github.com/argoproj/argocd-example-apps.git targetRevision: HEAD path: guestbook destination: server: https://kubernetes.default.svc namespace: payments syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true---apiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: identity-auth namespace: argocd labels: team: identityspec: project: identity source: repoURL: https://github.com/argoproj/argocd-example-apps.git targetRevision: HEAD path: guestbook destination: server: https://kubernetes.default.svc namespace: identity syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=trueEOF
# Wait for syncsleep 15
echo "=== ArgoCD Applications ==="kubectl get applications -n argocd -o custom-columns=\NAME:.metadata.name,\PROJECT:.spec.project,\STATUS:.status.sync.status,\HEALTH:.status.health.status,\NAMESPACE:.spec.destination.namespaceTask 4: Test RBAC Enforcement
Section titled “Task 4: Test RBAC Enforcement”Solution
# Test: Try to create an application in the payments project targeting the identity namespaceecho "=== Test: Cross-namespace deployment (should fail) ==="cat <<'EOF' | kubectl apply -f - 2>&1 || trueapiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: payments-in-wrong-namespace namespace: argocdspec: project: payments source: repoURL: https://github.com/argoproj/argocd-example-apps.git targetRevision: HEAD path: guestbook destination: server: https://kubernetes.default.svc namespace: identity syncPolicy: automated: prune: trueEOF
# The application might be created but sync should failecho ""echo "=== Check if the cross-namespace app synced ==="sleep 5kubectl get application payments-in-wrong-namespace -n argocd \ -o jsonpath='{.status.conditions[*].message}' 2>/dev/null || echo "Application rejected or sync failed as expected"
# Clean up the testkubectl delete application payments-in-wrong-namespace -n argocd 2>/dev/null || true
echo ""echo "=== Legitimate applications ==="kubectl get applications -n argocd -o custom-columns=\NAME:.metadata.name,PROJECT:.spec.project,SYNC:.status.sync.statusTask 5: Build a Platform Dashboard
Section titled “Task 5: Build a Platform Dashboard”Solution
cat <<'SCRIPT' > /tmp/platform-dashboard.sh#!/bin/bashecho "============================================="echo " ENTERPRISE GITOPS PLATFORM DASHBOARD"echo " $(date -u +%Y-%m-%dT%H:%M:%SZ)"echo "============================================="
echo ""echo "--- ArgoCD Health ---"ARGO_PODS=$(kubectl get pods -n argocd --no-headers | grep Running | wc -l | tr -d ' ')echo " ArgoCD Pods Running: $ARGO_PODS"
echo ""echo "--- Projects ---"kubectl get appprojects -n argocd --no-headers | while read LINE; do PROJECT=$(echo $LINE | awk '{print $1}') APP_COUNT=$(kubectl get applications -n argocd -o json | jq "[.items[] | select(.spec.project == \"$PROJECT\")] | length") echo " $PROJECT: $APP_COUNT applications"done
echo ""echo "--- Applications by Sync Status ---"SYNCED=$(kubectl get applications -n argocd -o json | jq '[.items[] | select(.status.sync.status == "Synced")] | length')OUTOFSYNC=$(kubectl get applications -n argocd -o json | jq '[.items[] | select(.status.sync.status == "OutOfSync")] | length')UNKNOWN=$(kubectl get applications -n argocd -o json | jq '[.items[] | select(.status.sync.status == "Unknown")] | length')echo " Synced: $SYNCED"echo " OutOfSync: $OUTOFSYNC"echo " Unknown: $UNKNOWN"
echo ""echo "--- Applications by Health ---"HEALTHY=$(kubectl get applications -n argocd -o json | jq '[.items[] | select(.status.health.status == "Healthy")] | length')PROGRESSING=$(kubectl get applications -n argocd -o json | jq '[.items[] | select(.status.health.status == "Progressing")] | length')DEGRADED=$(kubectl get applications -n argocd -o json | jq '[.items[] | select(.status.health.status == "Degraded")] | length')echo " Healthy: $HEALTHY"echo " Progressing: $PROGRESSING"echo " Degraded: $DEGRADED"
echo ""echo "--- Team Namespace Resources ---"for NS in payments identity search; do PODS=$(kubectl get pods -n $NS --no-headers 2>/dev/null | wc -l | tr -d ' ') SVCS=$(kubectl get services -n $NS --no-headers 2>/dev/null | wc -l | tr -d ' ') echo " $NS: $PODS pods, $SVCS services"done
echo ""echo "============================================="SCRIPT
chmod +x /tmp/platform-dashboard.shbash /tmp/platform-dashboard.shClean Up
Section titled “Clean Up”kind delete cluster --name enterprise-gitopsrm /tmp/platform-dashboard.shSuccess Criteria
Section titled “Success Criteria”- I installed ArgoCD with multi-tenant Projects for 3 teams
- I deployed applications scoped to team namespaces
- I verified that cross-namespace deployment is prevented by Project RBAC
- I built a platform dashboard showing application sync and health status
- I can explain the App of Apps pattern and when to use ApplicationSets
- I can describe at least 3 secrets management strategies for GitOps
- I can design a Git repository strategy for a 10+ team organization
Next Module
Section titled “Next Module”With the GitOps platform in place, it is time to secure it with Zero Trust principles. Head to Module 10.9: Zero Trust Architecture in Hybrid Cloud to learn about BeyondCorp, Identity-Aware Proxies, micro-segmentation, and how to remove VPNs from your enterprise architecture.