Module 13.1: Harbor - Enterprise Container Registry
Цей контент ще не доступний вашою мовою.
Toolkit Track | Complexity:
[COMPLEX]| Time: 50-60 minutes
Overview
Section titled “Overview”Harbor is the enterprise container registry that ate Docker Hub’s lunch. When organizations realized they couldn’t bet their production on rate limits and someone else’s infrastructure, Harbor emerged as the answer. It’s a CNCF Graduated project—the same tier as Kubernetes itself—trusted by thousands of enterprises worldwide.
This module teaches you to deploy, configure, and operate Harbor for enterprise container image management.
Prerequisites
Section titled “Prerequisites”- Docker fundamentals (building, pushing, tagging)
- Kubernetes basics (Deployments, Services, PVCs)
- DevSecOps Discipline - Supply chain security concepts
- Understanding of image registries and OCI format
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy Harbor with high availability and configure project-based access control for multi-team environments
- Implement automated vulnerability scanning and image signing policies for production registries
- Configure replication rules between Harbor instances for geo-distributed container distribution
- Integrate Harbor with Kubernetes admission controllers to enforce image security policies
Why This Module Matters
Section titled “Why This Module Matters”Every container you run came from somewhere. In production, “somewhere” can’t be DockerHub with 100 pull/hour limits, or an unknown registry with unscanned images. Harbor gives you:
- Vulnerability scanning: Know what’s in your images before production
- Access control: Who can push to
production/namespace? - Audit trails: Who pushed what, when?
- Replication: Images where you need them, when you need them
- Compliance: Image signing, retention policies, quotas
When your security team asks “how do we know this image is safe?”, Harbor is your answer.
Did You Know?
Section titled “Did You Know?”- Origin Story: Harbor was created by VMware in 2014, open-sourced in 2016, and donated to CNCF in 2018. It graduated in 2020—faster than most projects.
- Scale: Some Harbor installations manage over 1 million images and handle 100,000+ pulls per day
- The Name: Harbor was named because it’s a “safe harbor” for your container images—protecting them from the open seas of the internet
- Trivy Integration: Harbor switched from Clair to Trivy as the default scanner in v2.2, reducing scan times by 10x
Harbor Architecture
Section titled “Harbor Architecture”Understanding Harbor’s components is essential for successful deployment and troubleshooting:
HARBOR ARCHITECTURE─────────────────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────┐ │ NGINX Proxy │ │ (SSL termination, routing) │ └──────────────────┬──────────────────────┘ │ ┌──────────────────────────┼──────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ Core │ │ Registry │ │ Portal │ │ (API, Auth) │ │ (Distribution)│ │ (UI) │ │ │ │ │ │ │ │ - Projects │ │ - OCI storage │ │ - Web UI │ │ - Users/RBAC │ │ - Blob/Manifest│ │ - Dashboard │ │ - Webhooks │ │ - GC │ │ - Admin │ └───────┬────────┘ └────────────────┘ └────────────────┘ │ ┌──────────┴──────────┬───────────────────────┬───────────────────────┐ │ │ │ │ ▼ ▼ ▼ ▼┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐│ Job Service │ │ Trivy │ │ Notary │ │ Redis/Cache ││ │ │ (Scanner) │ │ (Signing) │ │ ││ - Replication │ │ │ │ │ │ - Session ││ - GC jobs │ │ - CVE scan │ │ - Trust │ │ - Job queues ││ - Retention │ │ - DB updates │ │ - Cosign │ │ - Caching │└────────────────┘ └────────────────┘ └────────────────┘ └────────────────┘ │ │ │ │ └─────────────────────┴───────────────────────┴───────────────────────┘ │ ▼ ┌───────────────────────────────┐ │ PostgreSQL │ │ (Metadata, users, RBAC) │ └───────────────────────────────┘ │ ▼ ┌───────────────────────────────┐ │ Storage Backend │ │ (S3 / Azure / GCS / Local) │ └───────────────────────────────┘
COMPONENT RESPONSIBILITIES:─────────────────────────────────────────────────────────────────────────────Core → Authentication, authorization, project management, APIRegistry → OCI-compliant image storage (Docker Distribution v2)Portal → Web UI for administrationJob Service → Async jobs (replication, GC, scanning)Trivy → Vulnerability scanning (built-in since v2.2)Notary → Image signing (optional, Docker Content Trust)Redis → Job queues, session management, cachingPostgreSQL → Persistent metadata storageCore Concepts
Section titled “Core Concepts”Projects: The Organizational Unit
Section titled “Projects: The Organizational Unit”Projects in Harbor are like namespaces with superpowers:
PROJECT STRUCTURE─────────────────────────────────────────────────────────────────────────────
┌────────────────────────────────────────┐ │ Harbor Instance │ │ │ │ ┌────────────────────────────────┐ │ │ │ library (public) │ │ │ │ • Base images │ │ │ │ • Shared utilities │ │ │ │ • Anyone can pull │ │ │ └────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────┐ │ │ │ team-backend (private) │ │ │ │ • API services │ │ │ │ • Team-only access │ │ │ │ • Auto-scan enabled │ │ │ └────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────┐ │ │ │ production (private) │ │ │ │ • Signed images only │ │ │ │ • Strict retention │ │ │ │ • Scan required for pull │ │ │ └────────────────────────────────┘ │ │ │ └────────────────────────────────────────┘
IMAGE NAMING:─────────────────────────────────────────────────────────────────────────────harbor.example.com / team-backend / api-server : v1.2.3└──────┬─────────┘ └────┬─────┘ └───┬────┘ └─┬──┘ registry project repository tagRBAC: Who Can Do What
Section titled “RBAC: Who Can Do What”Harbor’s role-based access control is project-scoped:
| Role | Pull | Push | Delete | Manage Members | Project Admin |
|---|---|---|---|---|---|
| Guest | Yes | No | No | No | No |
| Developer | Yes | Yes | No | No | No |
| Maintainer | Yes | Yes | Yes | No | No |
| Project Admin | Yes | Yes | Yes | Yes | Yes |
Robot Accounts: For CI/CD pipelines, create robot accounts with scoped permissions:
# Robot account for CI/CDName: robot$ci-pushPermissions: - project: team-backend access: - resource: repository action: push - resource: tag action: create - project: library access: - resource: repository action: pullVulnerability Scanning
Section titled “Vulnerability Scanning”Harbor uses Trivy for vulnerability scanning:
VULNERABILITY SCANNING FLOW─────────────────────────────────────────────────────────────────────────────
Push Image Scan Triggered │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Image Layers │────────────▶│ Trivy │ │ Stored │ │ Scanner │ └─────────────────┘ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Trivy CVE DB │ │ (NVD, Alpine, │ │ Debian, etc.) │ └────────┬────────┘ │ ▼ ┌────────────────────────────────────────────────────┐ │ SCAN RESULTS │ │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ CRITICAL: 2 | HIGH: 5 | MEDIUM: 12 | LOW: 8│ │ │ └─────────────────────────────────────────────┘ │ │ │ │ CVE-2024-1234 │ Critical │ openssl │ Fixed │ │ CVE-2024-5678 │ High │ curl │ Update │ │ CVE-2024-9012 │ Medium │ libc │ No fix │ │ │ └────────────────────────────────────────────────────┘
SCAN POLICY OPTIONS:─────────────────────────────────────────────────────────────────────────────Scan on Push → Auto-scan when image pushedScan on Schedule → Daily/weekly vulnerability DB updatesBlock on Pull → Prevent pull if CVE severity threshold exceededDeploying Harbor
Section titled “Deploying Harbor”Option 1: Docker Compose (Development/Testing)
Section titled “Option 1: Docker Compose (Development/Testing)”The fastest way to get Harbor running:
# Download the installerHARBOR_VERSION="v2.10.0"wget https://github.com/goharbor/harbor/releases/download/${HARBOR_VERSION}/harbor-offline-installer-${HARBOR_VERSION}.tgztar xzf harbor-offline-installer-${HARBOR_VERSION}.tgzcd harbor
# Copy and edit the configurationcp harbor.yml.tmpl harbor.ymlEdit harbor.yml:
# harbor.yml - key settingshostname: harbor.example.com # Your domain
# HTTPS configuration (production)https: port: 443 certificate: /your/certificate.crt private_key: /your/private.key
# For testing only - use HTTPS in production!# http:# port: 80
# Admin password - CHANGE THIS!harbor_admin_password: Harbor12345
# Database settingsdatabase: password: root123 # Change this! max_idle_conns: 100 max_open_conns: 900
# Storage backend# Default is local filesystemstorage_service: filesystem: maxthreads: 100 rootdirectory: /storage
# For S3-compatible storage:# storage_service:# s3:# accesskey: your-access-key# secretkey: your-secret-key# region: us-west-1# bucket: harbor-images# encrypt: true
# Trivy scanner settingstrivy: ignore_unfixed: false skip_update: false insecure: falseRun the installer:
# Install Harbor./install.sh --with-trivy
# Verify servicesdocker-compose ps
# Access UI at https://harbor.example.com# Login: admin / Harbor12345Option 2: Kubernetes with Helm (Production)
Section titled “Option 2: Kubernetes with Helm (Production)”For production Kubernetes deployments:
# Add Helm repositoryhelm repo add harbor https://helm.goharbor.iohelm repo update
# Create namespacekubectl create namespace harborCreate a values file for production:
expose: type: ingress tls: enabled: true certSource: secret secret: secretName: harbor-tls ingress: hosts: core: harbor.example.com className: nginx annotations: nginx.ingress.kubernetes.io/proxy-body-size: "0" nginx.ingress.kubernetes.io/ssl-redirect: "true"
externalURL: https://harbor.example.com
# Persistencepersistence: enabled: true resourcePolicy: "keep" persistentVolumeClaim: registry: storageClass: "fast-ssd" size: 100Gi database: storageClass: "fast-ssd" size: 5Gi redis: storageClass: "fast-ssd" size: 1Gi trivy: storageClass: "fast-ssd" size: 5Gi
# External database (recommended for production)database: type: external external: host: "postgresql.example.com" port: "5432" username: "harbor" password: "secretpassword" coreDatabase: "harbor"
# External Redis (recommended for production)redis: type: external external: addr: "redis.example.com:6379" password: "redis-password"
# Admin passwordharborAdminPassword: "SecurePassword123!"
# Resource requests/limitscore: resources: requests: memory: 256Mi cpu: 100m limits: memory: 512Mi cpu: 500m
registry: resources: requests: memory: 256Mi cpu: 100m
trivy: enabled: true resources: requests: cpu: 200m memory: 512Mi limits: cpu: 1 memory: 1Gi
# Enable metrics for Prometheusmetrics: enabled: true serviceMonitor: enabled: trueDeploy Harbor:
# Install Harborhelm install harbor harbor/harbor \ --namespace harbor \ -f harbor-values.yaml
# Watch deployment progresskubectl -n harbor get pods -w
# Verify all components are runningkubectl -n harbor get pods# Expected output:# harbor-core-xxx Running# harbor-database-xxx Running# harbor-jobservice-xxx Running# harbor-nginx-xxx Running# harbor-portal-xxx Running# harbor-redis-xxx Running# harbor-registry-xxx Running# harbor-trivy-xxx RunningConfiguring Docker to Use Harbor
Section titled “Configuring Docker to Use Harbor”# For HTTP (development only!)# Add to /etc/docker/daemon.json:{ "insecure-registries": ["harbor.example.com"]}sudo systemctl restart docker
# For HTTPS with self-signed cert# Copy CA certsudo mkdir -p /etc/docker/certs.d/harbor.example.comsudo cp ca.crt /etc/docker/certs.d/harbor.example.com/sudo systemctl restart docker
# Login to Harbordocker login harbor.example.com# Username: admin# Password: Harbor12345
# Push an imagedocker tag nginx:latest harbor.example.com/library/nginx:latestdocker push harbor.example.com/library/nginx:latestConfiguring Kubernetes to Pull from Harbor
Section titled “Configuring Kubernetes to Pull from Harbor”# Create image pull secretkubectl create secret docker-registry harbor-creds \ --docker-server=harbor.example.com \ --docker-username=admin \ --docker-password=Harbor12345 \ --docker-email=admin@example.com \ -n default
# Reference in Pod specapiVersion: v1kind: Podmetadata: name: nginxspec: containers: - name: nginx image: harbor.example.com/library/nginx:latest imagePullSecrets: - name: harbor-creds
# Or attach to ServiceAccount (recommended)kubectl patch serviceaccount default \ -p '{"imagePullSecrets": [{"name": "harbor-creds"}]}'Harbor Operations
Section titled “Harbor Operations”Creating Projects
Section titled “Creating Projects”Using the API:
# Create a projectcurl -X POST "https://harbor.example.com/api/v2.0/projects" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "project_name": "team-backend", "metadata": { "public": "false", "enable_content_trust": "false", "auto_scan": "true", "severity": "high", "reuse_sys_cve_allowlist": "true" }, "storage_limit": 10737418240 }'
# List projectscurl -s "https://harbor.example.com/api/v2.0/projects" \ -u "admin:Harbor12345" | jq .Setting Up Replication
Section titled “Setting Up Replication”Replicate images between Harbor instances or to/from external registries:
REPLICATION PATTERNS─────────────────────────────────────────────────────────────────────────────
PUSH-BASED (Harbor → Remote):┌───────────────┐ ┌───────────────┐│ Harbor │────Push Images────▶│ Remote ││ (Primary) │ │ Registry ││ │ │ (DR/Edge) │└───────────────┘ └───────────────┘
PULL-BASED (Remote → Harbor):┌───────────────┐ ┌───────────────┐│ Harbor │◀───Pull Images─────│ DockerHub ││ (Cache) │ │ (Upstream) ││ │ │ │└───────────────┘ └───────────────┘Create a replication rule via API:
# First, create a registry endpointcurl -X POST "https://harbor.example.com/api/v2.0/registries" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "name": "dockerhub", "type": "docker-hub", "url": "https://hub.docker.com", "credential": { "type": "basic", "access_key": "your-dockerhub-username", "access_secret": "your-dockerhub-token" } }'
# Create a pull-based replication rulecurl -X POST "https://harbor.example.com/api/v2.0/replication/policies" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "name": "pull-nginx-from-dockerhub", "src_registry": { "id": 1 }, "dest_namespace": "library", "dest_namespace_replace_count": 1, "trigger": { "type": "scheduled", "trigger_settings": { "cron": "0 0 * * *" } }, "filters": [ { "type": "name", "value": "library/nginx" }, { "type": "tag", "value": "1.*" } ], "enabled": true, "deletion": false, "override": true, "speed": -1 }'Retention Policies
Section titled “Retention Policies”Prevent storage bloat with retention rules:
# Create retention policy for a projectcurl -X POST "https://harbor.example.com/api/v2.0/retentions" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "algorithm": "or", "rules": [ { "disabled": false, "action": "retain", "params": { "latestPushedK": 10 }, "scope_selectors": { "repository": [ { "kind": "doublestar", "pattern": "**" } ] }, "tag_selectors": [ { "kind": "doublestar", "pattern": "**" } ] }, { "disabled": false, "action": "retain", "params": { "latestPulledN": 5 }, "scope_selectors": { "repository": [ { "kind": "doublestar", "pattern": "**" } ] }, "tag_selectors": [ { "kind": "doublestar", "pattern": "v*" } ] } ], "scope": { "level": "project", "ref": 1 }, "trigger": { "kind": "Schedule", "settings": { "cron": "0 0 0 * * *" } } }'Common retention patterns:
| Pattern | Rule | Use Case |
|---|---|---|
| Keep latest N | latestPushedK: 10 | Development images |
| Keep by age | nDaysSinceLastPush: 30 | Clean old builds |
| Keep pulled | latestPulledN: 5 | Active images |
| Keep tags | pattern: "v*" | Release versions |
| Keep always | always: true | Critical base images |
Garbage Collection
Section titled “Garbage Collection”Clean up deleted image layers:
# Trigger garbage collection via APIcurl -X POST "https://harbor.example.com/api/v2.0/system/gc/schedule" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "schedule": { "type": "Manual" }, "parameters": { "delete_untagged": true, "dry_run": false } }'
# Check GC job statuscurl -s "https://harbor.example.com/api/v2.0/system/gc" \ -u "admin:Harbor12345" | jq '.[-1]'
# Schedule automatic GCcurl -X PUT "https://harbor.example.com/api/v2.0/system/gc/schedule" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "schedule": { "type": "Weekly", "cron": "0 0 0 * * 0" }, "parameters": { "delete_untagged": true } }'CI/CD Integration
Section titled “CI/CD Integration”GitHub Actions
Section titled “GitHub Actions”name: Build and Push to Harbor
on: push: branches: [main] tags: ['v*']
env: REGISTRY: harbor.example.com PROJECT: team-backend IMAGE_NAME: api-server
jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4
- name: Set up Docker Buildx uses: docker/setup-buildx-action@v3
- name: Login to Harbor uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ secrets.HARBOR_USERNAME }} password: ${{ secrets.HARBOR_PASSWORD }}
- name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.PROJECT }}/${{ env.IMAGE_NAME }} tags: | type=ref,event=branch type=semver,pattern={{version}} type=sha,prefix=
- name: Build and push uses: docker/build-push-action@v5 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=registry,ref=${{ env.REGISTRY }}/${{ env.PROJECT }}/${{ env.IMAGE_NAME }}:buildcache cache-to: type=registry,ref=${{ env.REGISTRY }}/${{ env.PROJECT }}/${{ env.IMAGE_NAME }}:buildcache,mode=max
- name: Wait for scan run: | # Wait for Harbor to scan the image sleep 30
# Check scan results TAG=$(echo "${{ steps.meta.outputs.tags }}" | head -1 | cut -d: -f2) SCAN_URL="${{ env.REGISTRY }}/api/v2.0/projects/${{ env.PROJECT }}/repositories/${{ env.IMAGE_NAME }}/artifacts/${TAG}/additions/vulnerabilities"
RESULT=$(curl -s -u "${{ secrets.HARBOR_USERNAME }}:${{ secrets.HARBOR_PASSWORD }}" "$SCAN_URL")
CRITICAL=$(echo "$RESULT" | jq -r '.["application/vnd.security.vulnerability.report; version=1.1"].summary.critical // 0') HIGH=$(echo "$RESULT" | jq -r '.["application/vnd.security.vulnerability.report; version=1.1"].summary.high // 0')
echo "Critical: $CRITICAL, High: $HIGH"
if [ "$CRITICAL" -gt 0 ]; then echo "::error::Found $CRITICAL critical vulnerabilities!" exit 1 fiGitLab CI
Section titled “GitLab CI”variables: REGISTRY: harbor.example.com PROJECT: team-backend IMAGE_NAME: api-server
stages: - build - scan - deploy
build: stage: build image: docker:24 services: - docker:24-dind variables: DOCKER_TLS_CERTDIR: "/certs" before_script: - echo "$HARBOR_PASSWORD" | docker login $REGISTRY -u "$HARBOR_USERNAME" --password-stdin script: - | if [ -n "$CI_COMMIT_TAG" ]; then TAG="$CI_COMMIT_TAG" else TAG="$CI_COMMIT_SHORT_SHA" fi
docker build -t $REGISTRY/$PROJECT/$IMAGE_NAME:$TAG . docker push $REGISTRY/$PROJECT/$IMAGE_NAME:$TAG
# Also tag as latest for main branch if [ "$CI_COMMIT_BRANCH" == "main" ]; then docker tag $REGISTRY/$PROJECT/$IMAGE_NAME:$TAG $REGISTRY/$PROJECT/$IMAGE_NAME:latest docker push $REGISTRY/$PROJECT/$IMAGE_NAME:latest fi
check-vulnerabilities: stage: scan image: curlimages/curl:latest script: - | if [ -n "$CI_COMMIT_TAG" ]; then TAG="$CI_COMMIT_TAG" else TAG="$CI_COMMIT_SHORT_SHA" fi
# Wait for scan to complete sleep 30
# Get scan results SCAN_URL="https://$REGISTRY/api/v2.0/projects/$PROJECT/repositories/$IMAGE_NAME/artifacts/$TAG/additions/vulnerabilities" RESULT=$(curl -s -u "$HARBOR_USERNAME:$HARBOR_PASSWORD" "$SCAN_URL")
CRITICAL=$(echo "$RESULT" | jq -r '.["application/vnd.security.vulnerability.report; version=1.1"].summary.critical // 0')
if [ "$CRITICAL" -gt 0 ]; then echo "Found $CRITICAL critical vulnerabilities!" exit 1 fi needs: - buildJenkins Pipeline
Section titled “Jenkins Pipeline”// Jenkinsfilepipeline { agent any
environment { REGISTRY = 'harbor.example.com' PROJECT = 'team-backend' IMAGE_NAME = 'api-server' HARBOR_CREDS = credentials('harbor-credentials') }
stages { stage('Build') { steps { script { def imageTag = env.TAG_NAME ?: env.GIT_COMMIT.take(7)
sh """ docker login ${REGISTRY} -u ${HARBOR_CREDS_USR} -p ${HARBOR_CREDS_PSW} docker build -t ${REGISTRY}/${PROJECT}/${IMAGE_NAME}:${imageTag} . docker push ${REGISTRY}/${PROJECT}/${IMAGE_NAME}:${imageTag} """
env.IMAGE_TAG = imageTag } } }
stage('Check Vulnerabilities') { steps { script { sleep(30) // Wait for scan
def scanUrl = "https://${REGISTRY}/api/v2.0/projects/${PROJECT}/repositories/${IMAGE_NAME}/artifacts/${env.IMAGE_TAG}/additions/vulnerabilities"
def result = sh( script: """ curl -s -u ${HARBOR_CREDS_USR}:${HARBOR_CREDS_PSW} "${scanUrl}" | \ jq -r '.["application/vnd.security.vulnerability.report; version=1.1"].summary' """, returnStdout: true ).trim()
echo "Scan results: ${result}"
def critical = sh( script: "echo '${result}' | jq -r '.critical // 0'", returnStdout: true ).trim().toInteger()
if (critical > 0) { error("Found ${critical} critical vulnerabilities!") } } } } }}Monitoring Harbor
Section titled “Monitoring Harbor”Prometheus Metrics
Section titled “Prometheus Metrics”Harbor exposes metrics on /metrics:
- job_name: 'harbor' static_configs: - targets: - harbor-core:8001 - harbor-registry:8001 - harbor-jobservice:8001 - harbor-exporter:8001 metrics_path: /metricsKey metrics to monitor:
| Metric | Description | Alert Threshold |
|---|---|---|
harbor_project_total | Number of projects | N/A |
harbor_repo_total | Total repositories | N/A |
harbor_artifact_pulled | Pull count | Baseline + 50% |
harbor_artifact_pushed | Push count | Baseline + 50% |
harbor_task_concurrency | Active jobs | > 100 |
harbor_gc_completion_time_seconds | GC duration | > 3600 |
Grafana Dashboard
Section titled “Grafana Dashboard”{ "dashboard": { "title": "Harbor Overview", "panels": [ { "title": "Image Pulls/min", "targets": [{ "expr": "rate(harbor_artifact_pulled[5m])" }] }, { "title": "Image Pushes/min", "targets": [{ "expr": "rate(harbor_artifact_pushed[5m])" }] }, { "title": "Storage Used", "targets": [{ "expr": "harbor_project_quota_usage_byte" }] }, { "title": "Scan Queue Length", "targets": [{ "expr": "harbor_task_queue_size{type=\"SCAN\"}" }] } ] }}War Story: The DockerHub Dependency
Section titled “War Story: The DockerHub Dependency”A healthcare company learned this lesson the hard way during a critical production deployment:
The Setup:
- 50-node Kubernetes cluster
- All images pulled from DockerHub
- Rolling update to new application version
The Disaster: At 9 AM on a Monday (peak DockerHub traffic), they initiated the deployment. Ten nodes successfully pulled the new image. Then the rate limiting kicked in.
Failed to pull image "nginx:1.23":toomanyrequests: You have reached your pull rate limitThe Fallout:
- 40 nodes stuck in ImagePullBackOff
- Half running old version, half running new (version skew)
- API inconsistencies confusing users
- 4-hour outage during business hours
The Fix: They deployed Harbor over a weekend:
Week 1: Deploy Harbor on KubernetesWeek 2: Set up proxy cache for DockerHubWeek 3: Migrate CI/CD to push to HarborWeek 4: Update all deployments to pull from HarborWeek 5: Replicate to DR siteThe Result:
- Zero rate limit issues
- Average pull time: 2s (was 15s from DockerHub)
- Full visibility into what’s running where
- Security team happy with vulnerability scanning
The Lesson: Your container registry is production infrastructure. Treat it that way.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| HTTP in production | Man-in-the-middle attacks, credential theft | Always use HTTPS with valid certificates |
| Default admin password | Trivial to compromise | Change immediately, use strong password |
| No quotas | Single project fills storage | Set project and user quotas |
| No retention | Storage grows unbounded | Configure retention policies |
| Skip scanning | Unknown vulnerabilities deployed | Enable auto-scan, enforce policies |
| Single instance | SPOF for all deployments | Multi-site replication |
| No monitoring | Issues discovered too late | Prometheus + alerting |
| Robot accounts everywhere | Over-privileged automation | Scope robot accounts narrowly |
Test your understanding of Harbor:
1. What is the relationship between Projects and Repositories in Harbor?
Answer: Projects are organizational units that contain repositories. A project defines access policies, quotas, and scanning rules. Repositories within a project inherit these settings. Think of projects as namespaces—harbor.example.com/project/repository:tag.
2. What happens when you delete an image tag in Harbor?
Answer: Deleting a tag only removes the tag reference, not the underlying layers. The actual blob data remains until garbage collection runs. This is because multiple tags might reference the same layers. GC identifies orphaned blobs and reclaims storage.
3. How does Harbor's proxy cache feature help with DockerHub rate limits?
Answer: Harbor acts as a pull-through cache. When a node requests library/nginx:latest, Harbor checks its cache first. If cached, it serves locally (no DockerHub hit). If not cached, Harbor pulls once from DockerHub and caches it. All subsequent pulls from any node hit the local cache.
4. What's the difference between push-based and pull-based replication?
Answer: Push-based: Harbor actively sends images to remote registries (DR site, edge locations). Pull-based: Harbor fetches images from remote registries (proxy cache, sync from upstream). Push is better for disaster recovery; pull is better for caching upstream registries.
5. Why would you use robot accounts instead of user credentials in CI/CD?
Answer: Robot accounts are: (1) Scoped to specific projects and actions, (2) Don’t expire with user offboarding, (3) Can be revoked independently, (4) Auditable separately from user activity, (5) Don’t count against user license limits. Never use personal credentials in automation.
6. How can you enforce that only scanned images can be pulled?
Answer: In project settings, enable “Prevent vulnerable images from running” and set a severity threshold (e.g., Critical, High). Harbor will block pulls of images exceeding that threshold. Combined with auto-scan on push, this ensures only scanned, passing images deploy.
7. What components are required for a highly available Harbor deployment?
Answer: HA Harbor needs: (1) Multiple Harbor replicas behind load balancer, (2) External PostgreSQL with replication, (3) External Redis cluster, (4) Shared storage (S3/GCS/Azure) for registry data, (5) Session affinity or shared session store. The Helm chart supports all these.
8. How does Harbor integrate with Kubernetes admission controllers for image policy?
Answer: Harbor doesn’t directly integrate with admission controllers, but works with them. Typical pattern: OPA/Gatekeeper policy requiring images from harbor.example.com/* + Cosign verification of signatures. Harbor stores the signed images and signatures; the admission controller enforces the policy.
Hands-On Exercise: Deploy Harbor and Secure a Pipeline
Section titled “Hands-On Exercise: Deploy Harbor and Secure a Pipeline”Objective
Section titled “Objective”Deploy Harbor on a local Kubernetes cluster, configure a project with security policies, and create a CI/CD pipeline that enforces vulnerability scanning.
Environment Setup
Section titled “Environment Setup”# Create kind cluster with extra port mappingscat <<EOF | kind create cluster --name harbor-lab --config -kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4nodes:- role: control-plane extraPortMappings: - containerPort: 30080 hostPort: 80 - containerPort: 30443 hostPort: 443EOF
# Install NGINX Ingress Controllerkubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
# Wait for ingress to be readykubectl wait --namespace ingress-nginx \ --for=condition=ready pod \ --selector=app.kubernetes.io/component=controller \ --timeout=90sStep 1: Deploy Harbor with Helm
Section titled “Step 1: Deploy Harbor with Helm”# Add Harbor helm repohelm repo add harbor https://helm.goharbor.iohelm repo update
# Create namespacekubectl create namespace harbor
# Create minimal values for local developmentcat <<EOF > harbor-values.yamlexpose: type: nodePort tls: enabled: false nodePort: ports: http: nodePort: 30080
externalURL: http://localhost
harborAdminPassword: "Harbor12345"
persistence: enabled: true persistentVolumeClaim: registry: size: 5Gi database: size: 1Gi redis: size: 1Gi trivy: size: 1Gi
trivy: enabled: trueEOF
# Install Harborhelm install harbor harbor/harbor \ --namespace harbor \ -f harbor-values.yaml
# Wait for all pods to be ready (this takes 3-5 minutes)kubectl -n harbor get pods -w
# Verify Harbor is runningcurl -s http://localhost/api/v2.0/health | jq .Step 2: Configure Harbor
Section titled “Step 2: Configure Harbor”# Login to Harbor CLIdocker login localhost -u admin -p Harbor12345
# Create a project via APIcurl -X POST "http://localhost/api/v2.0/projects" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "project_name": "secure-apps", "metadata": { "public": "false", "auto_scan": "true", "severity": "high", "prevent_vul": "true" } }'
# Create a robot account for CI/CDROBOT_RESPONSE=$(curl -s -X POST "http://localhost/api/v2.0/robots" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "name": "ci-push", "description": "CI/CD push robot", "duration": -1, "level": "project", "permissions": [ { "namespace": "secure-apps", "kind": "project", "access": [ {"resource": "repository", "action": "push"}, {"resource": "repository", "action": "pull"}, {"resource": "tag", "action": "create"}, {"resource": "artifact", "action": "read"} ] } ] }')
echo "Robot token: $(echo $ROBOT_RESPONSE | jq -r '.secret')"Step 3: Build and Push a Test Image
Section titled “Step 3: Build and Push a Test Image”# Create a simple Dockerfilemkdir -p /tmp/harbor-test && cd /tmp/harbor-test
cat <<EOF > DockerfileFROM nginx:1.25COPY index.html /usr/share/nginx/html/EOF
cat <<EOF > index.html<h1>Harbor Test</h1>EOF
# Build and pushdocker build -t localhost/secure-apps/test-app:v1 .docker push localhost/secure-apps/test-app:v1
# Check scan results (wait a moment for scan to complete)sleep 30curl -s "http://localhost/api/v2.0/projects/secure-apps/repositories/test-app/artifacts/v1" \ -u "admin:Harbor12345" | jq '.scan_overview'Step 4: Test Vulnerability Policy
Section titled “Step 4: Test Vulnerability Policy”# Build an image with known vulnerabilities (old alpine)cat <<EOF > Dockerfile.vulnerableFROM alpine:3.7RUN apk add --no-cache curlEOF
docker build -t localhost/secure-apps/vulnerable-app:v1 -f Dockerfile.vulnerable .docker push localhost/secure-apps/vulnerable-app:v1
# Wait for scansleep 30
# Try to pull (should fail if vulnerabilities exceed threshold)docker pull localhost/secure-apps/vulnerable-app:v1
# Check vulnerability detailscurl -s "http://localhost/api/v2.0/projects/secure-apps/repositories/vulnerable-app/artifacts/v1/additions/vulnerabilities" \ -u "admin:Harbor12345" | jq '.["application/vnd.security.vulnerability.report; version=1.1"].summary'Step 5: Configure Retention Policy
Section titled “Step 5: Configure Retention Policy”# Create retention policycurl -X POST "http://localhost/api/v2.0/retentions" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d '{ "algorithm": "or", "rules": [ { "action": "retain", "params": {"latestPushedK": 5}, "scope_selectors": { "repository": [{"kind": "doublestar", "pattern": "**"}] }, "tag_selectors": [ {"kind": "doublestar", "pattern": "**"} ] } ], "scope": {"level": "project", "ref": 2}, "trigger": {"kind": "Schedule", "settings": {"cron": "0 0 0 * * *"}} }'
# Link retention policy to projectRETENTION_ID=$(curl -s "http://localhost/api/v2.0/retentions" -u "admin:Harbor12345" | jq '.[0].id')curl -X PUT "http://localhost/api/v2.0/projects/secure-apps" \ -H "Content-Type: application/json" \ -u "admin:Harbor12345" \ -d "{\"metadata\": {\"retention_id\": \"$RETENTION_ID\"}}"Success Criteria
Section titled “Success Criteria”- Harbor is running on Kubernetes
-
secure-appsproject created with auto-scan enabled - Robot account created for CI/CD
- Test image pushed and scanned
- Vulnerable image blocked from pull
- Retention policy configured
Cleanup
Section titled “Cleanup”# Delete kind clusterkind delete cluster --name harbor-lab
# Clean up test directoryrm -rf /tmp/harbor-testKey Takeaways
Section titled “Key Takeaways”- Harbor is enterprise-ready: CNCF Graduated, battle-tested at scale, actively maintained
- Projects are namespaces with policies: Use them to organize teams, environments, and access
- Scanning is non-negotiable: Enable auto-scan and enforce vulnerability thresholds
- Robot accounts for automation: Never use personal credentials in CI/CD pipelines
- Replication enables resilience: Multi-site for DR, proxy cache for upstream registries
- Garbage collection is essential: Schedule regular GC to reclaim storage
- Retention prevents bloat: Define policies before storage becomes a problem
- Monitor everything: Prometheus metrics, alerting, and dashboards
- Plan for HA: External database, Redis, and shared storage for production
- It’s production infrastructure: Your registry is as critical as your Kubernetes cluster
Next Module
Section titled “Next Module”Continue to Module 13.2: Zot — The lightweight, OCI-native alternative for minimal footprint requirements.
“If you don’t control your container registry, you don’t control your supply chain. Harbor gives you that control.”