Module 6.6: Knative -- Serverless Workloads on Kubernetes
Toolkit Track | Complexity:
[COMPLEX]| Time: ~55 minutes
Overview
Section titled “Overview”You have 200 microservices running in your cluster. Most of them sit idle 80% of the time, burning CPU and memory doing nothing. Knative brings serverless to Kubernetes — your workloads scale to zero when nobody is using them and spin back up in seconds when traffic arrives. No proprietary FaaS lock-in, no cloud-specific APIs. Just Kubernetes, with an off switch.
What You’ll Learn:
- What “serverless on Kubernetes” actually means (and what it does not mean)
- Knative Serving: Services, Configurations, Revisions, Routes, and scale-to-zero
- Knative Eventing: CloudEvents, Brokers, Triggers, and Sources
- How the activator proxy and cold starts work under the hood
- Traffic splitting for blue-green and canary deployments
- Installing Knative with different networking layers
- When Knative is the right tool and when it is not
Prerequisites:
- Kubernetes Deployments, Services, and Ingress basics
- KEDA — Understanding scale-to-zero concepts
- Reliability Engineering — SLO awareness
- Familiarity with Helm and kubectl
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy Knative Serving for serverless workloads with automatic scaling and revision management
- Configure Knative traffic splitting for canary deployments and A/B testing across service revisions
- Implement Knative Eventing with brokers, triggers, and event sources for event-driven architectures
- Compare Knative’s serverless model against traditional Kubernetes Deployments for variable-traffic workloads
Why This Module Matters
Section titled “Why This Module Matters”A mid-size fintech company was running 200 microservices on EKS. The team had sized every deployment for peak load because nobody wanted to be the person whose service fell over during the monthly billing run. The result: 80% of services sat idle most of the day, consuming reserved CPU and memory. The monthly cloud bill was $45,000.
A platform engineer analyzed the traffic patterns and found that 140 of those 200 services received fewer than 10 requests per hour outside of business hours. Many received zero. The team migrated those 140 low-traffic services to Knative Serving with scale-to-zero enabled. Services that previously ran 24/7 now spun down after 60 seconds of inactivity and cold-started in under 2 seconds when traffic returned.
The result: the monthly bill dropped to $12,000. The 60 always-on services kept their regular Deployments. The 140 bursty services got Knative. Nobody noticed the difference in latency because a 1.5-second cold start on an internal admin dashboard is invisible. The only thing that changed was the bill.
The lesson: not every workload needs to run 24/7. The hard part is knowing which ones can sleep.
Did You Know?
- Knative was originally created by Google, Pivotal, IBM, Red Hat, and SAP in 2018. Google Cloud Run is literally Knative running as a managed service — when you deploy to Cloud Run, you are deploying a Knative Service.
- Knative can scale a service from 0 to 1,000 pods in under 30 seconds. The activator proxy buffers incoming requests during cold start so that no requests are dropped — they just wait.
- The Knative project removed its Build component in 2019 and handed that responsibility to Tekton (Module 3.2). This is a rare example of a project intentionally shrinking its scope to do fewer things better.
- As of 2025, Knative is a CNCF incubating project. It powers serverless platforms at companies including IBM (Code Engine), VMware (Tanzu), and Google (Cloud Run), processing billions of requests daily across these platforms.
What Is Serverless on Kubernetes?
Section titled “What Is Serverless on Kubernetes?”Let us clear up the biggest misconception first.
"SERVERLESS" DOES NOT MEAN "NO SERVERS"================================================================
What marketing says: "No servers to manage!"
What it actually means: "Servers that manage themselves."
Specifically: 1. Scale to zero when idle (no pods running = no cost) 2. Scale up automatically when requests arrive 3. Developer provides container, platform handles the rest 4. Pay for what you use, not what you reserve
================================================================
TRADITIONAL DEPLOYMENT vs KNATIVE================================================================
Traditional: 3 replicas ──────────────────────────── 3 replicas Running 24/7, even at 3 AM with zero traffic
Knative: 0 pods ─── request ──▶ 1 pod ─── idle ──▶ 0 pods Only running when actively serving traffic
Cost comparison (per service): Traditional: ~$50/month (always on) Knative: ~$3/month (runs 2 hours/day average)================================================================Serverless on Kubernetes means your cluster still exists, your nodes still run, but individual workloads can sleep. Think of it like the difference between leaving every light in your house on 24/7 versus installing motion sensors.
Knative Serving
Section titled “Knative Serving”Knative Serving is the component that manages your serverless workloads. It introduces four Kubernetes Custom Resources that work together.
The Four Resources
Section titled “The Four Resources”KNATIVE SERVING RESOURCE MODEL================================================================
┌─────────────────────────────────────────────────────────────┐│ SERVICE (ksvc) ││ Top-level resource. You create this. ││ Manages everything below automatically. ││ ││ ┌───────────────────────────────────────────────────────┐ ││ │ CONFIGURATION │ ││ │ Desired state of your workload. │ ││ │ Each change creates a new Revision. │ ││ │ │ ││ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ ││ │ │ Revision v1 │ │ Revision v2 │ │ Revision v3 │ │ ││ │ │ (image:1.0) │ │ (image:1.1) │ │ (image:1.2) │ │ ││ │ │ │ │ │ │ (latest) │ │ ││ │ └─────────────┘ └─────────────┘ └─────────────┘ │ ││ └───────────────────────────────────────────────────────┘ ││ ││ ┌───────────────────────────────────────────────────────┐ ││ │ ROUTE │ ││ │ Maps traffic to Revisions. │ ││ │ Enables traffic splitting. │ ││ │ │ ││ │ 100% ──▶ Revision v3 (latest) │ ││ │ OR │ ││ │ 90% ──▶ Revision v2, 10% ──▶ Revision v3 (canary) │ ││ └───────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────┘
Key insight: You only create the SERVICE.Knative creates Configuration, Revisions, and Route for you.================================================================- Service (ksvc): The only resource you need to create. It manages the entire lifecycle.
- Configuration: Describes the desired state (container image, env vars, resource limits). Every update creates a new Revision.
- Revision: An immutable snapshot of your Configuration at a point in time. Think of it like a Git commit for your deployment.
- Route: Maps network traffic to one or more Revisions. This is how traffic splitting works.
Your First Knative Service
Section titled “Your First Knative Service”apiVersion: serving.knative.dev/v1kind: Servicemetadata: name: hello namespace: defaultspec: template: metadata: annotations: # Scale to zero after 60 seconds of no traffic autoscaling.knative.dev/scale-to-zero-pod-retention-period: "60s" spec: containers: - image: gcr.io/knative-samples/helloworld-go ports: - containerPort: 8080 env: - name: TARGET value: "World" resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 256Mi# Apply the servicekubectl apply -f hello-service.yaml
# Check the Knative servicekubectl get ksvc hello
# Output:# NAME URL LATESTCREATED LATESTREADY READY# hello http://hello.default.example.com hello-00001 hello-00001 True
# Check the automatically created resourceskubectl get configuration hellokubectl get revision -l serving.knative.dev/service=hellokubectl get route helloAutoscaling: From 0 to N
Section titled “Autoscaling: From 0 to N”Knative uses the Knative Pod Autoscaler (KPA) by default, which is more responsive than the standard HPA for serverless workloads.
apiVersion: serving.knative.dev/v1kind: Servicemetadata: name: autoscale-demospec: template: metadata: annotations: # Autoscaler class: kpa (default) or hpa autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
# Target concurrent requests per pod autoscaling.knative.dev/target: "10"
# Minimum replicas (0 enables scale-to-zero) autoscaling.knative.dev/min-scale: "0"
# Maximum replicas autoscaling.knative.dev/max-scale: "50"
# Scale down delay (seconds of idle before scaling to zero) autoscaling.knative.dev/scale-to-zero-pod-retention-period: "30s"
# Initial scale when waking from zero autoscaling.knative.dev/initial-scale: "1" spec: containers: - image: gcr.io/knative-samples/autoscale-go ports: - containerPort: 8080The key metric is concurrency — how many simultaneous requests each pod should handle. If target is 10 and 50 requests arrive simultaneously, Knative scales to 5 pods.
Scale to Zero: How It Actually Works
Section titled “Scale to Zero: How It Actually Works”This is the magic of Knative, and understanding the mechanics helps you tune it properly.
SCALE-TO-ZERO LIFECYCLE================================================================
Phase 1: ACTIVE (pods running, serving traffic)───────────────────────────────────────────────── Client ──▶ Ingress ──▶ Pod (serving requests) Pod Pod
Phase 2: GRACE PERIOD (no traffic, counting down)───────────────────────────────────────────────── No requests for 30s... Knative Autoscaler: "Scale-to-zero timer started." Pods still running, ready to serve.
Phase 3: SCALED TO ZERO (no pods, activator watching)───────────────────────────────────────────────── Knative terminates all pods. Ingress now points to the ACTIVATOR (a Knative system component). No application pods exist. Zero resource consumption.
Phase 4: COLD START (request arrives, waking up)───────────────────────────────────────────────── Client ──▶ Ingress ──▶ Activator (holds the request) │ ├── Signals autoscaler: "Wake up!" ├── Autoscaler creates pod(s) ├── Waits for pod to be Ready └── Forwards buffered request to pod │ Client ◀── Response ◀── Pod ◀─────────┘
Phase 5: ACTIVE AGAIN───────────────────────────────────────────────── Ingress switches from Activator back to pod directly. Subsequent requests go straight to pods (no activator overhead).================================================================The Activator
Section titled “The Activator”The activator is Knative’s secret weapon. It is a reverse proxy that sits in the data path only when a service is scaled to zero. When a request arrives for a sleeping service:
- The activator receives the request and buffers it (the client waits)
- It tells the autoscaler to create pods
- Once a pod passes its readiness probe, the activator forwards the buffered request
- The ingress layer switches to routing directly to pods
- The activator steps out of the data path
No requests are dropped. The client just experiences a delay (the cold start latency).
Cold Start Latency
Section titled “Cold Start Latency”Cold start is the time between “request arrives” and “pod is ready to serve.” It depends on:
COLD START BREAKDOWN================================================================
Component Typical Time─────────────────────────────────────────────────Activator receives request ~5msAutoscaler decision ~50msAPI server creates pod ~100msScheduler places pod ~50msContainer image pull 0ms (cached) to 30s+ (not cached)Container startup ~200ms to 5s (app dependent)Readiness probe passes ~100ms to 10s (app dependent)─────────────────────────────────────────────────TOTAL (image cached, fast app) ~500ms to 2sTOTAL (image not cached, slow) 5s to 45s
HOW TO MINIMIZE COLD START:1. Use small container images (distroless, alpine)2. Pre-pull images on nodes (DaemonSet trick)3. Keep application startup fast (lazy initialization)4. Set min-scale: 1 for latency-sensitive services5. Use initial-scale to start multiple pods on wake================================================================Knative Eventing
Section titled “Knative Eventing”Knative Serving handles request/response workloads. Knative Eventing handles event-driven architectures — reacting to things that happen rather than requests that arrive.
Core Concepts
Section titled “Core Concepts”KNATIVE EVENTING ARCHITECTURE================================================================
┌──────────────────────────────────────────────────────────┐│ SOURCES ││ (Where events come from) ││ ││ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Kafka │ │ GitHub │ │ Cron │ │ API │ ││ │ Source │ │ Source │ │ Source │ │ Source │ ││ └────┬────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │└───────┼────────────┼────────────┼────────────┼──────────┘ │ │ │ │ ▼ ▼ ▼ ▼┌──────────────────────────────────────────────────────────┐│ BROKER ││ (Central event bus, receives all events) ││ Events are CloudEvents (standard format) ││ ││ ┌─────────────────────────────────────────────────────┐ ││ │ Event: {type: "dev.knative.kafka.event", │ ││ │ source: "kafka-cluster", │ ││ │ data: {...}} │ ││ └─────────────────────────────────────────────────────┘ │└─────────────────────┬────────────────────────────────────┘ │ ┌─────────────┼─────────────┐ ▼ ▼ ▼┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Trigger │ │ Trigger │ │ Trigger ││ filter: │ │ filter: │ │ filter: ││ type=order │ │ type=payment│ │ type=* ││ │ │ │ │ │ │ │ ││ ▼ │ │ ▼ │ │ ▼ ││ Order Svc │ │ Payment │ │ Audit Log ││ (ksvc) │ │ Svc (ksvc)│ │ (ksvc) │└─────────────┘ └─────────────┘ └─────────────┘================================================================- Source: Produces events. Connects external systems (Kafka, GitHub webhooks, cron schedules, APIs) to the Knative eventing mesh.
- Broker: A central event bus. Receives events and distributes them to Triggers.
- Trigger: A filter that routes events from a Broker to a subscriber (typically a Knative Service). You define which event types each subscriber cares about.
- CloudEvents: A CNCF specification for describing events in a standard way. Every event in Knative Eventing is a CloudEvent with
type,source,id,data, and other metadata.
Eventing Example
Section titled “Eventing Example”# 1. Create a Broker (event bus)apiVersion: eventing.knative.dev/v1kind: Brokermetadata: name: default namespace: default---# 2. Create an event Source (cron job that fires every minute)apiVersion: sources.knative.dev/v1kind: PingSourcemetadata: name: heartbeat namespace: defaultspec: schedule: "*/1 * * * *" contentType: "application/json" data: '{"message": "heartbeat"}' sink: ref: apiVersion: eventing.knative.dev/v1 kind: Broker name: default---# 3. Create a Trigger (route heartbeat events to our service)apiVersion: eventing.knative.dev/v1kind: Triggermetadata: name: heartbeat-trigger namespace: defaultspec: broker: default filter: attributes: type: dev.knative.sources.ping subscriber: ref: apiVersion: serving.knative.dev/v1 kind: Service name: event-display---# 4. Create the subscriber serviceapiVersion: serving.knative.dev/v1kind: Servicemetadata: name: event-display namespace: defaultspec: template: spec: containers: - image: gcr.io/knative-releases/knative.dev/eventing/cmd/event_displayEvents flow: PingSource fires every minute, sends a CloudEvent to the Broker, the Trigger matches the event type and forwards it to the event-display service. If the service is scaled to zero, it wakes up to process the event.
Traffic Splitting
Section titled “Traffic Splitting”One of Knative’s most powerful features is built-in traffic splitting across Revisions. No service mesh required.
Blue-Green Deployment
Section titled “Blue-Green Deployment”apiVersion: serving.knative.dev/v1kind: Servicemetadata: name: my-appspec: template: metadata: name: my-app-v2 # Name this revision explicitly spec: containers: - image: my-registry/my-app:2.0 traffic: # All traffic goes to the new revision - revisionName: my-app-v2 percent: 100 # Tag the old revision so you can access it directly - revisionName: my-app-v1 percent: 0 tag: previousWith the previous tag, you can access the old version at previous-my-app.default.example.com for testing — even though it gets 0% of production traffic.
Canary Deployment
Section titled “Canary Deployment”apiVersion: serving.knative.dev/v1kind: Servicemetadata: name: my-appspec: template: metadata: name: my-app-v3 spec: containers: - image: my-registry/my-app:3.0 traffic: # 90% to current stable version - revisionName: my-app-v2 percent: 90 tag: stable # 10% to new canary version - revisionName: my-app-v3 percent: 10 tag: canary# Gradually shift traffic# 10% -> 25% -> 50% -> 100%
# Check current traffic splitkubectl get ksvc my-app -o jsonpath='{.status.traffic[*]}'
# Each tagged revision gets its own URL:# stable-my-app.default.example.com (always hits v2)# canary-my-app.default.example.com (always hits v3)# my-app.default.example.com (split 90/10)Installation
Section titled “Installation”Knative requires a networking layer. You have three main choices.
Option 1: Knative with Kourier (Lightweight)
Section titled “Option 1: Knative with Kourier (Lightweight)”Kourier is the simplest option — an Envoy-based ingress built specifically for Knative.
# Install Knative Serving CRDs and corekubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.16.0/serving-crds.yamlkubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.16.0/serving-core.yaml
# Install Kourier networking layerkubectl apply -f https://github.com/knative/net-kourier/releases/download/knative-v1.16.0/kourier.yaml
# Configure Knative to use Kourierkubectl patch configmap/config-network \ --namespace knative-serving \ --type merge \ --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'
# Install Knative Eventing (optional)kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.16.0/eventing-crds.yamlkubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.16.0/eventing-core.yaml
# Install the in-memory channel (for development)kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.16.0/in-memory-channel.yaml
# Install the MT-Channel-based brokerkubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.16.0/mt-channel-broker.yaml
# Verifykubectl get pods -n knative-servingkubectl get pods -n knative-eventingOption 2: Knative Operator (Production)
Section titled “Option 2: Knative Operator (Production)”# Install the Knative Operatorkubectl apply -f https://github.com/knative/operator/releases/download/knative-v1.16.0/operator.yaml
# Create a KnativeServing instancecat <<EOF | kubectl apply -f -apiVersion: operator.knative.dev/v1beta1kind: KnativeServingmetadata: name: knative-serving namespace: knative-servingspec: ingress: kourier: enabled: true config: network: ingress-class: "kourier.ingress.networking.knative.dev"EOF
# Create a KnativeEventing instancecat <<EOF | kubectl apply -f -apiVersion: operator.knative.dev/v1beta1kind: KnativeEventingmetadata: name: knative-eventing namespace: knative-eventingEOFNetworking Layer Comparison
Section titled “Networking Layer Comparison”| Feature | Kourier | Istio | Contour |
|---|---|---|---|
| Complexity | Low | High | Medium |
| Resource usage | ~100MB | ~500MB+ | ~200MB |
| mTLS | No | Yes | No |
| Service mesh features | No | Yes | No |
| Best for | Dev, simple production | Already using Istio | Already using Contour |
| Setup time | 2 minutes | 15+ minutes | 5 minutes |
Recommendation: Use Kourier unless you already have Istio or Contour in your cluster. Adding Istio just for Knative is overkill.
Comparison: Knative vs the Alternatives
Section titled “Comparison: Knative vs the Alternatives”| Feature | Knative | KEDA | AWS Lambda | Cloud Run |
|---|---|---|---|---|
| Runs on | Any Kubernetes | Any Kubernetes | AWS only | GCP only |
| Scale to zero | Yes | Yes | Yes | Yes |
| Cold start | 1-5s typical | 1-5s typical | 100ms-10s | 0-5s |
| Max execution time | Unlimited | Unlimited | 15 minutes | 60 minutes |
| Container support | Any container | Any container | Custom runtime | Any container |
| Event-driven | Yes (Eventing) | Yes (60+ scalers) | Yes (native) | Yes (via Eventarc) |
| Traffic splitting | Built-in | No | Weighted aliases | Built-in |
| Vendor lock-in | None | None | High | Medium |
| Networking | Configurable | Uses existing | VPC/API Gateway | Managed |
| Serving + Eventing | Both | Scaling only | Both | Serving only |
| Best for | Portable serverless | Scaling existing deployments | AWS-native functions | GCP-native containers |
When to Choose What
Section titled “When to Choose What”- Knative: You want serverless on Kubernetes without cloud vendor lock-in. You need both serving and eventing. You want built-in traffic splitting.
- KEDA: You already have Deployments and just want smarter autoscaling (especially event-driven). You do not need the full Knative serving model.
- AWS Lambda: You are all-in on AWS, your functions run under 15 minutes, and you want the lowest possible cold start times.
- Cloud Run: You are on GCP and want managed Knative without running the control plane yourself.
When to Use Knative
Section titled “When to Use Knative”GOOD FIT FOR KNATIVE================================================================
1. LOW-TRAFFIC SERVICES Internal tools, admin dashboards, reporting APIs that get 0-100 requests per hour. Scale-to-zero saves significant resources.
2. EVENT PROCESSORS Webhook receivers, queue consumers, notification handlers. Wake up when an event arrives, process it, go back to sleep.
3. BATCH JOBS WITH HTTP TRIGGERS Report generation, data exports, PDF rendering. Triggered on demand, no need to run 24/7.
4. DEV/STAGING ENVIRONMENTS 200 microservices in staging, most idle. Knative can cut staging costs by 60-80%.
5. MULTI-TENANT PLATFORMS Each tenant gets their own service instance. Inactive tenants scale to zero.
6. CANARY DEPLOYMENTS WITHOUT A SERVICE MESH Built-in traffic splitting means you do not need Istio just for canary releases.================================================================BAD FIT FOR KNATIVE================================================================
1. LATENCY-SENSITIVE SERVICES If p99 < 100ms matters, cold starts are unacceptable. Set min-scale: 1 (but then you lose scale-to-zero).
2. STATEFUL WORKLOADS Databases, caches, message brokers. These cannot be stopped and restarted on demand.
3. ALWAYS-ON HIGH-TRAFFIC SERVICES If your API handles 1000+ req/s 24/7, it will never scale to zero anyway. Knative adds overhead with no benefit.
4. LONG-RUNNING CONNECTIONS WebSockets, gRPC streams, SSE. Knative's activator does not handle persistent connections well.
5. WORKLOADS WITH EXPENSIVE STARTUP If your app takes 30+ seconds to start (JVM with huge classpath, ML model loading), cold starts become painful.
6. SERVICES WITH LARGE PERSISTENT VOLUMES PVCs cannot be dynamically attached/detached on scale events. Use regular Deployments with volume mounts.================================================================Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| Using Knative for all services | Always-on services gain nothing; Knative adds overhead and complexity | Profile traffic patterns first; only migrate services that are idle 50%+ of the time |
| Ignoring cold start latency | Users see 2-5 second delays on first request after idle period | Set min-scale: 1 for user-facing services, or use scale-to-zero only for internal/async workloads |
| Large container images | 500MB+ images cause 10-30 second cold starts when not cached on the node | Use distroless or alpine base images; keep images under 100MB; pre-pull with a DaemonSet |
| Not setting resource requests | Knative autoscaler cannot make good decisions without knowing pod resource consumption | Always set CPU and memory requests; the autoscaler uses these for scheduling decisions |
| Installing Istio just for Knative | Adds 500MB+ of memory overhead and significant operational complexity | Use Kourier unless you already have Istio for other reasons |
| Forgetting to configure DNS | Knative generates URLs like hello.default.example.com that do not resolve | Configure a real domain with a wildcard DNS record, or use Magic DNS (sslip.io) for development |
| Not testing cold start path | The warm path works fine, but the cold start path has different failure modes | Always test by scaling to zero manually (kubectl scale ksvc hello --replicas=0) and then sending a request |
| Setting scale-to-zero window too short | Service oscillates between 0 and 1 pods, wasting resources on repeated cold starts | Set scale-to-zero-pod-retention-period to at least 60s; longer for services with bursty traffic patterns |
| No readiness probe | Knative routes traffic before the app is actually ready, causing 503 errors during cold start | Define a readiness probe; Knative waits for it to pass before sending traffic |
| Mixing Knative and regular Services on same port | Network routing conflicts between Knative’s ingress and regular Kubernetes Services | Use separate namespaces or ensure no port/hostname overlap between Knative and standard services |
Question 1
Section titled “Question 1”What are the four Knative Serving resources, and which one do you actually create?
Show Answer
The four resources are Service (ksvc), Configuration, Revision, and Route.
You only create the Service. Knative automatically creates and manages the Configuration, Revisions, and Route. The Configuration describes the desired state, each change produces an immutable Revision (like a Git commit), and the Route determines how traffic is split across Revisions.
Question 2
Section titled “Question 2”Explain how the activator enables scale-to-zero without dropping requests.
Show Answer
When a Knative service scales to zero, the ingress layer routes traffic to the activator instead of directly to application pods (which no longer exist). When a request arrives:
- The activator buffers the request (the client waits)
- It signals the autoscaler to create pods
- The autoscaler schedules pod(s) and waits for the readiness probe to pass
- The activator forwards the buffered request to the now-ready pod
- The ingress switches to routing directly to pods, removing the activator from the data path
The client experiences a delay (cold start latency) but never a dropped request or an error.
Question 3
Section titled “Question 3”What is a Knative Revision and why is it immutable?
Show Answer
A Revision is an immutable snapshot of a Knative Configuration at a point in time. Every time you update a Knative Service (change the image, environment variables, resource limits, etc.), a new Revision is created.
Immutability is critical because it enables:
- Traffic splitting: You can route percentages of traffic to different Revisions for canary deployments
- Rollback: You can instantly shift 100% of traffic back to a previous Revision without redeploying
- Auditability: You can see exactly what was running at any point in time
Think of Revisions like Git commits — each one is a permanent record of a specific configuration state.
Question 4
Section titled “Question 4”You have a Knative Service with autoscaling.knative.dev/target: "10" and 75 concurrent requests arrive. How many pods will Knative create?
Show Answer
Knative will target 8 pods (75 / 10 = 7.5, rounded up to 8). The KPA (Knative Pod Autoscaler) uses the concurrency target to determine the desired replica count. With a target of 10 concurrent requests per pod and 75 total concurrent requests, it needs at least 8 pods to keep each pod at or below the target concurrency.
In practice, the autoscaler also considers a panic window for rapid scaling and a stable window for gradual adjustment, so the actual pod count may briefly be higher or lower depending on how quickly the traffic ramp occurred.
Question 5
Section titled “Question 5”What is the difference between Knative Eventing’s Broker and a Trigger?
Show Answer
A Broker is a central event bus that receives CloudEvents from Sources. It holds events and makes them available for filtering.
A Trigger is a subscription with a filter. It watches a Broker and routes matching events to a subscriber (typically a Knative Service). You define filter criteria on the Trigger — for example, only events with type: order.created — and the Broker delivers matching events to the Trigger’s subscriber.
The Broker/Trigger model decouples event producers from consumers. Producers send events to the Broker without knowing who will consume them. Consumers subscribe via Triggers without knowing who produces the events.
Question 6
Section titled “Question 6”Why is Kourier recommended over Istio for most Knative installations?
Show Answer
Kourier is recommended because:
- Resource usage: Kourier uses ~100MB of memory; Istio uses 500MB+ and adds sidecar proxies to every pod
- Complexity: Kourier is a single component purpose-built for Knative; Istio is a full service mesh with many components to manage
- Setup time: Kourier installs in 2 minutes; Istio takes 15+ minutes and requires ongoing configuration
- Scope: Unless you need mTLS, distributed tracing, or other service mesh features for your entire cluster, Istio is overkill
The only time to choose Istio is when you already have it installed for other reasons (mTLS, advanced traffic policies) and want Knative to use the existing infrastructure.
Question 7
Section titled “Question 7”A team deploys all 50 of their microservices on Knative with scale-to-zero. Users complain about intermittent slowness. What went wrong?
Show Answer
The team used Knative for services that should not scale to zero. The intermittent slowness is caused by cold starts — when a user hits a service that has scaled to zero, they wait 1-5 seconds (or more) while the pod starts up.
The fix is to profile traffic patterns and categorize services:
- High-traffic, user-facing services: Set
min-scale: 1(or use regular Deployments) - Internal/async services with bursty traffic: Scale-to-zero is appropriate
- Latency-sensitive services: Never scale to zero
Not every service benefits from scale-to-zero. The savings only matter for services that are idle a significant portion of the time, and the cold start penalty must be acceptable for the use case.
Question 8
Section titled “Question 8”How would you implement a canary deployment with Knative that sends 5% of traffic to a new version?
Show Answer
Update the Knative Service with a traffic split in the traffic section:
apiVersion: serving.knative.dev/v1kind: Servicemetadata: name: my-appspec: template: metadata: name: my-app-v2 spec: containers: - image: my-registry/my-app:2.0 traffic: - revisionName: my-app-v1 percent: 95 tag: stable - revisionName: my-app-v2 percent: 5 tag: canaryEach tagged revision gets its own URL (canary-my-app.default.example.com) for direct testing. To promote the canary, gradually increase the percentage (5 -> 25 -> 50 -> 100). To rollback, set the canary to 0% and stable to 100%.
No service mesh is needed — Knative handles traffic splitting natively through its Route resource.
Hands-On Exercise
Section titled “Hands-On Exercise”Objective
Section titled “Objective”Deploy a Knative service, watch it scale to zero, send a request to trigger cold start, observe it scale back up, and then perform a traffic split between two revisions.
Environment Setup
Section titled “Environment Setup”# Create a kind cluster with port mapping for Kouriercat <<EOF | kind create cluster --name knative-lab --config=-kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4nodes:- role: control-plane extraPortMappings: - containerPort: 31080 hostPort: 8080 protocol: TCPEOF
# Install Knative Servingkubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.16.0/serving-crds.yamlkubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.16.0/serving-core.yaml
# Install Kourierkubectl apply -f https://github.com/knative/net-kourier/releases/download/knative-v1.16.0/kourier.yaml
# Configure Knative to use Kourierkubectl patch configmap/config-network \ --namespace knative-serving \ --type merge \ --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'
# Configure Magic DNS (sslip.io) for local developmentkubectl patch configmap/config-domain \ --namespace knative-serving \ --type merge \ --patch '{"data":{"127.0.0.1.sslip.io":""}}'
# Patch Kourier to use NodePort for kindkubectl patch service kourier -n kourier-system \ --type merge \ --patch '{"spec":{"type":"NodePort","ports":[{"port":80,"targetPort":8080,"nodePort":31080}]}}'
# Wait for Knative to be readykubectl wait --for=condition=Ready pods --all -n knative-serving --timeout=120skubectl wait --for=condition=Ready pods --all -n kourier-system --timeout=120s
# Verify installationkubectl get pods -n knative-servingkubectl get pods -n kourier-systemStep 1: Deploy a Knative Service.
# Save as hello-knative.yamlapiVersion: serving.knative.dev/v1kind: Servicemetadata: name: hello namespace: defaultspec: template: metadata: annotations: autoscaling.knative.dev/scale-to-zero-pod-retention-period: "30s" spec: containers: - image: gcr.io/knative-samples/helloworld-go ports: - containerPort: 8080 env: - name: TARGET value: "KubeDojo Student" resources: requests: cpu: 100m memory: 128Mikubectl apply -f hello-knative.yaml
# Check the servicekubectl get ksvc hello
# Watch all created resourceskubectl get configuration hellokubectl get revision -l serving.knative.dev/service=hellokubectl get route helloStep 2: Send a request and observe pods.
Open a second terminal to watch pods:
# Terminal 2: watch podskubectl get pods -l serving.knative.dev/service=hello -wIn the first terminal:
# Send a requestcurl -H "Host: hello.default.127.0.0.1.sslip.io" http://localhost:8080
# Expected output: "Hello KubeDojo Student!"Step 3: Watch scale-to-zero.
# Wait 30+ seconds with no traffic# In Terminal 2, you should see the pod terminate
# Verify: no pods runningkubectl get pods -l serving.knative.dev/service=hello# Expected: No resources foundStep 4: Trigger cold start.
# Time the cold starttime curl -H "Host: hello.default.127.0.0.1.sslip.io" http://localhost:8080
# Note the total time -- this includes cold start latency# Expected: 1-5 seconds depending on your system
# Check pods again -- one should be running nowkubectl get pods -l serving.knative.dev/service=helloStep 5: Deploy a second revision and split traffic.
# Save as hello-v2.yamlapiVersion: serving.knative.dev/v1kind: Servicemetadata: name: hello namespace: defaultspec: template: metadata: annotations: autoscaling.knative.dev/scale-to-zero-pod-retention-period: "30s" spec: containers: - image: gcr.io/knative-samples/helloworld-go ports: - containerPort: 8080 env: - name: TARGET value: "KubeDojo Graduate" resources: requests: cpu: 100m memory: 128Mi traffic: - latestRevision: false revisionName: hello-00001 percent: 80 tag: stable - latestRevision: true percent: 20 tag: canarykubectl apply -f hello-v2.yaml
# Verify two revisions existkubectl get revision -l serving.knative.dev/service=hello
# Send multiple requests and observe the splitfor i in $(seq 1 20); do curl -s -H "Host: hello.default.127.0.0.1.sslip.io" http://localhost:8080done
# You should see ~80% "Hello KubeDojo Student!" and ~20% "Hello KubeDojo Graduate!"
# Test tagged routes directlycurl -H "Host: stable-hello.default.127.0.0.1.sslip.io" http://localhost:8080# Always returns: "Hello KubeDojo Student!"
curl -H "Host: canary-hello.default.127.0.0.1.sslip.io" http://localhost:8080# Always returns: "Hello KubeDojo Graduate!"Step 6: Promote the canary.
# Shift all traffic to the new revisionkubectl patch ksvc hello --type merge --patch '{ "spec": { "traffic": [ {"revisionName": "hello-00001", "percent": 0, "tag": "previous"}, {"latestRevision": true, "percent": 100} ] }}'
# Verifycurl -H "Host: hello.default.127.0.0.1.sslip.io" http://localhost:8080# Should always return: "Hello KubeDojo Graduate!"Success Criteria
Section titled “Success Criteria”- Knative Serving is installed and all pods are running in knative-serving namespace
- A Knative Service is deployed and accessible via curl
- You observed the service scale to zero (0 pods) after the idle period
- You triggered a cold start and measured the latency
- You deployed a second revision and verified traffic splitting (80/20)
- You accessed individual revisions via tagged routes
- You promoted the canary to 100% traffic
Bonus Challenge
Section titled “Bonus Challenge”Install Knative Eventing and create a PingSource that fires every 30 seconds, a Broker, a Trigger, and an event-display service. Verify that the event-display service scales to zero between events and wakes up each time a CloudEvent arrives.
Further Reading
Section titled “Further Reading”- Knative Documentation
- Knative Serving Autoscaling
- CloudEvents Specification
- Knative Cookbook (O’Reilly)
- Google Cloud Run Documentation (managed Knative)
Next Module
Section titled “Next Module”Return to Module 6.2: KEDA to compare event-driven autoscaling approaches, or explore Module 6.4: FinOps with OpenCost to measure the cost savings from scale-to-zero.
“The cheapest pod is the one that is not running.”