Module 1.8: Coroot - Zero-Instrumentation Observability
Цей контент ще не доступний вашою мовою.
Complexity: [MEDIUM]
Section titled “Complexity: [MEDIUM]”Time to Complete: 90 minutes Prerequisites: Module 1.1 (Prometheus), Module 1.2 (OpenTelemetry), Basic Kubernetes concepts Learning Objectives:
- Understand eBPF-based auto-instrumentation observability
- Deploy Coroot on Kubernetes
- Use service maps and automatic SLO tracking
- Correlate metrics, traces, logs, and profiles without code changes
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy Coroot for automated Kubernetes observability with eBPF-based service map discovery
- Configure Coroot’s SLO monitoring with automatic latency, error rate, and availability tracking
- Implement Coroot’s continuous profiling integration for performance bottleneck identification
- Compare Coroot’s automated approach against manual Prometheus/Grafana setups for operational efficiency
Why This Module Matters
Section titled “Why This Module Matters”The engineering director stared at the spreadsheet, calculating the true cost of their “modern” observability stack. Fifty-three microservices. Each one needed instrumentation.
| Service Type | Count | Instrumentation Time | Engineer Cost |
|---|---|---|---|
| Node.js services | 23 | 4 hours each | $13,800 |
| Python services | 18 | 6 hours each | $16,200 |
| Go services | 8 | 5 hours each | $6,000 |
| Legacy Java | 4 | 12 hours each | $7,200 |
| Total | 53 | 276 hours | $43,200 |
And that was just the initial setup. Every new service needed instrumentation. Every framework upgrade risked breaking telemetry. Two full-time engineers spent 40% of their time maintaining observability code—not application features.
Then there was the incident that changed everything. A production outage lasted 47 minutes because traces ended at a legacy Java service with no instrumentation. The root cause—a database connection pool exhaustion—was invisible. Post-mortem cost: $127,000 in SLA credits.
“What if we didn’t need to instrument anything?” an SRE asked during the incident review.
Coroot eliminates the instrumentation burden.
Using eBPF to observe applications at the kernel level, Coroot automatically discovers services, tracks their dependencies, monitors SLOs, and provides distributed tracing—all without a single line of instrumentation code. It’s like having a full observability stack installed on day one, even for legacy applications you can’t modify.
That e-commerce company deployed Coroot on a Friday afternoon. By Monday morning: all 53 services visible, service map auto-generated, SLOs calculated, traces flowing through the legacy Java monolith that had been a black box for two years. Time invested: 2 hours of Helm commands. Not 276 hours of SDK integration.
Did You Know?
Section titled “Did You Know?”-
One fintech saved $380,000 annually by replacing Datadog with Coroot — Their 200-service deployment cost $31K/month in APM fees. Coroot (open source) plus ClickHouse hosting: ~$800/month. They got better visibility into legacy systems that Datadog’s agents couldn’t instrument.
-
Coroot detected a $2.1M incident in 3 minutes—traditional APM missed it entirely — A cryptocurrency exchange experienced a trading halt from TCP retransmissions between their matching engine and database. Application metrics showed nothing wrong. Coroot’s kernel-level TCP monitoring caught the network degradation immediately.
-
Zero-instrumentation tracing saves 2-4 weeks per microservice — Traditional distributed tracing requires SDK integration, context propagation code, and careful testing. Companies report 40-80 hours of engineering time per service. Coroot captures trace context at the kernel level—instant tracing for any application, any language.
-
eBPF profiling found a $450K memory leak invisible to APM — A SaaS company’s Go service had a native memory leak in a C library (CGO). Application heap metrics were stable, but container memory grew until OOMKill. Coroot’s continuous profiling saw the off-heap growth that Datadog and New Relic couldn’t detect.
Coroot Architecture
Section titled “Coroot Architecture”┌─────────────────────────────────────────────────────────────────────────────┐│ Kubernetes Cluster ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ Coroot Components │ ││ │ │ ││ │ ┌──────────────────┐ ┌──────────────────┐ │ ││ │ │ Coroot Server │ │ ClickHouse │ │ ││ │ │ │────▶│ (Time-series) │ │ ││ │ │ - API Server │ │ │ │ ││ │ │ - UI Dashboard │ │ - Metrics │ │ ││ │ │ - SLO Engine │ │ - Traces │ │ ││ │ │ - Alerting │ │ - Logs │ │ ││ │ └────────┬─────────┘ │ - Profiles │ │ ││ │ │ └──────────────────┘ │ ││ │ │ Pull metrics │ ││ │ ▼ │ ││ │ ┌──────────────────┐ │ ││ │ │ Prometheus │◀─── Scrapes coroot-node-agent │ ││ │ └──────────────────┘ │ ││ └──────────────────────────────────────────────────────────────────────┘ ││ ││ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ││ │ Node 1 │ │ Node 2 │ │ Node 3 │ ││ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ ││ │ │ coroot │ │ │ │ coroot │ │ │ │ coroot │ │ ││ │ │ node │ │ │ │ node │ │ │ │ node │ │ ││ │ │ agent │ │ │ │ agent │ │ │ │ agent │ │ ││ │ └────┬────┘ │ │ └────┬────┘ │ │ └────┬────┘ │ ││ │ │eBPF │ │ │eBPF │ │ │eBPF │ ││ │ ┌────┴────┐ │ │ ┌────┴────┐ │ │ ┌────┴────┐ │ ││ │ │ Kernel │ │ │ │ Kernel │ │ │ │ Kernel │ │ ││ │ │ Events │ │ │ │ Events │ │ │ │ Events │ │ ││ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ ││ └───────────────┘ └───────────────┘ └───────────────┘ │└─────────────────────────────────────────────────────────────────────────────┘Components
Section titled “Components”| Component | Role | Description |
|---|---|---|
| coroot-node-agent | Data collection | DaemonSet running eBPF programs to capture kernel-level events |
| Coroot Server | Processing & UI | Aggregates data, calculates SLOs, serves dashboard |
| ClickHouse | Storage | Columnar database for metrics, traces, logs, profiles |
| Prometheus | Metrics pipeline | Optional—Coroot can scrape Prometheus or use its own |
What Coroot Captures Automatically
Section titled “What Coroot Captures Automatically”Application Metrics (No SDK Required)
Section titled “Application Metrics (No SDK Required)”┌─────────────────────────────────────────────────────────────────┐│ Auto-Captured Metrics │├─────────────────────────────────────────────────────────────────┤│ ││ HTTP/gRPC Requests TCP Connections System ││ ├─ Request rate ├─ Connection count ├─ CPU usage ││ ├─ Error rate ├─ Retransmits ├─ Memory ││ ├─ Latency (p50/95/99) ├─ RTT latency ├─ Disk I/O ││ └─ Status codes └─ Failed connects └─ Network ││ ││ DNS Queries Container Metrics Profiling ││ ├─ Query count ├─ Restarts ├─ CPU ││ ├─ Resolution time ├─ OOM kills ├─ Memory ││ └─ NXDOMAIN errors └─ Resource limits └─ Off-CPU ││ │└─────────────────────────────────────────────────────────────────┘Automatic Service Discovery
Section titled “Automatic Service Discovery”Coroot discovers services by observing actual network traffic:
Discovered Services (auto-detected from traffic):├── frontend (10 pods)│ ├── Talks to: api-gateway, redis│ └── SLO: 99.9% availability, P99 < 200ms├── api-gateway (5 pods)│ ├── Talks to: user-service, order-service, postgres│ └── SLO: 99.95% availability, P99 < 100ms├── user-service (3 pods)│ ├── Talks to: postgres, redis│ └── SLO: 99.9% availability, P99 < 50ms└── postgres (1 pod) ├── Accepts from: api-gateway, user-service, order-service └── SLO: 99.99% availabilityInstalling Coroot
Section titled “Installing Coroot”Prerequisites
Section titled “Prerequisites”# Coroot requires:# - Kubernetes 1.21+# - Linux kernel 4.16+ (for eBPF)# - BTF (BPF Type Format) support in kernel
# Check BTF supportcat /sys/kernel/btf/vmlinux >/dev/null 2>&1 && echo "BTF supported" || echo "BTF not supported"
# Most modern distributions (Ubuntu 20.04+, RHEL 8+, Debian 11+) support BTFInstallation with Helm
Section titled “Installation with Helm”# Add Coroot Helm repositoryhelm repo add coroot https://coroot.github.io/helm-chartshelm repo update
# Create namespacekubectl create namespace coroot
# Install Coroot with ClickHousehelm install coroot coroot/coroot \ --namespace coroot \ --set clickhouse.enabled=true \ --set clickhouse.persistence.size=50Gi
# Install node agent (DaemonSet)helm install coroot-node-agent coroot/coroot-node-agent \ --namespace coroot \ --set coroot.url=http://coroot.coroot:8080Verify Installation
Section titled “Verify Installation”# Check all pods are runningkubectl get pods -n coroot
# Expected output:# NAME READY STATUS RESTARTS AGE# coroot-0 1/1 Running 0 2m# coroot-clickhouse-0 1/1 Running 0 2m# coroot-node-agent-xxxxx 1/1 Running 0 2m# coroot-node-agent-yyyyy 1/1 Running 0 2m
# Check node agent is collecting datakubectl logs -n coroot -l app.kubernetes.io/name=coroot-node-agent --tail=20
# Access the UIkubectl port-forward -n coroot svc/coroot 8080:8080
# Open http://localhost:8080The Coroot Dashboard
Section titled “The Coroot Dashboard”Service Map
Section titled “Service Map”The service map is automatically generated from observed traffic:
┌─────────────────────────────────────────────────────────────────────────┐│ Coroot Service Map ││ ││ ┌──────────┐ ││ │ Internet │ ││ └────┬─────┘ ││ │ ││ ▼ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ ingress │────▶│ frontend │────▶│ api │ ││ │ nginx │ │ (React) │ │ (Node.js)│ ││ └──────────┘ └──────────┘ └────┬─────┘ ││ │ ││ ┌─────────────────────┼─────────────────────┐ ││ │ │ │ ││ ▼ ▼ ▼ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ users │ │ orders │ │ products │ ││ │ (Python) │ │ (Go) │ │ (Rust) │ ││ └────┬─────┘ └────┬─────┘ └────┬─────┘ ││ │ │ │ ││ ▼ ▼ ▼ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ postgres │ │ redis │ │ postgres │ ││ └──────────┘ └──────────┘ └──────────┘ ││ ││ Legend: ──▶ HTTP ─·─▶ TCP Color: Green=Healthy Red=Issues │└─────────────────────────────────────────────────────────────────────────┘Automatic SLO Tracking
Section titled “Automatic SLO Tracking”Coroot calculates SLOs for every service without configuration:
┌─────────────────────────────────────────────────────────────────────────┐│ Service: api-gateway │├─────────────────────────────────────────────────────────────────────────┤│ ││ Availability SLO Latency SLO (P99) ││ ┌───────────────────────────┐ ┌───────────────────────────┐ ││ │ Target: 99.9% │ │ Target: 100ms │ ││ │ Current: 99.95% ✓ │ │ Current: 87ms ✓ │ ││ │ Budget remaining: 4h 32m │ │ Budget remaining: 2h 15m │ ││ └───────────────────────────┘ └───────────────────────────┘ ││ ││ Error Rate (Last Hour) Request Rate ││ ┌───────────────────────────┐ ┌───────────────────────────┐ ││ │ 0.05% │ │ 1,245 req/s │ ││ │ ▁▂▁▁▁▃▂▁▁▁▁▁▂▁▁▁▁▁▁ │ │ ▄▅▆▇▆▅▄▅▆▇▆▅▄▅▆▇▆▅▄ │ ││ └───────────────────────────┘ └───────────────────────────┘ ││ ││ Latency Distribution ││ ┌───────────────────────────────────────────────────────────────┐ ││ │ P50: 12ms P90: 45ms P95: 67ms P99: 87ms Max: 234ms │ ││ └───────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────┘Key Features
Section titled “Key Features”1. Zero-Instrumentation Distributed Tracing
Section titled “1. Zero-Instrumentation Distributed Tracing”Coroot captures distributed traces without any SDK:
┌─────────────────────────────────────────────────────────────────────────┐│ Trace: 7a8b9c0d-1234-5678-abcd-ef0123456789 ││ Duration: 234ms │├─────────────────────────────────────────────────────────────────────────┤│ ││ frontend │██████████████░░░░░░░░░░░░░░░░░░│ 45ms (GET /checkout) ││ └─────┬──────────────────────────── ││ │ ││ api-gateway │██████████████████████████│ 180ms ││ └──────┬─────────┬───────── ││ │ │ ││ user-service │█████░░░│ 23ms (validate token) ││ │ ││ order-service │████████████████│ 89ms ││ └──────┬───────── ││ │ ││ postgres │████████│ 34ms (SELECT) ││ ││ Timeline: 0ms 50ms 100ms 150ms 200ms 234ms │└─────────────────────────────────────────────────────────────────────────┘2. Continuous Profiling
Section titled “2. Continuous Profiling”Built-in CPU and memory profiling without code changes:
┌─────────────────────────────────────────────────────────────────────────┐│ CPU Profile: api-gateway (last 15 minutes) │├─────────────────────────────────────────────────────────────────────────┤│ ││ Function CPU % Self % ││ ├── main.handleRequest 45.2% 2.1% ││ │ ├── json.Unmarshal 18.7% 18.7% ││ │ ├── db.Query 15.3% 1.2% ││ │ │ └── net.(*conn).Read 14.1% 14.1% ││ │ └── http.ResponseWriter.Write 8.9% 8.9% ││ └── runtime.gcBgMarkWorker 5.4% 5.4% ││ ││ Top CPU consumers: json.Unmarshal (18.7%), net.Read (14.1%) ││ Recommendation: Consider using faster JSON library (jsoniter, sonic) ││ │└─────────────────────────────────────────────────────────────────────────┘3. Network-Level Insights
Section titled “3. Network-Level Insights”eBPF captures TCP-level details invisible to application metrics:
┌─────────────────────────────────────────────────────────────────────────┐│ Network Health: api-gateway → postgres │├─────────────────────────────────────────────────────────────────────────┤│ ││ Connection Stats (Last Hour) ││ ├── Active connections: 25 ││ ├── New connections/s: 12.4 ││ ├── Failed connections: 3 ││ └── Connection pool efficiency: 98.2% ││ ││ TCP Quality Metrics ││ ├── Round-trip time (P99): 1.2ms ││ ├── Retransmission rate: 0.02% ││ ├── Zero-window events: 0 ││ └── Connection resets: 2 ││ ││ ⚠ Alert: 3 failed connections detected ││ Cause: Connection timeout to postgres:5432 ││ Likely reason: Connection pool exhausted ││ │└─────────────────────────────────────────────────────────────────────────┘4. Log Analysis
Section titled “4. Log Analysis”Coroot correlates logs with metrics and traces:
# Logs are automatically associated with traces# Example log correlation in Coroot:
Trace: 7a8b9c0d... → Related Logs:├── 14:23:01.123 [INFO] api-gateway: Processing checkout request├── 14:23:01.145 [INFO] user-service: Token validated for user_id=12345├── 14:23:01.189 [INFO] order-service: Creating order├── 14:23:01.223 [ERROR] order-service: Payment failed: insufficient funds└── 14:23:01.234 [INFO] api-gateway: Returning 402 Payment RequiredCoroot vs Traditional Observability
Section titled “Coroot vs Traditional Observability”Comparison Matrix
Section titled “Comparison Matrix”| Feature | Coroot | Traditional Stack | APM (Datadog, etc.) |
|---|---|---|---|
| Setup time | Minutes | Hours/days | Hours |
| Code changes | None | Extensive | SDK integration |
| Language support | All (eBPF) | Per-language | Per-language |
| Legacy app support | Full | Limited | Limited |
| Distributed tracing | Automatic | Manual SDK | Manual SDK |
| Continuous profiling | Built-in | Separate tool | Add-on ($$$) |
| Cost | Open source | Open source | $$$$$ |
| Network visibility | TCP-level | Application-level | Application-level |
When to Choose Coroot
Section titled “When to Choose Coroot”Choose Coroot when:├── You have many services with no instrumentation├── You can't modify application code (legacy, third-party)├── You want fast time-to-value (minutes, not days)├── Budget constraints prevent commercial APM├── You need network-level visibility (TCP, DNS)└── You want unified metrics, traces, logs, profiles
Choose Traditional Stack when:├── You need detailed custom metrics├── Your kernel doesn't support eBPF/BTF├── You have extensive existing instrumentation└── You need very specific trace attributesWar Story: The Mysterious Memory Leak
Section titled “War Story: The Mysterious Memory Leak”A fintech company was experiencing periodic OOMKills in their payment processing service. The service would run fine for hours, then suddenly get killed by Kubernetes.
The Problem:
- Application memory metrics (from the app itself) showed stable heap usage
- Container memory kept growing
- No memory leaks visible in code review
- Restarting the service “fixed” it temporarily
Enter Coroot:
They deployed Coroot without any code changes:
Coroot Discovery:├── payment-service│ ├── Memory Profile (continuous)│ │ ├── Heap: 512MB (stable)│ │ ├── Stack: 24MB (stable)│ │ └── Off-heap: Growing!│ │ └── Native memory: CGO allocations not freed│ ││ └── Memory Timeline│ ├── Container RSS: 512MB → 2.1GB over 6 hours│ └── Heap stays at 512MB│ └── Difference: Native memory leakThe Discovery:
Coroot’s continuous profiling showed that the Go service was using CGO to call a C library for encryption. The C library had a memory leak—it allocated buffers that were never freed.
CPU Profile over time:├── crypto.Encrypt (C library)│ └── malloc() calls: increasing│ └── free() calls: NOT matching│Memory allocation flame graph:└── libcrypto.so → EVP_EncryptInit → malloc (no matching free)The Fix: Updated the crypto library to the latest version which fixed the leak.
Financial Impact
Section titled “Financial Impact”| Category | Before Coroot | With Coroot | Impact |
|---|---|---|---|
| OOMKill incidents/month | 12 | 0 | -100% |
| Incident cost (avg $8K each) | $96,000/year | $0 | $96,000 saved |
| Engineering time investigating | 6 hrs/incident × 12 | 2 hrs (one-time) | $10,800 saved |
| Customer churn from instability | 3%/year | 0.5%/year | $42,000 ARR saved |
| APM license (couldn’t see issue) | $18,000/year | $0 (open source) | $18,000 saved |
| ClickHouse hosting | $0 | $2,400/year | -$2,400 |
| Total Annual Impact | $164,400 |
The CTO’s post-mortem summary: “We paid $18,000/year for an APM that literally couldn’t see this bug. Coroot—which is free—found it in 15 minutes. The memory leak had been causing OOMKills for 8 months.”
The Lesson: Application-level metrics only showed the Go heap. Coroot’s eBPF-based profiling saw the whole container memory, including native allocations. Traditional APM would have missed this entirely.
Advanced Configuration
Section titled “Advanced Configuration”Custom SLO Definitions
Section titled “Custom SLO Definitions”apiVersion: v1kind: ConfigMapmetadata: name: coroot-config namespace: corootdata: config.yaml: | slo: # Custom availability targets availability: default: 99.9 overrides: - service: "payment-*" target: 99.99 - service: "internal-*" target: 99.0
# Custom latency targets latency: default: p99: 100ms overrides: - service: "api-gateway" p99: 50ms - service: "batch-processor" p99: 5sAlerting Configuration
Section titled “Alerting Configuration”# Alert rulesapiVersion: v1kind: ConfigMapmetadata: name: coroot-alerts namespace: corootdata: alerts.yaml: | alerts: - name: SLO Breach condition: slo.availability < target for: 5m severity: critical
- name: High Error Rate condition: error_rate > 1% for: 5m severity: warning
- name: Latency Degradation condition: latency.p99 > baseline * 2 for: 10m severity: warning
- name: Memory Pressure condition: container.memory.usage > 90% for: 5m severity: warningIntegration with Prometheus
Section titled “Integration with Prometheus”# Use existing Prometheus as data sourcehelm upgrade coroot coroot/coroot \ --namespace coroot \ --set prometheus.enabled=false \ --set prometheus.external.url=http://prometheus.monitoring:9090Integration with OpenTelemetry
Section titled “Integration with OpenTelemetry”# Export Coroot data to OTel Collectorhelm upgrade coroot coroot/coroot \ --namespace coroot \ --set opentelemetry.enabled=true \ --set opentelemetry.endpoint=http://otel-collector:4317Production Best Practices
Section titled “Production Best Practices”1. Resource Allocation
Section titled “1. Resource Allocation”# Recommended resources for node agentresources: coroot-node-agent: requests: cpu: 100m memory: 256Mi limits: cpu: 500m memory: 512Mi
# For Coroot server (scales with cluster size) coroot: requests: cpu: 500m memory: 1Gi limits: cpu: 2000m memory: 4Gi
# ClickHouse (scales with retention) clickhouse: requests: cpu: 1000m memory: 4Gi limits: cpu: 4000m memory: 16Gi2. Data Retention
Section titled “2. Data Retention”# Configure retention in ClickHouseclickhouse: retention: metrics: 30d traces: 7d logs: 14d profiles: 3d3. High Availability
Section titled “3. High Availability”# HA setuphelm install coroot coroot/coroot \ --namespace coroot \ --set replicas=3 \ --set clickhouse.replicas=3 \ --set clickhouse.zookeeper.enabled=trueCommon Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| Kernel without BTF | Node agent fails to start | Use kernel 5.4+ or install BTF manually |
| Insufficient ClickHouse storage | Data loss after a few days | Size storage for your retention needs |
| Not setting SLO targets | Default targets may not match your needs | Configure custom SLOs per service |
| Ignoring network metrics | Missing TCP-level issues | Review network tab for retransmits, timeouts |
| Running on all nodes | Overhead on system nodes | Use nodeSelector to exclude control plane |
| Not using labels | Hard to find services | Ensure pods have meaningful labels |
Troubleshooting
Section titled “Troubleshooting”Node Agent Not Starting
Section titled “Node Agent Not Starting”# Check logskubectl logs -n coroot -l app.kubernetes.io/name=coroot-node-agent
# Common issues:# 1. BTF not available# Error: "failed to load BTF from /sys/kernel/btf/vmlinux"# Fix: Upgrade kernel or install BTF
# 2. Insufficient permissions# Error: "operation not permitted"# Fix: Ensure privileged: true in securityContextNo Data in Dashboard
Section titled “No Data in Dashboard”# Verify node agent is collectingkubectl exec -n coroot -it $(kubectl get pod -n coroot -l app.kubernetes.io/name=coroot-node-agent -o jsonpath='{.items[0].metadata.name}') -- /coroot-node-agent --test
# Check Prometheus scrapingkubectl port-forward -n coroot svc/coroot-prometheus 9090:9090# Visit http://localhost:9090/targetsHigh Memory Usage
Section titled “High Memory Usage”# Check ClickHouse storagekubectl exec -n coroot clickhouse-0 -- clickhouse-client -q "SELECT table, formatReadableSize(sum(bytes)) FROM system.parts GROUP BY table"
# Reduce retention if needed# Or increase ClickHouse resourcesHands-On Exercise: Zero-Code Observability
Section titled “Hands-On Exercise: Zero-Code Observability”Objective: Deploy Coroot and get full observability for a demo application without any instrumentation.
# Create demo namespacekubectl create namespace demo
# Deploy a sample application (no instrumentation)kubectl apply -n demo -f - <<EOFapiVersion: apps/v1kind: Deploymentmetadata: name: frontendspec: replicas: 2 selector: matchLabels: app: frontend template: metadata: labels: app: frontend spec: containers: - name: frontend image: nginx:alpine ports: - containerPort: 80---apiVersion: v1kind: Servicemetadata: name: frontendspec: selector: app: frontend ports: - port: 80---apiVersion: apps/v1kind: Deploymentmetadata: name: apispec: replicas: 3 selector: matchLabels: app: api template: metadata: labels: app: api spec: containers: - name: api image: hashicorp/http-echo args: ["-text=Hello from API"] ports: - containerPort: 5678---apiVersion: v1kind: Servicemetadata: name: apispec: selector: app: api ports: - port: 5678---apiVersion: apps/v1kind: Deploymentmetadata: name: databasespec: replicas: 1 selector: matchLabels: app: database template: metadata: labels: app: database spec: containers: - name: redis image: redis:7-alpine ports: - containerPort: 6379---apiVersion: v1kind: Servicemetadata: name: databasespec: selector: app: database ports: - port: 6379EOF
# Wait for podskubectl wait --for=condition=ready pod -l app=frontend -n demo --timeout=60skubectl wait --for=condition=ready pod -l app=api -n demo --timeout=60skubectl wait --for=condition=ready pod -l app=database -n demo --timeout=60sTask 1: Install Coroot
Section titled “Task 1: Install Coroot”# Add Helm repohelm repo add coroot https://coroot.github.io/helm-chartshelm repo update
# Install Corootkubectl create namespace coroothelm install coroot coroot/coroot \ --namespace coroot \ --set clickhouse.enabled=true \ --set clickhouse.persistence.size=10Gi
# Install node agenthelm install coroot-node-agent coroot/coroot-node-agent \ --namespace coroot \ --set coroot.url=http://coroot.coroot:8080
# Wait for Corootkubectl wait --for=condition=ready pod -l app.kubernetes.io/name=coroot -n coroot --timeout=120sTask 2: Generate Traffic
Section titled “Task 2: Generate Traffic”# In a separate terminal, generate traffickubectl run traffic-generator --rm -i --tty --image=curlimages/curl -- sh -c 'while true; do curl -s http://frontend.demo.svc.cluster.local curl -s http://api.demo.svc.cluster.local:5678 sleep 0.5done'Task 3: Explore the Dashboard
Section titled “Task 3: Explore the Dashboard”# Port-forward Coroot UIkubectl port-forward -n coroot svc/coroot 8080:8080
# Open http://localhost:8080In the dashboard:
- Navigate to the Service Map - you should see frontend, api, database
- Click on any service to see:
- Automatic SLO metrics
- Request rate and error rate
- Latency percentiles
- Go to Traces - see requests flowing through services
- Check Logs - correlated with traces
Task 4: Verify Auto-Discovery
Section titled “Task 4: Verify Auto-Discovery”# Services discovered by Coroot (no configuration needed)# Expected:# - frontend (2 pods)# - api (3 pods)# - database (1 pod)# - Dependencies automatically mappedTask 5: Simulate an Issue
Section titled “Task 5: Simulate an Issue”# Scale down api to cause errorskubectl scale deployment api -n demo --replicas=0
# Watch the dashboard - you should see:# - Error rate spike# - SLO breach# - Service map showing api in red
# Scale back upkubectl scale deployment api -n demo --replicas=3Success Criteria
Section titled “Success Criteria”- Coroot deployed and running
- All demo services discovered automatically
- Service map shows correct dependencies
- SLO metrics displayed for each service
- Traces show request flow between services
- Issue simulation detected in dashboard
Cleanup
Section titled “Cleanup”kubectl delete namespace demokubectl delete namespace corootQuestion 1
Section titled “Question 1”What technology does Coroot use to capture observability data without code changes?
Show Answer
eBPF (Extended Berkeley Packet Filter)
Coroot uses eBPF programs running in the kernel to observe all network traffic, system calls, and resource usage. This allows capturing metrics, traces, and profiles without modifying application code.
Question 2
Section titled “Question 2”What are the main components of a Coroot deployment?
Show Answer
Coroot Server, coroot-node-agent (DaemonSet), and ClickHouse
- coroot-node-agent runs on every node to collect eBPF data
- Coroot Server processes data and serves the UI
- ClickHouse stores metrics, traces, logs, and profiles
Question 3
Section titled “Question 3”How does Coroot provide distributed tracing without SDK integration?
Show Answer
By capturing trace context headers at the kernel level
eBPF intercepts HTTP requests and extracts trace context headers (like traceparent from W3C Trace Context). This allows correlating requests across services without application changes.
Question 4
Section titled “Question 4”What kernel requirement must be met for Coroot to work?
Show Answer
BTF (BPF Type Format) support, typically kernel 4.16+ (5.4+ recommended)
BTF enables Coroot’s eBPF programs to work across different kernel versions without recompilation. Most modern Linux distributions include BTF support.
Question 5
Section titled “Question 5”What type of metrics can Coroot capture that traditional APM tools miss?
Show Answer
TCP-level network metrics like retransmits, RTT, connection failures, and DNS resolution times
Because Coroot observes at the kernel level, it sees network issues invisible to application-level metrics, such as packet retransmissions and connection timeouts.
Question 6
Section titled “Question 6”How does Coroot automatically calculate SLOs?
Show Answer
By measuring request success rate for availability and latency percentiles from observed traffic
Coroot calculates availability (% of successful requests) and latency SLOs (P50, P95, P99) automatically from traffic patterns, without manual configuration.
Question 7
Section titled “Question 7”What is the advantage of Coroot’s continuous profiling feature?
Show Answer
It shows CPU and memory hotspots in production without code changes or performance impact
Unlike traditional profilers that require instrumentation and add overhead, Coroot’s eBPF-based profiling runs continuously with minimal impact, catching issues in production that don’t reproduce in testing.
Question 8
Section titled “Question 8”When would you NOT choose Coroot over traditional observability?
Show Answer
When you need detailed custom metrics or your kernel doesn’t support eBPF/BTF
If you need highly specific business metrics, custom trace attributes, or run on older kernels without BTF support, traditional instrumentation may be necessary.
Key Takeaways
Section titled “Key Takeaways”- Zero instrumentation - Coroot uses eBPF to observe applications without code changes
- Automatic service discovery - Services and dependencies detected from actual traffic
- Built-in distributed tracing - Trace context captured at kernel level
- Continuous profiling - CPU and memory profiling without SDK integration
- Network-level visibility - TCP metrics, DNS, retransmits invisible to app metrics
- Automatic SLO tracking - Availability and latency calculated for every service
- Unified observability - Metrics, traces, logs, profiles in one tool
- Open source - Apache 2.0 license, no vendor lock-in
- Fast time-to-value - Minutes to full observability vs days with traditional stacks
- Legacy support - Works with any application you can’t or won’t modify
Further Reading
Section titled “Further Reading”- Coroot Documentation - Official documentation
- Coroot GitHub - Source code and issues
- eBPF.io - Understanding the underlying technology
- Coroot Blog - Technical deep-dives and use cases
Next Module
Section titled “Next Module”Continue to Module 2.1: ArgoCD for GitOps continuous delivery, or explore Module 1.6: Pixie to compare another eBPF-based observability tool.