Module 1.2: OTel Collector Advanced
Complexity:
[COMPLEX]- Multiple interacting components, pipeline logicTime to Complete: 60-75 minutes
Prerequisites: Module 1 (OpenTelemetry Fundamentals), basic Kubernetes knowledge
OTCA Domain: Domain 3 - OTel Collector (26% of exam)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Design multi-pipeline Collector configurations that route traces, metrics, and logs through distinct receiver/processor/exporter chains
- Configure advanced processors (filter, transform, tail-sampling, batch) to reduce volume while preserving critical signals
- Deploy the Collector as a DaemonSet (agent) and Deployment (gateway) with proper resource limits, health checks, and scaling
- Debug pipeline issues using the debug exporter, zpages, and Collector internal metrics to identify data loss or bottlenecks
Why This Module Matters
Section titled “Why This Module Matters”The OpenTelemetry Collector is the backbone of every production observability pipeline. It receives, processes, and exports telemetry data — traces, metrics, and logs — and it does so at scale, reliably, and vendor-neutrally. Domain 3 accounts for 26% of your OTCA exam. You cannot pass without mastering the Collector.
If OpenTelemetry is the universal language of observability, the Collector is the postal service. It picks up signals from your applications, routes them through processing, and delivers them wherever they need to go. Misconfigure it and you get data loss, memory explosions, or a pipeline that silently drops the traces you need most.
War Story: The Silent Pipeline
A platform team deployed the OTel Collector to replace their vendor-specific agents. Everything looked green — health checks passed, pods were running. But after two weeks, the on-call engineer noticed zero traces for their payment service. The culprit? A
filterprocessor with a regex that accidentally matched thepayment-service prefix. The Collector was healthy. The pipeline was working. It was just filtering out exactly the data they needed most. Lesson: always validate your pipeline end-to-end with thedebugexporter before going to production. Trust, but verify.
Did You Know?
Section titled “Did You Know?”- The OTel Collector can process over 1 million spans per second on a single instance with proper tuning. Most teams hit config issues long before they hit performance limits.
- The
spanmetricsconnector can generate RED metrics (Rate, Errors, Duration) automatically from traces — meaning you get metrics for free without instrumenting anything twice. - The Collector’s
transformprocessor uses OTTL (OpenTelemetry Transformation Language), a purpose-built language that lets you modify, filter, and route telemetry using SQL-like expressions. - There are three official distributions of the Collector: Core (minimal), Contrib (batteries-included), and Custom (build your own with
ocb). The exam expects you to know when to use each.
Part 1: Collector Architecture and Config Structure
Section titled “Part 1: Collector Architecture and Config Structure”1.1 The config.yaml Anatomy
Section titled “1.1 The config.yaml Anatomy”Every Collector configuration has five top-level sections. Think of it as assembling a factory: you define what comes in (receivers), how it gets processed (processors), where it goes out (exporters), and then you wire them together (service/pipelines).
# The five building blocks of every Collector configreceivers: # How data gets IN to the Collectorprocessors: # How data gets TRANSFORMED inside the Collectorexporters: # How data gets OUT of the Collectorconnectors: # Bridge between pipelines (output of one, input of another)extensions: # Auxiliary services (health checks, auth, debugging)
service: # Wires everything together into pipelines extensions: [health_check, zpages] pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp/backend] metrics: receivers: [otlp, prometheus] processors: [batch] exporters: [prometheusremotewrite] logs: receivers: [otlp, filelog] processors: [batch] exporters: [otlp/backend]Key rules for the service section:
- A component declared in
receivers/processors/exportersdoes nothing until it appears in a pipeline underservice. - Processors execute in the order listed — order matters.
- A single receiver/exporter can appear in multiple pipelines.
- Pipeline names under
traces,metrics, andlogsare the three signal types.
1.2 Data Flow
Section titled “1.2 Data Flow”┌─────────────────────────────────────────────────────────────────┐│ OTel Collector ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │Receivers │──▶│Processors│──▶│Exporters │──▶│ Backends │ ││ │ │ │ │ │ │ │ │ ││ │ otlp │ │ batch │ │ otlp │ │ Jaeger │ ││ │ prometheus│ │ filter │ │ prometheus│ │ Prometheus│ ││ │ filelog │ │ transform│ │ debug │ │ Loki │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ Extensions: health_check, zpages, pprof, bearertokenauth │ ││ └────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────┘Part 2: Receivers — Getting Data In
Section titled “Part 2: Receivers — Getting Data In”Receivers listen for or pull telemetry data. They are the entry point of every pipeline.
2.1 OTLP Receiver (The Universal Input)
Section titled “2.1 OTLP Receiver (The Universal Input)”The otlp receiver is the most important receiver on the exam. It accepts data over both gRPC (port 4317) and HTTP (port 4318).
receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 max_recv_msg_size_mib: 4 # Default: 4 MiB http: endpoint: 0.0.0.0:4318 cors: allowed_origins: ["*"] # For browser-based appsgRPC vs HTTP — when to use each:
| Aspect | gRPC (:4317) | HTTP (:4318) |
|---|---|---|
| Performance | Higher throughput, streaming | Slightly lower |
| Compression | Built-in (gzip, zstd) | Requires config |
| Firewall-friendly | No (HTTP/2, specific ports) | Yes (standard HTTP) |
| Browser support | No (needs proxy) | Yes (for web apps) |
| Best for | Service-to-collector, collector-to-collector | Browser RUM, edge ingestion |
Exam tip: Default OTLP ports — 4317 for gRPC, 4318 for HTTP. These are tested frequently.
2.2 Prometheus Receiver
Section titled “2.2 Prometheus Receiver”Scrapes Prometheus-format metrics endpoints. Useful when you want the Collector to replace or augment a Prometheus server.
receivers: prometheus: config: scrape_configs: - job_name: 'k8s-pods' scrape_interval: 15s kubernetes_sd_configs: - role: pod2.3 Filelog Receiver
Section titled “2.3 Filelog Receiver”Reads logs from files on disk — essential for collecting container logs from nodes.
receivers: filelog: include: [/var/log/pods/*/*/*.log] operators: - type: json_parser timestamp: parse_from: attributes.time layout: '%Y-%m-%dT%H:%M:%S.%LZ'2.4 Host Metrics Receiver
Section titled “2.4 Host Metrics Receiver”Collects system-level metrics (CPU, memory, disk, network) from the host.
receivers: hostmetrics: collection_interval: 30s scrapers: cpu: {} memory: {} disk: {} filesystem: {} network: {} load: {}2.5 Kubernetes Cluster Receiver
Section titled “2.5 Kubernetes Cluster Receiver”Collects cluster-level metrics from the Kubernetes API server — node count, pod phases, resource quotas.
receivers: k8s_cluster: collection_interval: 30s node_conditions_to_report: [Ready, MemoryPressure] allocatable_types_to_report: [cpu, memory]This receiver needs RBAC access to the Kubernetes API. It typically runs in gateway mode (one instance), not on every node.
Part 3: Processors — Transforming Data In-Flight
Section titled “Part 3: Processors — Transforming Data In-Flight”Processors modify telemetry between receivers and exporters. Order matters — they execute sequentially as listed in the pipeline.
3.1 Batch Processor
Section titled “3.1 Batch Processor”Groups data into batches before sending. This is almost always the first processor you should add — it dramatically reduces export overhead.
processors: batch: send_batch_size: 8192 # Number of items per batch send_batch_max_size: 10000 # Hard upper limit timeout: 200ms # Flush interval even if batch isn't full3.2 Memory Limiter Processor
Section titled “3.2 Memory Limiter Processor”Prevents the Collector from running out of memory. Should be the first processor in every pipeline.
processors: memory_limiter: check_interval: 1s limit_mib: 512 # Hard limit spike_limit_mib: 128 # Buffer for spikesWhen memory exceeds limit_mib - spike_limit_mib (384 MiB in this example), the processor starts refusing data. When it exceeds limit_mib, it force-drops data. This prevents OOM kills.
3.3 Filter Processor
Section titled “3.3 Filter Processor”Drops telemetry that matches (or doesn’t match) conditions. Use it to reduce costs by discarding noisy, low-value data.
processors: filter: error_mode: ignore traces: span: - 'attributes["http.route"] == "/healthz"' # Drop health checks - 'attributes["http.route"] == "/readyz"' metrics: metric: - 'name == "http.server.duration" and resource.attributes["service.name"] == "debug-svc"'3.4 Attributes Processor
Section titled “3.4 Attributes Processor”Add, update, delete, or hash attributes on spans, metrics, or logs.
processors: attributes: actions: - key: environment value: production action: upsert # Insert or update - key: db.password action: delete # Remove sensitive data - key: user.email action: hash # Hash PII3.5 Transform Processor (OTTL)
Section titled “3.5 Transform Processor (OTTL)”The most powerful processor. Uses OTTL (OpenTelemetry Transformation Language) for arbitrary transformations.
processors: transform: error_mode: ignore trace_statements: - context: span statements: - set(attributes["deployment.env"], "prod") where resource.attributes["k8s.namespace.name"] == "production" - truncate_all(attributes, 256) # Limit attribute value length - replace_pattern(attributes["http.url"], "token=([^&]*)", "token=***") metric_statements: - context: datapoint statements: - convert_sum_to_gauge() where metric.name == "system.cpu.time" log_statements: - context: log statements: - merge_maps(attributes, ParseJSON(body), "insert") where IsMatch(body, "^\\{")OTTL is a high-priority exam topic. Know these key functions: set, delete, truncate_all, replace_pattern, merge_maps, ParseJSON, IsMatch.
3.6 Tail Sampling Processor
Section titled “3.6 Tail Sampling Processor”Makes sampling decisions after seeing complete traces. Runs only in gateway mode (needs full traces).
processors: tail_sampling: decision_wait: 10s # Wait for trace to complete num_traces: 100000 # Traces held in memory policies: - name: errors-always type: status_code status_code: {status_codes: [ERROR]} - name: slow-traces type: latency latency: {threshold_ms: 1000} - name: low-volume-sample type: probabilistic probabilistic: {sampling_percentage: 10}This keeps 100% of errors, 100% of slow traces, and 10% of everything else. The decision_wait must be long enough for all spans in a trace to arrive.
Part 4: Exporters — Sending Data Out
Section titled “Part 4: Exporters — Sending Data Out”Exporters send processed telemetry to backends.
4.1 OTLP Exporter (gRPC)
Section titled “4.1 OTLP Exporter (gRPC)”The default for Collector-to-Collector or Collector-to-backend communication.
exporters: otlp: endpoint: tempo.observability.svc.cluster.local:4317 tls: insecure: false cert_file: /certs/client.crt key_file: /certs/client.key compression: gzip # or zstd retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s4.2 OTLP HTTP Exporter
Section titled “4.2 OTLP HTTP Exporter”Same protocol, HTTP transport. Use when gRPC is blocked by firewalls or proxies.
exporters: otlphttp: endpoint: https://ingest.example.com compression: gzip headers: Authorization: "Bearer ${env:API_TOKEN}"4.3 Prometheus Exporter
Section titled “4.3 Prometheus Exporter”Exposes a /metrics endpoint that Prometheus can scrape. Converts OTLP metrics to Prometheus format.
exporters: prometheus: endpoint: 0.0.0.0:8889 namespace: otel resource_to_telemetry_conversion: enabled: true # Promote resource attributes to labels4.4 Debug Exporter
Section titled “4.4 Debug Exporter”Prints telemetry to stdout. Essential for development and troubleshooting.
exporters: debug: verbosity: detailed # basic | normal | detailed sampling_initial: 5 # First N items logged sampling_thereafter: 200 # Then every Nth item4.5 File Exporter
Section titled “4.5 File Exporter”Writes telemetry to files in JSON format. Useful for audit trails or offline analysis.
exporters: file: path: /data/otel-output.json rotation: max_megabytes: 100 max_days: 7 max_backups: 5Part 5: Connectors — Bridging Pipelines
Section titled “Part 5: Connectors — Bridging Pipelines”Connectors are both an exporter for one pipeline and a receiver for another. They transform one signal type into another.
5.1 Span Metrics Connector
Section titled “5.1 Span Metrics Connector”Generates RED metrics from traces automatically — no additional instrumentation needed.
connectors: spanmetrics: histogram: explicit: buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 500ms, 1s, 5s] dimensions: - name: http.method - name: http.status_code namespace: traces.spanmetrics
service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp/tempo, spanmetrics] # spanmetrics is an exporter here metrics: receivers: [otlp, spanmetrics] # spanmetrics is a receiver here processors: [batch] exporters: [prometheus]This is one of the most elegant features in OTel. Traces flow into the spanmetrics connector, and metrics flow out — giving you traces.spanmetrics.calls_total, traces.spanmetrics.duration_*, and error counts.
5.2 Count Connector
Section titled “5.2 Count Connector”Counts spans, metrics, or log records and emits the counts as metrics.
connectors: count: traces: spans: - name: span.count description: "Count of spans" logs: log_records: - name: log.record.count description: "Count of log records"Part 6: Extensions
Section titled “Part 6: Extensions”Extensions provide auxiliary capabilities that are not part of the data pipeline but support it.
extensions: health_check: endpoint: 0.0.0.0:13133 # Liveness/readiness probe target
zpages: endpoint: 0.0.0.0:55679 # Internal debug UI at /debug/tracez, /debug/pipelinez
pprof: endpoint: 0.0.0.0:1777 # Go pprof profiling
bearertokenauth: token: "${env:OTEL_AUTH_TOKEN}"
service: extensions: [health_check, zpages, pprof, bearertokenauth]| Extension | Purpose | Default Port |
|---|---|---|
health_check | K8s liveness/readiness probes | 13133 |
zpages | Debug UI: pipeline status, trace samples | 55679 |
pprof | Performance profiling | 1777 |
bearertokenauth | Authenticate incoming/outgoing requests | N/A |
Exam tip: zpages at /debug/pipelinez shows if pipelines are running. /debug/tracez shows sample traces passing through the Collector. This is your first stop when debugging.
Part 7: Deployment Patterns
Section titled “Part 7: Deployment Patterns”7.1 Agent vs Gateway
Section titled “7.1 Agent vs Gateway”Agent Mode (DaemonSet) Gateway Mode (Deployment)────────────────────── ─────────────────────────
┌─────────────────────┐ ┌─────────────────────┐│ Node 1 │ │ Node 1 ││ ┌─────┐ ┌────────┐ │ │ ┌─────┐ ││ │App A│─▶│Collector│─┤ │ │App A│──┐ ││ └─────┘ │(Agent) │ │ │ └─────┘ │ ││ ┌─────┐ │ │ │ │ ┌─────┐ │ ││ │App B│─▶│ │ │ │ │App B│──┤ ││ └─────┘ └───┬────┘ │ │ └─────┘ │ │└─────────────┼──────┘ └──────────┼──────────┘ │ │ ▼ │┌─────────────────────┐ ││ Node 2 │ ┌─────────▼──────────┐│ ┌─────┐ ┌────────┐ │ │ Gateway Collector ││ │App C│─▶│Collector│─┤───▶Backend │ (Deployment, 2+ ││ └─────┘ │(Agent) │ │ │ replicas) │──▶Backend│ └───┬────┘ │ │ │└─────────────┼──────┘ └─────────▲──────────┘ │ │ ▼ ┌──────────┼──────────┐ Backend │ Node 2 │ │ ┌─────┐ │ │ │ │App C│──┘ │ │ └─────┘ │ └─────────────────────┘| Aspect | Agent (DaemonSet) | Gateway (Deployment) |
|---|---|---|
| Deployment | One per node | Shared pool (2+ replicas) |
| Resource use | Light per node | Heavier but centralized |
| Tail sampling | Not possible (incomplete traces) | Yes (full traces arrive) |
| Host metrics | Yes (local access) | No |
| Filelog | Yes (local files) | No |
| Scaling | Scales with nodes | HPA on CPU/memory |
| Best for | Collection, basic processing | Aggregation, sampling, routing |
Production pattern: Use both. Agents on every node collect and forward. A gateway pool handles sampling, enrichment, and export.
Apps ──▶ Agent (DaemonSet) ──▶ Gateway (Deployment) ──▶ Backends - hostmetrics - tail_sampling - filelog - spanmetrics - memory_limiter - routing - batch - export to N backends7.2 Scaling the Gateway
Section titled “7.2 Scaling the Gateway”For horizontal scaling, use the load balancing exporter on agents to distribute traces across gateway replicas. This is critical for tail sampling — all spans of a trace must reach the same gateway instance.
# On the Agentexporters: loadbalancing: protocol: otlp: tls: insecure: true resolver: dns: hostname: otel-gateway-headless.observability.svc.cluster.local port: 4317The loadbalancing exporter uses trace ID-based routing — all spans with the same trace ID go to the same gateway. This is what makes tail sampling possible in a scaled deployment.
Part 8: OTLP Protocol Deep-Dive
Section titled “Part 8: OTLP Protocol Deep-Dive”OTLP (OpenTelemetry Protocol) is the native wire protocol of OpenTelemetry.
8.1 Protocol Comparison
Section titled “8.1 Protocol Comparison”| Feature | OTLP/gRPC | OTLP/HTTP |
|---|---|---|
| Transport | HTTP/2 with Protocol Buffers | HTTP/1.1 with Protobuf or JSON |
| Port | 4317 | 4318 |
| Compression | gzip, zstd (built-in) | gzip (via Content-Encoding) |
| Streaming | Yes (bidirectional) | No (request/response) |
| Path (traces) | N/A (gRPC service) | /v1/traces |
| Path (metrics) | N/A | /v1/metrics |
| Path (logs) | N/A | /v1/logs |
| Proxy support | Needs HTTP/2-aware proxy | Works with any HTTP proxy |
When to choose gRPC: Internal service-to-collector and collector-to-collector traffic where performance matters and you control the network.
When to choose HTTP: Browser telemetry (RUM), crossing firewalls/load balancers that don’t support HTTP/2, or when you need JSON-encoded payloads for debugging.
8.2 Compression
Section titled “8.2 Compression”Always enable compression in production. The difference is significant:
exporters: otlp: endpoint: gateway:4317 compression: zstd # Best ratio for telemetry data otlphttp: endpoint: https://ingest.example.com compression: gzip # More widely supportedzstd offers better compression ratios and speed than gzip but is less universally supported. For internal traffic, prefer zstd. For external endpoints, use gzip.
Part 9: OTel Operator for Kubernetes
Section titled “Part 9: OTel Operator for Kubernetes”The OpenTelemetry Operator extends Kubernetes to manage Collectors and auto-instrument applications.
9.1 Installing the Operator
Section titled “9.1 Installing the Operator”# Install cert-manager first (required dependency)kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml
# Install the OTel Operatorkubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml9.2 OpenTelemetryCollector CRD
Section titled “9.2 OpenTelemetryCollector CRD”The Operator manages Collector instances via a custom resource:
apiVersion: opentelemetry.io/v1beta1kind: OpenTelemetryCollectormetadata: name: otel-agent namespace: observabilityspec: mode: daemonset # daemonset | deployment | statefulset | sidecar image: otel/opentelemetry-collector-contrib:0.98.0 config: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: memory_limiter: check_interval: 1s limit_mib: 512 batch: {} exporters: otlp: endpoint: otel-gateway.observability.svc.cluster.local:4317 tls: insecure: true service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp]9.3 Auto-Instrumentation with the Instrumentation CRD
Section titled “9.3 Auto-Instrumentation with the Instrumentation CRD”The Operator can inject instrumentation into pods automatically — no code changes required.
apiVersion: opentelemetry.io/v1alpha1kind: Instrumentationmetadata: name: auto-instrumentation namespace: observabilityspec: exporter: endpoint: http://otel-agent-collector.observability.svc.cluster.local:4318 propagators: - tracecontext - baggage sampler: type: parentbased_traceidratio argument: "0.25" java: image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest python: image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest nodejs: image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latestThen annotate pods to opt in:
apiVersion: apps/v1kind: Deploymentmetadata: name: my-java-appspec: template: metadata: annotations: instrumentation.opentelemetry.io/inject-java: "true" # Java auto-instrument # Other options: # instrumentation.opentelemetry.io/inject-python: "true" # instrumentation.opentelemetry.io/inject-nodejs: "true" # instrumentation.opentelemetry.io/inject-dotnet: "true" spec: containers: - name: app image: my-java-app:latestThe Operator injects an init container with the instrumentation agent. The application starts with zero code changes and produces traces automatically.
Part 10: Collector Distributions
Section titled “Part 10: Collector Distributions”10.1 Core vs Contrib vs Custom
Section titled “10.1 Core vs Contrib vs Custom”| Distribution | Components | Use Case |
|---|---|---|
Core (otel/opentelemetry-collector) | ~20 components (otlp, batch, debug, etc.) | Minimal footprint, security-sensitive environments |
Contrib (otel/opentelemetry-collector-contrib) | 200+ components (all community receivers, processors, exporters) | Development, when you need specific integrations |
Custom (built with ocb) | Exactly what you choose | Production — include only what you use |
10.2 Building a Custom Collector
Section titled “10.2 Building a Custom Collector”The OpenTelemetry Collector Builder (ocb) creates purpose-built distributions:
dist: name: my-collector description: "Production collector" output_path: ./dist otelcol_version: "0.98.0"
receivers: - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.98.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/filelogreceiver v0.98.0
processors: - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.98.0 - gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.98.0
exporters: - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.98.0 - gomod: go.opentelemetry.io/collector/exporter/debugexporter v0.98.0# Build itocb --config builder-config.yamlWhy custom? Smaller binary (50MB vs 200MB+), smaller attack surface, faster startup, only the dependencies you actually audit.
Part 11: Debug Pipeline
Section titled “Part 11: Debug Pipeline”When things go wrong (and they will), here is your debugging toolkit.
11.1 Debug Exporter
Section titled “11.1 Debug Exporter”Add it to any pipeline to see what data is flowing:
exporters: debug: verbosity: detailed
service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp/backend, debug] # Add debug alongside real exporter11.2 Internal Telemetry
Section titled “11.2 Internal Telemetry”The Collector can report its own metrics:
service: telemetry: logs: level: debug # debug | info | warn | error encoding: json # For structured log parsing metrics: level: detailed # none | basic | normal | detailed address: 0.0.0.0:8888 # Collector's own /metrics endpointKey internal metrics to watch:
otelcol_receiver_accepted_spans— Are spans arriving?otelcol_processor_dropped_spans— Is the filter/sampling dropping too much?otelcol_exporter_sent_spans— Are spans leaving?otelcol_exporter_send_failed_spans— Is the backend rejecting data?
11.3 zpages for Live Debugging
Section titled “11.3 zpages for Live Debugging”With the zpages extension enabled at port 55679:
| Endpoint | What It Shows |
|---|---|
/debug/pipelinez | Active pipelines and their components |
/debug/tracez | Sample traces flowing through the Collector |
/debug/rpcz | gRPC call statistics |
/debug/extensionz | Running extensions |
# Port-forward to access zpageskubectl port-forward svc/otel-collector 55679:55679Complete Multi-Pipeline Configuration Example
Section titled “Complete Multi-Pipeline Configuration Example”This is a production-grade config with traces, metrics, and logs flowing through separate pipelines, with spanmetrics bridging traces to metrics:
receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 prometheus: config: scrape_configs: - job_name: 'k8s-pods' scrape_interval: 15s kubernetes_sd_configs: - role: pod filelog: include: [/var/log/pods/*/*/*.log] operators: - type: json_parser hostmetrics: collection_interval: 30s scrapers: cpu: {} memory: {} disk: {}
processors: memory_limiter: check_interval: 1s limit_mib: 1024 spike_limit_mib: 256 batch: send_batch_size: 8192 timeout: 200ms filter/healthz: error_mode: ignore traces: span: - 'attributes["http.route"] == "/healthz"' - 'attributes["http.route"] == "/readyz"' transform/redact: error_mode: ignore trace_statements: - context: span statements: - replace_pattern(attributes["http.url"], "token=([^&]*)", "token=REDACTED") log_statements: - context: log statements: - replace_pattern(body, "password=\\S+", "password=***")
exporters: otlp/tempo: endpoint: tempo.observability.svc.cluster.local:4317 tls: insecure: true otlp/loki: endpoint: loki.observability.svc.cluster.local:3100 tls: insecure: true prometheus: endpoint: 0.0.0.0:8889 debug: verbosity: basic
connectors: spanmetrics: histogram: explicit: buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 500ms, 1s, 5s] dimensions: - name: http.method - name: http.status_code
extensions: health_check: endpoint: 0.0.0.0:13133 zpages: endpoint: 0.0.0.0:55679
service: extensions: [health_check, zpages] telemetry: logs: level: info metrics: address: 0.0.0.0:8888 pipelines: traces: receivers: [otlp] processors: [memory_limiter, filter/healthz, transform/redact, batch] exporters: [otlp/tempo, spanmetrics, debug] metrics: receivers: [otlp, prometheus, hostmetrics, spanmetrics] processors: [memory_limiter, batch] exporters: [prometheus, debug] logs: receivers: [otlp, filelog] processors: [memory_limiter, transform/redact, batch] exporters: [otlp/loki, debug]Common Mistakes
Section titled “Common Mistakes”| Mistake | What Happens | Fix |
|---|---|---|
| Declaring a component but not adding it to a pipeline | Component is silently ignored | Always check the service.pipelines section |
| Wrong processor order (batch before memory_limiter) | OOM kills under load | memory_limiter first, batch last |
| Tail sampling on agents (DaemonSet) | Incomplete traces, bad sampling decisions | Tail sampling only in gateway mode |
| Using Contrib image in production | 200MB+ image, unused attack surface | Build custom distribution with ocb |
Forgetting error_mode: ignore on filter/transform | One bad record crashes the pipeline | Always set error_mode: ignore in production |
OTLP exporter with tls.insecure: false but no certs | Connection refused, export failures | Set insecure: true for internal traffic or provide valid certs |
| Not enabling compression on exporters | 3-5x more bandwidth usage | Always set compression: gzip or zstd |
| Running k8s_cluster receiver on every agent | Duplicate metrics, API server overload | Run k8s_cluster on a single gateway or Deployment |
Test your understanding of Domain 3 content.
Q1: What are the default OTLP ports for gRPC and HTTP?
Answer
gRPC: 4317, HTTP: 4318. These are standard across all OTel components.
Q2: In what order should memory_limiter and batch appear in a pipeline’s processor list?
Answer
memory_limiter should be first, batch should be last. The memory limiter needs to reject data before it accumulates in the batch buffer. The batch processor should be the final step before export to maximize batch efficiency.
Q3: Why can’t you run tail sampling on a DaemonSet (agent mode) Collector?
Answer
Tail sampling needs to see the complete trace before making a sampling decision. In agent mode, spans from different services land on different nodes, so no single agent sees all spans of a distributed trace. Tail sampling must run in gateway mode where all spans are forwarded to a central pool, and the loadbalancing exporter ensures all spans with the same trace ID reach the same gateway instance.
Q4: What is the purpose of a connector like spanmetrics?
Answer
A connector acts as both an exporter in one pipeline and a receiver in another. The spanmetrics connector receives traces from the traces pipeline and emits RED metrics (Rate, Errors, Duration) into the metrics pipeline. This lets you generate metrics from traces automatically without double-instrumentation.
Q5: What is the difference between Core, Contrib, and Custom Collector distributions?
Answer
- Core: Minimal set of ~20 components maintained by the OTel project. Small binary, limited functionality.
- Contrib: Community-maintained, 200+ components. Large binary, everything included.
- Custom: Built with
ocb(OpenTelemetry Collector Builder) to include only the specific components you need. Best for production — minimal attack surface, optimized size.
Q6: How does the loadbalancing exporter route data?
Answer
It routes based on trace ID using consistent hashing. All spans belonging to the same trace are sent to the same backend (gateway) instance. This is essential for tail sampling to work correctly in a horizontally scaled gateway deployment. It uses a DNS resolver to discover backend instances via a headless Kubernetes Service.
Q7: What annotation would you add to a Java Deployment to enable auto-instrumentation via the OTel Operator?
Answer
instrumentation.opentelemetry.io/inject-java: "true"This tells the OTel Operator’s webhook to inject an init container with the Java auto-instrumentation agent. The Instrumentation CRD must exist in the same namespace (or the annotation must reference a specific one).
Q8: You added a filter processor but telemetry is not being filtered. What is the most likely cause?
Answer
The filter processor is defined in the processors section but not included in the pipeline under service.pipelines. A component must appear in both its definition section and in a pipeline to be active. Check service.pipelines.<signal>.processors to confirm it is listed.
Q9: What zpages endpoint shows the status of active pipelines?
Answer
/debug/pipelinez — It shows all configured pipelines and their component status (receivers, processors, exporters). Access it by port-forwarding to port 55679 on the Collector.
Q10: When should you use OTLP/HTTP instead of OTLP/gRPC?
Answer
Use OTLP/HTTP when:
- Sending telemetry from browsers (gRPC doesn’t work in browsers)
- Crossing firewalls or proxies that don’t support HTTP/2
- You need JSON encoding for debugging
- Working with load balancers that only support HTTP/1.1
Use gRPC for internal collector-to-collector and service-to-collector traffic where performance matters.
Hands-On Exercise: Build a Multi-Signal Pipeline
Section titled “Hands-On Exercise: Build a Multi-Signal Pipeline”Objective: Deploy an OTel Collector that receives all three signals, generates spanmetrics, and outputs to debug.
# Create a kind cluster (skip if you already have one)kind create cluster --name otel-lab
# Create namespacekubectl create namespace observabilityStep 1: Deploy the Collector
Section titled “Step 1: Deploy the Collector”kubectl apply -n observability -f - <<'EOF'apiVersion: v1kind: ConfigMapmetadata: name: otel-collector-configdata: config.yaml: | receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318
processors: memory_limiter: check_interval: 1s limit_mib: 256 spike_limit_mib: 64 batch: send_batch_size: 1024 timeout: 1s
connectors: spanmetrics: dimensions: - name: http.method
exporters: debug: verbosity: detailed
extensions: health_check: endpoint: 0.0.0.0:13133 zpages: endpoint: 0.0.0.0:55679
service: extensions: [health_check, zpages] telemetry: metrics: address: 0.0.0.0:8888 pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [debug, spanmetrics] metrics: receivers: [otlp, spanmetrics] processors: [memory_limiter, batch] exporters: [debug] logs: receivers: [otlp] processors: [memory_limiter, batch] exporters: [debug]---apiVersion: apps/v1kind: Deploymentmetadata: name: otel-collectorspec: replicas: 1 selector: matchLabels: app: otel-collector template: metadata: labels: app: otel-collector spec: containers: - name: collector image: otel/opentelemetry-collector-contrib:0.98.0 args: ["--config=/etc/otel/config.yaml"] ports: - containerPort: 4317 - containerPort: 4318 - containerPort: 13133 - containerPort: 55679 volumeMounts: - name: config mountPath: /etc/otel livenessProbe: httpGet: path: / port: 13133 readinessProbe: httpGet: path: / port: 13133 volumes: - name: config configMap: name: otel-collector-config---apiVersion: v1kind: Servicemetadata: name: otel-collectorspec: selector: app: otel-collector ports: - name: otlp-grpc port: 4317 - name: otlp-http port: 4318 - name: health port: 13133 - name: zpages port: 55679EOFStep 2: Send Test Telemetry
Section titled “Step 2: Send Test Telemetry”# Wait for the collector to be readykubectl wait --for=condition=ready pod -l app=otel-collector -n observability --timeout=60s
# Port-forward to send datakubectl port-forward -n observability svc/otel-collector 4318:4318 &
# Send a test trace via OTLP/HTTPcurl -X POST http://localhost:4318/v1/traces \ -H "Content-Type: application/json" \ -d '{ "resourceSpans": [{ "resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "test-service"}}]}, "scopeSpans": [{ "spans": [{ "traceId": "5b8aa5a2d2c872e8321cf37308d69df2", "spanId": "051581bf3cb55c13", "name": "GET /api/users", "kind": 2, "startTimeUnixNano": "1000000000", "endTimeUnixNano": "2000000000", "attributes": [ {"key": "http.method", "value": {"stringValue": "GET"}}, {"key": "http.status_code", "value": {"intValue": "200"}} ] }] }] }] }'Step 3: Verify
Section titled “Step 3: Verify”# Check collector logs — you should see the trace in debug outputkubectl logs -n observability -l app=otel-collector --tail=50
# Check zpageskubectl port-forward -n observability svc/otel-collector 55679:55679 &# Open http://localhost:55679/debug/pipelinez in your browser
# Check Collector's own metricskubectl port-forward -n observability svc/otel-collector 8888:8888 &curl -s http://localhost:8888/metrics | grep otelcol_receiver_acceptedSuccess Criteria
Section titled “Success Criteria”You should see:
- The debug exporter printing the trace span with service name
test-service - The
spanmetricsconnector generating metrics (visible in the debug output for the metrics pipeline) otelcol_receiver_accepted_spans> 0 in the Collector’s own metrics- zpages showing all three pipelines active at
/debug/pipelinez
Key Takeaways for the Exam
Section titled “Key Takeaways for the Exam”- Config structure: Five sections (receivers, processors, exporters, connectors, extensions) + service to wire them.
- OTLP ports: gRPC = 4317, HTTP = 4318. Know these cold.
- Processor order:
memory_limiterfirst,batchlast. - Agent vs Gateway: Agents (DaemonSet) collect, Gateways (Deployment) aggregate and sample.
- Tail sampling: Gateway-only. Requires
loadbalancingexporter for horizontal scaling. - Connectors: Bridge pipelines.
spanmetricsgenerates RED metrics from traces. - Auto-instrumentation: OTel Operator +
InstrumentationCRD + pod annotation. - Distributions: Core (minimal), Contrib (everything), Custom via
ocb(production). - Debug toolkit:
debugexporter, zpages (/debug/pipelinez), internal telemetry metrics on:8888. - OTTL: The transform processor’s language —
set,delete,replace_pattern,merge_maps.
Next Module: OTCA Track Overview — Instrument applications using OTel SDKs across multiple languages.