Перейти до вмісту

Module 1.2: OTel Collector Advanced

Цей контент ще не доступний вашою мовою.

Complexity: [COMPLEX] - Multiple interacting components, pipeline logic

Time to Complete: 60-75 minutes

Prerequisites: Module 1 (OpenTelemetry Fundamentals), basic Kubernetes knowledge

OTCA Domain: Domain 3 - OTel Collector (26% of exam)


After completing this module, you will be able to:

  1. Design multi-pipeline Collector configurations that route traces, metrics, and logs through distinct receiver/processor/exporter chains
  2. Configure advanced processors (filter, transform, tail-sampling, batch) to reduce volume while preserving critical signals
  3. Deploy the Collector as a DaemonSet (agent) and Deployment (gateway) with proper resource limits, health checks, and scaling
  4. Debug pipeline issues using the debug exporter, zpages, and Collector internal metrics to identify data loss or bottlenecks

The OpenTelemetry Collector is the backbone of every production observability pipeline. It receives, processes, and exports telemetry data — traces, metrics, and logs — and it does so at scale, reliably, and vendor-neutrally. Domain 3 accounts for 26% of your OTCA exam. You cannot pass without mastering the Collector.

If OpenTelemetry is the universal language of observability, the Collector is the postal service. It picks up signals from your applications, routes them through processing, and delivers them wherever they need to go. Misconfigure it and you get data loss, memory explosions, or a pipeline that silently drops the traces you need most.

War Story: The Silent Pipeline

A platform team deployed the OTel Collector to replace their vendor-specific agents. Everything looked green — health checks passed, pods were running. But after two weeks, the on-call engineer noticed zero traces for their payment service. The culprit? A filter processor with a regex that accidentally matched the payment- service prefix. The Collector was healthy. The pipeline was working. It was just filtering out exactly the data they needed most. Lesson: always validate your pipeline end-to-end with the debug exporter before going to production. Trust, but verify.


  • The OTel Collector can process over 1 million spans per second on a single instance with proper tuning. Most teams hit config issues long before they hit performance limits.
  • The spanmetrics connector can generate RED metrics (Rate, Errors, Duration) automatically from traces — meaning you get metrics for free without instrumenting anything twice.
  • The Collector’s transform processor uses OTTL (OpenTelemetry Transformation Language), a purpose-built language that lets you modify, filter, and route telemetry using SQL-like expressions.
  • There are three official distributions of the Collector: Core (minimal), Contrib (batteries-included), and Custom (build your own with ocb). The exam expects you to know when to use each.

Part 1: Collector Architecture and Config Structure

Section titled “Part 1: Collector Architecture and Config Structure”

Every Collector configuration has five top-level sections. Think of it as assembling a factory: you define what comes in (receivers), how it gets processed (processors), where it goes out (exporters), and then you wire them together (service/pipelines).

# The five building blocks of every Collector config
receivers: # How data gets IN to the Collector
processors: # How data gets TRANSFORMED inside the Collector
exporters: # How data gets OUT of the Collector
connectors: # Bridge between pipelines (output of one, input of another)
extensions: # Auxiliary services (health checks, auth, debugging)
service: # Wires everything together into pipelines
extensions: [health_check, zpages]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/backend]
metrics:
receivers: [otlp, prometheus]
processors: [batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp, filelog]
processors: [batch]
exporters: [otlp/backend]

Key rules for the service section:

  • A component declared in receivers/processors/exporters does nothing until it appears in a pipeline under service.
  • Processors execute in the order listed — order matters.
  • A single receiver/exporter can appear in multiple pipelines.
  • Pipeline names under traces, metrics, and logs are the three signal types.
┌─────────────────────────────────────────────────────────────────┐
│ OTel Collector │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Receivers │──▶│Processors│──▶│Exporters │──▶│ Backends │ │
│ │ │ │ │ │ │ │ │ │
│ │ otlp │ │ batch │ │ otlp │ │ Jaeger │ │
│ │ prometheus│ │ filter │ │ prometheus│ │ Prometheus│ │
│ │ filelog │ │ transform│ │ debug │ │ Loki │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Extensions: health_check, zpages, pprof, bearertokenauth │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Receivers listen for or pull telemetry data. They are the entry point of every pipeline.

The otlp receiver is the most important receiver on the exam. It accepts data over both gRPC (port 4317) and HTTP (port 4318).

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
max_recv_msg_size_mib: 4 # Default: 4 MiB
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins: ["*"] # For browser-based apps

gRPC vs HTTP — when to use each:

AspectgRPC (:4317)HTTP (:4318)
PerformanceHigher throughput, streamingSlightly lower
CompressionBuilt-in (gzip, zstd)Requires config
Firewall-friendlyNo (HTTP/2, specific ports)Yes (standard HTTP)
Browser supportNo (needs proxy)Yes (for web apps)
Best forService-to-collector, collector-to-collectorBrowser RUM, edge ingestion

Exam tip: Default OTLP ports — 4317 for gRPC, 4318 for HTTP. These are tested frequently.

Scrapes Prometheus-format metrics endpoints. Useful when you want the Collector to replace or augment a Prometheus server.

receivers:
prometheus:
config:
scrape_configs:
- job_name: 'k8s-pods'
scrape_interval: 15s
kubernetes_sd_configs:
- role: pod

Reads logs from files on disk — essential for collecting container logs from nodes.

receivers:
filelog:
include: [/var/log/pods/*/*/*.log]
operators:
- type: json_parser
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'

Collects system-level metrics (CPU, memory, disk, network) from the host.

receivers:
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
memory: {}
disk: {}
filesystem: {}
network: {}
load: {}

Collects cluster-level metrics from the Kubernetes API server — node count, pod phases, resource quotas.

receivers:
k8s_cluster:
collection_interval: 30s
node_conditions_to_report: [Ready, MemoryPressure]
allocatable_types_to_report: [cpu, memory]

This receiver needs RBAC access to the Kubernetes API. It typically runs in gateway mode (one instance), not on every node.


Part 3: Processors — Transforming Data In-Flight

Section titled “Part 3: Processors — Transforming Data In-Flight”

Processors modify telemetry between receivers and exporters. Order matters — they execute sequentially as listed in the pipeline.

Groups data into batches before sending. This is almost always the first processor you should add — it dramatically reduces export overhead.

processors:
batch:
send_batch_size: 8192 # Number of items per batch
send_batch_max_size: 10000 # Hard upper limit
timeout: 200ms # Flush interval even if batch isn't full

Prevents the Collector from running out of memory. Should be the first processor in every pipeline.

processors:
memory_limiter:
check_interval: 1s
limit_mib: 512 # Hard limit
spike_limit_mib: 128 # Buffer for spikes

When memory exceeds limit_mib - spike_limit_mib (384 MiB in this example), the processor starts refusing data. When it exceeds limit_mib, it force-drops data. This prevents OOM kills.

Drops telemetry that matches (or doesn’t match) conditions. Use it to reduce costs by discarding noisy, low-value data.

processors:
filter:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/healthz"' # Drop health checks
- 'attributes["http.route"] == "/readyz"'
metrics:
metric:
- 'name == "http.server.duration" and resource.attributes["service.name"] == "debug-svc"'

Add, update, delete, or hash attributes on spans, metrics, or logs.

processors:
attributes:
actions:
- key: environment
value: production
action: upsert # Insert or update
- key: db.password
action: delete # Remove sensitive data
- key: user.email
action: hash # Hash PII

The most powerful processor. Uses OTTL (OpenTelemetry Transformation Language) for arbitrary transformations.

processors:
transform:
error_mode: ignore
trace_statements:
- context: span
statements:
- set(attributes["deployment.env"], "prod") where resource.attributes["k8s.namespace.name"] == "production"
- truncate_all(attributes, 256) # Limit attribute value length
- replace_pattern(attributes["http.url"], "token=([^&]*)", "token=***")
metric_statements:
- context: datapoint
statements:
- convert_sum_to_gauge() where metric.name == "system.cpu.time"
log_statements:
- context: log
statements:
- merge_maps(attributes, ParseJSON(body), "insert") where IsMatch(body, "^\\{")

OTTL is a high-priority exam topic. Know these key functions: set, delete, truncate_all, replace_pattern, merge_maps, ParseJSON, IsMatch.

Makes sampling decisions after seeing complete traces. Runs only in gateway mode (needs full traces).

processors:
tail_sampling:
decision_wait: 10s # Wait for trace to complete
num_traces: 100000 # Traces held in memory
policies:
- name: errors-always
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow-traces
type: latency
latency: {threshold_ms: 1000}
- name: low-volume-sample
type: probabilistic
probabilistic: {sampling_percentage: 10}

This keeps 100% of errors, 100% of slow traces, and 10% of everything else. The decision_wait must be long enough for all spans in a trace to arrive.


Exporters send processed telemetry to backends.

The default for Collector-to-Collector or Collector-to-backend communication.

exporters:
otlp:
endpoint: tempo.observability.svc.cluster.local:4317
tls:
insecure: false
cert_file: /certs/client.crt
key_file: /certs/client.key
compression: gzip # or zstd
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s

Same protocol, HTTP transport. Use when gRPC is blocked by firewalls or proxies.

exporters:
otlphttp:
endpoint: https://ingest.example.com
compression: gzip
headers:
Authorization: "Bearer ${env:API_TOKEN}"

Exposes a /metrics endpoint that Prometheus can scrape. Converts OTLP metrics to Prometheus format.

exporters:
prometheus:
endpoint: 0.0.0.0:8889
namespace: otel
resource_to_telemetry_conversion:
enabled: true # Promote resource attributes to labels

Prints telemetry to stdout. Essential for development and troubleshooting.

exporters:
debug:
verbosity: detailed # basic | normal | detailed
sampling_initial: 5 # First N items logged
sampling_thereafter: 200 # Then every Nth item

Writes telemetry to files in JSON format. Useful for audit trails or offline analysis.

exporters:
file:
path: /data/otel-output.json
rotation:
max_megabytes: 100
max_days: 7
max_backups: 5

Connectors are both an exporter for one pipeline and a receiver for another. They transform one signal type into another.

Generates RED metrics from traces automatically — no additional instrumentation needed.

connectors:
spanmetrics:
histogram:
explicit:
buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 500ms, 1s, 5s]
dimensions:
- name: http.method
- name: http.status_code
namespace: traces.spanmetrics
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo, spanmetrics] # spanmetrics is an exporter here
metrics:
receivers: [otlp, spanmetrics] # spanmetrics is a receiver here
processors: [batch]
exporters: [prometheus]

This is one of the most elegant features in OTel. Traces flow into the spanmetrics connector, and metrics flow out — giving you traces.spanmetrics.calls_total, traces.spanmetrics.duration_*, and error counts.

Counts spans, metrics, or log records and emits the counts as metrics.

connectors:
count:
traces:
spans:
- name: span.count
description: "Count of spans"
logs:
log_records:
- name: log.record.count
description: "Count of log records"

Extensions provide auxiliary capabilities that are not part of the data pipeline but support it.

extensions:
health_check:
endpoint: 0.0.0.0:13133 # Liveness/readiness probe target
zpages:
endpoint: 0.0.0.0:55679 # Internal debug UI at /debug/tracez, /debug/pipelinez
pprof:
endpoint: 0.0.0.0:1777 # Go pprof profiling
bearertokenauth:
token: "${env:OTEL_AUTH_TOKEN}"
service:
extensions: [health_check, zpages, pprof, bearertokenauth]
ExtensionPurposeDefault Port
health_checkK8s liveness/readiness probes13133
zpagesDebug UI: pipeline status, trace samples55679
pprofPerformance profiling1777
bearertokenauthAuthenticate incoming/outgoing requestsN/A

Exam tip: zpages at /debug/pipelinez shows if pipelines are running. /debug/tracez shows sample traces passing through the Collector. This is your first stop when debugging.


Agent Mode (DaemonSet) Gateway Mode (Deployment)
────────────────────── ─────────────────────────
┌─────────────────────┐ ┌─────────────────────┐
│ Node 1 │ │ Node 1 │
│ ┌─────┐ ┌────────┐ │ │ ┌─────┐ │
│ │App A│─▶│Collector│─┤ │ │App A│──┐ │
│ └─────┘ │(Agent) │ │ │ └─────┘ │ │
│ ┌─────┐ │ │ │ │ ┌─────┐ │ │
│ │App B│─▶│ │ │ │ │App B│──┤ │
│ └─────┘ └───┬────┘ │ │ └─────┘ │ │
└─────────────┼──────┘ └──────────┼──────────┘
│ │
▼ │
┌─────────────────────┐ │
│ Node 2 │ ┌─────────▼──────────┐
│ ┌─────┐ ┌────────┐ │ │ Gateway Collector │
│ │App C│─▶│Collector│─┤───▶Backend │ (Deployment, 2+ │
│ └─────┘ │(Agent) │ │ │ replicas) │──▶Backend
│ └───┬────┘ │ │ │
└─────────────┼──────┘ └─────────▲──────────┘
│ │
▼ ┌──────────┼──────────┐
Backend │ Node 2 │
│ ┌─────┐ │ │
│ │App C│──┘ │
│ └─────┘ │
└─────────────────────┘
AspectAgent (DaemonSet)Gateway (Deployment)
DeploymentOne per nodeShared pool (2+ replicas)
Resource useLight per nodeHeavier but centralized
Tail samplingNot possible (incomplete traces)Yes (full traces arrive)
Host metricsYes (local access)No
FilelogYes (local files)No
ScalingScales with nodesHPA on CPU/memory
Best forCollection, basic processingAggregation, sampling, routing

Production pattern: Use both. Agents on every node collect and forward. A gateway pool handles sampling, enrichment, and export.

Apps ──▶ Agent (DaemonSet) ──▶ Gateway (Deployment) ──▶ Backends
- hostmetrics - tail_sampling
- filelog - spanmetrics
- memory_limiter - routing
- batch - export to N backends

For horizontal scaling, use the load balancing exporter on agents to distribute traces across gateway replicas. This is critical for tail sampling — all spans of a trace must reach the same gateway instance.

# On the Agent
exporters:
loadbalancing:
protocol:
otlp:
tls:
insecure: true
resolver:
dns:
hostname: otel-gateway-headless.observability.svc.cluster.local
port: 4317

The loadbalancing exporter uses trace ID-based routing — all spans with the same trace ID go to the same gateway. This is what makes tail sampling possible in a scaled deployment.


OTLP (OpenTelemetry Protocol) is the native wire protocol of OpenTelemetry.

FeatureOTLP/gRPCOTLP/HTTP
TransportHTTP/2 with Protocol BuffersHTTP/1.1 with Protobuf or JSON
Port43174318
Compressiongzip, zstd (built-in)gzip (via Content-Encoding)
StreamingYes (bidirectional)No (request/response)
Path (traces)N/A (gRPC service)/v1/traces
Path (metrics)N/A/v1/metrics
Path (logs)N/A/v1/logs
Proxy supportNeeds HTTP/2-aware proxyWorks with any HTTP proxy

When to choose gRPC: Internal service-to-collector and collector-to-collector traffic where performance matters and you control the network.

When to choose HTTP: Browser telemetry (RUM), crossing firewalls/load balancers that don’t support HTTP/2, or when you need JSON-encoded payloads for debugging.

Always enable compression in production. The difference is significant:

exporters:
otlp:
endpoint: gateway:4317
compression: zstd # Best ratio for telemetry data
otlphttp:
endpoint: https://ingest.example.com
compression: gzip # More widely supported

zstd offers better compression ratios and speed than gzip but is less universally supported. For internal traffic, prefer zstd. For external endpoints, use gzip.


The OpenTelemetry Operator extends Kubernetes to manage Collectors and auto-instrument applications.

Terminal window
# Install cert-manager first (required dependency)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml
# Install the OTel Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

The Operator manages Collector instances via a custom resource:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otel-agent
namespace: observability
spec:
mode: daemonset # daemonset | deployment | statefulset | sidecar
image: otel/opentelemetry-collector-contrib:0.98.0
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 1s
limit_mib: 512
batch: {}
exporters:
otlp:
endpoint: otel-gateway.observability.svc.cluster.local:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]

9.3 Auto-Instrumentation with the Instrumentation CRD

Section titled “9.3 Auto-Instrumentation with the Instrumentation CRD”

The Operator can inject instrumentation into pods automatically — no code changes required.

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: auto-instrumentation
namespace: observability
spec:
exporter:
endpoint: http://otel-agent-collector.observability.svc.cluster.local:4318
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "0.25"
java:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
python:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
nodejs:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

Then annotate pods to opt in:

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-java-app
spec:
template:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-java: "true" # Java auto-instrument
# Other options:
# instrumentation.opentelemetry.io/inject-python: "true"
# instrumentation.opentelemetry.io/inject-nodejs: "true"
# instrumentation.opentelemetry.io/inject-dotnet: "true"
spec:
containers:
- name: app
image: my-java-app:latest

The Operator injects an init container with the instrumentation agent. The application starts with zero code changes and produces traces automatically.


DistributionComponentsUse Case
Core (otel/opentelemetry-collector)~20 components (otlp, batch, debug, etc.)Minimal footprint, security-sensitive environments
Contrib (otel/opentelemetry-collector-contrib)200+ components (all community receivers, processors, exporters)Development, when you need specific integrations
Custom (built with ocb)Exactly what you chooseProduction — include only what you use

The OpenTelemetry Collector Builder (ocb) creates purpose-built distributions:

builder-config.yaml
dist:
name: my-collector
description: "Production collector"
output_path: ./dist
otelcol_version: "0.98.0"
receivers:
- gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.98.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/filelogreceiver v0.98.0
processors:
- gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.98.0
- gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.98.0
exporters:
- gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.98.0
- gomod: go.opentelemetry.io/collector/exporter/debugexporter v0.98.0
Terminal window
# Build it
ocb --config builder-config.yaml

Why custom? Smaller binary (50MB vs 200MB+), smaller attack surface, faster startup, only the dependencies you actually audit.


When things go wrong (and they will), here is your debugging toolkit.

Add it to any pipeline to see what data is flowing:

exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/backend, debug] # Add debug alongside real exporter

The Collector can report its own metrics:

service:
telemetry:
logs:
level: debug # debug | info | warn | error
encoding: json # For structured log parsing
metrics:
level: detailed # none | basic | normal | detailed
address: 0.0.0.0:8888 # Collector's own /metrics endpoint

Key internal metrics to watch:

  • otelcol_receiver_accepted_spans — Are spans arriving?
  • otelcol_processor_dropped_spans — Is the filter/sampling dropping too much?
  • otelcol_exporter_sent_spans — Are spans leaving?
  • otelcol_exporter_send_failed_spans — Is the backend rejecting data?

With the zpages extension enabled at port 55679:

EndpointWhat It Shows
/debug/pipelinezActive pipelines and their components
/debug/tracezSample traces flowing through the Collector
/debug/rpczgRPC call statistics
/debug/extensionzRunning extensions
55679/debug/pipelinez
# Port-forward to access zpages
kubectl port-forward svc/otel-collector 55679:55679

Complete Multi-Pipeline Configuration Example

Section titled “Complete Multi-Pipeline Configuration Example”

This is a production-grade config with traces, metrics, and logs flowing through separate pipelines, with spanmetrics bridging traces to metrics:

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: 'k8s-pods'
scrape_interval: 15s
kubernetes_sd_configs:
- role: pod
filelog:
include: [/var/log/pods/*/*/*.log]
operators:
- type: json_parser
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
memory: {}
disk: {}
processors:
memory_limiter:
check_interval: 1s
limit_mib: 1024
spike_limit_mib: 256
batch:
send_batch_size: 8192
timeout: 200ms
filter/healthz:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/healthz"'
- 'attributes["http.route"] == "/readyz"'
transform/redact:
error_mode: ignore
trace_statements:
- context: span
statements:
- replace_pattern(attributes["http.url"], "token=([^&]*)", "token=REDACTED")
log_statements:
- context: log
statements:
- replace_pattern(body, "password=\\S+", "password=***")
exporters:
otlp/tempo:
endpoint: tempo.observability.svc.cluster.local:4317
tls:
insecure: true
otlp/loki:
endpoint: loki.observability.svc.cluster.local:3100
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
debug:
verbosity: basic
connectors:
spanmetrics:
histogram:
explicit:
buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 500ms, 1s, 5s]
dimensions:
- name: http.method
- name: http.status_code
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
service:
extensions: [health_check, zpages]
telemetry:
logs:
level: info
metrics:
address: 0.0.0.0:8888
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, filter/healthz, transform/redact, batch]
exporters: [otlp/tempo, spanmetrics, debug]
metrics:
receivers: [otlp, prometheus, hostmetrics, spanmetrics]
processors: [memory_limiter, batch]
exporters: [prometheus, debug]
logs:
receivers: [otlp, filelog]
processors: [memory_limiter, transform/redact, batch]
exporters: [otlp/loki, debug]

MistakeWhat HappensFix
Declaring a component but not adding it to a pipelineComponent is silently ignoredAlways check the service.pipelines section
Wrong processor order (batch before memory_limiter)OOM kills under loadmemory_limiter first, batch last
Tail sampling on agents (DaemonSet)Incomplete traces, bad sampling decisionsTail sampling only in gateway mode
Using Contrib image in production200MB+ image, unused attack surfaceBuild custom distribution with ocb
Forgetting error_mode: ignore on filter/transformOne bad record crashes the pipelineAlways set error_mode: ignore in production
OTLP exporter with tls.insecure: false but no certsConnection refused, export failuresSet insecure: true for internal traffic or provide valid certs
Not enabling compression on exporters3-5x more bandwidth usageAlways set compression: gzip or zstd
Running k8s_cluster receiver on every agentDuplicate metrics, API server overloadRun k8s_cluster on a single gateway or Deployment

Test your understanding of Domain 3 content.

Q1: What are the default OTLP ports for gRPC and HTTP?

Answer

gRPC: 4317, HTTP: 4318. These are standard across all OTel components.

Q2: In what order should memory_limiter and batch appear in a pipeline’s processor list?

Answer

memory_limiter should be first, batch should be last. The memory limiter needs to reject data before it accumulates in the batch buffer. The batch processor should be the final step before export to maximize batch efficiency.

Q3: Why can’t you run tail sampling on a DaemonSet (agent mode) Collector?

Answer

Tail sampling needs to see the complete trace before making a sampling decision. In agent mode, spans from different services land on different nodes, so no single agent sees all spans of a distributed trace. Tail sampling must run in gateway mode where all spans are forwarded to a central pool, and the loadbalancing exporter ensures all spans with the same trace ID reach the same gateway instance.

Q4: What is the purpose of a connector like spanmetrics?

Answer

A connector acts as both an exporter in one pipeline and a receiver in another. The spanmetrics connector receives traces from the traces pipeline and emits RED metrics (Rate, Errors, Duration) into the metrics pipeline. This lets you generate metrics from traces automatically without double-instrumentation.

Q5: What is the difference between Core, Contrib, and Custom Collector distributions?

Answer
  • Core: Minimal set of ~20 components maintained by the OTel project. Small binary, limited functionality.
  • Contrib: Community-maintained, 200+ components. Large binary, everything included.
  • Custom: Built with ocb (OpenTelemetry Collector Builder) to include only the specific components you need. Best for production — minimal attack surface, optimized size.

Q6: How does the loadbalancing exporter route data?

Answer

It routes based on trace ID using consistent hashing. All spans belonging to the same trace are sent to the same backend (gateway) instance. This is essential for tail sampling to work correctly in a horizontally scaled gateway deployment. It uses a DNS resolver to discover backend instances via a headless Kubernetes Service.

Q7: What annotation would you add to a Java Deployment to enable auto-instrumentation via the OTel Operator?

Answer
instrumentation.opentelemetry.io/inject-java: "true"

This tells the OTel Operator’s webhook to inject an init container with the Java auto-instrumentation agent. The Instrumentation CRD must exist in the same namespace (or the annotation must reference a specific one).

Q8: You added a filter processor but telemetry is not being filtered. What is the most likely cause?

Answer

The filter processor is defined in the processors section but not included in the pipeline under service.pipelines. A component must appear in both its definition section and in a pipeline to be active. Check service.pipelines.<signal>.processors to confirm it is listed.

Q9: What zpages endpoint shows the status of active pipelines?

Answer

/debug/pipelinez — It shows all configured pipelines and their component status (receivers, processors, exporters). Access it by port-forwarding to port 55679 on the Collector.

Q10: When should you use OTLP/HTTP instead of OTLP/gRPC?

Answer

Use OTLP/HTTP when:

  • Sending telemetry from browsers (gRPC doesn’t work in browsers)
  • Crossing firewalls or proxies that don’t support HTTP/2
  • You need JSON encoding for debugging
  • Working with load balancers that only support HTTP/1.1

Use gRPC for internal collector-to-collector and service-to-collector traffic where performance matters.


Hands-On Exercise: Build a Multi-Signal Pipeline

Section titled “Hands-On Exercise: Build a Multi-Signal Pipeline”

Objective: Deploy an OTel Collector that receives all three signals, generates spanmetrics, and outputs to debug.

Terminal window
# Create a kind cluster (skip if you already have one)
kind create cluster --name otel-lab
# Create namespace
kubectl create namespace observability
Terminal window
kubectl apply -n observability -f - <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 1s
limit_mib: 256
spike_limit_mib: 64
batch:
send_batch_size: 1024
timeout: 1s
connectors:
spanmetrics:
dimensions:
- name: http.method
exporters:
debug:
verbosity: detailed
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
service:
extensions: [health_check, zpages]
telemetry:
metrics:
address: 0.0.0.0:8888
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [debug, spanmetrics]
metrics:
receivers: [otlp, spanmetrics]
processors: [memory_limiter, batch]
exporters: [debug]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [debug]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.98.0
args: ["--config=/etc/otel/config.yaml"]
ports:
- containerPort: 4317
- containerPort: 4318
- containerPort: 13133
- containerPort: 55679
volumeMounts:
- name: config
mountPath: /etc/otel
livenessProbe:
httpGet:
path: /
port: 13133
readinessProbe:
httpGet:
path: /
port: 13133
volumes:
- name: config
configMap:
name: otel-collector-config
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector
spec:
selector:
app: otel-collector
ports:
- name: otlp-grpc
port: 4317
- name: otlp-http
port: 4318
- name: health
port: 13133
- name: zpages
port: 55679
EOF
Terminal window
# Wait for the collector to be ready
kubectl wait --for=condition=ready pod -l app=otel-collector -n observability --timeout=60s
# Port-forward to send data
kubectl port-forward -n observability svc/otel-collector 4318:4318 &
# Send a test trace via OTLP/HTTP
curl -X POST http://localhost:4318/v1/traces \
-H "Content-Type: application/json" \
-d '{
"resourceSpans": [{
"resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "test-service"}}]},
"scopeSpans": [{
"spans": [{
"traceId": "5b8aa5a2d2c872e8321cf37308d69df2",
"spanId": "051581bf3cb55c13",
"name": "GET /api/users",
"kind": 2,
"startTimeUnixNano": "1000000000",
"endTimeUnixNano": "2000000000",
"attributes": [
{"key": "http.method", "value": {"stringValue": "GET"}},
{"key": "http.status_code", "value": {"intValue": "200"}}
]
}]
}]
}]
}'
Terminal window
# Check collector logs — you should see the trace in debug output
kubectl logs -n observability -l app=otel-collector --tail=50
# Check zpages
kubectl port-forward -n observability svc/otel-collector 55679:55679 &
# Open http://localhost:55679/debug/pipelinez in your browser
# Check Collector's own metrics
kubectl port-forward -n observability svc/otel-collector 8888:8888 &
curl -s http://localhost:8888/metrics | grep otelcol_receiver_accepted

You should see:

  • The debug exporter printing the trace span with service name test-service
  • The spanmetrics connector generating metrics (visible in the debug output for the metrics pipeline)
  • otelcol_receiver_accepted_spans > 0 in the Collector’s own metrics
  • zpages showing all three pipelines active at /debug/pipelinez

  1. Config structure: Five sections (receivers, processors, exporters, connectors, extensions) + service to wire them.
  2. OTLP ports: gRPC = 4317, HTTP = 4318. Know these cold.
  3. Processor order: memory_limiter first, batch last.
  4. Agent vs Gateway: Agents (DaemonSet) collect, Gateways (Deployment) aggregate and sample.
  5. Tail sampling: Gateway-only. Requires loadbalancing exporter for horizontal scaling.
  6. Connectors: Bridge pipelines. spanmetrics generates RED metrics from traces.
  7. Auto-instrumentation: OTel Operator + Instrumentation CRD + pod annotation.
  8. Distributions: Core (minimal), Contrib (everything), Custom via ocb (production).
  9. Debug toolkit: debug exporter, zpages (/debug/pipelinez), internal telemetry metrics on :8888.
  10. OTTL: The transform processor’s language — set, delete, replace_pattern, merge_maps.

Next Module: OTCA Track Overview — Instrument applications using OTel SDKs across multiple languages.