Module 1.1: OTel API & SDK Deep Dive
Complexity:
[COMPLEX]- Core domain, 46% of OTCA exam weightTime to Complete: 90-120 minutes
Prerequisites: Basic familiarity with distributed systems, HTTP, and at least one programming language (Python or Go)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Instrument an application with the OpenTelemetry SDK, configuring TracerProvider, MeterProvider, and LoggerProvider pipelines
- Implement custom spans with attributes, events, and status codes that capture business-meaningful telemetry
- Configure exporters (OTLP, Prometheus, console) and processors (batch, filter) to control how signals leave your application
- Evaluate the tradeoffs between automatic and manual instrumentation and select the right approach for each service
Why This Module Matters
Section titled “Why This Module Matters”A team at a mid-size fintech company had Jaeger deployed, Prometheus scraping, and Fluentd shipping logs. Three separate instrumentation libraries, three configuration surfaces, three vendor lock-in vectors. When they needed to switch from Jaeger to Grafana Tempo, it took six weeks to re-instrument 40 services. Then OpenTelemetry arrived, and the next migration — Tempo to an OTLP-native backend — took an afternoon. One SDK, one wire protocol, swap the exporter.
That is the entire point of OpenTelemetry: decouple instrumentation from backends. Domain 2 of the OTCA exam (46% of your score) tests whether you understand the machinery that makes this possible — the Provider/Processor/Exporter pipelines for traces, metrics, and logs.
Get this module right and you have nearly half the exam locked down.
Did You Know?
Section titled “Did You Know?”- OpenTelemetry is the second most active CNCF project after Kubernetes itself, with 1,000+ contributors across 11 language SIGs.
- The OTel Collector is optional. The SDK can export directly to backends. The Collector adds routing, batching, and multi-backend fan-out, but it is a deployment choice, not a requirement.
- Semantic conventions took years to stabilize. The rename from
http.methodtohttp.request.method(v1.23+) broke dashboards everywhere — which is why the exam tests whether you know the current stable names. - Zero-code instrumentation in Java modifies bytecode at load time. The Java agent uses
java.lang.instrumentand byte-buddy to rewrite class files before the JVM executes them — no source changes needed.
Part 1: The Three Pillars — Provider Pipelines
Section titled “Part 1: The Three Pillars — Provider Pipelines”Every signal in OTel follows the same pattern: Provider → Processor/Reader → Exporter. Understand one and you understand all three.
1.1 TracerProvider Pipeline
Section titled “1.1 TracerProvider Pipeline”┌──────────────────────────────────────────────────────────────────┐│ TracerProvider ││ ││ Resource: { service.name="cart-svc", deployment.env="prod" } ││ ││ ┌────────┐ ┌──────────────────┐ ┌──────────────────────┐ ││ │ Tracer │───▶│ SpanProcessor │───▶│ SpanExporter │ ││ │ │ │ │ │ │ ││ │ start │ │ SimpleProcessor │ │ OTLPSpanExporter │ ││ │ span() │ │ (sync, dev) │ │ ConsoleSpanExporter │ ││ │ │ │ │ │ JaegerExporter │ ││ │ end │ │ BatchProcessor │ │ ZipkinExporter │ ││ │ span() │ │ (async, prod) │ │ │ ││ └────────┘ └──────────────────┘ └──────────────────────┘ │└──────────────────────────────────────────────────────────────────┘SpanProcessor — controls when spans are handed to the exporter:
| Processor | Behavior | Use Case |
|---|---|---|
SimpleSpanProcessor | Exports each span immediately, synchronously | Local dev, debugging |
BatchSpanProcessor | Buffers spans, exports in batches on a timer | Production (always) |
BatchSpanProcessor tuning knobs (and their env vars):
| Parameter | Env Var | Default | What It Does |
|---|---|---|---|
maxQueueSize | OTEL_BSP_MAX_QUEUE_SIZE | 2048 | Buffer capacity before dropping |
scheduledDelayMillis | OTEL_BSP_SCHEDULE_DELAY | 5000 | Flush interval in ms |
maxExportBatchSize | OTEL_BSP_MAX_EXPORT_BATCH_SIZE | 512 | Spans per export call |
SpanExporter — controls where spans go:
| Exporter | Protocol | Notes |
|---|---|---|
| OTLP (gRPC) | grpc on port 4317 | Default, preferred |
| OTLP (HTTP) | http/protobuf on port 4318 | Firewall-friendly |
| Console | stdout | Debugging only |
| Jaeger | Thrift/gRPC | Legacy — use OTLP instead |
| Zipkin | HTTP/JSON | Legacy — use OTLP instead |
Exam tip: OTLP is the only exporter the spec mandates SDKs implement. Jaeger and Zipkin exporters are community add-ons.
1.2 MeterProvider Pipeline
Section titled “1.2 MeterProvider Pipeline”┌──────────────────────────────────────────────────────────────────┐│ MeterProvider ││ ││ Resource: { service.name="cart-svc" } ││ ││ ┌────────┐ ┌──────────────────┐ ┌──────────────────────┐ ││ │ Meter │───▶│ MetricReader │───▶│ MetricExporter │ ││ │ │ │ │ │ │ ││ │counter │ │ PeriodicReader │ │ OTLPMetricExporter │ ││ │histo │ │ (push, 60s) │ │ ConsoleExporter │ ││ │gauge │ │ │ │ │ ││ │ │ │ PrometheusReader │ │ (pull: /metrics) │ ││ └────────┘ └──────────────────┘ └──────────────────────┘ │└──────────────────────────────────────────────────────────────────┘MetricReader — controls how metric data is collected:
| Reader | Model | Use Case |
|---|---|---|
PeriodicExportingMetricReader | Push (exports on interval) | OTLP backends, Collector |
PrometheusMetricReader | Pull (exposes /metrics) | Prometheus scrape targets |
1.3 LoggerProvider Pipeline
Section titled “1.3 LoggerProvider Pipeline”┌──────────────────────────────────────────────────────────────────┐│ LoggerProvider ││ ││ ┌──────────────┐ ┌──────────────────┐ ┌────────────────┐ ││ │ Log Bridge │───▶│ LogProcessor │───▶│ LogExporter │ ││ │ │ │ │ │ │ ││ │ Python │ │ SimpleProcessor │ │ OTLP │ ││ │ stdlib │ │ BatchProcessor │ │ Console │ ││ │ Java Log4j │ │ │ │ │ ││ │ .NET ILogger │ │ │ │ │ ││ └──────────────┘ └──────────────────┘ └────────────────┘ │└──────────────────────────────────────────────────────────────────┘Key concept: OTel does NOT replace your logging library. A log bridge connects existing loggers (Python logging, Java Log4j/SLF4J) to the OTel pipeline. You keep writing logger.info("order placed") — the bridge enriches log records with trace context and routes them through the OTel exporter.
1.4 Signal Maturity — Know This for the Exam
Section titled “1.4 Signal Maturity — Know This for the Exam”| Signal | Spec Status | SDK Status | Notes |
|---|---|---|---|
| Traces | Stable | Stable in all major languages | Safe for production |
| Metrics | Stable | Stable in most languages | GA since 2023 |
| Logs | Stable (data model) | Mixed (bridge APIs vary) | Bridges are the integration point |
| Baggage | Stable | Stable | Propagation mechanism |
| Profiling | Experimental | Early development | Not on exam yet |
Part 2: Span Anatomy
Section titled “Part 2: Span Anatomy”A span is not just “a timer with a name.” The exam expects you to know every field.
2.1 Span Structure
Section titled “2.1 Span Structure”Span { trace_id: "abc123..." ← 16-byte, shared across all spans in trace span_id: "def456..." ← 8-byte, unique to this span parent_id: "789abc..." ← links to parent span (or empty for root) name: "GET /api/cart" ← operation name kind: SERVER ← see below start_time: 1711234567.123 ← nanosecond precision end_time: 1711234567.456 status: { code: OK } ← Unset, Ok, Error attributes: { "http.request.method": "GET", "http.response.status_code": 200 } events: [ { name: "cache.miss", timestamp: ... } ] links: [ { trace_id: "other_trace", span_id: "other_span" } ] resource: { "service.name": "cart-svc", "host.name": "node-1" }}2.2 Span Kinds
Section titled “2.2 Span Kinds”| Kind | Who Creates It | Example |
|---|---|---|
INTERNAL | Default, in-process operations | Business logic, helper functions |
SERVER | The service receiving a request | HTTP handler, gRPC server method |
CLIENT | The service making a request | HTTP client call, DB query |
PRODUCER | Enqueuing a message (async) | Kafka producer, SQS send |
CONSUMER | Processing a queued message | Kafka consumer, SQS handler |
Exam trap: A span for “calling the database” is
CLIENT, notINTERNAL. The span kind describes the role in the communication, not whether it is “internal to your code.”
2.3 Attributes vs Resources
Section titled “2.3 Attributes vs Resources”This distinction trips people up constantly.
| Attributes | Resources | |
|---|---|---|
| Scope | Per-span (or per-metric data point) | Per-Provider (applies to ALL telemetry) |
| Set when | During span/metric creation | At SDK initialization |
| Example | http.request.method=GET | service.name=cart-svc |
| Changes | Different per operation | Same for entire process lifetime |
2.4 Events and Exceptions
Section titled “2.4 Events and Exceptions”Events are timestamped annotations within a span. The most common use is recording exceptions:
from opentelemetry import trace
span = trace.get_current_span()try: process_order(order_id)except Exception as e: span.set_status(trace.StatusCode.ERROR, str(e)) span.record_exception(e) # adds an event with exception.type, exception.message, exception.stacktrace raiserecord_exception is syntactic sugar — it creates an event named exception with attributes from semantic conventions.
2.5 Span Links
Section titled “2.5 Span Links”Links connect spans across separate traces. Unlike parent-child (which is within one trace), links say “this span is related to that other span.”
Use cases:
- A batch job that processes items from multiple incoming requests
- Fan-in: a span that aggregates results from several independent traces
- Retries where each attempt creates a new trace
Part 3: Metric Instruments
Section titled “Part 3: Metric Instruments”3.1 Sync vs Async Instruments
Section titled “3.1 Sync vs Async Instruments”Synchronous instruments are called inline with your code — you call .add() or .record() at the point of measurement.
Asynchronous (observable) instruments register a callback. The SDK calls your function at collection time to read the current value.
| Instrument | Sync/Async | Monotonic? | Example |
|---|---|---|---|
| Counter | Sync | Yes (only increases) | requests_total, bytes_sent |
| UpDownCounter | Sync | No (increases and decreases) | active_connections, queue_depth |
| Histogram | Sync | N/A (records distributions) | request_duration_ms, response_size |
| Observable Counter | Async | Yes | CPU time (read from /proc) |
| Observable UpDownCounter | Async | No | Current memory usage |
| Observable Gauge | Async | No (point-in-time value) | Temperature, current CPU % |
When to use async: When the value already exists somewhere (OS counter, connection pool stats) and you just want to observe it at collection time. Do NOT poll in a loop — let the SDK invoke your callback.
3.2 Metric Temporality
Section titled “3.2 Metric Temporality”This is one of the most tested concepts in Domain 2.
| Temporality | What It Reports | Who Prefers It |
|---|---|---|
| Cumulative | Total since process start (e.g., counter = 1042) | Prometheus, OTLP default for counters |
| Delta | Change since last collection (e.g., counter += 17) | StatsD, some commercial backends |
Why it matters: If your exporter expects delta but the SDK sends cumulative, you get nonsense numbers. The SDK’s MetricReader determines temporality preference.
Cumulative: |---100---|---250---|---430---| (running total)Delta: |---100---|---150---|---180---| (per-interval change)3.3 Aggregation
Section titled “3.3 Aggregation”The SDK aggregates raw measurements before export:
| Instrument | Default Aggregation |
|---|---|
| Counter | Sum |
| Histogram | Explicit Bucket Histogram |
| Gauge | Last Value |
3.4 Exemplars — Linking Metrics to Traces
Section titled “3.4 Exemplars — Linking Metrics to Traces”Exemplars attach a trace_id and span_id to a metric data point. When you see a latency spike on a dashboard, exemplars let you click through to the exact trace that caused it.
histogram bucket [200ms-500ms]: count=47, exemplar={trace_id="abc123", value=423ms}Exemplars are configured on the SDK level and typically sampled (not every data point gets one).
Part 4: Context Propagation
Section titled “Part 4: Context Propagation”4.1 How Context Crosses Service Boundaries
Section titled “4.1 How Context Crosses Service Boundaries” Service A Service B┌──────────────┐ HTTP Request ┌──────────────┐│ │ │ ││ span-A │ headers: │ span-B ││ trace=abc │──────────────────▶│ trace=abc ││ │ traceparent: │ parent=A ││ │ 00-abc...-A.. │ │└──────────────┘ └──────────────┘The SDK uses propagators to inject context into carriers (HTTP headers, message attributes) and extract it on the receiving side.
4.2 W3C TraceContext (Default)
Section titled “4.2 W3C TraceContext (Default)”The traceparent header:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^ version trace-id span-id flags (sampled)The tracestate header carries vendor-specific data (e.g., sampling decisions for a particular backend).
4.3 Baggage
Section titled “4.3 Baggage”Baggage propagates arbitrary key-value pairs across service boundaries. Unlike trace context (which carries IDs), baggage carries business data.
from opentelemetry import baggage, context
ctx = baggage.set_baggage("user.tier", "premium")# attach ctx — now all downstream services can read user.tierWarning: Baggage is sent in headers to every downstream service. Do NOT put sensitive data (PII, tokens) in baggage. It is visible on the wire.
Part 5: SDK Configuration
Section titled “Part 5: SDK Configuration”5.1 Environment Variables (Exam Favorites)
Section titled “5.1 Environment Variables (Exam Favorites)”| Variable | Purpose | Example |
|---|---|---|
OTEL_SERVICE_NAME | Sets service.name resource attribute | cart-service |
OTEL_EXPORTER_OTLP_ENDPOINT | Base URL for all OTLP signals | http://collector:4317 |
OTEL_EXPORTER_OTLP_PROTOCOL | Wire protocol | grpc, http/protobuf |
OTEL_EXPORTER_OTLP_HEADERS | Auth headers | api-key=secret123 |
OTEL_TRACES_EXPORTER | Trace exporter type | otlp, console, none |
OTEL_METRICS_EXPORTER | Metric exporter type | otlp, prometheus, none |
OTEL_LOGS_EXPORTER | Log exporter type | otlp, console, none |
OTEL_TRACES_SAMPLER | Sampling strategy | always_on, traceidratio |
OTEL_TRACES_SAMPLER_ARG | Sampler parameter | 0.1 (10% sampling) |
OTEL_RESOURCE_ATTRIBUTES | Additional resource attrs | deployment.env=prod,service.version=2.1 |
Exam tip: Env vars take lowest precedence. Programmatic config overrides env vars. Signal-specific env vars (e.g.,
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) override the general one (OTEL_EXPORTER_OTLP_ENDPOINT).
5.2 Semantic Conventions (Stable Naming Standards)
Section titled “5.2 Semantic Conventions (Stable Naming Standards)”The exam tests whether you know the correct attribute names. These are the most common:
| Convention | Attribute | Example Value |
|---|---|---|
| HTTP | http.request.method | GET |
| HTTP | http.response.status_code | 200 |
| HTTP | url.full | https://api.example.com/cart |
| Service | service.name | cart-service |
| Service | service.version | 1.4.2 |
| Database | db.system | postgresql |
| Database | db.statement | SELECT * FROM orders |
| Messaging | messaging.system | kafka |
| Messaging | messaging.operation | publish |
Watch out: The old convention
http.methodwas renamed tohttp.request.method. The exam uses the current stable names from semconv v1.23+.
Part 6: Zero-Code Instrumentation
Section titled “Part 6: Zero-Code Instrumentation”Zero-code (auto) instrumentation wraps library calls without changing your source.
6.1 How It Works
Section titled “6.1 How It Works”| Language | Mechanism | Command |
|---|---|---|
| Java | Bytecode manipulation via java.lang.instrument agent | java -javaagent:opentelemetry-javaagent.jar -jar myapp.jar |
| Python | Monkey-patching libraries at import time | opentelemetry-instrument python myapp.py |
| .NET | CLR profiler API and startup hooks | Via env vars: CORECLR_ENABLE_PROFILING=1 |
| Node.js | Require hooks / module patching | node --require @opentelemetry/auto-instrumentations-node myapp.js |
6.2 What Gets Instrumented Automatically
Section titled “6.2 What Gets Instrumented Automatically”Auto-instrumentation covers libraries, not business logic:
- HTTP servers and clients (Flask, requests, net/http)
- Database drivers (psycopg2, JDBC, database/sql)
- Messaging (Kafka clients, RabbitMQ)
- gRPC
You still need manual instrumentation for:
- Custom business spans (“process-order”, “validate-cart”)
- Custom metrics (order value, queue depth)
- Adding attributes specific to your domain
Part 7: Code Examples
Section titled “Part 7: Code Examples”7.1 Python — Manual Tracing with TracerProvider
Section titled “7.1 Python — Manual Tracing with TracerProvider”from opentelemetry import tracefrom opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporterfrom opentelemetry.sdk.resources import Resource
# 1. Create a Resource (identifies this service)resource = Resource.create({"service.name": "order-service", "service.version": "1.0.0"})
# 2. Build the pipeline: Provider → Processor → Exporterprovider = TracerProvider(resource=resource)processor = BatchSpanProcessor(ConsoleSpanExporter())provider.add_span_processor(processor)
# 3. Register globallytrace.set_tracer_provider(provider)
# 4. Get a tracer (scoped by instrumentation library name + version)tracer = trace.get_tracer("order.instrumentation", "0.1.0")
# 5. Create spanswith tracer.start_as_current_span("process-order", kind=trace.SpanKind.INTERNAL) as span: span.set_attribute("order.id", "ORD-12345") span.set_attribute("order.item_count", 3)
with tracer.start_as_current_span("validate-payment", kind=trace.SpanKind.CLIENT) as child: child.set_attribute("payment.provider", "stripe") # ... call payment API child.set_status(trace.StatusCode.OK)
span.add_event("order.confirmed", {"order.total": 59.99})
# 6. Flush before exitprovider.shutdown()7.2 Python — Custom Metric Instruments
Section titled “7.2 Python — Custom Metric Instruments”from opentelemetry import metricsfrom opentelemetry.sdk.metrics import MeterProviderfrom opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader, ConsoleMetricExporterfrom opentelemetry.sdk.resources import Resource
resource = Resource.create({"service.name": "order-service"})reader = PeriodicExportingMetricReader(ConsoleMetricExporter(), export_interval_millis=5000)provider = MeterProvider(resource=resource, metric_readers=[reader])metrics.set_meter_provider(provider)
meter = metrics.get_meter("order.instrumentation", "0.1.0")
# Sync Counter — total orders placed (only goes up)order_counter = meter.create_counter("orders.placed", unit="1", description="Total orders placed")
# Sync Histogram — order processing durationduration_hist = meter.create_histogram("orders.duration", unit="ms", description="Order processing time")
# Sync UpDownCounter — items currently in cartscart_items = meter.create_up_down_counter("cart.items", unit="1", description="Active cart items")
# Async Gauge — current queue depth (observed, not recorded)def read_queue_depth(options): depth = get_queue_depth() # your function that reads the real value options.observe(depth, {"queue.name": "orders"})
meter.create_observable_gauge("queue.depth", callbacks=[read_queue_depth])
# Usageorder_counter.add(1, {"order.type": "standard"})duration_hist.record(142.5, {"order.type": "standard"})cart_items.add(3, {"user.id": "u-123"}) # customer adds 3 itemscart_items.add(-1, {"user.id": "u-123"}) # customer removes 1 item7.3 Go — Manual Tracing with Context Propagation
Section titled “7.3 Go — Manual Tracing with Context Propagation”package main
import ( "context" "net/http"
"go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/propagation" sdktrace "go.opentelemetry.io/otel/sdk/trace" "go.opentelemetry.io/otel/sdk/resource" semconv "go.opentelemetry.io/otel/semconv/v1.24.0" "go.opentelemetry.io/otel/trace" "go.opentelemetry.io/otel/exporters/stdout/stdouttrace")
func initTracer() *sdktrace.TracerProvider { exporter, _ := stdouttrace.New(stdouttrace.WithPrettyPrint()) res, _ := resource.New(context.Background(), resource.WithAttributes(semconv.ServiceName("order-service")), ) tp := sdktrace.NewTracerProvider( sdktrace.WithBatcher(exporter), sdktrace.WithResource(res), ) otel.SetTracerProvider(tp) otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator( propagation.TraceContext{}, propagation.Baggage{}, )) return tp}
func handleOrder(w http.ResponseWriter, r *http.Request) { // Extract incoming context from headers (continues the trace from upstream) ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header)) tracer := otel.Tracer("order.instrumentation")
ctx, span := tracer.Start(ctx, "handle-order", trace.WithSpanKind(trace.SpanKindServer)) defer span.End() span.SetAttributes(attribute.String("order.id", "ORD-12345"))
callPaymentService(ctx)}
func callPaymentService(ctx context.Context) { tracer := otel.Tracer("order.instrumentation") ctx, span := tracer.Start(ctx, "call-payment", trace.WithSpanKind(trace.SpanKindClient)) defer span.End()
// Inject context into outgoing request — payment-svc continues the trace req, _ := http.NewRequestWithContext(ctx, "POST", "http://payment-svc/charge", nil) otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header)) http.DefaultClient.Do(req)}7.4 Go — Custom Metrics
Section titled “7.4 Go — Custom Metrics”func recordMetrics() { meter := otel.Meter("order.instrumentation")
orderCounter, _ := meter.Int64Counter("orders.placed", metric.WithUnit("1"), metric.WithDescription("Total orders placed"))
durationHist, _ := meter.Float64Histogram("orders.duration", metric.WithUnit("ms"), metric.WithDescription("Order processing time"))
// Async Gauge — SDK calls this callback at collection time meter.Float64ObservableGauge("queue.depth", metric.WithFloat64Callback(func(_ context.Context, o metric.Float64Observer) error { o.Observe(42.0, metric.WithAttributes(attribute.String("queue.name", "orders"))) return nil }), )
ctx := context.Background() orderCounter.Add(ctx, 1, metric.WithAttributes(attribute.String("order.type", "standard"))) durationHist.Record(ctx, 142.5, metric.WithAttributes(attribute.String("order.type", "standard")))}Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It’s Wrong | Fix |
|---|---|---|
Using SimpleSpanProcessor in production | Exports synchronously on the hot path, adds latency | Always use BatchSpanProcessor in prod |
Setting service.name as a span attribute | It’s a resource attribute, not per-span | Set it on the Resource passed to the Provider |
Using INTERNAL kind for outgoing HTTP calls | Misrepresents the span’s role in the trace | Use CLIENT for outgoing requests, SERVER for incoming |
| Putting PII in Baggage | Baggage is propagated in cleartext HTTP headers | Only put non-sensitive routing/context data in Baggage |
| Confusing cumulative and delta temporality | Backends interpret the numbers differently | Match temporality to your backend’s expectation |
Calling record_exception without setting Error status | Exception is logged as event but span shows OK | Always call set_status(ERROR) alongside record_exception |
Using deprecated http.method attribute name | Old semconv, will fail exam questions | Use http.request.method (semconv v1.23+) |
| Creating a new TracerProvider per request | Massive overhead, breaks batching | Create ONE provider at startup, share it globally |
Test yourself — these mirror OTCA exam style questions.
Q1: Which SpanProcessor should you use in production and why?
BatchSpanProcessor. It buffers spans and exports them asynchronously in batches, avoiding the latency overhead of synchronous per-span exports that SimpleSpanProcessor introduces.
Q2: A service receives an HTTP request and then calls a database. What span kinds are correct for (a) the HTTP handler span and (b) the database call span?
(a) SERVER — the service is receiving an inbound request.
(b) CLIENT — the service is making an outbound call to the database.
Q3: What is the difference between span attributes and resources?
Resources are set once at SDK initialization and apply to ALL telemetry from that provider (e.g., service.name, host.name). Attributes are per-span (or per-metric data point) and describe the specific operation (e.g., http.request.method, db.statement).
Q4: You set OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4317 and OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://traces-collector:4317. Which endpoint do traces go to?
http://traces-collector:4317. Signal-specific env vars override the general OTEL_EXPORTER_OTLP_ENDPOINT.
Q5: Name three differences between a Counter and a Histogram instrument.
- Counter records monotonically increasing sums (e.g., total requests); Histogram records distributions of values (e.g., latency).
- Counter uses
.add(); Histogram uses.record(). - Counter aggregates to a Sum; Histogram aggregates to bucket counts, sum, min, max.
Q6: What does a log bridge do, and why doesn't OTel replace existing logging libraries?
A log bridge connects existing language-native loggers (Python logging, Java Log4j) to the OTel LoggerProvider pipeline. OTel does not replace loggers because applications already have extensive logging — the bridge enriches existing logs with trace context and routes them to OTel exporters without requiring code changes.
Q7: Explain cumulative vs delta temporality for a Counter that receives: +10, +20, +5 over three collection intervals.
Cumulative reports the running total: 10, 30, 35. Delta reports the per-interval change: 10, 20, 5.
Q8: What header does W3C TraceContext use for propagation, and what are its four fields?
The traceparent header. Its four fields are: version (2 hex chars, currently 00), trace-id (32 hex chars), parent-id/span-id (16 hex chars), and trace-flags (2 hex chars, 01 = sampled).
Q9: A Java application needs OpenTelemetry instrumentation but the team cannot modify source code. What approach should they use?
Use the OpenTelemetry Java agent (-javaagent:opentelemetry-javaagent.jar). It uses bytecode manipulation at class load time to automatically instrument supported libraries (HTTP, gRPC, JDBC, etc.) without any source code changes.
Q10: What are exemplars and when would you use them?
Exemplars are sample trace_id/span_id references attached to metric data points. They link metrics to traces — when you see a latency spike in a histogram bucket on your dashboard, exemplars let you jump directly to a representative trace that contributed to that bucket.
Hands-On Exercise: Build an Instrumented Python Service
Section titled “Hands-On Exercise: Build an Instrumented Python Service”Goal: Wire up a complete OTel pipeline with traces and metrics, then verify the output.
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc- Create a Python script that initializes a
TracerProviderwith aBatchSpanProcessorandConsoleSpanExporter. - Create a
MeterProviderwith aPeriodicExportingMetricReader(5-second interval) andConsoleMetricExporter. - Set the resource to
service.name=exercise-svc. - Create a parent span called
handle-requestwith kindSERVER. - Inside it, create a child span
call-databasewith kindCLIENTand attributedb.system=postgresql. - Record a counter increment for
requests.totalwith attributehttp.request.method=GET. - Record a histogram observation for
request.durationwith value 142.5 ms. - In the child span, simulate an exception: call
record_exceptionand set status toERROR. - Flush and verify you see both spans and metrics in console output.
Success Criteria
Section titled “Success Criteria”- Console output shows two spans with correct parent-child relationship (same
trace_id, child’sparent_idmatches parent’sspan_id) - Parent span kind is
SPAN_KIND_SERVER, child isSPAN_KIND_CLIENT - Child span has status
ERRORand anexceptionevent - Metrics output shows
requests.totalcounter andrequest.durationhistogram - Resource on all telemetry shows
service.name=exercise-svc
Key Takeaways
Section titled “Key Takeaways”- All three signals follow the same pattern: Provider → Processor/Reader → Exporter. Learn one, understand all.
- BatchSpanProcessor for production, always. SimpleSpanProcessor is for debugging only.
- Resources describe the service, attributes describe the operation. Never put
service.nameon a span attribute. - Span kind describes the communication role, not whether code is “internal.” Outgoing calls are CLIENT.
- OTLP is the standard export protocol. Jaeger and Zipkin exporters exist but OTLP is preferred and spec-mandated.
- Env vars configure everything but programmatic config wins. Signal-specific vars override general ones.
- Auto-instrumentation covers libraries, not business logic. You still need manual spans for domain-specific operations.
- Cumulative vs delta temporality must match your backend. Getting this wrong produces garbage data.
Next Module: Module 2: OTel Collector Architecture — how to receive, process, and export telemetry at scale.