Skip to content

Module 1.3: Istio Security & Troubleshooting

Complexity: [COMPLEX] Time to Complete: 70-90 minutes Kubernetes version: 1.35+


Before starting this module, you should be comfortable reading Istio custom resources, following traffic through an injected sidecar, and translating a failed HTTP request into a hypothesis about identity, routing, or policy. The module assumes you have already worked through the earlier ICA modules and can run kubectl and istioctl against a disposable Kubernetes 1.35+ cluster without needing command-by-command orientation.


After completing this module, you will be able to connect symptoms to the right Istio security layer, apply policies in a controlled sequence, and prove the result from proxy state rather than trusting that YAML was accepted by the Kubernetes API server. Each outcome is intentionally testable in the quiz or hands-on exercise because the ICA exam expects operational reasoning, not memorized manifest fragments.

  1. Diagnose mTLS handshake failures, rejected requests, and complex policy conflicts using istioctl analyze and Envoy access logs.
  2. Implement PeerAuthentication policies to enforce mutual TLS systematically across distributed namespaces and workloads reliably.
  3. Design fine-grained AuthorizationPolicy rules that securely integrate JWT validation and role-based access controls for deep defense.
  4. Evaluate complex proxy states using istioctl proxy-status and proxy-config to identify critical synchronization anomalies.
  5. Compare STRICT and PERMISSIVE mTLS modes to formulate completely safe migration strategies for sensitive legacy services.

Hypothetical scenario: your team enables STRICT mTLS for a namespace after several weeks of successful traffic-management work, and the first visible symptom is not a helpful policy error but a flood of connection reset by peer, 503 no healthy upstream, and 403 Forbidden responses. The Kubernetes Services still exist, the Deployments still report ready replicas, and the application logs may show only timeout handling. In a mesh, the failing component is often the policy or proxy layer between two healthy Pods, which means ordinary application debugging can waste time unless you know where Istio makes each decision.

Istio security is powerful because it moves authentication, encryption, and authorization out of each application and into a consistent data-plane layer. That same separation is also why failures feel indirect: the app did not reject the request, the local Envoy proxy did after combining PeerAuthentication, RequestAuthentication, AuthorizationPolicy, DestinationRule, Gateway, and xDS state. The operational skill is learning to ask which layer made the decision, then proving that answer with istioctl analyze, istioctl proxy-status, istioctl proxy-config, and Envoy access logs.

Security and troubleshooting are tightly coupled on the ICA exam because secure configuration is only useful when you can keep it understandable under pressure. A correct AuthorizationPolicy can still become an outage if it silently blocks kubelet probes, an mTLS migration can still fail if one workload lacks a sidecar, and a Gateway can still terminate TLS incorrectly if its credential reference is in the wrong namespace. This module teaches the mental model first, then uses the original hands-on assets to practice controlled changes, deliberate breakage, and evidence-based repair.

Building security analogy

Istio security works like a modern high-security office building. PeerAuthentication is the lock on the service door because it decides whether the destination workload accepts plaintext, mTLS, or both. RequestAuthentication is the badge reader because it validates a JWT if a client presents one, but it does not decide which room the badge holder may enter. AuthorizationPolicy is the access-control rulebook because it combines peer identity, request identity, HTTP attributes, and custom authorization results into the final allow or deny decision.

Exercise scenario: a platform engineer wants to enable mTLS across the entire production mesh and starts with a mesh-wide PeerAuthentication resource in istio-system. The manifest is syntactically valid, but one critical dependency still communicates from outside the mesh and therefore cannot present an Istio-issued workload certificate. The lesson is not that STRICT mode is unsafe; the lesson is that STRICT mode is a server-side promise that every required caller can complete the mutual TLS handshake.

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT

The safer migration starts by accepting both encrypted and plaintext traffic while you inventory which workloads actually have sidecars and which ports still represent legacy paths. PERMISSIVE mode is useful because it gives you time to observe traffic without breaking callers, but it is not the destination for sensitive service-to-service paths. As soon as the inventory is clear, you move enforcement closer to the workload or namespace where you can validate both clients and dependencies.

# Step 1: Start with PERMISSIVE (accepts both mTLS and plaintext)
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: PERMISSIVE
# Step 2: Identify services without sidecars
# istioctl proxy-status (shows which pods have proxies)
# Step 3: Exclude specific ports or services
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
portLevelMtls:
8080:
mode: DISABLE # Legacy service port
# Step 4: Or apply STRICT per-namespace, not mesh-wide
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: payments # Only this namespace
spec:
mtls:
mode: STRICT

The hard lesson is to treat mesh security as a rollout, not a toggle. Start with PERMISSIVE where unknown callers exist, use istioctl proxy-status to verify that expected workloads have connected Envoy proxies, apply STRICT at a namespace or workload boundary, and use port-level exceptions only for explicitly documented legacy interfaces. Pause and predict: if you enable STRICT on the destination but the source has no sidecar, which side will reject the connection, and what evidence would you expect in proxy status or logs?


At its core, mTLS provides mutual authentication between two workloads, not merely encryption between two IP addresses. Each side proves its identity with a certificate issued for its Kubernetes service account, and the Envoy sidecars complete the handshake before the application container receives ordinary plaintext HTTP or gRPC. This split is important operationally because the application might be healthy while the proxy refuses the connection, so a failed request can be a security decision rather than an application crash.

Without mTLS:
Pod A ──── plaintext HTTP ────► Pod B
(anyone can intercept)
With mTLS:
Pod A Pod B
┌──────────────┐ ┌──────────────┐
│ App │ │ App │
│ ↓ │ │ ↑ │
│ Envoy Proxy │◄── encrypted TLS ────►│ Envoy Proxy │
│ (has cert A) │ (mutual verify) │ (has cert B) │
└──────────────┘ └──────────────┘
Both sides verify each other's identity via SPIFFE certificates
issued by istiod's built-in CA.

The same flow appears below as a Mermaid diagram because it helps separate application traffic from sidecar-to-sidecar transport. Notice that the application containers do not need TLS libraries, private keys, or per-service certificate reload logic; Istio centralizes that responsibility in the data plane and distributes identity through the control plane.

flowchart LR
subgraph Pod A
AppA[App]
ProxyA[Envoy Proxy<br/>has cert A]
AppA -->|plaintext HTTP| ProxyA
end
subgraph Pod B
ProxyB[Envoy Proxy<br/>has cert B]
AppB[App]
ProxyB -->|plaintext HTTP| AppB
end
ProxyA <-->|encrypted TLS<br/>mutual verify| ProxyB

Certificate identity formulation: Each individual workload automatically receives a unique SPIFFE (Secure Production Identity Framework for Everyone) identity. This standard allows systems to securely identify each other:

spiffe://cluster.local/ns/default/sa/reviews
└─ trust domain └─ namespace └─ service account

Stop and think: why would a legacy application without a sidecar fail to communicate with a mesh workload when STRICT mode is enabled? The destination proxy is not checking whether the source Pod is important to the business; it is checking whether the incoming connection can prove a trusted workload identity, so a caller without Envoy cannot present the expected Istio certificate.

PeerAuthentication resources dictate the inbound mTLS behavior for workloads residing in the mesh, which makes them server-side enforcement policies rather than client-side routing hints. A common mistake is to assume that enabling a policy makes every client automatically speak mTLS, but the actual enforcement point is the destination Envoy listener. The destination decides whether it will accept plaintext, require mutual TLS, or inherit a parent policy, while client behavior is either auto-detected by Istio or influenced by DestinationRule TLS settings.

The mesh-wide example is powerful because a policy named default in istio-system establishes the broadest baseline. Use this only after you have measured mesh participation because it changes the acceptance rule for every workload that does not have a more specific namespace or workload policy.

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system # Mesh-wide when in istio-system
spec:
mtls:
mode: STRICT # Require mTLS for all services

The namespace-level example is the safer next step for most migrations because it limits the blast radius to one team or one application boundary. It is also easier to explain during an incident because the policy’s namespace matches the destination workloads affected by the rule.

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: payments # Only affects this namespace
spec:
mtls:
mode: STRICT

The workload-level example narrows enforcement to Pods selected by label, which is useful when one service in a namespace can be hardened before its neighbors. Label selection is also where configuration drift becomes dangerous, so verify labels with kubectl get pod --show-labels before assuming the policy targets what you intended.

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: reviews-mtls
namespace: default
spec:
selector:
matchLabels:
app: reviews # Only affects pods with this label
mtls:
mode: STRICT

The port-level example is a surgical override for the rare case where a workload has both mesh-native and legacy-facing ports. Treat it as temporary technical debt: it should name the exact port that cannot yet use mTLS, and it should be tracked so the exception does not become invisible platform behavior.

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: reviews-mtls
namespace: default
spec:
selector:
matchLabels:
app: reviews
mtls:
mode: STRICT
portLevelMtls:
8080:
mode: DISABLE # Disable mTLS on port 8080 only

The modes table summarizes the behavior you are choosing when you write the policy. The important exam distinction is that STRICT is a rejection rule for plaintext inbound traffic, while PERMISSIVE is a migration rule that accepts both mTLS and plaintext until you can prove every required path is mesh-ready.

ModeBehaviorUse Case
STRICTOnly accepts mTLS trafficProduction (full encryption)
PERMISSIVEAccepts both mTLS and plaintextMigration period
DISABLENo mTLSLegacy services, debugging
UNSETInherits from parentDefault behavior

When several PeerAuthentication resources could apply, Istio resolves them by specificity rather than by creation time. This priority model protects a workload-specific exception from being overwritten by a namespace or mesh baseline, but it also means you need to search for selectors when a workload behaves differently from the rest of its namespace.

Workload-level > Namespace-level > Mesh-level
(selector) (namespace) (istio-system)

While PeerAuthentication controls the server side acceptance rules, DestinationRule can control the client side outbound transmission behavior. Modern Istio usually auto-detects when an in-mesh destination should use ISTIO_MUTUAL, but explicit DestinationRule TLS settings still matter when you need to override defaults, originate TLS to an external service, or diagnose why a client proxy is not forming the expected upstream cluster.

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
tls:
mode: ISTIO_MUTUAL # Use Istio's mTLS certs

DestinationRule TLS modes describe the client-side origination choice, not the destination’s willingness to accept the connection. This is why mTLS troubleshooting often requires checking both sides: a destination may require STRICT mTLS while the source proxy has a stale or incorrect cluster configuration.

ModeDescription
DISABLENo TLS
SIMPLEOriginate TLS (client verifies server)
MUTUALOriginate mTLS (both verify each other)
ISTIO_MUTUALUse Istio’s built-in mTLS certificates

Exam tip: in the vast majority of in-mesh cases, you do not need to set the DestinationRule TLS mode explicitly because Istio can use auto mTLS for eligible destinations. You should reach for explicit TLS settings when you are intentionally overriding the default, integrating with an external service, or proving a client-side cluster configuration during troubleshooting. Before running this, what output do you expect from istioctl proxy-config clusters if the source proxy has already received an ISTIO_MUTUAL cluster for the destination?


Part 2: Request Authentication (JWT) Implementation

Section titled “Part 2: Request Authentication (JWT) Implementation”

RequestAuthentication resources validate JSON Web Tokens attached to incoming requests, but they deliberately stop short of making authorization decisions. That design keeps cryptographic validation separate from access control: one object answers “is this token valid for the configured issuer,” while another object answers “is this request allowed to reach this workload.” The distinction is subtle enough to cause real operational surprises because a request with an invalid token is rejected, while a request with no token can still pass unless an AuthorizationPolicy requires an authenticated request principal.

When implementing a RequestAuthentication policy, you provide the issuer and the public key endpoint so the proxy can cryptographically verify the signature of incoming JWTs. Envoy uses that public key material to validate token structure, issuer, signature, and expiry before forwarding the request upstream. If you configure multiple rules, the proxy can accept tokens from multiple issuers, which is useful during identity-provider migrations but dangerous if old issuers remain trusted longer than intended.

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: default
spec:
selector:
matchLabels:
app: productpage
jwtRules:
- issuer: "https://accounts.google.com"
jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
- issuer: "https://my-auth.example.com"
jwksUri: "https://my-auth.example.com/.well-known/jwks.json"
forwardOriginalToken: true # Forward JWT to upstream
outputPayloadToHeader: "x-jwt-payload" # Extract claims to header

The RequestAuthentication step has three practical outcomes that are worth memorizing because they explain most confusing JWT failures:

  1. If a request arrives bearing a JWT, it meticulously validates it (checking issuer, structural signature, and active expiry status).
  2. If the attached JWT is fundamentally invalid, it immediately rejects the request with an HTTP 401 Unauthorized response.
  3. If the request arrives with NO JWT at all, it allows the request to pass through completely unhindered (this deeply surprises many engineers!).

To actually require the presence of a valid JWT, you must deploy a corresponding AuthorizationPolicy. That extra policy usually matches requestPrincipals when you want to allow only authenticated requests, or notRequestPrincipals in a DENY rule when you want to reject missing identity before considering more specific ALLOW rules. Keep the two resources separate in your mental model: RequestAuthentication creates verified request identity, and AuthorizationPolicy consumes that identity.

You can extract internal JWT claims and project them into HTTP headers for downstream authorization or routing logic. This is useful when an application needs a normalized identity attribute but should not parse the whole token itself, and it is also useful when a routing policy needs to branch on a claim such as group, tenant, or subject. The tradeoff is exposure: once claims are copied into headers, you must ensure that only the proxy can set or trust those headers.

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: default
spec:
selector:
matchLabels:
app: frontend
jwtRules:
- issuer: "https://auth.example.com"
jwksUri: "https://auth.example.com/.well-known/jwks.json"
outputClaimToHeaders:
- header: x-jwt-sub
claim: sub
- header: x-jwt-groups
claim: groups

This extraction process bridges the gap between raw cryptographic validation and business-level application logic. It should not become a substitute for application authorization when domain rules are complex, but it is effective for coarse-grained enforcement at the mesh edge. Which approach would you choose here and why: require a token at the ingress gateway only, require it again on the backend workload, or combine both with different policies for north-south and east-west traffic?


AuthorizationPolicy is Istio’s final access-control mechanism, and it becomes the place where peer identity, request identity, HTTP method, path, namespace, IP range, and external authorization results are combined into an allow or deny decision. The rule language is expressive enough to model most service-to-service boundaries, but the default behavior changes depending on which policy actions exist. That is why policy design should start with the evaluation order before it starts with YAML snippets.

Understanding the exact evaluation sequence is crucial to preventing accidental lockouts or security breaches. Istio checks CUSTOM policies first, then DENY policies, and finally ALLOW policies, with a default allow only when no applicable AuthorizationPolicy exists for the workload. Once any ALLOW policy applies to a workload, traffic that does not match an ALLOW rule is denied, which is an intentional zero-trust behavior but a common source of surprise.

Request arrives
┌─ CUSTOM policies ─┐ (if any, checked first via external authz)
│ Match? → delegate │
└────────────────────┘
┌─── DENY policies ──┐ (checked second)
│ Match? → REJECT │
└─────────────────────┘
┌── ALLOW policies ──┐ (checked third)
│ Match? → ALLOW │
│ No match? → DENY │ ← If ANY allow policy exists, default is deny
└─────────────────────┘
No policies? → ALLOW (default)

The logical flowchart below captures the same ordering in a form you can use during incident triage. If you see a 403, do not jump straight to the ALLOW rule you just edited; first ask whether a DENY or CUSTOM policy matched earlier and short-circuited the request.

flowchart TD
Req[Request arrives] --> CPol[CUSTOM policies]
CPol -- Match --> Del[Delegate to external authz]
CPol -- No Match --> DPol[DENY policies]
DPol -- Match --> Rej[REJECT Request]
DPol -- No Match --> APol[ALLOW policies]
APol -- Match --> Allow1[ALLOW Request]
APol -- No Match --> Deny1[DENY Request]
APol -- No policies exist? --> Allow2[ALLOW Request default]

Critical security warning: if there are no AuthorizationPolicies defined for a particular workload, all traffic is permitted by default. However, the exact moment you instantiate any applicable ALLOW policy, all traffic that does not explicitly match an ALLOW rule is automatically denied. Pause and predict: if you apply an ALLOW policy to a workload that permits traffic from the frontend service, what happens to critical health-check requests originating from the Kubernetes kubelet, and how would you verify whether probes are affected?

An ALLOW policy explicitly permits selected traffic patterns while defaulting the rest to a denied state for the selected workload. This is the right model for sensitive services because it makes the intended callers visible, but it requires you to think about non-user traffic such as probes, metrics scraping, batch jobs, and administrative endpoints. In policy reviews, read the selector first, then read the source identities and operations, because a perfect rule on the wrong workload is still the wrong rule.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-reviews
namespace: default
spec:
selector:
matchLabels:
app: reviews
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/productpage"]
to:
- operation:
methods: ["GET"]
paths: ["/reviews/*"]

This highly specific policy allows HTTP GET requests to the /reviews/* endpoint from the productpage service account and denies everything else that reaches the selected reviews workload. The principal string is derived from the SPIFFE identity, so it couples namespace and service account rather than Pod name or Deployment name. That coupling is a strength because Pods churn constantly, but it also means a service account rename can become an authorization outage.

A DENY policy acts as a high-priority rejection rule. Because it is evaluated before ALLOW policies, it is useful for blocking known-dangerous paths, namespaces, or IP ranges even when a broader ALLOW policy would otherwise permit the request. Use DENY sparingly and document it clearly, because a matching DENY can make a correct ALLOW policy look broken until you remember the evaluation order.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-external
namespace: default
spec:
selector:
matchLabels:
app: internal-api
action: DENY
rules:
- from:
- source:
notNamespaces: ["default", "backend"]
to:
- operation:
paths: ["/admin/*"]

This directive denies any request aimed at the /admin/* endpoint on the selected internal API when the request originates outside the listed namespace boundaries. It is a useful pattern for administrative paths, but it should be paired with logs and tests so teams can distinguish intentional rejection from accidental lockout. When debugging, look for the path, method, source namespace, and response flag in the proxy access log before changing the policy.

3.4 Require JWT Integration (Combining Policies)

Section titled “3.4 Require JWT Integration (Combining Policies)”

Because RequestAuthentication alone does not inherently require a token to be present, you must combine it with an explicit AuthorizationPolicy to enforce secure token presence across your architecture. The two-step process below first validates a token if one exists, then rejects requests that do not produce a request principal. This pattern is especially common at ingress gateways and public-facing APIs where anonymous requests should not reach the application at all.

# Step 1: Validate JWT if present
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: require-jwt
namespace: default
spec:
selector:
matchLabels:
app: productpage
jwtRules:
- issuer: "https://auth.example.com"
jwksUri: "https://auth.example.com/.well-known/jwks.json"
# Step 2: DENY requests without valid JWT
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: require-jwt
namespace: default
spec:
selector:
matchLabels:
app: productpage
action: DENY
rules:
- from:
- source:
notRequestPrincipals: ["*"] # No valid JWT principal = deny

You can apply policies broadly to an entire namespace to establish a baseline security posture, but broad policies are where small YAML mistakes have the largest operational effect. A namespace-level ALLOW policy can intentionally create a same-namespace boundary, while an empty policy can create a deny-all posture that must be opened with additional policies. Use these patterns when a team owns the whole namespace and agrees that services should not be reachable until explicitly allowed.

# Allow all traffic within the namespace
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-same-namespace
namespace: backend
spec:
action: ALLOW
rules:
- from:
- source:
namespaces: ["backend"]
# Deny all traffic (explicit deny-all)
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: backend
spec:
{} # Empty spec = deny all

Mastering these common syntactical patterns will speed up policy authoring, but the deeper skill is knowing what each pattern proves. Method and path rules prove HTTP intent, principal rules prove workload identity, claim rules prove authenticated request identity, and IP rules prove network origin only as seen by the proxy. Combine them when you need defense in depth, and avoid combining them when a simpler principal-based rule is easier to test and maintain.

The following snippets are deliberately small so you can recognize the matching fields inside larger policies. Read each as a reusable fragment rather than a complete policy; the selector and action still determine where and how the fragment applies.

rules:
- to:
- operation:
methods: ["GET", "HEAD"]

Allow explicitly from strictly defined service accounts

Section titled “Allow explicitly from strictly defined service accounts”
rules:
- from:
- source:
principals: ["cluster.local/ns/frontend/sa/webapp"]

Allow dynamically based on embedded JWT claims

Section titled “Allow dynamically based on embedded JWT claims”
rules:
- from:
- source:
requestPrincipals: ["https://auth.example.com/*"]
when:
- key: request.auth.claims[role]
values: ["admin"]

Allow strictly from specific IP CIDR blocks

Section titled “Allow strictly from specific IP CIDR blocks”
rules:
- from:
- source:
ipBlocks: ["10.0.0.0/8"]

Securing the ingress gateway edge is where mesh security meets ordinary client-facing TLS. Inside the mesh, workload identity is normally represented by Istio-issued SPIFFE certificates; at the edge, users and external systems usually expect public DNS names, browser-trusted certificates, and sometimes client certificate authentication. Treat the ingress Gateway as a separate security boundary because it has to translate internet-facing TLS requirements into mesh routing decisions without weakening the policies that protect internal services.

4.1 Simple TLS Configuration (Server Certificate Only)

Section titled “4.1 Simple TLS Configuration (Server Certificate Only)”

In SIMPLE mode, the client verifies the identity of the server, but the server does not demand a certificate back from the client. This is the normal HTTPS model for browser traffic and many public APIs: the gateway presents a certificate for app.example.com, the client verifies that certificate chain, and Istio forwards accepted requests through the configured Gateway and VirtualService path. The secret reference is operationally important because the gateway cannot terminate TLS if the credential is missing, malformed, or created in the wrong namespace for the deployment model.

First, inject your TLS material into the appropriate system namespace. In production you would normally automate certificate issuance and renewal, but the command below preserves the mechanical shape of the secret that the Gateway expects.

Terminal window
# Create TLS secret
kubectl create -n istio-system secret tls my-tls-secret \
--key=server.key \
--cert=server.crt

Next, reference it within your edge Gateway. The credentialName value must match the Kubernetes Secret name because Envoy receives the certificate material from Istio’s secret distribution path, not from a file path inside your application container.

apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: secure-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "app.example.com"
tls:
mode: SIMPLE
credentialName: my-tls-secret

4.2 Mutual TLS at Ingress (Client Certificates)

Section titled “4.2 Mutual TLS at Ingress (Client Certificates)”

In MUTUAL mode, the gateway demands that the connecting client present a valid certificate, creating a fully authenticated two-way bridge at the edge. This is common for partner APIs, internal corporate clients, and machine-to-machine integrations where possession of a trusted client certificate is part of the access contract. It is not a replacement for AuthorizationPolicy inside the mesh; it authenticates the external TLS client at the gateway, while internal service access may still need workload identity, JWT claims, or path-specific policy.

Terminal window
# Create secret with CA cert for client verification
kubectl create -n istio-system secret generic my-mtls-secret \
--from-file=tls.key=server.key \
--from-file=tls.crt=server.crt \
--from-file=ca.crt=ca.crt
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: mtls-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "secure.example.com"
tls:
mode: MUTUAL # Require client certificate
credentialName: my-mtls-secret

When a service mesh fails, the symptom is often separated from the cause by several layers of generated proxy configuration. A 403 can come from a DENY policy, an ALLOW policy defaulting everything else to deny, a missing JWT principal, or an external authorization service; a 503 can come from an undefined subset, an empty endpoint list, a stale proxy, or a healthy upstream that is unreachable because mTLS negotiation failed. A systematic approach prevents you from rewriting random manifests while the real signal is already visible in Istio analysis output or Envoy configuration.

istioctl analyze is the first command you should run when something is not working as expected because it checks Kubernetes and Istio resources together before you chase runtime symptoms. It catches errors such as references to missing hosts, undefined subsets, invalid schemas, and gateway credential problems that are easy to miss during manual YAML review. It does not prove that live traffic is healthy, but it narrows the problem space quickly and often explains why a proxy never received the configuration you expected.

Terminal window
# Analyze all namespaces
istioctl analyze --all-namespaces
# Analyze specific namespace
istioctl analyze -n default
# Analyze a specific file before applying
istioctl analyze my-virtualservice.yaml
# Common warnings/errors:
# IST0101: Referenced host not found
# IST0104: Gateway references missing secret
# IST0106: Schema validation error
# IST0108: Unknown annotation
# IST0113: VirtualService references undefined subset

After static analysis, check whether the distributed proxies are connected to istiod and fundamentally in sync with the control plane. istioctl proxy-status gives you a mesh-wide synchronization view across xDS resources, which is exactly what you need when one workload seems to ignore a recent policy or route. A stale proxy can make a fixed manifest appear broken because the local Envoy process has not acknowledged the latest configuration.

Terminal window
istioctl proxy-status

The output interpretation matters because the status columns map to different parts of Envoy behavior. A proxy can have routes synchronized while endpoints are stale, or listeners synchronized while clusters are wrong, so avoid reducing the whole table to a single healthy or unhealthy label.

NAME CDS LDS EDS RDS ECDS ISTIOD
productpage-v1-xxx.default SYNCED SYNCED SYNCED SYNCED SYNCED istiod-xxx
reviews-v1-xxx.default SYNCED SYNCED SYNCED SYNCED SYNCED istiod-xxx
ratings-v1-xxx.default STALE SYNCED SYNCED SYNCED SYNCED istiod-xxx ← Problem!
StatusMeaningAction
SYNCEDProxy has latest config from istiodNormal
NOT SENTistiod hasn’t sent config (no changes)Usually normal
STALEProxy hasn’t acknowledged latest configInvestigate — restart pod or check connectivity

Decoding the xDS config types gives you the vocabulary for deeper debugging. When an upstream service is missing, think CDS or EDS; when an HTTP path is routed incorrectly, think RDS; when traffic does not reach the application port at all, think LDS.

TypeFull NameWhat It Configures
CDSCluster Discovery ServiceUpstream clusters (services)
LDSListener Discovery ServiceInbound/outbound listeners
EDSEndpoint Discovery ServiceEndpoints (pod IPs)
RDSRoute Discovery ServiceHTTP routing rules
ECDSExtension Config DiscoveryWASM extensions

The istioctl proxy-config suite lets you inspect what the Envoy proxy is actually configured to execute under the hood. This is the decisive step when Kubernetes objects look correct but runtime behavior disagrees, because the proxy config is the compiled result that Envoy will enforce. The goal is not to memorize every JSON field; the goal is to know which view answers your current question.

Terminal window
# List all clusters (upstream services) for a pod
istioctl proxy-config clusters productpage-v1-xxx.default
# List listeners (what ports Envoy is listening on)
istioctl proxy-config listeners productpage-v1-xxx.default
# List routes (HTTP routing rules)
istioctl proxy-config routes productpage-v1-xxx.default
# List endpoints (actual pod IPs)
istioctl proxy-config endpoints productpage-v1-xxx.default
# Show the full Envoy config dump
istioctl proxy-config all productpage-v1-xxx.default -o json
# Filter by specific service
istioctl proxy-config endpoints productpage-v1-xxx.default \
--cluster "outbound|9080||reviews.default.svc.cluster.local"

Enabling proxy access logs is essential for visualizing each request as it flows through the abstracted mesh layers. Access logs show status codes, response flags, upstream clusters, request paths, duration, and authority values that often tell you whether the request reached the proxy, matched a route, reached an upstream, or failed at a policy boundary. Because logs can be noisy, enable a format and retention strategy that fit your environment rather than leaving raw debug output as your only incident tool.

Terminal window
# Enable via mesh config
istioctl install --set meshConfig.accessLogFile=/dev/stdout -y
# View logs for a specific pod's sidecar
kubectl logs productpage-v1-xxx -c istio-proxy
# Sample log entry:
# [2024-01-15T10:30:00.000Z] "GET /reviews/1 HTTP/1.1" 200 - via_upstream
# - 0 325 45 42 "-" "curl/7.68.0" "xxx" "reviews:9080"
# "10.244.0.15:9080" outbound|9080||reviews.default.svc.cluster.local
# 10.244.0.10:50542 10.96.10.15:9080 10.244.0.10:50540

The log format breakdown below is not just a reference; it is a checklist for interpretation. If there are no logs, traffic may not be reaching the proxy; if logs show 4xx responses, focus on authentication and authorization; if logs show 5xx or upstream flags, inspect clusters, endpoints, and mTLS.

[timestamp] "METHOD PATH PROTOCOL" STATUS_CODE FLAGS
- REQUEST_BYTES RESPONSE_BYTES DURATION_MS UPSTREAM_DURATION
"USER_AGENT" "REQUEST_ID" "AUTHORITY"
"UPSTREAM_HOST" UPSTREAM_CLUSTER
DOWNSTREAM_LOCAL DOWNSTREAM_REMOTE DOWNSTREAM_PEER

The table below preserves the common failure modes from the original module, but you should read it as a diagnostic map rather than a list of unrelated facts. Each row connects one visible symptom to one command that can confirm the hypothesis, then to the smallest safe fix. In an incident, that discipline keeps the first repair small enough to validate before you change the next layer.

IssueSymptomsDiagnosticFix
Missing sidecarService not in mesh, no mTLSkubectl get pod -o jsonpath='{.spec.containers[*].name}'Label namespace + restart pods
VirtualService not appliedTraffic ignores routing rulesistioctl analyze (IST0113)Check hosts match, gateway reference exists
mTLS STRICT with non-mesh serviceconnection reset by peeristioctl proxy-status (missing pod)Use PERMISSIVE or add sidecar
Stale proxy configOld routing rules in effectistioctl proxy-status (STALE)Restart the pod
Gateway TLS misconfiguredTLS handshake failureistioctl analyze (IST0104)Check credentialName matches K8s Secret
AuthorizationPolicy blocking403 Forbiddenkubectl logs <pod> -c istio-proxyCheck RBAC filters in access logs
Subset not defined503 no healthy upstreamistioctl analyze (IST0113)Create DestinationRule with matching subsets
Port name wrongProtocol detection failskubectl get svc -o yaml (check port names)Name ports as http-api, grpc-api, tcp-api

When something in your mesh inevitably stops working, follow this workflow in order unless you already have a stronger piece of evidence. The sequence starts with cheap static checks, then confirms proxy synchronization, then reads the exact route and upstream state, and only then changes configuration. Before running the second step, predict what you should see if the workload is missing a sidecar; that prediction helps you avoid interpreting an absent proxy as a stale one.

Step 1: istioctl analyze -n <namespace>
→ Catches 80% of misconfigurations
Step 2: istioctl proxy-status
→ Is the proxy connected? Is config synced?
Step 3: istioctl proxy-config routes <pod>
→ Does the proxy have the expected routing rules?
Step 4: kubectl logs <pod> -c istio-proxy
→ What does the access log show? 4xx? 5xx? Timeout?
Step 5: istioctl proxy-config clusters <pod>
→ Can the proxy see the upstream service?
Step 6: istioctl proxy-config endpoints <pod> --cluster <cluster>
→ Are there healthy endpoints?
Debugging Decision Tree
Something isn't working
Run istioctl analyze
│ │
Issues found No issues
│ │
Fix them ▼
Is sidecar injected?
│ │
No Yes
│ │
Inject it ▼
proxy-status SYNCED?
│ │
No Yes
│ │
Restart pod ▼
Check access logs
│ │
4xx/5xx No logs
│ │
Policy/route Traffic not
issue reaching proxy

The decision flow below repeats the workflow visually for exam practice and operational runbooks. The most important branch is the split between “proxy has bad evidence” and “traffic never reached the proxy,” because those failures require different owners and different tools.

flowchart TD
Start[Something isn't working] --> Analyze[Run istioctl analyze]
Analyze --> Issues[Issues found]
Issues --> Fix[Fix them]
Analyze --> NoIssues[No issues]
NoIssues --> CheckSidecar[Is sidecar injected?]
CheckSidecar -- No --> Inject[Inject it]
CheckSidecar -- Yes --> ProxyStatus[proxy-status SYNCED?]
ProxyStatus -- No --> Restart[Restart pod]
ProxyStatus -- Yes --> AccessLogs[Check access logs]
AccessLogs -- 4xx/5xx --> PolicyIssue[Policy/route issue]
AccessLogs -- No logs --> TrafficIssue[Traffic not reaching proxy]

5.7 Worked Example: Separating Policy Failure from Routing Failure

Section titled “5.7 Worked Example: Separating Policy Failure from Routing Failure”

Imagine a learner reports that productpage can no longer call reviews, but they do not yet know whether the symptom is security or routing. Start by asking for the observable response. A 403 points you toward AuthorizationPolicy, RequestAuthentication, or external authorization, while a 503 points you toward route selection, subsets, clusters, endpoints, or upstream health. The exact status code is not everything, but it is enough to choose the next diagnostic branch.

If the response is 403, inspect the selected AuthorizationPolicies before changing any VirtualService. Confirm whether an applicable DENY rule matches first, then check whether an ALLOW policy exists and fails to include the caller’s principal, method, or path. The strongest evidence is an access log entry from the destination sidecar that shows the denied request, because it proves the request reached Envoy and was rejected at the policy layer rather than disappearing in service discovery.

If the response is 503, pivot to route and endpoint evidence. Run istioctl analyze to catch an undefined subset, then inspect the client proxy routes to see whether Envoy has the expected route, and finally inspect endpoints for the upstream cluster. This order keeps you from editing authorization rules when the proxy simply has no healthy target for the route that matched the request.

The same habit applies to mTLS migration. A plaintext caller rejected by STRICT PeerAuthentication may look like a networking problem from the application perspective, but the right evidence is sidecar presence, proxy synchronization, and mTLS-capable cluster configuration. When you can explain why each command belongs to a layer, your troubleshooting becomes faster because every result either confirms or eliminates one class of failure.

Keep a small incident note while you troubleshoot: symptom, first command, observed evidence, and next hypothesis. That note prevents duplicated checks when another operator joins, and it gives you a clean explanation for why the final fix was a policy edit, a route edit, a sidecar restart, or an mTLS migration rollback.


Strong Istio security implementations share a rollout shape: they begin with observation, move enforcement toward the smallest boundary that can be tested, and only then generalize the policy. That does not mean security should be slow; it means the first policy should be paired with a way to prove who it affects. In a mesh, proof comes from sidecar presence, xDS synchronization, access logs, and a narrow request that demonstrates the intended allow or deny path.

PatternWhen to Use ItWhy It WorksScaling Consideration
PERMISSIVE-to-STRICT migrationYou have unknown callers, mixed sidecar coverage, or legacy dependenciesIt accepts both plaintext and mTLS while you inventory real traffic and then hardens one boundary at a timeTrack every exception and convert namespace policies to workload policies when only one service still needs flexibility
RequestAuthentication plus AuthorizationPolicyYou need JWT validation and enforcement rather than best-effort token checkingRequestAuthentication validates the token while AuthorizationPolicy requires and consumes the resulting request principalKeep issuers and JWKS URIs owned by platform or identity teams so stale issuers do not linger
Principal-based workload authorizationYou can express trust in terms of namespace and service account identitySPIFFE principals survive Pod churn and Deployment rollouts better than IP or Pod-name matchingStandardize service account naming so policies remain readable across many namespaces
Evidence-first troubleshootingYou face 403, 503, route mismatch, or handshake symptoms under time pressureanalyze, proxy-status, proxy-config, and access logs each answer a different layer-specific questionCapture the workflow in runbooks so teams do not skip directly to restarts or broad policy deletion

Anti-patterns usually appear when teams try to make the mesh disappear behind one giant default. Mesh security is not a single switch because every request combines source identity, destination policy, request attributes, routing configuration, and proxy sync state. The safer alternative is to make each control visible enough that a reviewer can explain both the expected success path and the expected failure path.

Anti-PatternWhat Goes WrongBetter Alternative
Mesh-wide STRICT before inventoryNon-mesh callers and legacy ports fail immediately, often with confusing transport errorsStart with PERMISSIVE, list connected proxies, test critical paths, then enforce per namespace or workload
RequestAuthentication without an enforcing policyInvalid tokens fail, but anonymous requests may still reach the serviceAdd an AuthorizationPolicy that requires requestPrincipals or denies notRequestPrincipals
Broad ALLOW policy without probe awarenessHealth checks, scrapers, and automation can be denied even though user traffic worksInclude explicit rules for required operational callers or place probe endpoints outside the protected path
Debugging by deleting all policiesYou restore traffic but lose the evidence needed to understand which rule failedDisable the smallest suspected policy, preserve logs, and verify the exact route or principal before widening access

Use the decision framework below when you need to choose the next action under uncertainty. Start by naming the symptom as precisely as possible, then choose the tool that can falsify one hypothesis at a time. This structure is intentionally conservative because most service-mesh incidents become worse when an operator changes PeerAuthentication, DestinationRule, and AuthorizationPolicy together before checking which layer made the original decision.

flowchart TD
Symptom[Request fails or policy change behaves unexpectedly] --> AnalyzeFirst[Run istioctl analyze]
AnalyzeFirst --> StaticIssue{Static config issue found?}
StaticIssue -- Yes --> FixStatic[Fix host, subset, schema, or secret reference]
StaticIssue -- No --> Status[Run istioctl proxy-status]
Status --> Synced{Relevant proxy synced?}
Synced -- No --> SyncRepair[Check sidecar, istiod connectivity, restart only affected pod]
Synced -- Yes --> Code{Status code or symptom}
Code -- 401 or 403 --> Authz[Inspect RequestAuthentication, AuthorizationPolicy, principals, and claims]
Code -- 503 or reset --> Network[Inspect clusters, endpoints, DestinationRule TLS, and mTLS mode]
Code -- Route mismatch --> Routes[Inspect proxy-config routes and VirtualService hosts]
Authz --> Logs[Confirm with istio-proxy access logs]
Network --> Logs
Routes --> Logs
SymptomFirst QuestionBest First ToolLikely LayerSafe Next Move
401 UnauthorizedWas a JWT present and valid for the configured issuer?Access logs and RequestAuthentication reviewRequest authenticationVerify issuer, JWKS URI, expiry, and token location before editing authorization
403 ForbiddenDid a DENY match or did ALLOW default-deny reject the request?Access logs plus AuthorizationPolicy reviewAuthorizationCheck source principal, request principal, path, method, and DENY precedence
503 no healthy upstreamDoes the proxy have clusters and endpoints for the intended destination?istioctl proxy-config clusters/endpointsRouting or endpoint discoveryConfirm DestinationRule subsets, Service selectors, and endpoint readiness
connection reset by peer after STRICTCan both sides participate in Istio mTLS?istioctl proxy-status and sidecar inspectionPeer authenticationRevert to PERMISSIVE for the boundary or inject the missing sidecar
TLS handshake failure at ingressCan the gateway load the named credential and match the host?istioctl analyze and gateway logsGateway TLSVerify Secret namespace, credentialName, hosts, and certificate chain

The framework also helps with exam pacing. If the question gives you a broken subset name, do not waste time inspecting JWT policies; if it gives you a missing token and a RequestAuthentication object, look for the AuthorizationPolicy that actually requires identity; if it gives you a stale proxy, do not assume your YAML is wrong until you have refreshed or repaired the proxy connection. Good troubleshooting is not a longer list of commands; it is choosing the command whose result will change your next decision.

  • Istio graduated its core security APIs to security.istio.io/v1, which means PeerAuthentication, RequestAuthentication, and AuthorizationPolicy examples should use the stable API version in modern Kubernetes 1.35+ practice environments.
  • Istio workload identities follow the SPIFFE URI pattern spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>, so changing a service account name changes the principal that AuthorizationPolicy rules match.
  • Envoy receives separate xDS resource types for clusters, listeners, endpoints, routes, and extension configuration, which is why proxy-status can show one column stale while the others remain synchronized.
  • RequestAuthentication rejects invalid JWTs with authentication failure, but a request with no JWT is not rejected by that resource alone; authorization is the layer that makes identity mandatory.

MistakeWhy It HappensHow to Fix It
Enabling STRICT mTLS with non-mesh servicesThe destination starts requiring Istio certificates, but one caller or port cannot present a sidecar-issued identityUse PERMISSIVE during migration, inject the missing sidecar, or add a documented port-level exception while legacy traffic is removed
Using RequestAuthentication without AuthorizationPolicyThe policy validates tokens that are present but does not require anonymous requests to carry oneAdd an AuthorizationPolicy that allows expected requestPrincipals or denies notRequestPrincipals: ["*"]
Creating an ALLOW policy without accounting for operational trafficAny applicable ALLOW policy creates default deny for traffic that does not match the rulesAdd explicit rules for probes, metrics, and required automation, then test both user and non-user paths
Treating spec: {} as harmless boilerplateAn empty AuthorizationPolicy is a deny-all policy for the selected scopeUse it only as an intentional baseline and pair it with explicit allow policies for every required caller
Placing a mesh-wide PeerAuthentication in the wrong namespaceMesh-wide default only works from istio-system, while the same object elsewhere affects only that namespacePut the mesh default in istio-system and use namespace or workload policies for narrower boundaries
Referencing a missing Gateway credentialNameThe gateway cannot load the certificate or client CA material needed for TLS terminationCreate the Kubernetes Secret in the expected namespace and rerun istioctl analyze before retesting traffic
Ignoring port naming conventionsProtocol detection and policy matching can behave unexpectedly when service ports are ambiguousName ports with protocol-aware prefixes such as http-, grpc-, or tcp- and review generated listeners
Forgetting DENY-before-ALLOW precedenceA DENY policy can reject a request before the ALLOW policy you are reading gets a chance to matchSearch all applicable policies, inspect access logs, and remove or narrow the DENY only after proving it matched

The mistakes above are deliberately specific because generic advice like “check your policy” is not useful during an outage. Each fix asks you to change the smallest boundary that explains the symptom, then verify from proxy evidence. That habit is what separates safe service-mesh operations from repeated broad rollbacks.


1. Your team enables STRICT mTLS with PeerAuthentication policies in a namespace and one legacy caller immediately receives connection resets. What should you check first, and why?

Start by checking whether both sides of the critical path have Envoy sidecars and appear in istioctl proxy-status. PeerAuthentication policies enforce mutual TLS at the destination, so a caller without a sidecar cannot present the required workload certificate when STRICT mode is active. If the caller is intentionally outside the mesh, use PERMISSIVE for the migration boundary or add a narrow port-level exception while you remove the legacy dependency. Do not first rewrite AuthorizationPolicy because a reset during handshake happens before HTTP authorization can explain the request.

2. You create RequestAuthentication for a service, but anonymous requests still reach the application. What resource is missing?

The missing resource is an AuthorizationPolicy that requires a valid request principal or denies requests without one. RequestAuthentication validates a JWT when a request carries one, and it rejects invalid tokens, but it does not require every request to include a token. A DENY policy using notRequestPrincipals: ["*"] is a common way to reject anonymous traffic after JWT validation has been configured.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: require-jwt
spec:
selector:
matchLabels:
app: myservice
action: DENY
rules:
- from:
- source:
notRequestPrincipals: ["*"]
3. A DENY policy blocks `/admin/*`, and a later ALLOW policy appears to permit your service account. Why does the request still fail?

The request still fails because Istio evaluates DENY policies before ALLOW policies. Once the DENY rule matches the request path and source conditions, evaluation short-circuits and the ALLOW rule never gets a chance to permit the request. The right fix is not to broaden the ALLOW rule; it is to narrow the DENY rule or adjust the administrative path policy after confirming the match in access logs. This is why policy debugging always starts with the action evaluation order.

4. You need only the `frontend` service account to call `backend` with GET requests on `/api/*`. What policy shape matches that requirement?

Use an ALLOW policy selected onto the backend workload, with a principals match for the frontend service account and an operation match for the GET method and /api/* path. This policy works because it expresses the trust relationship in workload identity rather than Pod IP address, and it limits the allowed operation to the API path instead of opening the whole service. Any non-matching caller or method is denied because an applicable ALLOW policy exists.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: backend-policy
namespace: default
spec:
selector:
matchLabels:
app: backend
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/frontend"]
to:
- operation:
methods: ["GET"]
paths: ["/api/*"]
5. A service returns `503 no healthy upstream` after a VirtualService change. Which Istio tools separate a route problem from an endpoint problem?

Run istioctl analyze first to catch missing subset or host references, then inspect the client proxy with istioctl proxy-config routes and istioctl proxy-config endpoints. If the route points at a subset that does not exist, analysis should usually identify the missing DestinationRule subset before runtime inspection. If the route exists but endpoints are empty, the problem is more likely Service selectors, readiness, endpoint discovery, or stale EDS. This separation matters because rewriting AuthorizationPolicy will not fix an upstream cluster with no healthy endpoints.

6. `istioctl proxy-status` shows one workload with STALE EDS while other columns are SYNCED. What does that imply?

It implies the proxy has not acknowledged the latest endpoint discovery configuration, even if other configuration types are current. The workload may still have valid listeners and routes while using stale endpoint data for upstream service selection. Check istiod connectivity, proxy health, and whether restarting only the affected Pod causes a fresh sync. Do not assume the Service or DestinationRule is wrong until you know whether the local proxy has the latest endpoint view.

7. You enable ingress SIMPLE TLS and clients report handshake failures. What configuration evidence should you gather before changing routes?

Gather evidence that the Gateway references a real certificate secret with the expected credentialName, that the secret is in the namespace where the gateway deployment expects credentials, and that the Gateway hosts match the client SNI or Host header. istioctl analyze can catch missing credentials or invalid references before you chase VirtualService routes. Route changes will not repair a gateway that cannot complete TLS termination. Only after TLS succeeds should you inspect route matching and upstream service behavior.

Terminal window
# During installation
istioctl install --set meshConfig.accessLogFile=/dev/stdout -y
# Or via IstioOperator
spec:
meshConfig:
accessLogFile: /dev/stdout
8. A VirtualService looks correct in Kubernetes, but traffic still follows the old path. What proxy command proves what Envoy actually knows?

Use istioctl proxy-config routes <pod-name>.<namespace> against the client-side proxy that is making the outbound request. Kubernetes storing the VirtualService only proves the API server accepted the object; the route dump proves whether Envoy has received and compiled the route into its active configuration. If the expected route is absent, return to istioctl analyze, proxy synchronization, gateway references, and host matching. If the route is present, inspect access logs and upstream clusters to find the next layer.

Terminal window
istioctl proxy-config routes <pod-name>.<namespace>
Terminal window
istioctl proxy-config routes <pod-name>.<namespace> -o json

Hands-On Exercise: Security & Troubleshooting

Section titled “Hands-On Exercise: Security & Troubleshooting”

Configure mTLS, implement fine-grained authorization policies, and actively practice troubleshooting common Istio configuration issues using CLI diagnostic tools. The exercise deliberately uses Bookinfo because it gives you multiple services, service accounts, subsets, and HTTP paths without requiring you to build a separate application. Your goal is not only to make the commands succeed; your goal is to explain which layer each command proves and which failure would appear if that layer were wrong.

Run this exercise in a disposable Kubernetes 1.35+ cluster with Istio available on your workstation. The demo profile is intentionally convenient rather than production-grade, and access logging is enabled so you can inspect the sidecar evidence instead of relying only on client symptoms. If your cluster already has Istio installed, adapt the installation step rather than reinstalling over a shared environment.

Terminal window
# Ensure Istio is installed with demo profile
istioctl install --set profile=demo \
--set meshConfig.accessLogFile=/dev/stdout -y
kubectl label namespace default istio-injection=enabled --overwrite
# Deploy Bookinfo
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.25/samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.25/samples/bookinfo/networking/destination-rule-all.yaml
kubectl wait --for=condition=ready pod --all -n default --timeout=120s

Begin by enforcing mesh-wide cryptographic security requirements after the Bookinfo workloads are ready and sidecar injection is enabled. This is the controlled version of the migration scenario from the lesson: every Bookinfo service should have an injected proxy, so STRICT mode should not break ordinary service-to-service calls. If the verification request fails, do not immediately remove the policy; first confirm sidecar presence and proxy synchronization.

Terminal window
# Apply mesh-wide STRICT mTLS
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
EOF
# Verify mTLS is working
istioctl proxy-config clusters productpage-v1-$(kubectl get pods -l app=productpage -o jsonpath='{.items[0].metadata.name}' | cut -d'-' -f3-) | grep reviews

Verify securely: traffic should still work between all services because they all possess injected sidecars and can participate in Istio mTLS. A successful response proves the migration boundary is internally consistent, while a failed response gives you a clean opportunity to practice checking proxy-status, container lists, and cluster configuration before changing policy.

Terminal window
kubectl exec $(kubectl get pod -l app=ratings -o jsonpath='{.items[0].metadata.name}') -c ratings -- curl -s productpage:9080/productpage | head -20
Task 1 solution notes

The expected solution is a successful Bookinfo response after STRICT mTLS is applied. If the request fails, confirm that every Bookinfo Pod has an istio-proxy container and that istioctl proxy-status shows the relevant proxies synchronized before changing the PeerAuthentication policy.

Apply specific access control boundaries to prevent unauthorized peer identity from reaching the reviews workload. The policy below is written as a DENY rule so you can observe how a non-matching service account is rejected while the expected productpage identity remains allowed. In a production review, you would also ask whether the rule name matches the action clearly enough for another engineer to understand it during an incident.

Terminal window
# Deny all traffic to reviews (start restrictive)
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all-reviews
namespace: default
spec:
selector:
matchLabels:
app: reviews
action: DENY
rules:
- from:
- source:
notPrincipals: ["cluster.local/ns/default/sa/bookinfo-productpage"]
EOF

Verify securely: Only the productpage workload can reach reviews endpoints. Unauthorized requests from alternate services should definitively get blocked with a 403 Forbidden response:

Terminal window
# This should work (productpage → reviews)
kubectl exec $(kubectl get pod -l app=productpage -o jsonpath='{.items[0].metadata.name}') \
-c productpage -- curl -s -o /dev/null -w "%{http_code}" http://reviews:9080/reviews/1
# This should fail with 403 (ratings → reviews)
kubectl exec $(kubectl get pod -l app=ratings -o jsonpath='{.items[0].metadata.name}') \
-c ratings -- curl -s -o /dev/null -w "%{http_code}" http://reviews:9080/reviews/1
Task 2 solution notes

The productpage request should return a successful status, while the ratings request should return 403. If both calls succeed, the policy selector, service account principal, or DENY match is not selecting the intended traffic; if both calls fail, check whether the expected productpage principal differs from the service account used by the deployed workload.

Deliberately break a traffic-management configuration and systematically deduce the error before fixing it. The subset typo is a useful practice fault because it is visible to static analysis, route inspection, and runtime behavior, which lets you compare several sources of evidence. Resist the temptation to fix the typo before running the diagnostic commands; the learning value comes from seeing how each tool reports the same underlying mismatch.

Terminal window
# Create a VirtualService with a typo in the subset name
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: reviews-broken
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v99 # This subset doesn't exist!
EOF
# Now diagnose:
# Step 1: Analyze
istioctl analyze -n default
# Expected: IST0113 - Referenced subset not found
# Step 2: Check proxy config
istioctl proxy-config routes $(kubectl get pod -l app=productpage \
-o jsonpath='{.items[0].metadata.name}').default | grep reviews
# Step 3: Fix it
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: reviews-broken
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1 # Fixed!
EOF
# Step 4: Verify
istioctl analyze -n default
Task 3 solution notes

The expected diagnostic evidence is an analyzer warning for the nonexistent subset and a route view that shows Envoy trying to use the broken route once it is distributed. The fix is the smallest possible route correction: change the subset to one that exists in the DestinationRule, then rerun analysis to confirm the static configuration is clean.

Dive into the low-level proxy primitives to visually confirm routing architecture from the client proxy’s point of view. This is the point where Istio troubleshooting becomes concrete: clusters show upstream service abstractions, listeners show what Envoy accepts, routes show HTTP decisions, endpoints show actual Pod IP targets, and access logs show the request that exercised those structures. Capture one example from each view and connect it back to a symptom you could diagnose later.

Terminal window
# Get the productpage pod name
PP_POD=$(kubectl get pod -l app=productpage -o jsonpath='{.items[0].metadata.name}')
# View all clusters (upstream services)
istioctl proxy-config clusters $PP_POD.default
# View listeners
istioctl proxy-config listeners $PP_POD.default
# View routes
istioctl proxy-config routes $PP_POD.default
# View endpoints for reviews service
istioctl proxy-config endpoints $PP_POD.default \
--cluster "outbound|9080||reviews.default.svc.cluster.local"
# Check access logs
kubectl logs $PP_POD -c istio-proxy --tail=10
Task 4 solution notes

The solution is not a single output string; it is a mapping between each command and the proxy layer it proves. Clusters should include upstream service abstractions, listeners should show accepted ports, routes should show HTTP routing decisions, endpoints should show concrete upstream Pods, and logs should show at least one request that exercised the path.

Use the checklist as a self-assessment rather than a completion form. A box is only complete when you can describe the evidence you observed and the layer it proves, because the ICA exam will often give you symptoms and ask for the next diagnostic move rather than asking you to paste a manifest.

  • PeerAuthentication policies enforce mutual TLS mesh-wide and all participating services communicate successfully
  • AuthorizationPolicy cleanly and correctly restricts backend reviews access to the productpage client only
  • You can accurately identify the IST0113 error surfaced from istioctl analyze regarding the broken VirtualService
  • You can properly use proxy-config to inspect and trace clusters, listeners, dynamic routes, and live endpoints
  • Raw access logs clearly show nuanced request details inside the istio-proxy streaming container

Restore your environment to a pristine baseline when you are done, especially if the cluster will be reused for later modules. Cleanup matters in service-mesh labs because security policies and VirtualServices can affect future exercises in ways that look unrelated to the learner who inherits the namespace.

Terminal window
kubectl delete peerauthentication default -n istio-system
kubectl delete authorizationpolicy deny-all-reviews -n default
kubectl delete virtualservice reviews-broken -n default
kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.25/samples/bookinfo/platform/kube/bookinfo.yaml
kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.25/samples/bookinfo/networking/destination-rule-all.yaml
istioctl uninstall --purge -y
kubectl delete namespace istio-system


Continue your journey and transition to Module 4: Istio Observability to learn about tracking Istio metrics, configuring distributed tracing telemetry, parsing access logging pipelines, and constructing brilliant visual dashboards with Kiali and Grafana. Mastering observability constitutes 10% of the ICA exam.