Interview Questions/Troubleshooting Scenarios/Service Mesh mTLS Certificate Rotation Failure
AdvancedScenario
15 min

Service Mesh mTLS Certificate Rotation Failure

Service MeshSecurityNetworking
Advertisement
Interview Question

After a certificate rotation, services in the mesh begin failing with 503s. How do you diagnose and restore traffic?

Key Points to Cover
  • Check control plane health and CA/key rotation events
  • Validate sidecar proxy versions and trust bundles
  • Inspect SNI/identity mismatches and policy enforcement
  • Roll back/rotate certs with proper canaries
  • Add alerts on cert expiry and rotation failures
Evaluation Rubric
Inspects control plane and rotation status35% weight
Validates proxies/trust and identity25% weight
Restores traffic with safe steps20% weight
Prevents recurrence via alerting20% weight
Hints
  • 💡Mismatched trust domains commonly break mTLS.
Common Pitfalls to Avoid
  • ⚠️Focusing solely on application logs and neglecting the service mesh control plane and sidecar logs.
  • ⚠️Assuming all sidecar proxies are updated to the latest compatible version.
  • ⚠️Not validating the complete trust chain, only the root CA.
  • ⚠️Overlooking the impact of newly rotated certificates on existing authorization policies.
  • ⚠️Failing to consider network segmentation or firewall rules that might be implicitly affected by new TLS configurations.
Potential Follow-up Questions
  • How to stage trust bundle rollouts?
  • How do you test rotation in staging?
Advertisement