AdvancedScenario
15 min
Service Mesh mTLS Certificate Rotation Failure
Service MeshSecurityNetworking
Advertisement
Interview Question
After a certificate rotation, services in the mesh begin failing with 503s. How do you diagnose and restore traffic?
Key Points to Cover
- Check control plane health and CA/key rotation events
- Validate sidecar proxy versions and trust bundles
- Inspect SNI/identity mismatches and policy enforcement
- Roll back/rotate certs with proper canaries
- Add alerts on cert expiry and rotation failures
Evaluation Rubric
Inspects control plane and rotation status35% weight
Validates proxies/trust and identity25% weight
Restores traffic with safe steps20% weight
Prevents recurrence via alerting20% weight
Hints
- 💡Mismatched trust domains commonly break mTLS.
Common Pitfalls to Avoid
- ⚠️Focusing solely on application logs and neglecting the service mesh control plane and sidecar logs.
- ⚠️Assuming all sidecar proxies are updated to the latest compatible version.
- ⚠️Not validating the complete trust chain, only the root CA.
- ⚠️Overlooking the impact of newly rotated certificates on existing authorization policies.
- ⚠️Failing to consider network segmentation or firewall rules that might be implicitly affected by new TLS configurations.
Potential Follow-up Questions
- ❓How to stage trust bundle rollouts?
- ❓How do you test rotation in staging?
Advertisement