IntermediateScenario
10 min
DNS Resolution Failure
DNSNetworkingReliability
Advertisement
Interview Question
Your services suddenly cannot resolve domain names, breaking connectivity to dependencies. Walk me through your triage.
Key Points to Cover
- Test resolution with dig/nslookup from pods/hosts
- Check DNS service health (CoreDNS, Route53, etc.)
- Review config changes in resolv.conf or VPC DNS
- Fallback to alternate resolvers if needed
- Enable DNS cache and set alerts for QPS failures
Evaluation Rubric
Validates DNS resolution directly30% weight
Checks DNS service health/config30% weight
Considers temporary resolver fallback20% weight
Proposes long-term monitoring20% weight
Hints
- 💡DNS outages often stem from upstream providers.
Common Pitfalls to Avoid
- ⚠️**Assuming the scope immediately:** Jumping to conclusions about the cause without first determining the scope (e.g., one pod vs. all pods vs. all hosts).
- ⚠️**Neglecting to check local configuration:** Overlooking `resolv.conf` or equivalent settings on the client side and focusing solely on the server.
- ⚠️**Ignoring network connectivity:** Assuming network paths are clear to DNS servers without explicit verification (e.g., firewalls, routing).
- ⚠️**Failing to correlate with recent changes:** Not checking deployment logs or configuration history, which often reveals the trigger.
- ⚠️**Not testing both internal and external resolution:** Focusing only on external domains while missing issues with internal service discovery.
Potential Follow-up Questions
- ❓How to prevent single points of DNS failure?
- ❓What about DNSSEC issues?
Advertisement