IntermediateScenario
10 min

DNS Resolution Failure

DNSNetworkingReliability
Advertisement
Interview Question

Your services suddenly cannot resolve domain names, breaking connectivity to dependencies. Walk me through your triage.

Key Points to Cover
  • Test resolution with dig/nslookup from pods/hosts
  • Check DNS service health (CoreDNS, Route53, etc.)
  • Review config changes in resolv.conf or VPC DNS
  • Fallback to alternate resolvers if needed
  • Enable DNS cache and set alerts for QPS failures
Evaluation Rubric
Validates DNS resolution directly30% weight
Checks DNS service health/config30% weight
Considers temporary resolver fallback20% weight
Proposes long-term monitoring20% weight
Hints
  • 💡DNS outages often stem from upstream providers.
Common Pitfalls to Avoid
  • ⚠️**Assuming the scope immediately:** Jumping to conclusions about the cause without first determining the scope (e.g., one pod vs. all pods vs. all hosts).
  • ⚠️**Neglecting to check local configuration:** Overlooking `resolv.conf` or equivalent settings on the client side and focusing solely on the server.
  • ⚠️**Ignoring network connectivity:** Assuming network paths are clear to DNS servers without explicit verification (e.g., firewalls, routing).
  • ⚠️**Failing to correlate with recent changes:** Not checking deployment logs or configuration history, which often reveals the trigger.
  • ⚠️**Not testing both internal and external resolution:** Focusing only on external domains while missing issues with internal service discovery.
Potential Follow-up Questions
  • How to prevent single points of DNS failure?
  • What about DNSSEC issues?
Advertisement