Advertisement
Interview Question
Your services suddenly cannot resolve domain names, breaking connectivity to dependencies. Walk me through your triage.
Key Points to Cover
- Test resolution with dig/nslookup from pods/hosts
- Check DNS service health (CoreDNS, Route53, etc.)
- Review config changes in resolv.conf or VPC DNS
- Fallback to alternate resolvers if needed
- Enable DNS cache and set alerts for QPS failures
Evaluation Rubric
Validates DNS resolution directly30% weight
Checks DNS service health/config30% weight
Considers temporary resolver fallback20% weight
Proposes long-term monitoring20% weight
Hints
- 💡DNS outages often stem from upstream providers.
Common Pitfalls to Avoid
- ⚠️**Assuming the scope immediately:** Jumping to conclusions about the cause without first determining the scope (e.g., one pod vs. all pods vs. all hosts).
- ⚠️**Neglecting to check local configuration:** Overlooking `resolv.conf` or equivalent settings on the client side and focusing solely on the server.
- ⚠️**Ignoring network connectivity:** Assuming network paths are clear to DNS servers without explicit verification (e.g., firewalls, routing).
- ⚠️**Failing to correlate with recent changes:** Not checking deployment logs or configuration history, which often reveals the trigger.
- ⚠️**Not testing both internal and external resolution:** Focusing only on external domains while missing issues with internal service discovery.
Potential Follow-up Questions
- ❓How to prevent single points of DNS failure?
- ❓What about DNSSEC issues?
Advertisement
Related Questions
Questions that share similar topics with this one
How DNS Resolution Works
Intermediate📞 Phone Screen•2 min•Phone
DNS A Record vs CNAME
Beginner📞 Phone Screen•2 min•Phone
DNS Propagation Basics
Beginner📞 Phone Screen•2 min•Phone
Design an API Gateway / Edge Layer
Advanced🏗️ System Design•45 min•System-Design
Design a Distributed Caching Layer
Intermediate🏗️ System Design•30 min•System-Design