Technical Deep Dive
In-depth technical questions for experienced candidates
A Kubernetes pod is stuck in a restart loop. Walk me through your systematic approach to diagnose and fix this issue.
Your production Kubernetes cluster shows unusually high CPU usage in multiple pods. Walk me through your investigation and mitigation steps.
Explain your approach for designing secure IAM policies following least-privilege principles. How would you audit and enforce them in production?
Your application spans multiple microservices with separate databases. How would you ensure data consistency while maintaining scalability?
Explain how you would secure a CI/CD pipeline to protect against supply chain attacks and credential leaks.
Describe how you would design a Kubernetes architecture for multi-region high availability and low latency.
In a high-traffic microservices system using a distributed cache, how do you handle cache invalidation without breaking consistency?
Describe your approach for performing a zero-downtime database migration in production.
Your workloads face intermittent connectivity failures across regions. Walk through your diagnostic and remediation approach.
How would you design and implement a cost-optimization strategy for a large-scale multi-cloud setup?
How would you secure container runtimes (e.g., Docker, containerd) in production environments?
Explain how you would design and implement database sharding for a large-scale application.
Discuss the advantages and disadvantages of adopting a service mesh (e.g., Istio, Linkerd) in production.
How would you detect, troubleshoot, and mitigate deadlocks in a relational database system?
What are the key design considerations for building resilient, event-driven systems at scale?
Explain different approaches to service discovery in microservices and their trade-offs.
How do you design and optimize database indexes for query performance without over-indexing?
How do you scale a message queue system like Kafka or RabbitMQ to handle millions of messages per second?
How would you design a disaster recovery (DR) strategy for a critical cloud-hosted application?
How would you design a CI pipeline to minimize build/test time through parallelization?
Explain the roles of metrics, logs, and traces in observability, and how they complement each other.
How would you design a secure and scalable multi-tenant SaaS application?
What considerations would you make when scaling an API gateway for millions of requests per second?
How would you securely store and access application secrets in a cloud-native environment?
Compare data lakes and data warehouses in terms of architecture, use cases, and trade-offs.
What is chaos engineering, and how would you implement it safely in production?
What are the main challenges of hybrid cloud networking, and how would you address them?
What are the key use cases for edge computing, and what architectural considerations apply?
Compare GraphQL and REST APIs in terms of flexibility, performance, and trade-offs.
Your Java services show p99 latency spikes during peak traffic. How would you analyze and tune JVM garbage collection to reduce pause times?
Read replicas are falling minutes behind the primary. How do you diagnose replication lag and remediate it safely?
Compare blue-green and canary deployments. How would you integrate feature flags to reduce risk during production rollouts?
Design a secure multi-tenant Kubernetes setup. How do you isolate workloads and enforce policy across namespaces?
Explain how you would design application-layer encryption using a cloud KMS and envelope encryption for sensitive data.
Contrast WebSockets and gRPC streaming for real-time communication. How do you scale and secure each?
You need reliable event publication coupled with database writes. Describe how you’d implement the outbox pattern and ensure idempotency end to end.
In a streaming system under bursty load, how do you implement backpressure to prevent overload and cascading failures?
How do you design outbound (egress) controls for workloads in private subnets without public IPs while maintaining least privilege?
Describe how you would leverage eBPF for deep observability and runtime security in production Linux systems.
A critical API has intermittent p99 latency spikes without increased error rates. How would you isolate the cause and stabilize tail latency?
Explain how you would design and implement distributed tracing in a microservices environment. How do you ensure minimal performance overhead?
What measures would you take to secure the software supply chain from dependency attacks or compromised packages?
How would you design a scalable ETL pipeline for processing terabytes of data daily with low latency?
Describe how you would implement rate limiting in a large-scale API to protect against abuse while ensuring fairness.
You are tasked with breaking a large monolith into microservices. Walk through your migration strategy.
What does idempotency mean in APIs, and how would you design idempotent operations in REST or gRPC services?
Explain the trade-offs of building applications on a serverless architecture. When is it a good fit, and when is it not?
How would you design data partitioning for a system that must handle billions of records with fast queries?
Explain different caching strategies and their trade-offs for high-performance applications.