Reliability

All interview questions related to Reliability

26 Questions
5 Categories
1 Beginner14 Intermediate11 Advanced
Advertisement
πŸ“ž Phone Screen
K8s Readiness vs Liveness Probes
Intermediate

What is the difference between readiness and liveness probes in Kubernetes?

2 minβ€’Phone
View Question→
πŸ”¬ Technical Deep Dive
PostgreSQL Replication Lag Troubleshooting
Advanced

Read replicas are falling minutes behind the primary. How do you diagnose replication lag and remediate it safely?

5 minβ€’Technical
View Question→
πŸ”¬ Technical Deep Dive
Exactly-Once Effects with the Outbox Pattern
Advanced

You need reliable event publication coupled with database writes. Describe how you’d implement the outbox pattern and ensure idempotency end to end.

5 minβ€’Technical
View Question→
πŸ”¬ Technical Deep Dive
Designing Backpressure in Reactive Systems
Advanced

In a streaming system under bursty load, how do you implement backpressure to prevent overload and cascading failures?

5 minβ€’Technical
View Question→
πŸ”¬ Technical Deep Dive
Designing Idempotent APIs
Advanced

What does idempotency mean in APIs, and how would you design idempotent operations in REST or gRPC services?

5 minβ€’Technical
View Question→
πŸ—οΈ System Design
Design a Multi-Channel Notification Service
Intermediate

Design a service to send notifications via email, SMS, and push at scale with retries, templates, and user preferences.

30 minβ€’System-Design
View Question→
πŸ—οΈ System Design
Design an E-commerce Checkout & Cart
Advanced

Design a highly available checkout/cart system handling flash sales, inventory reservations, payments, and order confirmation.

45 minβ€’System-Design
View Question→
πŸ—οΈ System Design
Design a Payment Processing Gateway
Advanced

Design a payment gateway supporting multiple processors, 3-D Secure, refunds, settlements, and PCI concerns.

45 minβ€’System-Design
View Question→
πŸ—οΈ System Design
Design a Distributed Job Scheduler
Intermediate

Design a reliable, horizontally scalable scheduler (distributed cron) that supports one-off and recurring jobs with retries and idempotency.

30 minβ€’System-Design
View Question→
πŸ—οΈ System Design
Design an API Gateway / Edge Layer
Advanced

Design a multi-tenant API gateway that handles routing, auth, rate limiting, request/response transformations, canarying, and observability across regions.

45 minβ€’System-Design
View Question→
πŸ—οΈ System Design
Design a Distributed Caching Layer
Intermediate

Design a distributed cache that supports eviction policies, consistency across nodes, replication, and client-side failover.

30 minβ€’System-Design
View Question→
πŸ—οΈ System Design
Design a Push Subscription & Pub/Sub System
Intermediate

Design a global publish/subscribe system with millions of subscribers, durable delivery, and filtering.

30 minβ€’System-Design
View Question→
πŸ”§ Troubleshooting Scenarios
DNS Resolution Failure
Intermediate

Your services suddenly cannot resolve domain names, breaking connectivity to dependencies. Walk me through your triage.

10 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Service in CrashLoopBackOff
Intermediate

A Kubernetes service keeps restarting with CrashLoopBackOff. How do you debug and resolve this?

10 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Network Partition in Distributed System
Advanced

Half your nodes cannot communicate with the other half due to a suspected network partition. How do you investigate and respond?

15 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Cloud Provider Outage
Advanced

Your cloud provider reports an ongoing outage in one region, affecting your services. How do you respond?

15 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Circuit Breaker Tripped
Intermediate

A critical dependency is failing and your service’s circuit breaker has opened. How do you handle this situation?

10 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
EMFILE: Too Many Open Files
Intermediate

A service starts failing with EMFILE errors. Describe how you identify the cause and fix it.

10 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Cache Stampede / Thundering Herd
Intermediate

A cache eviction triggers a surge of requests to the origin, causing overload. How do you diagnose and prevent cache stampede?

10 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Container Image Pulls Throttled by Registry
Intermediate

New pods are failing with ImagePullBackOff and registry logs show rate limiting/throttling. How do you restore service quickly and prevent recurrence?

10 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Kernel Panic and Node Reboot Loop
Advanced

A production node repeatedly reboots due to kernel panics under load. Outline your triage and containment steps.

15 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Health Check Misconfiguration Causing Flapping
Beginner

Instances are flapping in and out of load balancers due to aggressive health checks. How do you detect and fix this without masking real failures?

5 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Cloud API Throttling (429) Causing Failures
Intermediate

Background jobs calling a cloud provider API start failing with 429 Too Many Requests during peak hours. How do you stabilize and prevent it?

10 minβ€’Scenario
View Question→
πŸ”§ Troubleshooting Scenarios
Flaky Integration Tests Blocking Releases
Intermediate

CI pipelines are blocked by flaky integration tests. How do you triage and stabilize pipelines?

10 minβ€’Scenario
View Question→
🀝 Behavioral & Leadership
Responding to a Data-Loss Near Miss
Advanced

Tell me about a time you discovered a near-miss that could have caused data loss. How did you handle it and what changed?

6 minβ€’Behavioral
View Question→
🀝 Behavioral & Leadership
Root Cause Analysis of a Failure
Intermediate

Tell me about a time when you performed a root cause analysis after a failure. How did you conduct it, and what changes did you implement afterward?

5 minβ€’Behavioral
View Question→
Advertisement