Design a multi-tenant API gateway that handles routing, auth, rate limiting, request/response transformations, canarying, and observability across regions.

45 min•System-Design

View Question→

🏗️ System Design

Design a Distributed Caching Layer

Intermediate

Caching Consistency Networking+1

Design a distributed cache that supports eviction policies, consistency across nodes, replication, and client-side failover.

30 min•System-Design

View Question→

🏗️ System Design

Design a Push Subscription & Pub/Sub System

Intermediate

Messaging Scalability Reliability

Design a global publish/subscribe system with millions of subscribers, durable delivery, and filtering.

30 min•System-Design

View Question→

🔧 Troubleshooting Scenarios

DNS Resolution Failure

Intermediate

DNS Networking Reliability

Your services suddenly cannot resolve domain names, breaking connectivity to dependencies. Walk me through your triage.

10 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Service in CrashLoopBackOff

Intermediate

Kubernetes Containers Reliability

A Kubernetes service keeps restarting with CrashLoopBackOff. How do you debug and resolve this?

10 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Network Partition in Distributed System

Advanced

Networking Distributed Systems Reliability

Half your nodes cannot communicate with the other half due to a suspected network partition. How do you investigate and respond?

15 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Cloud Provider Outage

Advanced

Cloud Reliability Incident Response

Your cloud provider reports an ongoing outage in one region, affecting your services. How do you respond?

15 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Circuit Breaker Tripped

Intermediate

Resilience Reliability Microservices

A critical dependency is failing and your service’s circuit breaker has opened. How do you handle this situation?

10 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

EMFILE: Too Many Open Files

Intermediate

Linux Limits Reliability

A service starts failing with EMFILE errors. Describe how you identify the cause and fix it.

10 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Cache Stampede / Thundering Herd

Intermediate

Caching Performance Reliability

A cache eviction triggers a surge of requests to the origin, causing overload. How do you diagnose and prevent cache stampede?

10 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Container Image Pulls Throttled by Registry

Intermediate

Kubernetes Containers Supply Chain+1

New pods are failing with ImagePullBackOff and registry logs show rate limiting/throttling. How do you restore service quickly and prevent recurrence?

10 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Kernel Panic and Node Reboot Loop

Advanced

Linux Kernel Reliability

A production node repeatedly reboots due to kernel panics under load. Outline your triage and containment steps.

15 min•Scenario

View Question→

🔧 Troubleshooting Scenarios

Health Check Misconfiguration Causing Flapping

Beginner