AdvancedSystem-Design
45 min
Design a URL Shortener (TinyURL)
System DesignDatabasesCachingNetworkingSecurity
Advertisement
Interview Question
Design a globally available URL shortener like TinyURL/Bitly. Cover API design, key generation, storage, redirects, analytics, abuse prevention, and scalability.
Key Points to Cover
- API: create/resolve endpoints, rate limits, auth for premium features
- Key generation: base62 IDs, collision avoidance, snowflake/KSUID, custom aliases
- Storage: KV store (hot) + relational/log for analytics; TTL/archival
- Performance: CDN edge redirects, caching, read-heavy optimization
- Reliability: multi-region replication, eventual consistency trade-offs
- Abuse: spam/phishing detection, domain allow/deny lists, quotas
- Analytics: click tracking, unique visitors, geo/UA aggregation
Evaluation Rubric
Clear API and key/ID strategy25% weight
Efficient storage and caching plan25% weight
Global scale, replication, and latency25% weight
Abuse prevention and analytics25% weight
Hints
- 💡Consider read-path at CDN edge and write-path in regions.
Common Pitfalls to Avoid
- ⚠️**Single Point of Failure in Key Generation/Database:** Relying on a single, non-distributed mechanism for generating unique short codes or storing the core mapping without adequate replication/sharding.
- ⚠️**Synchronous Analytics Processing:** Performing click logging and analytics processing in the critical redirect path, leading to high latency and scalability issues under heavy load.
- ⚠️**Insufficient Abuse Prevention:** Neglecting robust measures against URL blacklisting, content scanning, and rate limiting, opening the service to spam, phishing, and malware, which degrades trust and reputation.
- ⚠️**Inadequate Storage for High-Volume Reads:** Using a traditional relational database for the `shortCode` -> `longUrl` mapping without proper caching or a dedicated high-throughput KV store, leading to performance bottlenecks during high redirect volumes.
- ⚠️**Neglecting Global Distribution & Latency:** Failing to design for multi-region deployment, global load balancing, and data replication, resulting in high latency for users geographically distant from the primary data center.
Potential Follow-up Questions
- ❓How do you handle custom domains?
- ❓How would you prevent hash enumeration?
Advertisement