AdvancedSystem-Design
45 min
Design an ML Model Serving Platform
MLServingMonitoringScalability
Advertisement
Interview Question
Design a platform to serve ML models at scale with versioning, monitoring, A/B testing, and GPU utilization.
Key Points to Cover
- Model registry/versioning and deployment API
- Serving infra: CPU vs GPU autoscaling, batching
- Monitoring: latency, drift, canary/A-B testing
- Rollbacks, rollouts, and shadow deployments
- Cost optimization: multi-tenancy, spot GPUs
Evaluation Rubric
Clear registry & versioning strategy25% weight
Efficient GPU/CPU serving design25% weight
Strong monitoring and rollback plan25% weight
Cost optimization and tenancy handling25% weight
Hints
- 💡Batch inference improves throughput.
Potential Follow-up Questions
- ❓How do you support online vs offline inference?
- ❓How to ensure fairness in A/B testing?
Advertisement