Interview Questions/System Design/Design an ML Model Serving Platform
AdvancedSystem-Design
45 min

Design an ML Model Serving Platform

MLServingMonitoringScalability
Advertisement
Interview Question

Design a platform to serve ML models at scale with versioning, monitoring, A/B testing, and GPU utilization.

Key Points to Cover
  • Model registry/versioning and deployment API
  • Serving infra: CPU vs GPU autoscaling, batching
  • Monitoring: latency, drift, canary/A-B testing
  • Rollbacks, rollouts, and shadow deployments
  • Cost optimization: multi-tenancy, spot GPUs
Evaluation Rubric
Clear registry & versioning strategy25% weight
Efficient GPU/CPU serving design25% weight
Strong monitoring and rollback plan25% weight
Cost optimization and tenancy handling25% weight
Hints
  • 💡Batch inference improves throughput.
Potential Follow-up Questions
  • How do you support online vs offline inference?
  • How to ensure fairness in A/B testing?
Advertisement