AdvancedSystem-Design
45 min
Design an ML Feature Store
MLDataConsistencyStreaming
Advertisement
Interview Question
Design an ML feature store that supports offline feature engineering and online low-latency serving with consistency guarantees.
Key Points to Cover
- Feature registry & schema/versioning; lineage and governance
- Offline pipeline (batch) and online pipeline (stream) materialization
- Consistency: point-in-time correctness, training/serving skew reduction
- Storage: offline (data lake) vs online (KV/Redis) with TTL/backfills
- Serving APIs, caching, and multi-tenant quotas/SLA
- Monitoring: feature drift, nulls, and freshness
Evaluation Rubric
Clear registry & versioning strategy25% weight
Correctness and skew mitigation25% weight
Hot/cold storage & materialization25% weight
Serving SLAs and monitoring25% weight
Hints
- 💡Point-in-time joins are essential for correctness.
Common Pitfalls to Avoid
- ⚠️**Data Latency Mismatches:** Insufficient buffering or delayed event processing in the streaming pipeline can lead to stale features being served online, while offline pipelines are up-to-date.
- ⚠️**Inconsistent Transformation Logic:** Using different codebases or versions for feature transformations in offline and online pipelines will inevitably lead to training-serving skew.
- ⚠️**Lack of Point-in-Time Correctness in Offline Data:** Failing to accurately reconstruct historical feature states can lead to models being trained on data that wasn't available at that specific time.
- ⚠️**Schema Drift without Versioning:** Unmanaged changes to feature schemas can break downstream models and pipelines without proper tracking and notification.
- ⚠️**Performance Bottlenecks in Online Serving:** Overly complex transformations or inefficient data retrieval from the online store can result in unacceptable inference latency.
Potential Follow-up Questions
- ❓How do you deprecate a feature safely?
- ❓How do you backfill online features?
Advertisement