Design an ML Feature Store

Interview Question

Design an ML feature store that supports offline feature engineering and online low-latency serving with consistency guarantees.

Key Points to Cover

Evaluation Rubric

Clear registry & versioning strategy25% weight

Correctness and skew mitigation25% weight

Hot/cold storage & materialization25% weight

Serving SLAs and monitoring25% weight

Hints

Common Pitfalls to Avoid

⚠️**Data Latency Mismatches:** Insufficient buffering or delayed event processing in the streaming pipeline can lead to stale features being served online, while offline pipelines are up-to-date.
⚠️**Inconsistent Transformation Logic:** Using different codebases or versions for feature transformations in offline and online pipelines will inevitably lead to training-serving skew.
⚠️**Lack of Point-in-Time Correctness in Offline Data:** Failing to accurately reconstruct historical feature states can lead to models being trained on data that wasn't available at that specific time.
⚠️**Schema Drift without Versioning:** Unmanaged changes to feature schemas can break downstream models and pipelines without proper tracking and notification.
⚠️**Performance Bottlenecks in Online Serving:** Overly complex transformations or inefficient data retrieval from the online store can result in unacceptable inference latency.

Potential Follow-up Questions