AdvancedSystem-Design
45 min
Design Ad Impression & Click Tracking
StreamingFraudDataPrivacy
Advertisement
Interview Question
Design an ad event pipeline for impressions, clicks, and conversions with deduplication, fraud detection, and near-real-time reporting.
Key Points to Cover
- Event schema, idempotency keys, and late/out-of-order handling
- Streaming ingestion (Kafka) with exactly-once processing
- Attribution windows and join with conversion signals
- Fraud signals: velocity, device/UA, IP reputation, bots
- Privacy compliance and data retention/consent
Evaluation Rubric
Sound event model & dedupe strategy25% weight
Reliable streaming & joins25% weight
Effective fraud detection approach25% weight
Privacy/regulatory considerations25% weight
Hints
- 💡Use outbox pattern to avoid missing events.
Common Pitfalls to Avoid
- ⚠️**Underestimating State Management Complexity:** Managing state for deduplication and fraud detection in a distributed streaming system can be complex, especially with scale and fault tolerance requirements. Poorly designed state management can lead to data inconsistencies or performance bottlenecks.
- ⚠️**Inaccurate Idempotency Keys:** If idempotency keys are not sufficiently unique or robust, duplicates can still slip through, or legitimate events might be mistakenly dropped.
- ⚠️**Real-time Fraud Detection Accuracy vs. Latency Trade-offs:** Highly accurate real-time fraud detection often involves computationally intensive models, which can increase latency. Finding the right balance is critical to avoid impacting reporting speed.
- ⚠️**Ignoring Data Privacy and GDPR Compliance:** Ensuring that Personally Identifiable Information (PII) is handled securely and anonymized appropriately throughout the pipeline is crucial, especially when dealing with event data and user behavior.
- ⚠️**Scalability Bottlenecks in Downstream Systems:** While the streaming ingestion might be highly scalable, if the analytical data stores or reporting layers cannot keep up with the processed data volume, near-real-time reporting will be compromised.
Potential Follow-up Questions
- ❓How to evaluate probabilistic attribution?
- ❓How do you combat click injection?
Advertisement