Interview Questions/System Design/Design Ad Impression & Click Tracking
AdvancedSystem-Design
45 min

Design Ad Impression & Click Tracking

StreamingFraudDataPrivacy
Advertisement
Interview Question

Design an ad event pipeline for impressions, clicks, and conversions with deduplication, fraud detection, and near-real-time reporting.

Key Points to Cover
  • Event schema, idempotency keys, and late/out-of-order handling
  • Streaming ingestion (Kafka) with exactly-once processing
  • Attribution windows and join with conversion signals
  • Fraud signals: velocity, device/UA, IP reputation, bots
  • Privacy compliance and data retention/consent
Evaluation Rubric
Sound event model & dedupe strategy25% weight
Reliable streaming & joins25% weight
Effective fraud detection approach25% weight
Privacy/regulatory considerations25% weight
Hints
  • 💡Use outbox pattern to avoid missing events.
Common Pitfalls to Avoid
  • ⚠️**Underestimating State Management Complexity:** Managing state for deduplication and fraud detection in a distributed streaming system can be complex, especially with scale and fault tolerance requirements. Poorly designed state management can lead to data inconsistencies or performance bottlenecks.
  • ⚠️**Inaccurate Idempotency Keys:** If idempotency keys are not sufficiently unique or robust, duplicates can still slip through, or legitimate events might be mistakenly dropped.
  • ⚠️**Real-time Fraud Detection Accuracy vs. Latency Trade-offs:** Highly accurate real-time fraud detection often involves computationally intensive models, which can increase latency. Finding the right balance is critical to avoid impacting reporting speed.
  • ⚠️**Ignoring Data Privacy and GDPR Compliance:** Ensuring that Personally Identifiable Information (PII) is handled securely and anonymized appropriately throughout the pipeline is crucial, especially when dealing with event data and user behavior.
  • ⚠️**Scalability Bottlenecks in Downstream Systems:** While the streaming ingestion might be highly scalable, if the analytical data stores or reporting layers cannot keep up with the processed data volume, near-real-time reporting will be compromised.
Potential Follow-up Questions
  • How to evaluate probabilistic attribution?
  • How do you combat click injection?
Advertisement