AdvancedScenario
15 min
Clock Skew Breaking Distributed DB Writes
DatabasesDistributed SystemsTime Sync
Advertisement
Interview Question
A distributed database starts rejecting writes or showing anomalies due to detected clock skew on some nodes. How do you diagnose and stabilize?
Key Points to Cover
- Verify NTP/PTP status and per-node clock offsets
- Correlate DB logs for max_clock_skew or lease errors
- Remove or isolate skewed nodes; rebalance replicas
- Harden time sync: multiple NTP sources, monitoring, alerts
- Run consistency checks and re-enable traffic gradually
Evaluation Rubric
Quantifies skew and affected nodes30% weight
Isolates/remediates skewed replicas30% weight
Improves time sync resilience20% weight
Performs consistency checks before return20% weight
Hints
- 💡Even small skews can break leases/transactions.
Common Pitfalls to Avoid
- ⚠️Assuming only one time source is the issue and not checking others.
- ⚠️Directly restarting database services without first addressing the underlying clock skew.
- ⚠️Neglecting to check database-specific error logs for exact failure reasons related to clock synchronization.
- ⚠️Failing to account for the potential for data inconsistencies introduced during the period of clock skew.
- ⚠️Overlooking the impact of network latency or firewall rules on NTP/PTP synchronization effectiveness.
Potential Follow-up Questions
- ❓How do you monitor skew continuously?
- ❓When to use PTP over NTP?
Advertisement