Interview Questions/Technical Deep Dive/PostgreSQL Replication Lag Troubleshooting
AdvancedTechnical
5 min

PostgreSQL Replication Lag Troubleshooting

Advertisement
Interview Question

Read replicas are falling minutes behind the primary. How do you diagnose replication lag and remediate it safely?

Key Points to Cover
  • Measure lag via pg_stat_replication and WAL positions
  • Check I/O saturation, network bandwidth, and fsync settings
  • Tune wal_level, max_wal_size, checkpoint settings
  • Reduce heavy long-running transactions or VACUUM issues
  • Consider logical replication or scaling read traffic
Evaluation Rubric
Accurately measures and reads lag signals30% weight
Identifies I/O/network/transaction bottlenecks30% weight
Applies WAL/checkpoint tuning steps20% weight
Keeps replicas consistent while remediating20% weight
Hints
  • 💡Watch for vacuum bloat and autovacuum settings.
Common Pitfalls to Avoid
  • ⚠️Focusing solely on the replica without checking the primary's write load.
  • ⚠️Assuming the network is always the bottleneck without proper testing.
  • ⚠️Aggressively disabling `fsync` without understanding the data durability implications.
  • ⚠️Making large, untargeted configuration changes without isolating the impact of each.
  • ⚠️Not considering the possibility of specific query or transaction patterns on the primary causing extreme write loads.
Potential Follow-up Questions
  • When to switch to logical replication?
  • How to failover with minimal data loss?
Advertisement