I’m running a wal-consumer pod in a PostgreSQL environment that streams WAL logs from a replication slot. However, I noticed that the order in which I set up my services affects whether the consumer works correctly or lags.
Two Different Setup Orders, Two Different Outcomes:
Setup A (Causes Lag)
- Create the replication slot
- Run database migrations (creating/modifying tables)
- Create the publication
Issue: The replication slot starts lagging, and the wal-consumer does not consume WAL logs properly—even though no inserts, updates, or deletes are happening. The last confirmed flushed LSN stays where it was initiated.
Setup B (Works as Expected)
- Run database migrations.
- Create the publication
- Create the replication slot
Outcome: The wal-consumer streams WAL logs as expected with no lag.
Why Does This Happen?
The replication slot appears to accumulate WAL logs in Setup A, even when no data is changing. But when the publication is created first (as in Setup B), everything works smoothly.
Why does the order of creating the replication slot matter?
Does PostgreSQL handle WAL retention differently depending on whether a publication exists at the time the slot is created? Would love to understand what’s happening under the hood!