]> git.ipfire.org Git - thirdparty/postgresql.git/commit
Fix unconditional WAL receiver shutdown during stream-archive transition
authorMichael Paquier <michael@paquier.xyz>
Tue, 4 Nov 2025 01:52:44 +0000 (10:52 +0900)
committerMichael Paquier <michael@paquier.xyz>
Tue, 4 Nov 2025 01:52:44 +0000 (10:52 +0900)
commit25b484080f14bf2a3ab3dcaa5b9304cfbc921f97
tree3d1f6919fdff4d41855b1aa2fad2bae07236f47e
parent438ccd33a37a9c3bf77fbe5424c1c0a1acb769e4
Fix unconditional WAL receiver shutdown during stream-archive transition

Commit b4f584f9d2a1 (affecting v15~, later backpatched down to 13 as of
3635a0a35aaf) introduced an unconditional WAL receiver shutdown when
switching from streaming to archive WAL sources.  This causes problems
during a timeline switch, when a WAL receiver enters WALRCV_WAITING
state but remains alive, waiting for instructions.

The unconditional shutdown can break some monitoring scenarios as the
WAL receiver gets repeatedly terminated and re-spawned, causing
pg_stat_wal_receiver.status to show a "streaming" instead of "waiting"
status, masking the fact that the WAL receiver is waiting for a new TLI
and a new LSN to be able to continue streaming.

This commit changes the WAL receiver behavior so as the shutdown becomes
conditional, with InstallXLogFileSegmentActive being always reset to
prevent the regression fixed by b4f584f9d2a1: only terminate the WAL
receiver when it is actively streaming (WALRCV_STREAMING,
WALRCV_STARTING, or WALRCV_RESTARTING).  When in WALRCV_WAITING state,
just reset InstallXLogFileSegmentActive flag to allow archive
restoration without killing the process.  WALRCV_STOPPED and
WALRCV_STOPPING are not reachable states in this code path.  For the
latter, the startup process is the one in charge of setting
WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to
reach a WALRCV_STOPPED state after switching walRcvState, so
WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is
in a WALRCV_STOPPING state.

A regression test is added to check that a WAL receiver is not stopped
on timeline jump, that fails when the fix of this commit is reverted.

Reported-by: Ryan Bird <ryanzxg@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Backpatch-through: 13
src/backend/access/transam/xlog.c
src/test/recovery/t/004_timeline_switch.pl