Tomas Vondra [Mon, 26 Jan 2026 17:54:12 +0000 (18:54 +0100)]
Exercise parallel GIN builds in regression tests
Modify two places creating GIN indexes in regression tests, so that the
build is parallel. This provides a basic test coverage, even if the
amounts of data are fairly small.
Tomas Vondra [Mon, 26 Jan 2026 17:52:16 +0000 (18:52 +0100)]
Lookup the correct ordering for parallel GIN builds
When building a tuplesort during parallel GIN builds, the function
incorrectly looked up the default B-Tree operator, not the function
associated with the GIN opclass (through GIN_COMPARE_PROC).
Fixed by using the same logic as initGinState(), and the other place
in parallel GIN builds.
This could cause two types of issues. First, a data type might not have
a B-Tree opclass, in which case the PrepareSortSupportFromOrderingOp()
fails with an ERROR. Second, a data type might have both B-Tree and GIN
opclasses, defining order/equality in different ways. This could lead to
logical corruption in the index.
Backpatch to 18, where parallel GIN builds were introduced.
Discussion: https://postgr.es/m/73a28b94-43d5-4f77-b26e-0d642f6de777@iki.fi Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 18
Robert Haas [Mon, 26 Jan 2026 17:43:52 +0000 (12:43 -0500)]
Reduce length of TAP test file name.
Buildfarm member fairywren hit the Windows limitation on the length of a
file path. While there may be other things we should also do to prevent
this from happening, it's certainly the case that the length of this
test file name is much longer than others in the same directory, so make
it shorter.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/274e0a1a-d7d2-4bc8-8b56-dd09f285715e@gmail.com
Backpatch-through: 17
David Rowley [Mon, 26 Jan 2026 10:46:23 +0000 (23:46 +1300)]
Fix possible issue of a WindowFunc being in the wrong WindowClause
ed1a88dda made it so WindowClauses can be merged when all window
functions belonging to the WindowClause can equally well use some
other WindowClause without any behavioral changes. When that
optimization applies, the WindowFunc's "winref" gets adjusted to
reference the new WindowClause.
That commit does not work well with the deduplication logic in
find_window_functions(), which only added the WindowFunc to the list
when there wasn't already an identical WindowFunc in the list. That
deduplication logic meant that the duplicate WindowFunc wouldn't get the
"winref" changed when optimize_window_clauses() was able to swap the
WindowFunc to another WindowClause. This could lead to the following
error in the unlikely event that the deduplication code did something and
the duplicate WindowFunc happened to be moved into another WindowClause.
ERROR: WindowFunc with winref 2 assigned to WindowAgg with winref 1
As it turns out, the deduplication logic in find_window_functions() is
pretty bogus. It might have done something when added, as that code
predates b8d7f053c, which changed how projections work. As it turns
out, at least now we *will* evaluate the duplicate WindowFuncs. All
that the deduplication code seems to do today is assist in
underestimating the WindowAggPath costs due to not counting the
evaluation costs of duplicate WindowFuncs.
Ideally the fix would be to remove the deduplication code, but that
could result in changes to the plan costs, as duplicate WindowFuncs
would then be costed. Instead, let's play it safe and shift the
deduplication code so it runs after the other processing in
optimize_window_clauses().
Backpatch only as far as v16 as there doesn't seem to be any other harm
done by the WindowFunc deduplication code before then. This issue was
fixed in master by 7027dd499.
Dean Rasheed [Sat, 24 Jan 2026 11:30:48 +0000 (11:30 +0000)]
Fix trigger transition table capture for MERGE in CTE queries.
When executing a data-modifying CTE query containing MERGE and some
other DML operation on a table with statement-level AFTER triggers,
the transition tables passed to the triggers would fail to include the
rows affected by the MERGE.
The reason is that, when initializing a ModifyTable node for MERGE,
MakeTransitionCaptureState() would create a TransitionCaptureState
structure with a single "tcs_private" field pointing to an
AfterTriggersTableData structure with cmdType == CMD_MERGE. Tuples
captured there would then not be included in the sets of tuples
captured when executing INSERT/UPDATE/DELETE ModifyTable nodes in the
same query.
Since there are no MERGE triggers, we should only create
AfterTriggersTableData structures for INSERT/UPDATE/DELETE. Individual
MERGE actions should then use those, thereby sharing the same capture
tuplestores as any other DML commands executed in the same query.
This requires changing the TransitionCaptureState structure, replacing
"tcs_private" with 3 separate pointers to AfterTriggersTableData
structures, one for each of INSERT, UPDATE, and DELETE. Nominally,
this is an ABI break to a public structure in commands/trigger.h.
However, since this is a private field pointing to an opaque data
structure, the only way to create a valid TransitionCaptureState is by
calling MakeTransitionCaptureState(), and no extensions appear to be
doing that anyway, so it seems safe for back-patching.
Backpatch to v15, where MERGE was introduced.
Bug: #19380 Reported-by: Daniel Woelfel <dwwoelfel@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19380-4e293be2b4007248%40postgresql.org
Backpatch-through: 15
Amit Langote [Fri, 23 Jan 2026 01:20:51 +0000 (10:20 +0900)]
Fix bogus ctid requirement for dummy-root partitioned targets
ExecInitModifyTable() unconditionally required a ctid junk column even
when the target was a partitioned table. This led to spurious "could
not find junk ctid column" errors when all children were excluded and
only the dummy root result relation remained.
A partitioned table only appears in the result relations list when all
leaf partitions have been pruned, leaving the dummy root as the sole
entry. Assert this invariant (nrels == 1) and skip the ctid requirement.
Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for
partitioned tables, which is safe since no rows will be processed in
this case.
Bug: #19099 Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org
Backpatch-through: 14
Tom Lane [Thu, 22 Jan 2026 23:35:31 +0000 (18:35 -0500)]
Remove faulty Assert in partitioned INSERT...ON CONFLICT DO UPDATE.
Commit f16241bef mistakenly supposed that INSERT...ON CONFLICT DO
UPDATE rejects partitioned target tables. (This may have been
accurate when the patch was written, but it was already obsolete
when committed.) Hence, there's an assertion that we can't see
ItemPointerIndicatesMovedPartitions() in that path, but the assertion
is triggerable.
Some other places throw error if they see a moved-across-partitions
tuple, but there seems no need for that here, because if we just retry
then we get the same behavior as in the update-within-partition case,
as demonstrated by the new isolation test. So fix by deleting the
faulty Assert. (The fact that this is the fix doubtless explains
why we've heard no field complaints: the behavior of a non-assert
build is fine.)
The TM_Deleted case contains a cargo-culted copy of the same Assert,
which I also deleted to avoid confusion, although I believe that one
is actually not triggerable.
Per our code coverage report, neither the TM_Updated nor the
TM_Deleted case were reached at all by existing tests, so this
patch adds tests for both.
Reported-by: Dmitry Koval <d.koval@postgrespro.ru>
Author: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/f5fffe4b-11b2-4557-a864-3587ff9b4c36@postgrespro.ru
Backpatch-through: 14
Thomas Munro [Thu, 22 Jan 2026 02:43:13 +0000 (15:43 +1300)]
jit: Add missing inline pass for LLVM >= 17.
With LLVM >= 17, transform passes are provided as a string to
LLVMRunPasses. Only two strings were used: "default<O3>" and
"default<O0>,mem2reg".
With previous LLVM versions, an additional inline pass was added when
JIT inlining was enabled without optimization. With LLVM >= 17, the code
would go through llvm_inline, prepare the functions for inlining, but
the generated bitcode would be the same due to the missing inline pass.
This patch restores the previous behavior by adding an inline pass when
inlining is enabled but no optimization is done.
This fixes an oversight introduced by 76200e5e when support for LLVM 17
was added.
Backpatch-through: 14
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Pierre Ducroquet <p.psql@pinaraf.info> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqrNjJnbn15ctPv7o4yEAT9fWa-dK15RSyun6QNw9YDtKg%40mail.gmail.com
Álvaro Herrera [Wed, 21 Jan 2026 17:55:43 +0000 (18:55 +0100)]
amcheck: Fix snapshot usage in bt_index_parent_check
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it.
Backpatch of 6bd469d26aca to branches 14-16. I previously misidentified
the bug's origin: it came in with commit 7f563c09f890 (pg11-era, not 5ae2087202af as claimed previously), so all live branches are affected.
Also take the opportunity to fix some comments that we failed to update
in the original commits and apply pgperltidy. In branch 14, remove the
unnecessary test plan specification (which would have need to have been
changed anyway; c.f. commit 549ec201d613.)
Diagnosed-by: Donghang Lin <donghanglin@gmail.com>
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
Bruce Momjian [Tue, 20 Jan 2026 03:59:10 +0000 (22:59 -0500)]
doc: revert "xreflabel" used for PL/Python & libpq chapters
This reverts d8aa21b74ff, which was added for the PG 18 release notes,
and adjusts the PG 18 release notes for this change. This is necessary
since the "xreflabel" affected other references to these chapters.
Michael Paquier [Mon, 19 Jan 2026 23:11:16 +0000 (08:11 +0900)]
pg_stat_statements: Fix crash in list squashing with Vars
When IN/ANY clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.
This transformation was creating an overlap with the token locations,
causing pg_stat_statements query normalization to crash because it
could not calculate the amount of bytes remaining to write for the
normalized query.
This commit disables squashing for mixed IN list expressions when
constructing a scalar array op, by setting list_start and list_end
to -1 when both variables and non-variables are present. Some
regression tests are added to PGSS to verify these patterns.
Robert Haas [Mon, 19 Jan 2026 17:02:08 +0000 (12:02 -0500)]
Don't set the truncation block length greater than RELSEG_SIZE.
When faced with a relation containing more than 1 physical segment
(i.e. >1GB, with normal settings), the previous code could compute a
truncation block length greater than RELSEG_SIZE, which could lead to
restore failures of this form:
file "%s" has truncation block length %u in excess of segment size %u
The fix is simply to clamp the maximum computed truncation_block_length
to RELSEG_SiZE. I have also added some comments to clarify the logic.
The test case was written by Oleg Tkachenko, but I have rewritten its
comments.
Richard Guo [Mon, 19 Jan 2026 02:13:23 +0000 (11:13 +0900)]
Fix unsafe pushdown of quals referencing grouping Vars
When checking a subquery's output expressions to see if it's safe to
push down an upper-level qual, check_output_expressions() previously
treated grouping Vars as opaque Vars. This implicitly assumed they
were stable and scalar.
However, a grouping Var's underlying expression corresponds to the
grouping clause, which may be volatile or set-returning. If an
upper-level qual references such an output column, pushing it down
into the subquery is unsafe. This can cause strange results due to
multiple evaluation of a volatile function, or introduce SRFs into
the subquery's WHERE/HAVING quals.
This patch teaches check_output_expressions() to look through grouping
Vars to their underlying expressions. This ensures that any
volatility or set-returning properties in the grouping clause are
detected, preventing the unsafe pushdown.
We do not need to recursively examine the Vars contained in these
underlying expressions. Even if they reference outputs from
lower-level subqueries (at any depth), those references are guaranteed
not to expand to volatile or set-returning functions, because
subqueries containing such functions in their targetlists are never
pulled up.
Backpatch to v18, where this issue was introduced.
Reported-by: Eric Ridge <eebbrr@gmail.com> Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/7900964C-F99E-481E-BEE5-4338774CEB9F@gmail.com
Backpatch-through: 18
Tom Lane [Sun, 18 Jan 2026 19:54:33 +0000 (14:54 -0500)]
Update time zone data files to tzdata release 2025c.
This is pretty pro-forma for our purposes, as the only change
is a historical correction for pre-1976 DST laws in
Baja California. (Upstream made this release mostly to update
their leap-second data, which we don't use.) But with minor
releases coming up, we should be up-to-date.
Michael Paquier [Sun, 18 Jan 2026 08:24:58 +0000 (17:24 +0900)]
Fix error message related to end TLI in backup manifest
The code adding the WAL information included in a backup manifest is
cross-checked with the contents of the timeline history file of the end
timeline. A check based on the end timeline, when it fails, reported
the value of the start timeline in the error message. This error is
fixed to show the correct timeline number in the report.
This error report would be confusing for users if seen, because it would
provide an incorrect information, so backpatch all the way down.
Fix crash in test function on removable_cutoff(NULL)
The function is part of the injection_points test module and only used
in tests. None of the current tests call it with a NULL argument, but
it is supposed to work.
Amit Langote [Fri, 16 Jan 2026 05:53:32 +0000 (14:53 +0900)]
Fix rowmark handling for non-relation RTEs during executor init
Commit cbc127917e introduced tracking of unpruned relids to skip
processing of pruned partitions. PlannedStmt.unprunableRelids is
computed as the difference between PlannerGlobal.allRelids and
prunableRelids, but allRelids only contains RTE_RELATION entries.
This means non-relation RTEs (VALUES, subqueries, CTEs, etc.) are
never included in unprunableRelids, and consequently not in
es_unpruned_relids at runtime.
As a result, rowmarks attached to non-relation RTEs were incorrectly
skipped during executor initialization. This affects any DML statement
that has rowmarks on such RTEs, including MERGE with a VALUES or
subquery source, and UPDATE/DELETE with joins against subqueries or
CTEs. When a concurrent update triggers an EPQ recheck, the missing
rowmark leads to incorrect results.
Fix by restricting the es_unpruned_relids membership check to
RTE_RELATION entries only, since partition pruning only applies to
actual relations. Rowmarks for other RTE kinds are now always
processed.
Bug: #19355 Reported-by: Bihua Wang <wangbihua.cn@gmail.com> Diagnosed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19355-57d7d52ea4980dc6@postgresql.org
Backpatch-through: 18
Amit Langote [Fri, 16 Jan 2026 04:01:52 +0000 (13:01 +0900)]
Fix segfault from releasing locks in detached DSM segments
If a FATAL error occurs while holding a lock in a DSM segment (such
as a dshash lock) and the process is not in a transaction, a
segmentation fault can occur during process exit.
The problem sequence is:
1. Process acquires a lock in a DSM segment (e.g., via dshash)
2. FATAL error occurs outside transaction context
3. proc_exit() begins, calling before_shmem_exit callbacks
4. dsm_backend_shutdown() detaches all DSM segments
5. Later, on_shmem_exit callbacks run
6. ProcKill() calls LWLockReleaseAll()
7. Segfault: the lock being released is in unmapped memory
This only manifests outside transaction contexts because
AbortTransaction() calls LWLockReleaseAll() during transaction
abort, releasing locks before DSM cleanup. Background workers and
other non-transactional code paths are vulnerable.
Fix by calling LWLockReleaseAll() unconditionally at the start of
shmem_exit(), before any callbacks run. Releasing locks before
callbacks prevents the segfault - locks must be released before
dsm_backend_shutdown() detaches their memory. This is safe because
after an error, held locks are protecting potentially inconsistent
data anyway, and callbacks can acquire fresh locks if needed.
Also add a comment noting that LWLockReleaseAll() must be safe to
call before LWLock initialization (which it is, since
num_held_lwlocks will be 0), plus an Assert for the post-condition.
This fix aligns with the original design intent from commit 001a573a2, which noted that backends must clean up shared memory
state (including releasing lwlocks) before unmapping dynamic shared
memory segments.
Fix 'unexpected data beyond EOF' on replica restart
On restart, a replica can fail with an error like 'unexpected data
beyond EOF in block 200 of relation T/D/R'. These are the steps to
reproduce it:
- A relation has a size of 400 blocks.
- Blocks 201 to 400 are empty.
- Block 200 has two rows.
- Blocks 100 to 199 are empty.
- A restartpoint is done
- Vacuum truncates the relation to 200 blocks
- A FPW deletes a row in block 200
- A checkpoint is done
- A FPW deletes the last row in block 200
- Vacuum truncates the relation to 100 blocks
- The replica restarts
When the replica restarts:
- The relation on disk starts at 100 blocks, because all the
truncations were applied before restart.
- The first truncate to 200 blocks is replayed. It silently fails, but
it will still (incorrectly!) update the cache size to 200 blocks
- The first FPW on block 200 is applied. XLogReadBufferForRead relies
on the cached size and incorrectly assumes that the page already
exists in the file, and thus won't extend the relation.
- The online checkpoint record is replayed, calling smgrdestroyall
which causes the cached size to be discarded
- The second FPW on block 200 is applied. This time, the detected size
is 100 blocks, an extend is attempted. However, the block 200 is
already present in the buffer cache due to the first FPW. This
triggers the 'unexpected data beyond EOF'.
To fix, update the cached size in SmgrRelation with the current size
rather than the requested new size, when the requested new size is
greater.
Andres Freund [Thu, 15 Jan 2026 15:17:51 +0000 (10:17 -0500)]
aio: io_uring: Fix danger of completion getting reused before being read
We called io_uring_cqe_seen(..., cqe) before reading cqe->res. That allows the
completion to be reused, which in turn could lead to cqe->res being
overwritten. The window for that is very narrow and the likelihood of it
happening is very low, as we should never actually utilize all CQEs, but the
consequences would be bad.
Add check for invalid offset at multixid truncation
If a multixid with zero offset is left behind after a crash, and that
multixid later becomes the oldest multixid, truncation might try to
look up its offset and read the zero value. In the worst case, we
might incorrectly use the zero offset to truncate valid SLRU segments
that are still needed. I'm not sure if that can happen in practice, or
if there are some other lower-level safeguards or incidental reasons
that prevent the caller from passing an unwritten multixid as the
oldest multi. But better safe than sorry, so let's add an explicit
check for it.
In stable branches, we should perhaps do the same check for
'oldestOffset', i.e. the offset of the old oldest multixid (in master,
'oldestOffset' is gone). But if the old oldest multixid has an invalid
offset, the damage has been done already, and we would never advance
past that point. It's not clear what we should do in that case. The
check that this commit adds will prevent such an multixid with invalid
offset from becoming the oldest multixid in the first place, which
seems enough for now.
Michael Paquier [Wed, 14 Jan 2026 07:02:33 +0000 (16:02 +0900)]
pg_waldump: Relax LSN comparison check in TAP test
The test 002_save_fullpage.pl, checking --save-fullpage fails with
wal_consistency_checking enabled, due to the fact that the block saved
in the file has the same LSN as the LSN used in the file name. The test
required that the block LSN is stritly lower than file LSN. This commit
relaxes the check a bit, by allowing the LSNs to match.
While on it, the test name is reworded to include some information about
the file and block LSNs, which is useful for debugging.
Michael Paquier [Tue, 13 Jan 2026 23:44:52 +0000 (08:44 +0900)]
Fix query jumbling with GROUP BY clauses
RangeTblEntry.groupexprs was marked with the node attribute
query_jumble_ignore, causing a list of GROUP BY expressions to be
ignored during the query jumbling. For example, these two queries could
be grouped together within the same query ID:
SELECT count(*) FROM t GROUP BY a;
SELECT count(*) FROM t GROUP BY b;
However, as such queries use different GROUP BY clauses, they should be
split across multiple entries.
This fixes an oversight in 247dea89f761, that has introduced an RTE for
GROUP BY clauses. Query IDs are documented as being stable across minor
releases, but as this is a regression new to v18 and that we are still
early in its support cycle, a backpatch is exceptionally done as this
has broken a behavior that exists since query jumbling is supported in
core, since its introduction in pg_stat_statements.
The tests of pg_stat_statements are expanded to cover this area, with
patterns involving GROUP BY and GROUPING clauses.
Author: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com
Backpatch-through: 18
Fujii Masao [Tue, 13 Jan 2026 13:54:45 +0000 (22:54 +0900)]
doc: Document DEFAULT option in file_fdw.
Commit 9f8377f7a introduced the DEFAULT option for file_fdw but did not
update the documentation. This commit adds the missing description of
the DEFAULT option to the file_fdw documentation.
Backpatch to v16, where the DEFAULT option was introduced.
Jacob Champion [Fri, 9 Jan 2026 18:02:56 +0000 (10:02 -0800)]
doc: Improve description of publish_via_partition_root
Reword publish_via_partition_root's opening paragraph. Describe its
behavior more clearly, and directly state that its default is false.
Per complaint by Peter Smith; final text of the patch made in
collaboration with Chao Li.
Author: Chao Li <li.evan.chao@gmail.com>
Author: Peter Smith <peter.b.smith@fujitsu.com> Reported-by: Peter Smith <peter.b.smith@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut%2BPu7SpK%2BctOYoqYR3V4w5LKc9sCs6c_qotk9uTQJQ4zp6g%40mail.gmail.com
Backpatch-through: 14
Nathan Bossart [Fri, 9 Jan 2026 16:12:54 +0000 (10:12 -0600)]
pg_dump: Fix gathering of sequence information.
Since commit bd15b7db48, pg_dump uses pg_get_sequence_data() (née
pg_sequence_read_tuple()) to gather all sequence data in a single
query as opposed to a query per sequence. Two related bugs have
been identified:
* If the user lacks appropriate privileges on the sequence, pg_dump
generates a setval() command with garbage values instead of
failing as expected.
* pg_dump can fail due to a concurrently dropped sequence, even if
the dropped sequence's data isn't part of the dump.
This commit fixes the above issues by 1) teaching
pg_get_sequence_data() to return nulls instead of erroring for a
missing sequence and 2) teaching pg_dump to fail if it tries to
dump the data of a sequence for which pg_get_sequence_data()
returned nulls. Note that pg_dump may still fail due to a
concurrently dropped sequence, but it should now only do so when
the sequence data is part of the dump. This matches the behavior
before commit bd15b7db48.
Bug: #19365 Reported-by: Paveł Tyślacki <pavel.tyslacki@gmail.com> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19365-6245240d8b926327%40postgresql.org
Discussion: https://postgr.es/m/2885944.1767029161%40sss.pgh.pa.us
Backpatch-through: 18
David Rowley [Thu, 8 Jan 2026 22:02:59 +0000 (11:02 +1300)]
Fix possible incorrect column reference in ERROR message
When creating a partition for a RANGE partitioned table, the reporting
of errors relating to converting the specified range values into
constant values for the partition key's type could display the name of a
previous partition key column when an earlier range was specified as
MINVALUE or MAXVALUE.
This was caused by the code not correctly incrementing the index that
tracks which partition key the foreach loop was working on after
processing MINVALUE/MAXVALUE ranges.
Fix by using foreach_current_index() to ensure the index variable is
always set to the List element being worked on.
Peter Geoghegan [Wed, 7 Jan 2026 17:53:05 +0000 (12:53 -0500)]
Fix nbtree skip array transformation comments.
Fix comments that incorrectly described transformations performed by the
"Avoid extra index searches through preprocessing" mechanism introduced
by commit b3f1a13f.
Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-By: Chao Li <li.evan.chao@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/20251230190145.c3c88c5eb0f88b136adda92f@sraoss.co.jp
Backpatch-through: 18
John Naylor [Wed, 7 Jan 2026 09:02:19 +0000 (16:02 +0700)]
createuser: Update docs to reflect defaults
Commit c7eab0e97 changed the default password_encryption setting to
'scram-sha-256', so update the example for creating a user with an
assigned password.
In addition, commit 08951a7c9 added new options that in turn pass
default tokens NOBYPASSRLS and NOREPLICATION to the CREATE ROLE
command, so fix this omission as well for v16 and later.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/cff1ea60-c67d-4320-9e33-094637c2c4fb%40iki.fi
Backpatch-through: 14
Andres Freund [Wed, 7 Jan 2026 00:51:19 +0000 (19:51 -0500)]
Fix buggy interaction between array subscripts and subplan params
In a7f107df2 I changed subplan param evaluation to happen within the
containing expression. As part of that, ExecInitSubPlanExpr() was changed to
evaluate parameters via a new EEOP_PARAM_SET expression step. These parameters
were temporarily stored into ExprState->resvalue/resnull, with some reasoning
why that would be fine. Unfortunately, that analysis was wrong -
ExecInitSubscriptionRef() evaluates the input array into "resv"/"resnull",
which will often point to ExprState->resvalue/resnull. This means that the
EEOP_PARAM_SET, if inside an array subscript, would overwrite the input array
to array subscript.
The fix is fairly simple - instead of evaluating into
ExprState->resvalue/resnull, store the temporary result of the subplan in the
subplan's return value.
Bug: #19370 Reported-by: Zepeng Zhang <redraiment@gmail.com> Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Diagnosed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/19370-7fb7a5854b7618f1@postgresql.org
Backpatch-through: 18
Amit Kapila [Tue, 6 Jan 2026 04:48:49 +0000 (04:48 +0000)]
Update comments atop ReplicationSlotCreate.
Since commit 1462aad2e4, which introduced the ability to modify the
two_phase property of a slot, the comments above ReplicationSlotCreate
have become outdated. We have now added a cautionary note in the comments
above ReplicationSlotAlter explaining when it is safe to modify the
two_phase property of a slot.
Author: Daniil Davydov <3danissimo@gmail.com>
Author: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAJDiXggZXQZ7bD0QcTizDt6us9aX6ZKK4dWxzgb5x3+TsVHjqQ@mail.gmail.com
David Rowley [Tue, 6 Jan 2026 04:29:44 +0000 (17:29 +1300)]
Fix issue with EVENT TRIGGERS and ALTER PUBLICATION
When processing the "publish" options of an ALTER PUBLICATION command,
we call SplitIdentifierString() to split the options into a List of
strings. Since SplitIdentifierString() modifies the delimiter
character and puts NULs in their place, this would overwrite the memory
of the AlterPublicationStmt. Later in AlterPublicationOptions(), the
modified AlterPublicationStmt is copied for event triggers, which would
result in the event trigger only seeing the first "publish" option
rather than all options that were specified in the command.
To fix this, make a copy of the string before passing to
SplitIdentifierString().
Here we also adjust a similar case in the pgoutput plugin. There's no
known issues caused by SplitIdentifierString() here, so this is being
done out of paranoia.
Thanks to Henson Choi for putting together an example case showing the
ALTER PUBLICATION issue.
Fujii Masao [Tue, 6 Jan 2026 02:57:12 +0000 (11:57 +0900)]
Add TAP test for GUC settings passed via CONNECTION in logical replication.
Commit d926462d819 restored the behavior of passing GUC settings from
the CONNECTION string to the publisher's walsender, allowing per-connection
configuration.
This commit adds a TAP test to verify that behavior works correctly.
Since commit d926462d819 was recently applied and backpatched to v15,
this follow-up commit is also backpatched accordingly.
Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHGQGwGYV+-abbKwdrM2UHUe-JYOFWmsrs6=QicyJO-j+-Widw@mail.gmail.com
Backpatch-through: 15
Fujii Masao [Tue, 6 Jan 2026 02:52:22 +0000 (11:52 +0900)]
Honor GUC settings specified in CREATE SUBSCRIPTION CONNECTION.
Prior to v15, GUC settings supplied in the CONNECTION clause of
CREATE SUBSCRIPTION were correctly passed through to
the publisher's walsender. For example:
would cause wal_sender_timeout to take effect on the publisher's walsender.
However, commit f3d4019da5d changed the way logical replication
connections are established, forcing the publisher's relevant
GUC settings (datestyle, intervalstyle, extra_float_digits) to
override those provided in the CONNECTION string. As a result,
from v15 through v18, GUC settings in the CONNECTION string were
always ignored.
This regression prevented per-connection tuning of logical replication.
For example, using a shorter timeout for walsender connecting
to a nearby subscriber and a longer one for walsender connecting
to a remote subscriber.
This commit restores the intended behavior by ensuring that
GUC settings in the CONNECTION string are again passed through
and applied by the walsender, allowing per-connection configuration.
Backpatch to v15, where the regression was introduced.
Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHGQGwGYV+-abbKwdrM2UHUe-JYOFWmsrs6=QicyJO-j+-Widw@mail.gmail.com
Backpatch-through: 15
David Rowley [Tue, 6 Jan 2026 02:16:56 +0000 (15:16 +1300)]
Fix misleading comment for GetOperatorFromCompareType
The comment claimed *strat got set to InvalidStrategy when the function
lookup fails. This isn't true; an ERROR is raised when that happens.
Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CA+renyXOrjLacP_nhqEQUf2W+ZCoY2q5kpQCfG05vQVYzr8b9w@mail.gmail.com
Backpatch-through: 18
Andres Freund [Mon, 5 Jan 2026 18:09:03 +0000 (13:09 -0500)]
ci: Remove ulimit -p for netbsd/openbsd
Previously the ulimit -p 256 was needed to increase the limit on
openbsd. However, sometimes the limit actually was too low, causing
"could not fork new process for connection: Resource temporarily unavailable"
errors. Most commonly on netbsd, but also on openbsd.
The ulimit on openbsd couldn't trivially be increased with ulimit, because of
hitting the hard limit.
Instead of increasing the limit in the CI script, the CI image generation now
increases the limits: https://github.com/anarazel/pg-vm-images/pull/129
David Rowley [Sun, 4 Jan 2026 07:33:14 +0000 (20:33 +1300)]
Fix selectivity estimation integer overflow in contrib/intarray
This fixes a poorly written integer comparison function which was
performing subtraction in an attempt to return a negative value when
a < b and a positive value when a > b, and 0 when the values were equal.
Unfortunately that didn't always work correctly due to two's complement
having the INT_MIN 1 further from zero than INT_MAX. This could result
in an overflow and cause the comparison function to return an incorrect
result, which would result in the binary search failing to find the
value being searched for.
This could cause poor selectivity estimates when the statistics stored
the value of INT_MAX (2147483647) and the value being searched for was
large enough to result in the binary search doing a comparison with that
INT_MAX value.
Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2ng1Ot5LoKbVU-Dh---dFTUZWJRH8wv2chBu29fnNDMaQ@mail.gmail.com
Backpatch-through: 14
Masahiko Sawada [Wed, 31 Dec 2025 19:18:17 +0000 (11:18 -0800)]
Fix macro name for io_uring_queue_init_mem check.
Commit f54af9f2679d added a check for
io_uring_queue_init_mem(). However, it used the macro name
HAVE_LIBURING_QUEUE_INIT_MEM in both meson.build and the C code, while
the Autotools build script defined HAVE_IO_URING_QUEUE_INIT_MEM. As a
result, the optimization was never enabled in builds configured with
Autotools, as the C code checked for the wrong macro name.
This commit changes the macro name to HAVE_IO_URING_QUEUE_INIT_MEM in
meson.build and the C code. This matches the actual function
name (io_uring_queue_init_mem), following the standard HAVE_<FUNCTION>
convention.
Backpatch to 18, where the macro was introduced.
Bug: #19368 Reported-by: Evan Si <evsi@amazon.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19368-016d79a7f3a1c599@postgresql.org
Backpatch-through: 18
Thomas Munro [Wed, 31 Dec 2025 00:24:17 +0000 (13:24 +1300)]
jit: Fix jit_profiling_support when unavailable.
jit_profiling_support=true captures profile data for Linux perf. On
other platforms, LLVMCreatePerfJITEventListener() returns NULL and the
attempt to register the listener would crash.
Fix by ignoring the setting in that case. The documentation already
says that it only has an effect if perf support is present, and we
already did the same for older LLVM versions that lacked support.
No field reports, unsurprisingly for an obscure developer-oriented
setting. Noticed in passing while working on commit 1a28b4b4.
Masahiko Sawada [Tue, 30 Dec 2025 18:56:28 +0000 (10:56 -0800)]
Fix a race condition in updating procArray->replication_slot_xmin.
Previously, ReplicationSlotsComputeRequiredXmin() computed the oldest
xmin across all slots without holding ProcArrayLock (when
already_locked is false), acquiring the lock just before updating the
replication slot xmin.
This could lead to a race condition: if a backend created a new slot
and updates the global replication slot xmin, another backend
concurrently running ReplicationSlotsComputeRequiredXmin() could
overwrite that update with an invalid or stale value. This happens
because the concurrent backend might have computed the aggregate xmin
before the new slot was accounted for, but applied the update after
the new slot had already updated the global value.
In the reported failure, a walsender for an apply worker computed
InvalidTransactionId as the oldest xmin and overwrote a valid
replication slot xmin value computed by a walsender for a tablesync
worker. Consequently, the tablesync worker computed a transaction ID
via GetOldestSafeDecodingTransactionId() effectively without
considering the replication slot xmin. This led to the error "cannot
build an initial slot snapshot as oldest safe xid %u follows
snapshot's xmin %u", which was an assertion failure prior to commit 240e0dbacd3.
To fix this, we acquire ReplicationSlotControlLock in exclusive mode
during slot creation to perform the initial update of the slot
xmin. In ReplicationSlotsComputeRequiredXmin(), we hold
ReplicationSlotControlLock in shared mode until the global slot xmin
is updated in ProcArraySetReplicationSlotXmin(). This prevents
concurrent computations and updates of the global xmin by other
backends during the initial slot xmin update process, while still
permitting concurrent calls to ReplicationSlotsComputeRequiredXmin().
Backpatch to all supported versions.
Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Pradeep Kumar <spradeepkumar29@gmail.com> Reviewed-by: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1L8wYcyTPxNzPGkhuO52WBGoOZbT0A73Le=ZUWYAYmdfw@mail.gmail.com
Backpatch-through: 14
Thomas Munro [Tue, 30 Dec 2025 01:11:37 +0000 (14:11 +1300)]
jit: Remove -Wno-deprecated-declarations in 18+.
REL_18_STABLE and master have commit ee485912, so they always use the
newer LLVM opaque pointer functions. Drop -Wno-deprecated-declarations
(commit a56e7b660) for code under jit/llvm in those branches, to catch
any new deprecation warnings that arrive in future version of LLVM.
Older branches continued to use functions marked deprecated in LLVM 14
and 15 (ie switched to the newer functions only for LLVM 16+), as a
precaution against unforeseen compatibility problems with bitcode
already shipped. In those branches, the comment about warning
suppression is updated to explain that situation better. In theory we
could suppress warnings only for LLVM 14 and 15 specifically, but that
isn't done here.
Backpatch-through: 14 Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1407185.1766682319%40sss.pgh.pa.us
Thomas Munro [Mon, 29 Dec 2025 02:22:16 +0000 (15:22 +1300)]
Fix Mkvcbuild.pm builds of test_cloexec.c.
Mkvcbuild.pm scrapes Makefile contents, but couldn't understand the
change made by commit bec2a0aa. Revealed by BF animal hamerkop in
branch REL_16_STABLE.
1. It used += instead of =, which didn't match the pattern that
Mkvcbuild.pm looks for. Drop the +.
2. Mkvcbuild.pm doesn't link PROGRAM executables with libpgport. Apply
a local workaround to REL_16_STABLE only (later branches dropped
Mkvcbuild.pm).
Backpatch-through: 16 Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/175163.1766357334%40sss.pgh.pa.us
Richard Guo [Mon, 29 Dec 2025 02:40:45 +0000 (11:40 +0900)]
Ignore PlaceHolderVars when looking up statistics
When looking up statistical data about an expression, we failed to
look through PlaceHolderVar nodes, treating them as opaque. This
could prevent us from matching an expression to base columns, index
expressions, or extended statistics, as examine_variable() relies on
strict structural matching.
As a result, queries involving PlaceHolderVar nodes often fell back to
default selectivity estimates, potentially leading to poor plan
choices.
This patch updates examine_variable() to strip PlaceHolderVars before
analysis. This is safe during estimation because PlaceHolderVars are
transparent for the purpose of statistics lookup: they do not alter
the value distribution of the underlying expression.
To minimize performance overhead on this hot path, a lightweight
walker first checks for the presence of PlaceHolderVars. The more
expensive mutator is invoked only when necessary.
There is one ensuing plan change in the regression tests, which is
expected and demonstrates the fix: the rowcount estimate becomes much
more accurate with this patch.
Back-patch to v18. Although this issue exists before that, changes in
this version made it common enough to notice. Given the lack of field
reports for older versions, I am not back-patching further.
Reported-by: Haowu Ge <gehaowu@bitmoe.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com
Backpatch-through: 18
Richard Guo [Mon, 29 Dec 2025 02:38:49 +0000 (11:38 +0900)]
Strip PlaceHolderVars from index operands
When pulling up a subquery, we may need to wrap its targetlist items
in PlaceHolderVars to enforce separate identity or as a result of
outer joins. However, this causes any upper-level WHERE clauses
referencing these outputs to contain PlaceHolderVars, which prevents
indxpath.c from recognizing that they could be matched to index
columns or index expressions, potentially affecting the planner's
ability to use indexes.
To fix, explicitly strip PlaceHolderVars from index operands. A
PlaceHolderVar appearing in a relation-scan-level expression is
effectively a no-op. Nevertheless, to play it safe, we strip only
PlaceHolderVars that are not marked nullable.
The stripping is performed recursively to handle cases where
PlaceHolderVars are nested or interleaved with other node types. To
minimize performance impact, we first use a lightweight walker to
check for the presence of strippable PlaceHolderVars. The expensive
mutator is invoked only if a candidate is found, avoiding unnecessary
memory allocation and tree copying in the common case where no
PlaceHolderVars are present.
Back-patch to v18. Although this issue exists before that, changes in
this version made it common enough to notice. Given the lack of field
reports for older versions, I am not back-patching further.
Reported-by: Haowu Ge <gehaowu@bitmoe.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com
Backpatch-through: 18
Add oauth_validator_libraries to variable_is_guc_list_quote
The variable_is_guc_list_quote function need to know about all
GUC_QUOTE variables, this adds oauth_validator_libraries which
was missing. Backpatch to v18 where OAuth was introduced.
Michael Paquier [Sat, 27 Dec 2025 08:23:51 +0000 (17:23 +0900)]
Fix pg_stat_get_backend_activity() to use multi-byte truncated result
pg_stat_get_backend_activity() calls pgstat_clip_activity() to ensure
that the reported query string is correctly truncated when it finishes
with an incomplete multi-byte sequence. However, the result returned by
the function was not what pgstat_clip_activity() generated, but the
non-truncated, original, contents from PgBackendStatus.st_activity_raw.
Oversight in 54b6cd589ac2, so backpatch all the way down.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2mDzwc48q2EK9tSXS6iJMJ35wvxNQnHX+rXjy5VgLvJQw@mail.gmail.com
Backpatch-through: 14
Richard Guo [Thu, 25 Dec 2025 03:12:52 +0000 (12:12 +0900)]
Fix planner error with SRFs and grouping sets
If there are any SRFs in a PathTarget, we must separate it into
SRF-computing and SRF-free targets. This is because the executor can
only handle SRFs that appear at the top level of the targetlist of a
ProjectSet plan node.
If we find a subexpression that matches an expression already computed
in the previous plan level, we should treat it like a Var and should
not split it again. setrefs.c will later replace the expression with
a Var referencing the subplan output.
However, when processing the grouping target for grouping sets, the
planner can fail to recognize that an expression is already computed
in the scan/join phase. The root cause is a mismatch in the
nullingrels bits. Expressions in the grouping target carry the
grouping nulling bit in their nullingrels to indicate that they can be
nulled by the grouping step. However, the corresponding expressions
in the scan/join target do not have these bits.
As a result, the exact match check in list_member() fails, leading the
planner to incorrectly believe that the expression needs to be
re-evaluated from its arguments, which are often not available in the
subplan. This can lead to planner errors such as "variable not found
in subplan target list".
To fix, ignore the grouping nulling bit when checking whether an
expression from the grouping target is available in the pre-grouping
input target. This aligns with the matching logic in setrefs.c.
Backpatch to v18, where this issue was introduced.
Bug: #19353 Reported-by: Marian MULLER REBEYROL <marian.muller@serli.com>
Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19353-aaa179bba986a19b@postgresql.org
Backpatch-through: 18
Masahiko Sawada [Wed, 24 Dec 2025 21:55:32 +0000 (13:55 -0800)]
psql: Fix tab completion for VACUUM option values.
Commit 8a3e4011 introduced tab completion for the ONLY option of
VACUUM and ANALYZE, along with some code simplification using
MatchAnyN. However, it caused a regression in tab completion for
VACUUM option values. For example, neither ON nor OFF was suggested
after "VACUUM (VERBOSE". In addition, the ONLY keyword was not
suggested immediately after a completed option list.
Fujii Masao [Wed, 24 Dec 2025 15:27:19 +0000 (00:27 +0900)]
doc: Use proper tags in pg_overexplain documentation.
The pg_overexplain documentation previously used the <literal> tag for
some file names, struct names, and commands. Update the markup to
use the more appropriate tags: <filename>, <structname>, and <command>.
Backpatch to v18, where pg_overexplain was introduced.
Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Shixin Wang <wang-shi-xin@outlook.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEyYUzz0LjBV_fMcdwU3wgmu0NCoT+JJiozPa8DG6eeog@mail.gmail.com
Backpatch-through: 18
Commit 8e0d32a4a1 fixed an issue by allowing the replication origin to be
created while marking the table sync state as SUBREL_STATE_DATASYNC.
Update the comment in check_old_cluster_subscription_state() to accurately
describe this corrected behavior.
Author: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 17, where the code was introduced
Discussion: https://postgr.es/m/CAA4eK1+KaSf5nV_tWy+SDGV6MnFnKMhdt41jJjSDWm6yCyOcTw@mail.gmail.com
Discussion: https://postgr.es/m/aUTekQTg4OYnw-Co@paquier.xyz
Amit Kapila [Wed, 24 Dec 2025 04:21:43 +0000 (04:21 +0000)]
Don't advance origin during apply failure.
The logical replication parallel apply worker could incorrectly advance
the origin progress during an error or failed apply. This behavior risks
transaction loss because such transactions will not be resent by the
server.
Commit 3f28b2fcac addressed a similar issue for both the apply worker and
the table sync worker by registering a before_shmem_exit callback to reset
origin information. This prevents the worker from advancing the origin
during transaction abortion on shutdown. This patch applies the same fix
to the parallel apply worker, ensuring consistent behavior across all
worker types.
As with 3f28b2fcac, we are backpatching through version 16, since parallel
apply mode was introduced there and the issue only occurs when changes are
applied before the transaction end record (COMMIT or ABORT) is received.
Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16
Discussion: https://postgr.es/m/TY4PR01MB169078771FB31B395AB496A6B94B4A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
Fix bug in following update chain when locking a heap tuple
After waiting for a concurrent updater to finish, heap_lock_tuple()
followed the update chain to lock all tuple versions. However, when
stepping from the initial tuple to the next one, it failed to check
that the next tuple's XMIN matches the initial tuple's XMAX. That's an
important check whenever following an update chain, and the recursive
part that follows the chain did it, but the initial step missed it.
Without the check, if the updating transaction aborts, the updated
tuple is vacuumed away and replaced by an unrelated tuple, the
unrelated tuple might get incorrectly locked.
Michael Paquier [Tue, 23 Dec 2025 05:32:19 +0000 (14:32 +0900)]
Fix orphaned origin in shared memory after DROP SUBSCRIPTION
Since ce0fdbfe9722, a replication slot and an origin are created by each
tablesync worker, whose information is stored in both a catalog and
shared memory (once the origin is set up in the latter case). The
transaction where the origin is created is the same as the one that runs
the initial COPY, with the catalog state of the origin becoming visible
for other sessions only once the COPY transaction has committed. The
catalog state is coupled with a state in shared memory, initialized at
the same time as the origin created in the catalogs. Note that the
transaction doing the initial data sync can take a long time, time that
depends on the amount of data to transfer from a publication node to its
subscriber node.
Now, when a DROP SUBSCRIPTION is executed, all its workers are stopped
with the origins removed. The removal of each origin relies on a
catalog lookup. A worker still running the initial COPY would fail its
transaction, with the catalog state of the origin rolled back while the
shared memory state remains around. The session running the DROP
SUBSCRIPTION should be in charge of cleaning up the catalog and the
shared memory state, but as there is no data in the catalogs the shared
memory state is not removed. This issue would leave orphaned origin
data in shared memory, leading to a confusing state as it would still
show up in pg_replication_origin_status. Note that this shared memory
data is sticky, being flushed on disk in replorigin_checkpoint at
checkpoint. This prevents other origins from reusing a slot position
in the shared memory data.
To address this problem, the commit moves the creation of the origin at
the end of the transaction that precedes the one executing the initial
COPY, making the origin immediately visible in the catalogs for other
sessions, giving DROP SUBSCRIPTION a way to know about it. A different
solution would have been to clean up the shared memory state using an
abort callback within the tablesync worker. The solution of this commit
is more consistent with the apply worker that creates an origin in a
short transaction.
A test is added in the subscription test 004_sync.pl, which was able to
display the problem. The test fails when this commit is reverted.
Thomas Munro [Sun, 21 Dec 2025 02:40:07 +0000 (15:40 +1300)]
Clean up test_cloexec.c and Makefile.
An unused variable caused a compiler warning on BF animal fairywren, an
snprintf() call was redundant, and some buffer sizes were inconsistent.
Per code review from Tom Lane.
The Makefile's test ifeq ($(PORTNAME), win32) never succeeded due to a
circularity, so only Meson builds were actually compiling the new test
code, partially explaining why CI didn't tell us about the warning
sooner (the other problem being that CompilerWarnings only makes
world-bin, a problem for another commit). Simplify.
Backpatch-through: 16, like commit c507ba55
Author: Bryan Green <dbryan.green@gmail.com> Co-authored-by: Thomas Munro <tmunro@gmail.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1086088.1765593851%40sss.pgh.pa.us
John Naylor [Fri, 19 Dec 2025 08:48:18 +0000 (15:48 +0700)]
Update pg_hba.conf example to reflect MD5 deprecation
In the wake of commit db6a4a985, remove most use of 'md5' from the
example configuration file. The only remainder is an example exception
for a client that doesn't support SCRAM.
Author: Mikael Gustavsson <mikael.gustavsson@smhi.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/176595607507.978865.11597773194269211255@wrigleys.postgresql.org
Discussion: https://postgr.es/m/4ed268473fdb4cf9b0eced6c8019d353@smhi.se
Backpatch-through: 18
Fujii Masao [Fri, 19 Dec 2025 03:05:37 +0000 (12:05 +0900)]
Add guard to prevent recursive memory context logging.
Previously, if memory context logging was triggered repeatedly and
rapidly while a previous request was still being processed, it could
result in recursive calls to ProcessLogMemoryContextInterrupt().
This could lead to infinite recursion and potentially crash the process.
This commit adds a guard to prevent such recursion.
If ProcessLogMemoryContextInterrupt() is already in progress and
logging memory contexts, subsequent calls will exit immediately,
avoiding unintended recursive calls.
While this scenario is unlikely in practice, it's not impossible.
This change adds a safety check to prevent such failures.
Back-patch to v14, where memory context logging was introduced.
Noah Misch [Thu, 18 Dec 2025 18:23:47 +0000 (10:23 -0800)]
Sort DO_SUBSCRIPTION_REL dump objects independent of OIDs.
Commit 0decd5e89db9f5edb9b27351082f0d74aae7a9b6 missed
DO_SUBSCRIPTION_REL, leading to assertion failures. In the unlikely use
case of diffing "pg_dump --binary-upgrade" output, spurious diffs were
possible. As part of fixing that, align the DumpableObject naming and
sort order with DO_PUBLICATION_REL. The overall effect of this commit
is to change sort order from (subname, srsubid) to (rel, subname).
Since DO_SUBSCRIPTION_REL is only for --binary-upgrade, accept that
larger-than-usual dump order change. Back-patch to v17, where commit 9a17be1e244a45a77de25ed2ada246fd34e4557d introduced DO_SUBSCRIPTION_REL.
Reported-by: vignesh C <vignesh21@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2x3rd7C0_HjUpJFbxpAqXgm=QtoKfkEWDVA8h+JFpa_w@mail.gmail.com
Backpatch-through: 17
Operations on unlogged relations should not be WAL-logged. The
brin_initialize_empty_new_buffer() function didn't get the memo.
The function is only called when a concurrent update to a brin page
uses up space that we're just about to insert to, which makes it
pretty hard to hit. If you do manage to hit it, a full-page WAL record
is erroneously emitted for the unlogged index. If you then crash,
crash recovery will fail on that record with an error like this:
FATAL: could not create file "base/5/32819": File exists
Jacob Champion [Wed, 17 Dec 2025 19:55:04 +0000 (11:55 -0800)]
oauth_validator: Avoid races in log_check()
Commit e0f373ee4 fixed up races in Cluster::connect_fails when using
log_like. t/002_client.pl didn't get the memo, though, because it
doesn't use Test::Cluster to perform its custom hook tests. (This is
probably not an issue at the moment, since the log check is only done
after authentication success and not failure, but there's no reason to
wait for someone to hit it.)
Introduce the fix, based on debug2 logging, to its use of log_check() as
well, and move the logic into the test() helper so that any additions
don't need to continually duplicate it.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
Jacob Champion [Wed, 17 Dec 2025 19:54:56 +0000 (11:54 -0800)]
libpq-oauth: use correct c_args in meson.build
Copy-paste bug from b0635bfda: libpq-oauth.so was being built with
libpq_so_c_args, rather than libpq_oauth_so_c_args. (At the moment, the
two lists are identical, but that won't be true forever.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
Jacob Champion [Wed, 17 Dec 2025 19:54:47 +0000 (11:54 -0800)]
libpq-fe.h: Don't claim SOCKTYPE in the global namespace
The definition of PGoauthBearerRequest uses a temporary SOCKTYPE macro
to hide the difference between Windows and Berkeley socket handles,
since we don't surface pgsocket in our public API. This macro doesn't
need to escape the header, because implementers will choose the correct
socket type based on their platform, so I #undef'd it immediately after
use.
I didn't namespace that helper, though, so if anyone else needs a
SOCKTYPE macro, libpq-fe.h will now unhelpfully get rid of it. This
doesn't seem too far-fetched, given its proximity to existing POSIX
macro names.
Add a PQ_ prefix to avoid collisions, update and improve the surrounding
documentation, and backpatch.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
The test is very sensitive to how backends start and exit, because it
tests dead-end backends which occur when all the connection slots are
in use. The test failed occasionally in the CI, when the backend that
was launched for the raw_connect_works() check lingered for a while,
and exited only later during the test. When it exited, it released a
connection slot, when the test expected all the slots to be in use at
that time.
The 002_connection_limits.pl test had a similar issue: if the backend
launched for safe_psql() in the test initialization lingers around, it
uses up a connection slot during the test, messing up the test's
connection counting. I haven't seen that in the CI, but when I added a
"sleep(1);" to proc_exit(), the test failed.
To make the tests more robust, restart the server to ensure that the
lingering backends doesn't interfere with the later test steps.
In the passing, fix a bogus test name.
Report and analysis by Jelte Fennema-Nio, Andres Freund, Thomas Munro.
Jeff Davis [Tue, 16 Dec 2025 19:13:17 +0000 (11:13 -0800)]
ltree: fix case-insensitive matching.
Previously, ltree_prefix_eq_ci() used lowercasing with the default
collation; while ltree_crc32_sz() used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding was single-byte.
Change both to use casefolding with the default collation.
Backpatch through 18, where the casefolding APIs were introduced. The
bug exists in earlier versions, but would require some adaptation.
A REINDEX is required for ltree indexes where the database default
collation is not libc.
Jeff Davis [Tue, 16 Dec 2025 18:35:40 +0000 (10:35 -0800)]
Fix multibyte issue in ltree_strncasecmp().
Previously, the API for ltree_strncasecmp() took two inputs but only
one length (that of the smaller input). It truncated the larger input
to that length, but that could break a multibyte sequence.
Change the API to be a check for prefix equality (possibly
case-insensitive) instead, which is all that's needed by the
callers. Also, provide the lengths of both inputs.
Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5f65b85740197ba6249ea507cddf609f84a6188b.camel%40j-davis.com
Backpatch-through: 14
[C] 'function void CacheInvalidateHeapTupleInplace(Relation, HeapTuple, HeapTuple)' has some sub-type changes:
parameter 3 of type 'typedef HeapTuple' was removed
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18 only
Robert Haas [Tue, 16 Dec 2025 15:40:53 +0000 (10:40 -0500)]
Switch memory contexts in ReinitializeParallelDSM.
We already do this in CreateParallelContext, InitializeParallelDSM, and
LaunchParallelWorkers. I suspect the reason why the matching logic was
omitted from ReinitializeParallelDSM is that I failed to realize that
any memory allocation was happening here -- but shm_mq_attach does
allocate, which could result in a shm_mq_handle being allocated in a
shorter-lived context than the ParallelContext which points to it.
That could result in a crash if the shorter-lived context is freed
before the parallel context is destroyed. As far as I am currently
aware, there is no way to reach a crash using only code that is
present in core PostgreSQL, but extensions could potentially trip
over this. Fixing this in the back-branches appears low-risk, so
back-patch to all supported versions.
Michael Paquier [Tue, 16 Dec 2025 04:29:36 +0000 (13:29 +0900)]
Fail recovery when missing redo checkpoint record without backup_label
This commit adds an extra check at the beginning of recovery to ensure
that the redo record of a checkpoint exists before attempting WAL
replay, logging a PANIC if the redo record referenced by the checkpoint
record could not be found. This is the same level of failure as when a
checkpoint record is missing. This check is added when a cluster is
started without a backup_label, after retrieving its checkpoint record.
The redo LSN used for the check is retrieved from the checkpoint record
successfully read.
In the case where a backup_label exists, the startup process already
fails if the redo record cannot be found after reading a checkpoint
record at the beginning of recovery.
Previously, the presence of the redo record was not checked. If the
redo and checkpoint records were located on different WAL segments, it
would be possible to miss a entire range of WAL records that should have
been replayed but were just ignored. The consequences of missing the
redo record depend on the version dealt with, these becoming worse the
older the version used:
- On HEAD, v18 and v17, recovery fails with a pointer dereference at the
beginning of the redo loop, as the redo record is expected but cannot be
found. These versions are good students, because we detect a failure
before doing anything, even if the failure is misleading in the shape of
a segmentation fault, giving no information that the redo record is
missing.
- In v16 and v15, problems show at the end of recovery within
FinishWalRecovery(), the startup process using a buggy LSN to decide
from where to start writing WAL. The cluster gets corrupted, still it
is noisy about it.
- v14 and older versions are worse: a cluster gets corrupted but it is
entirely silent about the matter. The redo record missing causes the
startup process to skip entirely recovery, because a missing record is
the same as not redo being required at all. This leads to data loss, as
everything is missed between the redo record and the checkpoint record.
Note that I have tested that down to 9.4, reproducing the issue with a
version of the author's reproducer slightly modified. The code is wrong
since at least 9.2, but I did not look at the exact point of origin.
This problem has been found by debugging a cluster where the WAL segment
including the redo segment was missing due to an operator error, leading
to a crash, based on an investigation in v15.
Requesting archive recovery with the creation of a recovery.signal or
a standby.signal even without a backup_label would mitigate the issue:
if the record cannot be found in pg_wal/, the missing segment can be
retrieved with a restore_command when checking that the redo record
exists. This was already the case without this commit, where recovery
would re-fetch the WAL segment that includes the redo record. The check
introduced by this commit makes the segment to be retrieved earlier to
make sure that the redo record can be found.
On HEAD, the code will be slightly changed in a follow-up commit to not
rely on a PANIC, to include a test able to emulate the original problem.
This is a minimal backpatchable fix, kept separated for clarity.
Reported-by: Andres Freund <andres@anarazel.de> Analyzed-by: Andres Freund <andres@anarazel.de>
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
Backpatch-through: 14
Jacob Champion [Mon, 15 Dec 2025 21:30:48 +0000 (13:30 -0800)]
libpq: Align oauth_json_set_error() with other NLS patterns
Now that the prior commits have fixed missing OAuth translations, pull
the bespoke usage of libpq_gettext() for OAUTHBEARER parsing into
oauth_json_set_error() itself, and make that a gettext trigger as well,
to better match what the other sites are doing. Add an _internal()
variant to handle the existing untranslated case.
Jacob Champion [Mon, 15 Dec 2025 21:30:44 +0000 (13:30 -0800)]
libpq-oauth: Don't translate internal errors
Some error messages are generated when OAuth multiplexer operations fail
unexpectedly in the client. Álvaro pointed out that these are both
difficult to translate idiomatically (as they use internal terminology
heavily) and of dubious translation value to end users (since they're
going to need to get developer help anyway). The response parsing engine
has a similar issue.
Remove these from the translation files by introducing internal variants
of actx_error() and oauth_parse_set_error().
Jacob Champion [Mon, 15 Dec 2025 21:30:31 +0000 (13:30 -0800)]
libpq: Add missing OAuth translations
Several strings that should have been translated as they passed through
libpq_gettext were not actually being pulled into the translation files,
because I hadn't directly wrapped them in one of the GETTEXT_TRIGGERS.
Move the responsibility for calling libpq_gettext() to the code that
sets actx->errctx. Doing the same in report_type_mismatch() would result
in double-translation, so mark those strings with gettext_noop()
instead. And wrap two ternary operands with gettext_noop(), even though
they're already in one of the triggers, since xgettext sees only the
first.
Finally, fe-auth-oauth.c was missing from nls.mk, so none of that file
was being translated at all. Add it now.
Original patch by Zhijie Hou, plus suggested tweaks by Álvaro Herrera
and small additions by me.
Noah Misch [Mon, 15 Dec 2025 20:19:49 +0000 (12:19 -0800)]
Revisit cosmetics of "For inplace update, send nontransactional invalidations."
This removes a never-used CacheInvalidateHeapTupleInplace() parameter.
It adds README content about inplace update visibility in logical
decoding. It rewrites other comments.
Back-patch to v18, where commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704
first appeared. Since this removes a CacheInvalidateHeapTupleInplace()
parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN
contains no calls to that function.
Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reported-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18
Clarify comment on multixid offset wraparound check
Coverity complained that offset cannot be 0 here because there's an
explicit check for "offset == 0" earlier in the function, but it
didn't see the possibility that offset could've wrapped around to 0.
The code is correct, but clarify the comment about it.
The same code exists in backbranches in the server
GetMultiXactIdMembers() function and in 'master' in the pg_upgrade
GetOldMultiXactIdSingleMember function. In backbranches Coverity
didn't complain about it because the check was merely an assertion,
but change the comment in all supported branches for consistency.
Michael Paquier [Thu, 11 Dec 2025 05:11:25 +0000 (14:11 +0900)]
pg_buffercache: Fix memory allocation formula
The code over-allocated the memory required for os_page_status, relying
on uint64 for its element size instead of an int, hence doubling what
was required. This could mean quite a lot of memory if dealing with a
lot of NUMA pages.
Michael Paquier [Thu, 11 Dec 2025 01:25:44 +0000 (10:25 +0900)]
Fix allocation formula in llvmjit_expr.c
An array of LLVMBasicBlockRef is allocated with the size used for an
element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef".
LLVMBasicBlockRef is a type that refers to a pointer, so this did not
directly cause a problem because both should have the same size, still
it is incorrect.
This issue has been spotted while reviewing a different patch, and
exists since 2a0faed9d702, so backpatch all the way down.
The test seemed to incorrectly think that query_safe() takes an
argument that describes what the query does, similar to e.g.
command_ok(). Until commit bd8d9c9bdf the extra arguments were
harmless and were just ignored, but when commit bd8d9c9bdf introduced
a new optional argument to query_safe(), the extra arguments started
clashing with that, causing the test to fail.
Backpatch to v17, that's the oldest branch where the test exists. The
extra arguments didn't cause any trouble on the older branches, but
they were clearly bogus anyway.
Fix some near-bugs related to ResourceOwner function arguments
These functions took a ResourceOwner argument, but only checked if it
was NULL, and then used CurrentResourceOwner for the actual work.
Surely the intention was to use the passed-in resource owner. All
current callers passed CurrentResourceOwner or NULL, so this has no
consequences at the moment, but it's an accident waiting to happen for
future caller and extensions.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com
Backpatch-through: 17
Michael Paquier [Wed, 10 Dec 2025 03:47:20 +0000 (12:47 +0900)]
Fix failures with cross-version pg_upgrade tests
Buildfarm members skimmer and crake have reported that pg_upgrade
running from v18 fails due to the changes of d52c24b0f808, with the
expectations that the objects removed in the test module
injection_points should still be present post upgrades, but the test
module does not have them anymore.
The origin of the issue is that the following test modules depend on
injection_points, but they do not drop the extension once the tests
finish, leaving its traces in the dumps used for the upgrades:
- gin, down to v17
- typcache, down to v18
- nbtree, HEAD-only
Test modules have no upgrade requirements, as they are used only for..
Tests, so there is no point in keeping them around.
An alternative solution would be to drop the databases created by these
modules in AdjustUpgrade.pm, but the solution of this commit to drop the
extension is simpler. Note that there would be a catch if using a
solution based on AdjustUpgrade.pm as the database name used for the
test runs differs between configure and meson:
- configure relies on USE_MODULE_DB for the database name unicity, that
would build a database name based on the *first* entry of REGRESS, that
lists all the SQL tests.
- meson relies on a "name" field.
For example, for the test module "gin", the regression database is named
"regression_gin" under meson, while it is more complex for configure, as
of "contrib_regression_gin_incomplete_splits". So a AdjustUpgrade.pm
would need a set of DROP DATABASE IF EXISTS to solve this issue, to cope
with each build system.
The failure has been caused by d52c24b0f808, and the problem can happen
with upgrade dumps from v17 and v18 to HEAD. This problem is not
currently reachable in the back-branches, but it could be possible that
a future change in injection_points in stable branches invalidates this
theory, so this commit is applied down to v17 in the test modules that
matter.
Per discussion with Tom Lane and Heikki Linnakangas.
Thomas Munro [Tue, 9 Dec 2025 20:01:35 +0000 (09:01 +1300)]
Fix O_CLOEXEC flag handling in Windows port.
PostgreSQL's src/port/open.c has always set bInheritHandle = TRUE
when opening files on Windows, making all file descriptors inheritable
by child processes. This meant the O_CLOEXEC flag, added to many call
sites by commit 1da569ca1f (v16), was silently ignored.
The original commit included a comment suggesting that our open()
replacement doesn't create inheritable handles, but it was a mis-
understanding of the code path. In practice, the code was creating
inheritable handles in all cases.
This hasn't caused widespread problems because most child processes
(archive_command, COPY PROGRAM, etc.) operate on file paths passed as
arguments rather than inherited file descriptors. Even if a child
wanted to use an inherited handle, it would need to learn the numeric
handle value, which isn't passed through our IPC mechanisms.
Nonetheless, the current behavior is wrong. It violates documented
O_CLOEXEC semantics, contradicts our own code comments, and makes
PostgreSQL behave differently on Windows than on Unix. It also creates
potential issues with future code or security auditing tools.
To fix, define O_CLOEXEC to _O_NOINHERIT in master, previously used by
O_DSYNC. We use different values in the back branches to preserve
existing values. In pgwin32_open_handle() we set bInheritHandle
according to whether O_CLOEXEC is specified, for the same atomic
semantics as POSIX in multi-threaded programs that create processes.
Backpatch-through: 16
Author: Bryan Green <dbryan.green@gmail.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> (minor adjustments)
Discussion: https://postgr.es/m/e2b16375-7430-4053-bda3-5d2194ff1880%40gmail.com
Nathan Bossart [Tue, 9 Dec 2025 17:01:38 +0000 (11:01 -0600)]
doc: Fix titles of some pg_buffercache functions.
As in commit 59d6c03956, use <function> rather than <structname> in
the <title> to be consistent with how other functions in this
module are documented.
Dean Rasheed [Tue, 9 Dec 2025 10:49:16 +0000 (10:49 +0000)]
doc: Fix statement about ON CONFLICT and deferrable constraints.
The description of deferrable constraints in create_table.sgml states
that deferrable constraints cannot be used as conflict arbitrators in
an INSERT with an ON CONFLICT DO UPDATE clause, but in fact this
restriction applies to all ON CONFLICT clauses, not just those with DO
UPDATE. Fix this, and while at it, change the word "arbitrators" to
"arbiters", to match the terminology used elsewhere.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWsybvZP3ce8rGcVNx-QHuDOJZDz8y=p1SzqHwjRXyV4Q@mail.gmail.com
Backpatch-through: 14
Amit Kapila [Tue, 9 Dec 2025 07:12:37 +0000 (07:12 +0000)]
Fix LOCK_TIMEOUT handling in slotsync worker.
Previously, the slotsync worker relied on SIGINT for graceful shutdown
during promotion. However, SIGINT is also used by the LOCK_TIMEOUT handler
to cancel queries. Since the slotsync worker can lock catalog tables while
parsing libpq tuples, this overlap caused it to ignore LOCK_TIMEOUT
signals and potentially wait indefinitely on locks.
This patch replaces the slotsync worker's SIGINT handler with
StatementCancelHandler to correctly process query-cancel interrupts.
Additionally, the startup process now uses SIGUSR1 to signal the slotsync
worker to stop during promotion. The worker exits after detecting that the
shared memory flag stopSignaled is set.
Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, here it was introduced
Discussion: https://postgr.es/m/TY4PR01MB169078F33846E9568412D878C94A2A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Amit Kapila [Mon, 8 Dec 2025 05:33:14 +0000 (05:33 +0000)]
Prevent invalidation of newly created replication slots.
A race condition could cause a newly created replication slot to become
invalidated between WAL reservation and a checkpoint.
Previously, if the required WAL was removed, we retried the reservation
process. However, the slot could still be invalidated before the retry if
the WAL was not yet removed but the checkpoint advanced the redo pointer
beyond the slot's intended restart LSN and computed the minimum LSN that
needs to be preserved for the slots.
The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock
during WAL reservation to serialize WAL reservation and checkpoint's
minimum restart_lsn computation. This ensures that, if WAL reservation
occurs first, the checkpoint waits until restart_lsn is updated before
removing WAL. If the checkpoint runs first, subsequent WAL reservations
pick a position at or after the latest checkpoint's redo pointer.
We can't use the same fix for branch 17 and prior because commit 2090edc6f3 changed to compute to the minimum restart_LSN among slot's at
the beginning of checkpoint (or restart point). The fix for 17 and prior
branches is under discussion and will be committed separately.
Tom Lane [Sat, 6 Dec 2025 01:10:33 +0000 (20:10 -0500)]
Fix text substring search for non-deterministic collations.
Due to an off-by-one error, the code failed to find matches at the
end of the haystack. Fix by rewriting the loop.
While at it, fix a comment that claimed that the function could find
a zero-length match. Such a match could send a caller into an endless
loop. However, zero-length matches only make sense with an empty
search string, and that case is explicitly excluded by all callers.
To make sure it stays that way, add an Assert and a comment.
Bug: #19341 Reported-by: Adam Warland <adam.warland@infor.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19341-1d9a22915edfec58@postgresql.org
Backpatch-through: 18
Fix setting next multixid's offset at offset wraparound
In commit 789d65364c, we started updating the next multixid's offset
too when recording a multixid, so that it can always be used to
calculate the number of members. I got it wrong at offset wraparound:
we need to skip over offset 0. Fix that.