Amit Langote [Fri, 23 Jan 2026 01:21:25 +0000 (10:21 +0900)]
Fix bogus ctid requirement for dummy-root partitioned targets
ExecInitModifyTable() unconditionally required a ctid junk column even
when the target was a partitioned table. This led to spurious "could
not find junk ctid column" errors when all children were excluded and
only the dummy root result relation remained.
A partitioned table only appears in the result relations list when all
leaf partitions have been pruned, leaving the dummy root as the sole
entry. Assert this invariant (nrels == 1) and skip the ctid requirement.
Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for
partitioned tables, which is safe since no rows will be processed in
this case.
Bug: #19099 Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org
Backpatch-through: 14
Tom Lane [Thu, 22 Jan 2026 23:35:31 +0000 (18:35 -0500)]
Remove faulty Assert in partitioned INSERT...ON CONFLICT DO UPDATE.
Commit f16241bef mistakenly supposed that INSERT...ON CONFLICT DO
UPDATE rejects partitioned target tables. (This may have been
accurate when the patch was written, but it was already obsolete
when committed.) Hence, there's an assertion that we can't see
ItemPointerIndicatesMovedPartitions() in that path, but the assertion
is triggerable.
Some other places throw error if they see a moved-across-partitions
tuple, but there seems no need for that here, because if we just retry
then we get the same behavior as in the update-within-partition case,
as demonstrated by the new isolation test. So fix by deleting the
faulty Assert. (The fact that this is the fix doubtless explains
why we've heard no field complaints: the behavior of a non-assert
build is fine.)
The TM_Deleted case contains a cargo-culted copy of the same Assert,
which I also deleted to avoid confusion, although I believe that one
is actually not triggerable.
Per our code coverage report, neither the TM_Updated nor the
TM_Deleted case were reached at all by existing tests, so this
patch adds tests for both.
Reported-by: Dmitry Koval <d.koval@postgrespro.ru>
Author: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/f5fffe4b-11b2-4557-a864-3587ff9b4c36@postgrespro.ru
Backpatch-through: 14
Thomas Munro [Thu, 22 Jan 2026 02:43:13 +0000 (15:43 +1300)]
jit: Add missing inline pass for LLVM >= 17.
With LLVM >= 17, transform passes are provided as a string to
LLVMRunPasses. Only two strings were used: "default<O3>" and
"default<O0>,mem2reg".
With previous LLVM versions, an additional inline pass was added when
JIT inlining was enabled without optimization. With LLVM >= 17, the code
would go through llvm_inline, prepare the functions for inlining, but
the generated bitcode would be the same due to the missing inline pass.
This patch restores the previous behavior by adding an inline pass when
inlining is enabled but no optimization is done.
This fixes an oversight introduced by 76200e5e when support for LLVM 17
was added.
Backpatch-through: 14
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Pierre Ducroquet <p.psql@pinaraf.info> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqrNjJnbn15ctPv7o4yEAT9fWa-dK15RSyun6QNw9YDtKg%40mail.gmail.com
Álvaro Herrera [Wed, 21 Jan 2026 17:55:43 +0000 (18:55 +0100)]
amcheck: Fix snapshot usage in bt_index_parent_check
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it.
Backpatch of 6bd469d26aca to branches 14-16. I previously misidentified
the bug's origin: it came in with commit 7f563c09f890 (pg11-era, not 5ae2087202af as claimed previously), so all live branches are affected.
Also take the opportunity to fix some comments that we failed to update
in the original commits and apply pgperltidy. In branch 14, remove the
unnecessary test plan specification (which would have need to have been
changed anyway; c.f. commit 549ec201d613.)
Diagnosed-by: Donghang Lin <donghanglin@gmail.com>
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
Tom Lane [Sun, 18 Jan 2026 19:54:33 +0000 (14:54 -0500)]
Update time zone data files to tzdata release 2025c.
This is pretty pro-forma for our purposes, as the only change
is a historical correction for pre-1976 DST laws in
Baja California. (Upstream made this release mostly to update
their leap-second data, which we don't use.) But with minor
releases coming up, we should be up-to-date.
Michael Paquier [Sun, 18 Jan 2026 08:25:04 +0000 (17:25 +0900)]
Fix error message related to end TLI in backup manifest
The code adding the WAL information included in a backup manifest is
cross-checked with the contents of the timeline history file of the end
timeline. A check based on the end timeline, when it fails, reported
the value of the start timeline in the error message. This error is
fixed to show the correct timeline number in the report.
This error report would be confusing for users if seen, because it would
provide an incorrect information, so backpatch all the way down.
Amit Langote [Fri, 16 Jan 2026 04:01:52 +0000 (13:01 +0900)]
Fix segfault from releasing locks in detached DSM segments
If a FATAL error occurs while holding a lock in a DSM segment (such
as a dshash lock) and the process is not in a transaction, a
segmentation fault can occur during process exit.
The problem sequence is:
1. Process acquires a lock in a DSM segment (e.g., via dshash)
2. FATAL error occurs outside transaction context
3. proc_exit() begins, calling before_shmem_exit callbacks
4. dsm_backend_shutdown() detaches all DSM segments
5. Later, on_shmem_exit callbacks run
6. ProcKill() calls LWLockReleaseAll()
7. Segfault: the lock being released is in unmapped memory
This only manifests outside transaction contexts because
AbortTransaction() calls LWLockReleaseAll() during transaction
abort, releasing locks before DSM cleanup. Background workers and
other non-transactional code paths are vulnerable.
Fix by calling LWLockReleaseAll() unconditionally at the start of
shmem_exit(), before any callbacks run. Releasing locks before
callbacks prevents the segfault - locks must be released before
dsm_backend_shutdown() detaches their memory. This is safe because
after an error, held locks are protecting potentially inconsistent
data anyway, and callbacks can acquire fresh locks if needed.
Also add a comment noting that LWLockReleaseAll() must be safe to
call before LWLock initialization (which it is, since
num_held_lwlocks will be 0), plus an Assert for the post-condition.
This fix aligns with the original design intent from commit 001a573a2, which noted that backends must clean up shared memory
state (including releasing lwlocks) before unmapping dynamic shared
memory segments.
Fix 'unexpected data beyond EOF' on replica restart
On restart, a replica can fail with an error like 'unexpected data
beyond EOF in block 200 of relation T/D/R'. These are the steps to
reproduce it:
- A relation has a size of 400 blocks.
- Blocks 201 to 400 are empty.
- Block 200 has two rows.
- Blocks 100 to 199 are empty.
- A restartpoint is done
- Vacuum truncates the relation to 200 blocks
- A FPW deletes a row in block 200
- A checkpoint is done
- A FPW deletes the last row in block 200
- Vacuum truncates the relation to 100 blocks
- The replica restarts
When the replica restarts:
- The relation on disk starts at 100 blocks, because all the
truncations were applied before restart.
- The first truncate to 200 blocks is replayed. It silently fails, but
it will still (incorrectly!) update the cache size to 200 blocks
- The first FPW on block 200 is applied. XLogReadBufferForRead relies
on the cached size and incorrectly assumes that the page already
exists in the file, and thus won't extend the relation.
- The online checkpoint record is replayed, calling smgrdestroyall
which causes the cached size to be discarded
- The second FPW on block 200 is applied. This time, the detected size
is 100 blocks, an extend is attempted. However, the block 200 is
already present in the buffer cache due to the first FPW. This
triggers the 'unexpected data beyond EOF'.
To fix, update the cached size in SmgrRelation with the current size
rather than the requested new size, when the requested new size is
greater.
Add check for invalid offset at multixid truncation
If a multixid with zero offset is left behind after a crash, and that
multixid later becomes the oldest multixid, truncation might try to
look up its offset and read the zero value. In the worst case, we
might incorrectly use the zero offset to truncate valid SLRU segments
that are still needed. I'm not sure if that can happen in practice, or
if there are some other lower-level safeguards or incidental reasons
that prevent the caller from passing an unwritten multixid as the
oldest multi. But better safe than sorry, so let's add an explicit
check for it.
In stable branches, we should perhaps do the same check for
'oldestOffset', i.e. the offset of the old oldest multixid (in master,
'oldestOffset' is gone). But if the old oldest multixid has an invalid
offset, the damage has been done already, and we would never advance
past that point. It's not clear what we should do in that case. The
check that this commit adds will prevent such an multixid with invalid
offset from becoming the oldest multixid in the first place, which
seems enough for now.
Jacob Champion [Fri, 9 Jan 2026 18:02:27 +0000 (10:02 -0800)]
doc: Improve description of publish_via_partition_root
Reword publish_via_partition_root's opening paragraph. Describe its
behavior more clearly, and directly state that its default is false.
Per complaint by Peter Smith; final text of the patch made in
collaboration with Chao Li.
Author: Chao Li <li.evan.chao@gmail.com>
Author: Peter Smith <peter.b.smith@fujitsu.com> Reported-by: Peter Smith <peter.b.smith@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut%2BPu7SpK%2BctOYoqYR3V4w5LKc9sCs6c_qotk9uTQJQ4zp6g%40mail.gmail.com
Backpatch-through: 14
David Rowley [Thu, 8 Jan 2026 22:04:39 +0000 (11:04 +1300)]
Fix possible incorrect column reference in ERROR message
When creating a partition for a RANGE partitioned table, the reporting
of errors relating to converting the specified range values into
constant values for the partition key's type could display the name of a
previous partition key column when an earlier range was specified as
MINVALUE or MAXVALUE.
This was caused by the code not correctly incrementing the index that
tracks which partition key the foreach loop was working on after
processing MINVALUE/MAXVALUE ranges.
Fix by using foreach_current_index() to ensure the index variable is
always set to the List element being worked on.
Amit Kapila [Thu, 8 Jan 2026 06:44:28 +0000 (06:44 +0000)]
Prevent invalidation of newly created replication slots.
A race condition could cause a newly created replication slot to become
invalidated between WAL reservation and a checkpoint.
Previously, if the required WAL was removed, we retried the reservation
process. However, the slot could still be invalidated before the retry if
the WAL was not yet removed but the checkpoint advanced the redo pointer
beyond the slot's intended restart LSN and computed the minimum LSN that
needs to be preserved for the slots.
The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock
during WAL reservation, and a shared lock during the minimum LSN
calculation at checkpoints to serialize the process. This ensures that, if
WAL reservation occurs first, the checkpoint waits until restart_lsn is
updated before calculating the minimum LSN. If the checkpoint runs first,
subsequent WAL reservations pick a position at or after the latest
checkpoint's redo pointer.
We used a similar fix in HEAD (via commit 006dd4b2e5) and 18. The
difference is that in 17 and prior branches we need to additionally handle
the race condition with slot's minimum LSN computation during checkpoints.
John Naylor [Wed, 7 Jan 2026 09:02:19 +0000 (16:02 +0700)]
createuser: Update docs to reflect defaults
Commit c7eab0e97 changed the default password_encryption setting to
'scram-sha-256', so update the example for creating a user with an
assigned password.
In addition, commit 08951a7c9 added new options that in turn pass
default tokens NOBYPASSRLS and NOREPLICATION to the CREATE ROLE
command, so fix this omission as well for v16 and later.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/cff1ea60-c67d-4320-9e33-094637c2c4fb%40iki.fi
Backpatch-through: 14
David Rowley [Tue, 6 Jan 2026 04:31:20 +0000 (17:31 +1300)]
Fix issue with EVENT TRIGGERS and ALTER PUBLICATION
When processing the "publish" options of an ALTER PUBLICATION command,
we call SplitIdentifierString() to split the options into a List of
strings. Since SplitIdentifierString() modifies the delimiter
character and puts NULs in their place, this would overwrite the memory
of the AlterPublicationStmt. Later in AlterPublicationOptions(), the
modified AlterPublicationStmt is copied for event triggers, which would
result in the event trigger only seeing the first "publish" option
rather than all options that were specified in the command.
To fix this, make a copy of the string before passing to
SplitIdentifierString().
Here we also adjust a similar case in the pgoutput plugin. There's no
known issues caused by SplitIdentifierString() here, so this is being
done out of paranoia.
Thanks to Henson Choi for putting together an example case showing the
ALTER PUBLICATION issue.
David Rowley [Sun, 4 Jan 2026 07:34:45 +0000 (20:34 +1300)]
Fix selectivity estimation integer overflow in contrib/intarray
This fixes a poorly written integer comparison function which was
performing subtraction in an attempt to return a negative value when
a < b and a positive value when a > b, and 0 when the values were equal.
Unfortunately that didn't always work correctly due to two's complement
having the INT_MIN 1 further from zero than INT_MAX. This could result
in an overflow and cause the comparison function to return an incorrect
result, which would result in the binary search failing to find the
value being searched for.
This could cause poor selectivity estimates when the statistics stored
the value of INT_MAX (2147483647) and the value being searched for was
large enough to result in the binary search doing a comparison with that
INT_MAX value.
Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2ng1Ot5LoKbVU-Dh---dFTUZWJRH8wv2chBu29fnNDMaQ@mail.gmail.com
Backpatch-through: 14
Thomas Munro [Wed, 31 Dec 2025 00:24:17 +0000 (13:24 +1300)]
jit: Fix jit_profiling_support when unavailable.
jit_profiling_support=true captures profile data for Linux perf. On
other platforms, LLVMCreatePerfJITEventListener() returns NULL and the
attempt to register the listener would crash.
Fix by ignoring the setting in that case. The documentation already
says that it only has an effect if perf support is present, and we
already did the same for older LLVM versions that lacked support.
No field reports, unsurprisingly for an obscure developer-oriented
setting. Noticed in passing while working on commit 1a28b4b4.
Masahiko Sawada [Tue, 30 Dec 2025 18:56:18 +0000 (10:56 -0800)]
Fix a race condition in updating procArray->replication_slot_xmin.
Previously, ReplicationSlotsComputeRequiredXmin() computed the oldest
xmin across all slots without holding ProcArrayLock (when
already_locked is false), acquiring the lock just before updating the
replication slot xmin.
This could lead to a race condition: if a backend created a new slot
and updates the global replication slot xmin, another backend
concurrently running ReplicationSlotsComputeRequiredXmin() could
overwrite that update with an invalid or stale value. This happens
because the concurrent backend might have computed the aggregate xmin
before the new slot was accounted for, but applied the update after
the new slot had already updated the global value.
In the reported failure, a walsender for an apply worker computed
InvalidTransactionId as the oldest xmin and overwrote a valid
replication slot xmin value computed by a walsender for a tablesync
worker. Consequently, the tablesync worker computed a transaction ID
via GetOldestSafeDecodingTransactionId() effectively without
considering the replication slot xmin. This led to the error "cannot
build an initial slot snapshot as oldest safe xid %u follows
snapshot's xmin %u", which was an assertion failure prior to commit 240e0dbacd3.
To fix this, we acquire ReplicationSlotControlLock in exclusive mode
during slot creation to perform the initial update of the slot
xmin. In ReplicationSlotsComputeRequiredXmin(), we hold
ReplicationSlotControlLock in shared mode until the global slot xmin
is updated in ProcArraySetReplicationSlotXmin(). This prevents
concurrent computations and updates of the global xmin by other
backends during the initial slot xmin update process, while still
permitting concurrent calls to ReplicationSlotsComputeRequiredXmin().
Backpatch to all supported versions.
Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Pradeep Kumar <spradeepkumar29@gmail.com> Reviewed-by: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1L8wYcyTPxNzPGkhuO52WBGoOZbT0A73Le=ZUWYAYmdfw@mail.gmail.com
Backpatch-through: 14
Thomas Munro [Tue, 30 Dec 2025 01:11:37 +0000 (14:11 +1300)]
jit: Remove -Wno-deprecated-declarations in 18+.
REL_18_STABLE and master have commit ee485912, so they always use the
newer LLVM opaque pointer functions. Drop -Wno-deprecated-declarations
(commit a56e7b660) for code under jit/llvm in those branches, to catch
any new deprecation warnings that arrive in future version of LLVM.
Older branches continued to use functions marked deprecated in LLVM 14
and 15 (ie switched to the newer functions only for LLVM 16+), as a
precaution against unforeseen compatibility problems with bitcode
already shipped. In those branches, the comment about warning
suppression is updated to explain that situation better. In theory we
could suppress warnings only for LLVM 14 and 15 specifically, but that
isn't done here.
Backpatch-through: 14 Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1407185.1766682319%40sss.pgh.pa.us
Michael Paquier [Sat, 27 Dec 2025 08:23:58 +0000 (17:23 +0900)]
Fix pg_stat_get_backend_activity() to use multi-byte truncated result
pg_stat_get_backend_activity() calls pgstat_clip_activity() to ensure
that the reported query string is correctly truncated when it finishes
with an incomplete multi-byte sequence. However, the result returned by
the function was not what pgstat_clip_activity() generated, but the
non-truncated, original, contents from PgBackendStatus.st_activity_raw.
Oversight in 54b6cd589ac2, so backpatch all the way down.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2mDzwc48q2EK9tSXS6iJMJ35wvxNQnHX+rXjy5VgLvJQw@mail.gmail.com
Backpatch-through: 14
Fix bug in following update chain when locking a heap tuple
After waiting for a concurrent updater to finish, heap_lock_tuple()
followed the update chain to lock all tuple versions. However, when
stepping from the initial tuple to the next one, it failed to check
that the next tuple's XMIN matches the initial tuple's XMAX. That's an
important check whenever following an update chain, and the recursive
part that follows the chain did it, but the initial step missed it.
Without the check, if the updating transaction aborts, the updated
tuple is vacuumed away and replaced by an unrelated tuple, the
unrelated tuple might get incorrectly locked.
Michael Paquier [Tue, 23 Dec 2025 05:32:25 +0000 (14:32 +0900)]
Fix orphaned origin in shared memory after DROP SUBSCRIPTION
Since ce0fdbfe9722, a replication slot and an origin are created by each
tablesync worker, whose information is stored in both a catalog and
shared memory (once the origin is set up in the latter case). The
transaction where the origin is created is the same as the one that runs
the initial COPY, with the catalog state of the origin becoming visible
for other sessions only once the COPY transaction has committed. The
catalog state is coupled with a state in shared memory, initialized at
the same time as the origin created in the catalogs. Note that the
transaction doing the initial data sync can take a long time, time that
depends on the amount of data to transfer from a publication node to its
subscriber node.
Now, when a DROP SUBSCRIPTION is executed, all its workers are stopped
with the origins removed. The removal of each origin relies on a
catalog lookup. A worker still running the initial COPY would fail its
transaction, with the catalog state of the origin rolled back while the
shared memory state remains around. The session running the DROP
SUBSCRIPTION should be in charge of cleaning up the catalog and the
shared memory state, but as there is no data in the catalogs the shared
memory state is not removed. This issue would leave orphaned origin
data in shared memory, leading to a confusing state as it would still
show up in pg_replication_origin_status. Note that this shared memory
data is sticky, being flushed on disk in replorigin_checkpoint at
checkpoint. This prevents other origins from reusing a slot position
in the shared memory data.
To address this problem, the commit moves the creation of the origin at
the end of the transaction that precedes the one executing the initial
COPY, making the origin immediately visible in the catalogs for other
sessions, giving DROP SUBSCRIPTION a way to know about it. A different
solution would have been to clean up the shared memory state using an
abort callback within the tablesync worker. The solution of this commit
is more consistent with the apply worker that creates an origin in a
short transaction.
A test is added in the subscription test 004_sync.pl, which was able to
display the problem. The test fails when this commit is reverted.
Thomas Munro [Thu, 5 Dec 2024 23:34:33 +0000 (12:34 +1300)]
Fix printf format string warning on MinGW.
This is a back-patch of 1319997d to branches 14-17 to fix an old warning
about a printf type mismatch on MinGW, in anticipation of a potential
expansion of the scope of CI's CompilerWarnings checks. Though CI began
in 15, BF animal fairwren also shows the warning in 14, so we might as
well fix that too.
Original commit message (except for new "Backpatch-through" tag):
Commit 517bf2d91 changed a printf format string to placate MinGW, which
at the time warned about "%lld". Current MinGW is now warning about the
replacement "%I64d". Reverting the change clears the warning on the
MinGW CI task, and hopefully it will clear it on build farm animal
fairywren too.
Backpatch-through: 14-17 Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TYAPR01MB5866A71B744BE01B3BF71791F5AEA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
Fujii Masao [Fri, 19 Dec 2025 03:08:20 +0000 (12:08 +0900)]
Add guard to prevent recursive memory context logging.
Previously, if memory context logging was triggered repeatedly and
rapidly while a previous request was still being processed, it could
result in recursive calls to ProcessLogMemoryContextInterrupt().
This could lead to infinite recursion and potentially crash the process.
This commit adds a guard to prevent such recursion.
If ProcessLogMemoryContextInterrupt() is already in progress and
logging memory contexts, subsequent calls will exit immediately,
avoiding unintended recursive calls.
While this scenario is unlikely in practice, it's not impossible.
This change adds a safety check to prevent such failures.
Back-patch to v14, where memory context logging was introduced.
Operations on unlogged relations should not be WAL-logged. The
brin_initialize_empty_new_buffer() function didn't get the memo.
The function is only called when a concurrent update to a brin page
uses up space that we're just about to insert to, which makes it
pretty hard to hit. If you do manage to hit it, a full-page WAL record
is erroneously emitted for the unlogged index. If you then crash,
crash recovery will fail on that record with an error like this:
FATAL: could not create file "base/5/32819": File exists
Noah Misch [Wed, 17 Dec 2025 00:13:54 +0000 (16:13 -0800)]
Assert lack of hazardous buffer locks before possible catalog read.
Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind,
which existed in all branches for six days before detection. While the
probability of reaching the trouble was low, the disruption was extreme. No
new backends could start, and service restoration needed an immediate
shutdown. Hence, add this to catch the next bug like it.
The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit 0bada39. This also checks in a number of similar places. It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.
Back-patch to v14 - v17. This a back-patch of commit f4ece891fc2f3f96f0571720a1ae30db8030681b (from before v18 branched) to
all supported branches, to accompany the back-patch of commits 243e9b4
and 0bada39. For catalog indexes, the bttextcmp() behavior that
motivated IsCatalogTextUniqueIndexOid() was v18-specific. Hence, this
back-patch doesn't need that or its correction from commit 4a4ee0c2c1e53401924101945ac3d517c0a8a559.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
Backpatch-through: 14-17
Noah Misch [Wed, 17 Dec 2025 00:13:54 +0000 (16:13 -0800)]
WAL-log inplace update before revealing it to other sessions.
A buffer lock won't stop a reader having already checked tuple
visibility. If a vac_update_datfrozenid() and then a crash happened
during inplace update of a relfrozenxid value, datfrozenxid could
overtake relfrozenxid. That could lead to "could not access status of
transaction" errors.
Back-patch to v14 - v17. This is a back-patch of commits:
Noah Misch [Wed, 17 Dec 2025 00:13:54 +0000 (16:13 -0800)]
For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK. The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f. That is a
source of index corruption.
Back-patch to v14 - v17. This is a back-patch of commits:
This back-patch omits the non-comment heap_decode() changes. I find
those changes removed harmless code that was last necessary in v13. See
discussion thread for details. The back branches aren't the place to
remove such code.
Like the original back-patch, this doesn't change WAL, because these
branches use end-of-recovery SIResetAll(). All branches change the ABI
of extern function PrepareToInvalidateCacheTuple(). No PGXN extension
calls that, and there's no apparent use case in extensions. Expect
".abi-compliance-history" edits to follow.
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions) Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions)
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
Backpatch-through: 14-17
Jeff Davis [Tue, 16 Dec 2025 18:35:40 +0000 (10:35 -0800)]
Fix multibyte issue in ltree_strncasecmp().
Previously, the API for ltree_strncasecmp() took two inputs but only
one length (that of the smaller input). It truncated the larger input
to that length, but that could break a multibyte sequence.
Change the API to be a check for prefix equality (possibly
case-insensitive) instead, which is all that's needed by the
callers. Also, provide the lengths of both inputs.
Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5f65b85740197ba6249ea507cddf609f84a6188b.camel%40j-davis.com
Backpatch-through: 14
Robert Haas [Tue, 16 Dec 2025 15:40:53 +0000 (10:40 -0500)]
Switch memory contexts in ReinitializeParallelDSM.
We already do this in CreateParallelContext, InitializeParallelDSM, and
LaunchParallelWorkers. I suspect the reason why the matching logic was
omitted from ReinitializeParallelDSM is that I failed to realize that
any memory allocation was happening here -- but shm_mq_attach does
allocate, which could result in a shm_mq_handle being allocated in a
shorter-lived context than the ParallelContext which points to it.
That could result in a crash if the shorter-lived context is freed
before the parallel context is destroyed. As far as I am currently
aware, there is no way to reach a crash using only code that is
present in core PostgreSQL, but extensions could potentially trip
over this. Fixing this in the back-branches appears low-risk, so
back-patch to all supported versions.
Michael Paquier [Tue, 16 Dec 2025 04:29:44 +0000 (13:29 +0900)]
Fail recovery when missing redo checkpoint record without backup_label
This commit adds an extra check at the beginning of recovery to ensure
that the redo record of a checkpoint exists before attempting WAL
replay, logging a PANIC if the redo record referenced by the checkpoint
record could not be found. This is the same level of failure as when a
checkpoint record is missing. This check is added when a cluster is
started without a backup_label, after retrieving its checkpoint record.
The redo LSN used for the check is retrieved from the checkpoint record
successfully read.
In the case where a backup_label exists, the startup process already
fails if the redo record cannot be found after reading a checkpoint
record at the beginning of recovery.
Previously, the presence of the redo record was not checked. If the
redo and checkpoint records were located on different WAL segments, it
would be possible to miss a entire range of WAL records that should have
been replayed but were just ignored. The consequences of missing the
redo record depend on the version dealt with, these becoming worse the
older the version used:
- On HEAD, v18 and v17, recovery fails with a pointer dereference at the
beginning of the redo loop, as the redo record is expected but cannot be
found. These versions are good students, because we detect a failure
before doing anything, even if the failure is misleading in the shape of
a segmentation fault, giving no information that the redo record is
missing.
- In v16 and v15, problems show at the end of recovery within
FinishWalRecovery(), the startup process using a buggy LSN to decide
from where to start writing WAL. The cluster gets corrupted, still it
is noisy about it.
- v14 and older versions are worse: a cluster gets corrupted but it is
entirely silent about the matter. The redo record missing causes the
startup process to skip entirely recovery, because a missing record is
the same as not redo being required at all. This leads to data loss, as
everything is missed between the redo record and the checkpoint record.
Note that I have tested that down to 9.4, reproducing the issue with a
version of the author's reproducer slightly modified. The code is wrong
since at least 9.2, but I did not look at the exact point of origin.
This problem has been found by debugging a cluster where the WAL segment
including the redo segment was missing due to an operator error, leading
to a crash, based on an investigation in v15.
Requesting archive recovery with the creation of a recovery.signal or
a standby.signal even without a backup_label would mitigate the issue:
if the record cannot be found in pg_wal/, the missing segment can be
retrieved with a restore_command when checking that the redo record
exists. This was already the case without this commit, where recovery
would re-fetch the WAL segment that includes the redo record. The check
introduced by this commit makes the segment to be retrieved earlier to
make sure that the redo record can be found.
On HEAD, the code will be slightly changed in a follow-up commit to not
rely on a PANIC, to include a test able to emulate the original problem.
This is a minimal backpatchable fix, kept separated for clarity.
Reported-by: Andres Freund <andres@anarazel.de> Analyzed-by: Andres Freund <andres@anarazel.de>
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
Backpatch-through: 14
Clarify comment on multixid offset wraparound check
Coverity complained that offset cannot be 0 here because there's an
explicit check for "offset == 0" earlier in the function, but it
didn't see the possibility that offset could've wrapped around to 0.
The code is correct, but clarify the comment about it.
The same code exists in backbranches in the server
GetMultiXactIdMembers() function and in 'master' in the pg_upgrade
GetOldMultiXactIdSingleMember function. In backbranches Coverity
didn't complain about it because the check was merely an assertion,
but change the comment in all supported branches for consistency.
Andres Freund [Wed, 5 Oct 2022 17:43:13 +0000 (10:43 -0700)]
tests: Rename conflicting role names
These cause problems when running installcheck-world USE_MODULE_DB=1 with -j.
Note: This patch has been originally applied to v16 and newer branches
as of 6a20b04f0408. Sami Imseih has reported this issue as being
possible to reach for other branches still supported: REL_15_STABLE and
REL_14_STABLE.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20221003234111.4ob7yph6r4g4ywhu@awork3.anarazel.de
Discussion: https://postgr.es/m/CAA5RZ0soPHUvwXLZ+mf5Q=uJRL5ggnRg1uB0qKVG1bbdA9up6w@mail.gmail.com
Backpatch-through: 14
Michael Paquier [Thu, 11 Dec 2025 01:25:51 +0000 (10:25 +0900)]
Fix allocation formula in llvmjit_expr.c
An array of LLVMBasicBlockRef is allocated with the size used for an
element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef".
LLVMBasicBlockRef is a type that refers to a pointer, so this did not
directly cause a problem because both should have the same size, still
it is incorrect.
This issue has been spotted while reviewing a different patch, and
exists since 2a0faed9d702, so backpatch all the way down.
Dean Rasheed [Tue, 9 Dec 2025 10:49:19 +0000 (10:49 +0000)]
doc: Fix statement about ON CONFLICT and deferrable constraints.
The description of deferrable constraints in create_table.sgml states
that deferrable constraints cannot be used as conflict arbitrators in
an INSERT with an ON CONFLICT DO UPDATE clause, but in fact this
restriction applies to all ON CONFLICT clauses, not just those with DO
UPDATE. Fix this, and while at it, change the word "arbitrators" to
"arbiters", to match the terminology used elsewhere.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWsybvZP3ce8rGcVNx-QHuDOJZDz8y=p1SzqHwjRXyV4Q@mail.gmail.com
Backpatch-through: 14
Fix setting next multixid's offset at offset wraparound
In commit 789d65364c, we started updating the next multixid's offset
too when recording a multixid, so that it can always be used to
calculate the number of members. I got it wrong at offset wraparound:
we need to skip over offset 0. Fix that.
Set next multixid's offset when creating a new multixid
With this commit, the next multixid's offset will always be set on the
offsets page, by the time that a backend might try to read it, so we
no longer need the waiting mechanism with the condition variable. In
other words, this eliminates "corner case 2" mentioned in the
comments.
The waiting mechanism was broken in a few scenarios:
- When nextMulti was advanced without WAL-logging the next
multixid. For example, if a later multixid was already assigned and
WAL-logged before the previous one was WAL-logged, and then the
server crashed. In that case the next offset would never be set in
the offsets SLRU, and a query trying to read it would get stuck
waiting for it. Same thing could happen if pg_resetwal was used to
forcibly advance nextMulti.
- In hot standby mode, a deadlock could happen where one backend waits
for the next multixid assignment record, but WAL replay is not
advancing because of a recovery conflict with the waiting backend.
The old TAP test used carefully placed injection points to exercise
the old waiting code, but now that the waiting code is gone, much of
the old test is no longer relevant. Rewrite the test to reproduce the
IPC/MultixactCreation hang after crash recovery instead, and to verify
that previously recorded multixids stay readable.
Backpatch to all supported versions. In back-branches, we still need
to be able to read WAL that was generated before this fix, so in the
back-branches this includes a hack to initialize the next offsets page
when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a
page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the
WAL is not compatible.
Fix amcheck's handling of incomplete root splits in B-tree
When the root page is being split, it's normal that root page
according to the metapage is not marked BTP_ROOT. Fix bogus error in
amcheck about that case.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/abd65090-5336-42cc-b768-2bdd66738404@iki.fi
Backpatch-through: 14
Dean Rasheed [Sat, 29 Nov 2025 12:34:45 +0000 (12:34 +0000)]
Avoid rewriting data-modifying CTEs more than once.
Formerly, when updating an auto-updatable view, or a relation with
rules, if the original query had any data-modifying CTEs, the rewriter
would rewrite those CTEs multiple times as RewriteQuery() recursed
into the product queries. In most cases that was harmless, because
RewriteQuery() is mostly idempotent. However, if the CTE involved
updating an always-generated column, it would trigger an error because
any subsequent rewrite would appear to be attempting to assign a
non-default value to the always-generated column.
This could perhaps be fixed by attempting to make RewriteQuery() fully
idempotent, but that looks quite tricky to achieve, and would probably
be quite fragile, given that more generated-column-type features might
be added in the future.
Instead, fix by arranging for RewriteQuery() to rewrite each CTE
exactly once (by tracking the number of CTEs already rewritten as it
recurses). This has the advantage of being simpler and more efficient,
but it does make RewriteQuery() dependent on the order in which
rewriteRuleAction() joins the CTE lists from the original query and
the rule action, so care must be taken if that is ever changed.
Reported-by: Bernice Southey <bernice.southey@gmail.com>
Author: Bernice Southey <bernice.southey@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CAEDh4nyD6MSH9bROhsOsuTqGAv_QceU_GDvN9WcHLtZTCYM1kA@mail.gmail.com
Backpatch-through: 14
Tom Lane [Thu, 27 Nov 2025 18:09:59 +0000 (13:09 -0500)]
Allow indexscans on partial hash indexes with implied quals.
Normally, if a WHERE clause is implied by the predicate of a partial
index, we drop that clause from the set of quals used with the index,
since it's redundant to test it if we're scanning that index.
However, if it's a hash index (or any !amoptionalkey index), this
could result in dropping all available quals for the index's first
key, preventing us from generating an indexscan.
It's fair to question the practical usefulness of this case. Since
hash only supports equality quals, the situation could only arise
if the index's predicate is "WHERE indexkey = constant", implying
that the index contains only one hash value, which would make hash
a really poor choice of index type. However, perhaps there are
other !amoptionalkey index AMs out there with which such cases are
more plausible.
To fix, just don't filter the candidate indexquals this way if
the index is !amoptionalkey. That's a bit hokey because it may
result in testing quals we didn't need to test, but to do it
more accurately we'd have to redundantly identify which candidate
quals are actually usable with the index, something we don't know
at this early stage of planning. Doesn't seem worth the effort.
Reported-by: Sergei Glukhov <s.glukhov@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/e200bf38-6b45-446a-83fd-48617211feff@postgrespro.ru
Backpatch-through: 14
doc: Clarify passphrase command reloading on Windows
When running on Windows (or EXEC_BACKEND) the SSL configuration will
be reloaded on each backend start, so the passphrase command will be
reloaded along with it. This implies that passphrase command reload
must be enabled on Windows for connections to work at all. Document
this since it wasn't mentioned explicitly, and will there add markup
for parameter value to match the rest of the docs.
Backpatch to all supported versions.
Author: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5F301096-921A-427D-8EC1-EBAEC2A35082@yesql.se
Backpatch-through: 14
Andres Freund [Mon, 24 Nov 2025 22:37:09 +0000 (17:37 -0500)]
lwlock: Fix, currently harmless, bug in LWLockWakeup()
Accidentally the code in LWLockWakeup() checked the list of to-be-woken up
processes to see if LW_FLAG_HAS_WAITERS should be unset. That means that
HAS_WAITERS would not get unset immediately, but only during the next,
unnecessary, call to LWLockWakeup().
Luckily, as the code stands, this is just a small efficiency issue.
However, if there were (as in a patch of mine) a case in which LWLockWakeup()
would not find any backend to wake, despite the wait list not being empty,
we'd wrongly unset LW_FLAG_HAS_WAITERS, leading to potentially hanging.
While the consequences in the backbranches are limited, the code as-is
confusing, and it is possible that there are workloads where the additional
wait list lock acquisitions hurt, therefore backpatch.
David Rowley [Mon, 24 Nov 2025 04:02:24 +0000 (17:02 +1300)]
Fix incorrect IndexOptInfo header comment
The comment incorrectly indicated that indexcollations[] stored
collations for both key columns and INCLUDE columns, but in reality it
only has elements for the key columns. canreturn[] didn't get a mention,
so add that while we're here.
Thomas Munro [Sat, 22 Nov 2025 07:51:16 +0000 (20:51 +1300)]
jit: Adjust AArch64-only code for LLVM 21.
LLVM 21 changed the arguments of RTDyldObjectLinkingLayer's
constructor, breaking compilation with the backported
SectionMemoryManager from commit 9044fc1d.
Print new OldestXID value in pg_resetwal when it's being changed
Commit 74cf7d46a91d added the --oldest-transaction-id option to
pg_resetwal, but forgot to update the code that prints all the new
values that are being set. Fix that.
Tom Lane [Tue, 18 Nov 2025 17:56:55 +0000 (12:56 -0500)]
Don't allow CTEs to determine semantic levels of aggregates.
The fix for bug #19055 (commit b0cc0a71e) allowed CTE references in
sub-selects within aggregate functions to affect the semantic levels
assigned to such aggregates. It turns out this broke some related
cases, leading to assertion failures or strange planner errors such
as "unexpected outer reference in CTE query". After experimenting
with some alternative rules for assigning the semantic level in
such cases, we've come to the conclusion that changing the level
is more likely to break things than be helpful.
Therefore, this patch undoes what b0cc0a71e changed, and instead
installs logic to throw an error if there is any reference to a
CTE that's below the semantic level that standard SQL rules would
assign to the aggregate based on its contained Var and Aggref nodes.
(The SQL standard disallows sub-selects within aggregate functions,
so it can't reach the troublesome case and hence has no rule for
what to do.)
Perhaps someone will come along with a legitimate query that this
logic rejects, and if so probably the example will help us craft
a level-adjustment rule that works better than what b0cc0a71e did.
I'm not holding my breath for that though, because the previous
logic had been there for a very long time before bug #19055 without
complaints, and that bug report sure looks to have originated from
fuzzing not from real usage.
Like b0cc0a71e, back-patch to all supported branches, though
sadly that no longer includes v13.
Bug: #19106 Reported-by: Kamil Monicz <kamil@monicz.dev>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19106-9dd3668a0734cd72@postgresql.org
Backpatch-through: 14
* Remove ancient test for __hurd__, which intended to activate
PS_USE_CHANGE_ARGV in these branches but turned out not to be
defined on modern systems.
* Add new test for__GNU__ to activate PS_USE_CLOBBER_ARGV, like the
newer branches.
Author: Michael Banck <mbanck@debian.org>
Discussion: https://postgr.es/m/CA%2BhUKGJMNGUAqf27WbckYFrM-Mavy0RKJvocfJU%3DJ2XcAZyv%2Bw%40mail.gmail.com
Backpatch-through: 14-15
Nathan Bossart [Mon, 17 Nov 2025 20:14:41 +0000 (14:14 -0600)]
Update .abi-compliance-history for change to CreateStatistics().
As noted in the commit message for 5e4fcbe531, the addition of a
second parameter to CreateStatistics() breaks ABI compatibility,
but we are unaware of any impacted third-party code. This commit
updates .abi-compliance-history accordingly.
Nathan Bossart [Fri, 14 Nov 2025 19:20:09 +0000 (13:20 -0600)]
Add note about CreateStatistics()'s selective use of check_rights.
Commit 5e4fcbe531 added a check_rights parameter to this function
for use by ALTER TABLE commands that re-create statistics objects.
However, we intentionally ignore check_rights when verifying
relation ownership because this function's lookup could return a
different answer than the caller's. This commit adds a note to
this effect so that we remember it down the road.
Dean Rasheed [Thu, 13 Nov 2025 12:05:17 +0000 (12:05 +0000)]
doc: Improve description of RLS policies applied by command type.
On the CREATE POLICY page, the "Policies Applied by Command Type"
table was missing MERGE ... THEN DELETE and some of the policies
applied during INSERT ... ON CONFLICT and MERGE. Fix that, and try to
improve readability by listing the various MERGE cases separately,
rather than together with INSERT/UPDATE/DELETE. Mention COPY ... TO
along with SELECT, since it behaves in the same way. In addition,
document which policy violations cause errors to be thrown, and which
just cause rows to be silently ignored.
Also, a paragraph above the table states that INSERT ... ON CONFLICT
DO UPDATE only checks the WITH CHECK expressions of INSERT policies
for rows appended to the relation by the INSERT path, which is
incorrect -- all rows proposed for insertion are checked, regardless
of whether they end up being inserted. Fix that, and also mention that
the same applies to INSERT ... ON CONFLICT DO NOTHING.
In addition, in various other places on that page, clarify how the
different types of policy are applied to different commands, and
whether or not errors are thrown when policy checks do not pass.
Backpatch to all supported versions. Prior to v17, MERGE did not
support RETURNING, and so MERGE ... THEN INSERT would never check new
rows against SELECT policies. Prior to v15, MERGE was not supported at
all.
Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Viktor Holmberg <v@viktorh.net> Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWqnfeChjK=n1V_dYZT4rt4mnq+ybf9c0qXDYTVMsy8pg@mail.gmail.com
Backpatch-through: 14
Clear 'xid' in dummy async notify entries written to fill up pages
Before we started to freeze async notify entries (commit 8eeb4a0f7c),
no one looked at the 'xid' on an entry with invalid 'dboid'. But now
we might actually need to freeze it later. Initialize them with
InvalidTransactionId to begin with, to avoid that work later.
Álvaro pointed this out in review of commit 8eeb4a0f7c, but I forgot
to include this change there.
Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY
Previous commit fixed a bug where VACUUM would truncate the CLOG
that's still needed to check the commit status of XIDs in the async
notify queue, but as mentioned in the commit message, it wasn't a full
fix. If a backend is executing asyncQueueReadAllNotifications() and
has just made a local copy of an async SLRU page which contains old
XIDs, vacuum can concurrently truncate the CLOG covering those XIDs,
and the backend still gets an error when it calls
TransactionIdDidCommit() on those XIDs in the local copy. This commit
fixes that race condition.
To fix, hold the SLRU bank lock across the TransactionIdDidCommit()
calls in NOTIFY processing.
Per Tom Lane's idea. Backpatch to all supported versions.
Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY
The async notification queue contains the XID of the sender, and when
processing notifications we call TransactionIdDidCommit() on the
XID. But we had no safeguards to prevent the CLOG segments containing
those XIDs from being truncated away. As a result, if a backend didn't
for some reason process its notifications for a long time, or when a
new backend issued LISTEN, you could get an error like:
test=# listen c21;
ERROR: 58P01: could not access status of transaction 14279685
DETAIL: Could not open file "pg_xact/000D": No such file or directory.
LOCATION: SlruReportIOError, slru.c:1087
To fix, make VACUUM "freeze" the XIDs in the async notification queue
before truncating the CLOG. Old XIDs are replaced with
FrozenTransactionId or InvalidTransactionId.
Note: This commit is not a full fix. A race condition remains, where a
backend is executing asyncQueueReadAllNotifications() and has just
made a local copy of an async SLRU page which contains old XIDs, while
vacuum concurrently truncates the CLOG covering those XIDs. When the
backend then calls TransactionIdDidCommit() on those XIDs from the
local copy, you still get the error. The next commit will fix that
remaining race condition.
This was first reported by Sergey Zhuravlev in 2021, with many other
people hitting the same issue later. Thanks to:
- Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink
for investigating and providing reproducable test cases,
- Matheus Alcantara and Arseniy Mukhin for review and earlier proposed
patches to fix this,
- Álvaro Herrera and Masahiko Sawada for reviews,
- Yura Sokolov aka funny-falcon for the idea of marking transactions
as committed in the notification queue, and
- Joel Jacobson for the final patch version. I hope I didn't forget
anyone.
Backpatch to all supported versions. I believe the bug goes back all
the way to commit d1e027221d, which introduced the SLRU-based async
notification queue.
Escalate ERRORs during async notify processing to FATAL
Previously, if async notify processing encountered an error, we would
report the error to the client and advance our read position past the
offending entry to prevent trying to process it over and over
again. Trying to continue after an error has a few problems however:
- We have no way of telling the client that a notification was
lost. They get an ERROR, but that doesn't tell you much. As such,
it's not clear if keeping the connection alive after losing a
notification is a good thing. Depending on the application logic,
missing a notification could cause the application to get stuck
waiting, for example.
- If the connection is idle, PqCommReadingMsg is set and any ERROR is
turned into FATAL anyway.
- We bailed out of the notification processing loop on first error
without processing any subsequent notifications. The subsequent
notifications would not be processed until another notify interrupt
arrives. For example, if there were two notifications pending, and
processing the first one caused an ERROR, the second notification
would not be processed until someone sent a new NOTIFY.
This commit changes the behavior so that any ERROR while processing
async notifications is turned into FATAL, causing the client
connection to be terminated. That makes the behavior more consistent
as that's what happened in idle state already, and terminating the
connection is a clear signal to the application that it might've
missed some notifications.
The reason to do this now is that the next commits will change the
notification processing code in a way that would make it harder to
skip over just the offending notification entry on error.
doc: Document effects of ownership change on privileges
Explicitly document that privileges are transferred along with the
ownership. Backpatch to all supported versions since this behavior
has always been present.
Author: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Josef Šimánek <josef.simanek@gmail.com> Reported-by: Gilles Parc <gparc@free.fr>
Discussion: https://postgr.es/m/2023185982.281851219.1646733038464.JavaMail.root@zimbra15-e2.priv.proxad.net
Backpatch-through: 14
The range for commit_siblings was incorrectly listed as starting on 1
instead of 0 in the sample configuration file. Backpatch down to all
supported branches.
Author: Man Zeng <zengman@halodbtech.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/tencent_53B70BA72303AE9C6889E78E@qq.com
Backpatch-through: 14
Nathan Bossart [Mon, 10 Nov 2025 15:00:00 +0000 (09:00 -0600)]
Check for CREATE privilege on the schema in CREATE STATISTICS.
This omission allowed table owners to create statistics in any
schema, potentially leading to unexpected naming conflicts. For
ALTER TABLE commands that require re-creating statistics objects,
skip this check in case the user has since lost CREATE on the
schema. The addition of a second parameter to CreateStatistics()
breaks ABI compatibility, but we are unaware of any impacted
third-party code.
Jacob Champion [Mon, 10 Nov 2025 14:03:05 +0000 (06:03 -0800)]
libpq: Prevent some overflows of int/size_t
Several functions could overflow their size calculations, when presented
with very large inputs from remote and/or untrusted locations, and then
allocate buffers that were too small to hold the intended contents.
Switch from int to size_t where appropriate, and check for overflow
conditions when the inputs could have plausibly originated outside of
the libpq trust boundary. (Overflows from within the trust boundary are
still possible, but these will be fixed separately.) A version of
add_size() is ported from the backend to assist with code that performs
more complicated concatenation.
Thomas Munro [Fri, 7 Nov 2025 23:25:45 +0000 (12:25 +1300)]
Fix generic read and write barriers for Clang.
generic-gcc.h maps our read and write barriers to C11 acquire and
release fences using compiler builtins, for platforms where we don't
have our own hand-rolled assembler. This is apparently enough for GCC,
but the C11 memory model is only defined in terms of atomic accesses,
and our barriers for non-atomic, non-volatile accesses were not always
respected under Clang's stricter interpretation of the standard.
This explains the occasional breakage observed on new RISC-V + Clang
animal greenfly in lock-free PgAioHandle manipulation code containing a
repeating pattern of loads and read barriers. The problem can also be
observed in code generated for MIPS and LoongAarch, though we aren't
currently testing those with Clang, and on x86, though we use our own
assembler there. The scariest aspect is that we use the generic version
on very common ARM systems, but it doesn't seem to reorder the relevant
code there (or we'd have debugged this long ago).
Fix by inserting an explicit compiler barrier. It expands to an empty
assembler block declared to have memory side-effects, so registers are
flushed and reordering is prevented. In those respects this is like the
architecture-specific assembler versions, but the compiler is still in
charge of generating the appropriate fence instruction. Done for write
barriers on principle, though concrete problems have only been observed
with read barriers.
Reported-by: Alexander Lakhin <exclusion@gmail.com> Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/d79691be-22bd-457d-9d90-18033b78c40a%40gmail.com
Backpatch-through: 13
Fujii Masao [Fri, 7 Nov 2025 05:58:53 +0000 (14:58 +0900)]
doc: Fix descriptions of some PGC_POSTMASTER parameters.
The following parameters can only be set at server start because
their context is PGC_POSTMASTER, but this information was missing
or incorrectly documented. This commit adds or corrects
that information for the following parameters:
Álvaro Herrera [Thu, 6 Nov 2025 18:08:29 +0000 (19:08 +0100)]
Introduce XLogRecPtrIsValid()
XLogRecPtrIsInvalid() is inconsistent with the affirmative form of
macros used for other datatypes, and leads to awkward double negatives
in a few places. This commit introduces XLogRecPtrIsValid(), which
allows code to be written more naturally.
This patch only adds the new macro. XLogRecPtrIsInvalid() is left in
place, and all existing callers remain untouched. This means all
supported branches can accept hypothetical bug fixes that use the new
macro, and at the same time any code that compiled with the original
formulation will continue to silently compile just fine.
Etsuro Fujita [Thu, 6 Nov 2025 03:25:04 +0000 (12:25 +0900)]
Update obsolete comment in ExecScanReScan().
Commit 27cc7cd2b removed the epqScanDone flag from the EState struct,
and instead added an equivalent flag named relsubs_done to the EPQState
struct; but it failed to update this comment.
Etsuro Fujita [Thu, 6 Nov 2025 03:15:05 +0000 (12:15 +0900)]
postgres_fdw: Add more test coverage for EvalPlanQual testing.
postgres_fdw supports EvalPlanQual testing by using the infrastructure
provided by the core with the RecheckForeignScan callback routine (cf.
commits 5fc4c26db and 385f337c9), but there has been no test coverage
for that, except that recent commit 12609fbac, which fixed an issue in
commit 385f337c9, added a test case to exercise only a code path added
by that commit to the core infrastructure. So let's add test cases to
exercise other code paths as well at this time.
Like commit 12609fbac, back-patch to all supported branches.
Michael Paquier [Wed, 5 Nov 2025 07:48:29 +0000 (16:48 +0900)]
Fix timing-dependent failure in recovery test 004_timeline_switch
The test introduced by 17b2d5ec759c verifies that a WAL receiver
survives across a timeline jump by searching the server logs for
termination messages. However, it called restart() before the timeline
switch, which kills the WAL receiver and may log the exact message being
checked, hence failing the test. As TAP tests reuse the same log file
across restarts, a rotate_logfile() is used before the restart so as the
log matching check is not impacted by log entries generated by a
previous shutdown.
Recent changes to file handle inheritance altered I/O timing enough to
make this fail consistently while testing another patch.
While on it, this adds an extra check based on a PID comparison. This
test may lead to false positives as it could be possible that the WAL
receiver has processed a timeline jump before the initial PID is
grabbed, but it should be good enough in most cases.
Andres Freund [Tue, 4 Nov 2025 23:36:18 +0000 (18:36 -0500)]
jit: Fix accidentally-harmless type confusion
In 2a0faed9d702, which added JIT compilation support for expressions, I
accidentally used sizeof(LLVMBasicBlockRef *) instead of
sizeof(LLVMBasicBlockRef) as part of computing the size of an allocation. That
turns out to have no real negative consequences due to LLVMBasicBlockRef being
a pointer itself (and thus having the same size). It still is wrong and
confusing, so fix it.
Álvaro Herrera [Tue, 4 Nov 2025 19:31:43 +0000 (20:31 +0100)]
Fix snapshot handling bug in recent BRIN fix
Commit a95e3d84c0e0 added ActiveSnapshot push+pop when processing
work-items (BRIN autosummarization), but forgot to handle the case of
a transaction failing during the run, which drops the snapshot untimely.
Fix by making the pop conditional on an element being actually there.
Andres Freund [Tue, 4 Nov 2025 18:24:58 +0000 (13:24 -0500)]
Backpatch: Fix warnings about declaration of environ on MinGW
Backpatch commit 7bc9a8bdd2d to 13-17. The motivation for backpatching is that
we want to update CI to Debian Trixie. Trixie contains a newer mingw
installation, which would trigger the warning addressed by 7bc9a8bdd2d. The
risk of backpatching seems fairly low, given that it did not cause issues in
the branches the commit is already present.
While CI is not present in 13-14, it seems better to be consistent across
branches.
Author: Thomas Munro <tmunro@postgresql.org>
Discussion: https://postgr.es/m/o5yadhhmyjo53svzwvaocww6zkrp63i4f32cw3treuh46pxtza@hyqio5b2tkt6
Backpatch-through: 13
Tighten check for generated column in partition key expression
A generated column may end up being part of the partition key
expression, if it's specified as an expression e.g. "(<generated
column name>)" or if the partition key expression contains a whole-row
reference, even though we do not allow a generated column to be part
of partition key expression. Fix this hole.
Álvaro Herrera [Tue, 4 Nov 2025 12:23:26 +0000 (13:23 +0100)]
BRIN autosummarization may need a snapshot
It's possible to define BRIN indexes on functions that require a
snapshot to run, but the autosummarization feature introduced by commit 7526e10224f0 fails to provide one. This causes autovacuum to leave a
BRIN placeholder tuple behind after a failed work-item execution, making
such indexes less efficient. Repair by obtaining a snapshot prior to
running the task, and add a test to verify this behavior.
Author: Álvaro Herrera <alvherre@kurilemu.de> Reported-by: Giovanni Fabris <giovanni.fabris@icon.it> Reported-by: Arthur Nascimento <tureba@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/202511031106.h4fwyuyui6fz@alvherre.pgsql
Michael Paquier [Tue, 4 Nov 2025 01:52:42 +0000 (10:52 +0900)]
Fix unconditional WAL receiver shutdown during stream-archive transition
Commit b4f584f9d2a1 (affecting v15~, later backpatched down to 13 as of 3635a0a35aaf) introduced an unconditional WAL receiver shutdown when
switching from streaming to archive WAL sources. This causes problems
during a timeline switch, when a WAL receiver enters WALRCV_WAITING
state but remains alive, waiting for instructions.
The unconditional shutdown can break some monitoring scenarios as the
WAL receiver gets repeatedly terminated and re-spawned, causing
pg_stat_wal_receiver.status to show a "streaming" instead of "waiting"
status, masking the fact that the WAL receiver is waiting for a new TLI
and a new LSN to be able to continue streaming.
This commit changes the WAL receiver behavior so as the shutdown becomes
conditional, with InstallXLogFileSegmentActive being always reset to
prevent the regression fixed by b4f584f9d2a1: only terminate the WAL
receiver when it is actively streaming (WALRCV_STREAMING,
WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state,
just reset InstallXLogFileSegmentActive flag to allow archive
restoration without killing the process. WALRCV_STOPPED and
WALRCV_STOPPING are not reachable states in this code path. For the
latter, the startup process is the one in charge of setting
WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to
reach a WALRCV_STOPPED state after switching walRcvState, so
WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is
in a WALRCV_STOPPING state.
A regression test is added to check that a WAL receiver is not stopped
on timeline jump, that fails when the fix of this commit is reverted.
Reported-by: Ryan Bird <ryanzxg@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Backpatch-through: 13
Tom Lane [Sun, 2 Nov 2025 17:30:44 +0000 (12:30 -0500)]
Avoid mixing void and integer in a conditional expression.
The C standard says that the second and third arguments of a
conditional operator shall be both void type or both not-void
type. The Windows version of INTERRUPTS_PENDING_CONDITION()
got this wrong. It's pretty harmless because the result of
the operator is ignored anyway, but apparently recent versions
of MSVC have started issuing a warning about it. Silence the
warning by casting the dummy zero to void.
Reported-by: Christian Ullrich <chris@chrullrich.net>
Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/cc4ef8db-f8dc-4347-8a22-e7ebf44c0308@chrullrich.net
Backpatch-through: 13
Bruce Momjian [Thu, 30 Oct 2025 23:11:52 +0000 (19:11 -0400)]
doc: rewrite random_page_cost description
This removes some of the specifics of how the default was set, and adds
a mention of latency as a reason the value is lower than the storage
hardware might suggest. It still mentions caching.
David Rowley [Thu, 30 Oct 2025 01:50:46 +0000 (14:50 +1300)]
Fix bogus use of "long" in AllocSetCheck()
Because long is 32-bit on 64-bit Windows, it isn't a good datatype to
store the difference between 2 pointers. The under-sized type could
overflow and lead to scary warnings in MEMORY_CONTEXT_CHECKING builds,
such as:
WARNING: problem in alloc set ExecutorState: bad single-chunk %p in block %p
However, the problem lies only in the code running the check, not from
an actual memory accounting bug.
Fix by using "Size" instead of "long". This means using an unsigned
type rather than the previous signed type. If the block's freeptr was
corrupted, we'd still catch that if the unsigned type wrapped. Unsigned
allows us to avoid further needless complexities around comparing signed
and unsigned types.
Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvo-RmiT4s33J=aC9C_-wPZjOXQ232V-EZFgKftSsNRi4w@mail.gmail.com
The project Git server hasn't supported cloning with the Git protocol
in a very long time, but the documentation never got the memo. Remove
the mention of using the Git protocol, and while there wrap a mention
of Git in <productname> tags.
Backpatch down to all supported versions.
Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABwTF4WMiMb-KT2NRcib5W0C8TQF6URMb+HK9a_=rnZnY8Q42w@mail.gmail.com
Backpatch-through: 13
Tom Lane [Thu, 23 Oct 2025 16:32:06 +0000 (12:32 -0400)]
Fix off-by-one Asserts in FreePageBtreeInsertInternal/Leaf.
These two functions expect there to be room to insert another item
in the FreePageBtree's array, but their assertions were too weak
to guarantee that. This has little practical effect granting that
the callers are not buggy, but it seems to be misleading late-model
Coverity into complaining about possible array overrun.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/799984.1761150474@sss.pgh.pa.us
Backpatch-through: 13
Tom Lane [Thu, 23 Oct 2025 15:47:46 +0000 (11:47 -0400)]
Fix resource leaks in PL/Python error reporting, redux.
Commit c6f7f11d8 intended to prevent leaking any PyObject reference
counts in edge cases (such as out-of-memory during string
construction), but actually it introduced a leak in the normal case.
Repeating an error-trapping operation often enough would lead to
session-lifespan memory bloat. The problem is that I failed to
think about the fact that PyObject_GetAttrString() increments the
refcount of the returned PyObject, so that simply walking down the
list of error frame objects causes all but the first one to have
their refcount incremented.
I experimented with several more-or-less-complex ways around that,
and eventually concluded that the right fix is simply to drop the
newly-obtained refcount as soon as we walk to the next frame
object in PLy_traceback. This sounds unsafe, but it's perfectly
okay because the caller holds a refcount on the first frame object
and each frame object holds a refcount on the next one; so the
current frame object can't disappear underneath us.
By the same token, we can simplify the caller's cleanup back to
simply dropping its refcount on the first object. Cleanup of
each frame object will lead in turn to the refcount of the next
one going to zero.
I also added a couple of comments explaining why PLy_elog_impl()
doesn't try to free the strings acquired from PLy_get_spi_error_data()
or PLy_get_error_data(). That's because I got here by looking at a
Coverity complaint about how those strings might get leaked. They
are not leaked, but in testing that I discovered this other leak.
Back-patch, as c6f7f11d8 was. It's a bit nervous-making to be
putting such a fix into v13, which is only a couple weeks from its
final release; but I can't see that leaving a recently-introduced
leak in place is a better idea.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1203918.1761184159@sss.pgh.pa.us
Backpatch-through: 13
Fujii Masao [Thu, 23 Oct 2025 04:24:56 +0000 (13:24 +0900)]
Add comments explaining overflow entries in the replication lag tracker.
Commit 883a95646a8 introduced overflow entries in the replication lag tracker
to fix an issue where lag columns in pg_stat_replication could stall when
the replay LSN stopped advancing.
This commit adds comments clarifying the purpose and behavior of overflow
entries to improve code readability and understanding.
Since commit 883a95646a8 was recently applied and backpatched to all
supported branches, this follow-up commit is also backpatched accordingly.
David Rowley [Thu, 23 Oct 2025 00:14:08 +0000 (13:14 +1300)]
Fix incorrect zero extension of Datum in JIT tuple deform code
When JIT deformed tuples (controlled via the jit_tuple_deforming GUC),
types narrower than sizeof(Datum) would be zero-extended up to Datum
width. This wasn't the same as what fetch_att() does in the standard
tuple deforming code. Logically the values are the same when fetching
via the DatumGet*() marcos, but negative numbers are not the same in
binary form.
In the report, the problem was manifesting itself with:
ERROR: could not find memoization table entry
in a query which had a "Cache Mode: binary" Memoize node. However, it's
currently unclear what else is affected. Anything that uses
datum_image_eq() or datum_image_hash() on a Datum from a tuple deformed by
JIT could be affected, but it may not be limited to that.
The fix for this is simple: use signed extension instead of zero
extension.
Many thanks to Emmanuel Touzery for reporting this issue and providing
steps and backup which allowed the problem to easily be recreated.
Fujii Masao [Wed, 22 Oct 2025 11:13:15 +0000 (20:13 +0900)]
Make invalid primary_slot_name follow standard GUC error reporting.
Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().
This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.
Fujii Masao [Wed, 22 Oct 2025 02:27:15 +0000 (11:27 +0900)]
Fix stalled lag columns in pg_stat_replication when replay LSN stops advancing.
Previously, when the replay LSN reported in feedback messages from a standby
stopped advancing, for example, due to a recovery conflict, the write_lag and
flush_lag columns in pg_stat_replication would initially update but then stop
progressing. This prevented users from correctly monitoring replication lag.
The problem occurred because when any LSN stopped updating, the lag tracker's
cyclic buffer became full (the write head reached the slowest read head).
In that state, the lag tracker could no longer compute round-trip lag values
correctly.
This commit fixes the issue by handling the slowest read entry (the one
causing the buffer to fill) as a separate overflow entry and freeing space
so the write and other read heads can continue advancing in the buffer.
As a result, write_lag and flush_lag now continue updating even if the reported
replay LSN remains stalled.
Nathan Bossart [Tue, 21 Oct 2025 21:37:29 +0000 (16:37 -0500)]
Add .abi-compliance-history to back-branches.
This file was previously added to v18 by commits a72f7d97be and 93fb76ca4e. Unlike the v18 version of the file, the back-branch
versions set the original baseline point to the most recent ABI
break documented in the git commit history. While we'd ordinarily
set it to something just before the .0 release, we're unlikely to
act upon ABI breaks in released minor versions, so it doesn't seem
worth the trouble to construct a comprehensive history.
David Rowley [Tue, 21 Oct 2025 07:48:34 +0000 (20:48 +1300)]
Fix BRIN 32-bit counter wrap issue with huge tables
A BlockNumber (32-bit) might not be large enough to add bo_pagesPerRange
to when the table contains close to 2^32 pages. At worst, this could
result in a cancellable infinite loop during the BRIN index scan with
power-of-2 pagesPerRange, and slow (inefficient) BRIN index scans and
scanning of unneeded heap blocks for non power-of-2 pagesPerRange.
Backpatch to all supported versions.
Author: sunil s <sunilfeb26@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOG6S4-tGksTQhVzJM19NzLYAHusXsK2HmADPZzGQcfZABsvpA@mail.gmail.com
Backpatch-through: 13
Michael Paquier [Mon, 20 Oct 2025 23:08:42 +0000 (08:08 +0900)]
Fix POSIX compliance in pgwin32_unsetenv() for "name" argument
pgwin32_unsetenv() (compatibility routine of unsetenv() on Windows)
lacks the input validation that its sibling pgwin32_setenv() has.
Without these checks, calling unsetenv() with incorrect names crashes on
WIN32. However, invalid names should be handled, failing on EINVAL.
This commit adds the same checks as setenv() to fail with EINVAL for a
"name" set to NULL, an empty string, or if '=' is included in the value,
per POSIX requirements.
Like 7ca37fb0406b, backpatch down to v14. pgwin32_unsetenv() is defined
on REL_13_STABLE, but with the branch going EOL soon and the lack of
setenv() there for WIN32, nothing is done for v13.
Author: Bryan Green <dbryan.green@gmail.com>
Discussion: https://postgr.es/m/b6a1e52b-d808-4df7-87f7-2ff48d15003e@gmail.com
Backpatch-through: 14
Tom Lane [Sun, 19 Oct 2025 18:36:58 +0000 (14:36 -0400)]
Don't rely on zlib's gzgetc() macro.
It emerges that zlib's configuration logic is not robust enough
to guarantee that the macro will have the same ideas about struct
field layout as the library itself does, leading to corruption of
zlib's state struct followed by unintelligible failure messages.
This hazard has existed for a long time, but we'd not noticed
for several reasons:
(1) We only use gzgetc() when trying to read a manually-compressed
TOC file within a directory-format dump, which is a rarely-used
scenario that we weren't even testing before 20ec99589.
(2) No corruption actually occurs unless sizeof(long) is different
from sizeof(off_t) and the platform is big-endian.
(3) Some platforms have already fixed the configuration instability,
at least sufficiently for their environments.
Despite (3), it seems foolish to assume that the problem isn't
going to be present in some environments for a long time to come.
Hence, avoid relying on this macro. We can just #undef it and
fall back on the underlying function of the same name.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2122679.1760846783@sss.pgh.pa.us
Backpatch-through: 13
Álvaro Herrera [Sat, 18 Oct 2025 15:50:10 +0000 (17:50 +0200)]
Fix pg_dump sorting of foreign key constraints
Apparently, commit 04bc2c42f765 failed to notice that DO_FK_CONSTRAINT
objects require identical handling as DO_CONSTRAINT ones, which causes
some pg_upgrade tests in debug builds to fail spuriously. Add that.