git.ipfire.org Git - thirdparty/postgresql.git/log

Use term "referenced" rather than "dependent" in dependency locking

Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/20260528.114608.488039299811669368.horikyota.ntt@gmail.com
Backpatch-through: 14

Make stack depth check work with asan's use-after-return

With address sanitizer's stack-use-after-return check, stack variables are
moved to heap allocations, to allow to detect references to the memory at a
later time. That broke our stack-depth check, which is why we had to disable
detect_stack_use_after_return in CI. Luckily __builtin_frame_address() works
correctly, even under asan, so use that.

We started using __builtin_frame_address() with de447bb8e6fb, however as of
that commit we just used it for the stack base address, not for the value to
compare to the base address. Now we use it for both.

When building without __builtin_frame_address() support, we continue to use
stack variables for the stack depth determination.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2kk4z4odvuyrg7qlwjd7ft4eron4cle4btb33v4qatgsdkayir@gj6e62rgsel4
Backpatch-through: 14

Fix race between ProcSignalInit() and EmitProcSignalBarrier().

Previously, ProcSignalInit() read the global barrier generation before
publishing its PID into pss_pid. This created a race condition: a
process could initialize its local generation with an older global
value, while a concurrent EmitProcSignalBarrier() might skip that
process because its pss_pid was still zero. This resulted in
WaitForProcSignalBarrier() hanging indefinitely.

Fix this by publishing pss_pid before reading psh_barrierGeneration
with a memory barrier so that the store to pss_pid is ordered before
the load. A concurrent EmitProcSignalBarrier() then either observes
the published PID and signals this slot, or completes its generation
increment before we load it.

While this race has become more visible due to recent features using
signal barriers in more places (such as online wal_level changes), the
issue is theoretically present since signal barriers were introduced
to release smgr caches (e.g., in DROP DATABASE). v14 has the
procsiangl barrier infrastricutre but no in-tree caller that actually
emits a barrier, so the case is unreachable there.

This issue was also reported by buildfarm member flaviventris.

Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2WgAJmWReDN7Chtba8Er2YBvKCoa0KVN25-1evnTrHsLyA@mail.gmail.com
Backpatch-through: 15

Avoid orphaned objects dependencies

Concurrent DDL can leave behind objects referencing other objects that
no longer exist. This can happen if an object is dropped, while a new
object that depends on it is created concurrently. For example:

session 1: BEGIN; CREATE FUNCTION myschema.myfunc() ...;
session 2: DROP SCHEMA myschema;
session 1: COMMIT;

DROP SCHEMA does check that there are no objects dependending on the
schema being dropped, but it does not see objects being concurrently
created by other sessions. Even if it did, this scenario would still
fail:

session 1: BEGIN: DROP SCHEMA myschema;
session 2: CREATE FUNCTION myschema.myfunc() ...;
session 1: COMMIT;

When the DROP SCHEMA runs, the schema was empty, but the new function
is created in it before the dropping transaction completes. The CREATE
FUNCTION does not see that the schema is concurrently being dropped.

In both of these scenarios, the function is left behind in the schema
that no longer exists.

To fix, acquire AccessShareLock on all referenced objects when
recording dependencies. This conflicts with the AccessExclusiveLock
taken by DROP, preventing the race. After acquiring the lock, verify
that the object still exists, and if it was dropped concurrently,
report an error. We already had such a mechanism for shared
dependencies, but for some reason we didn't do it for in-database
dependendies.

Ideally the locks would be acquired much earlier when creating a new
object, but that will require modifying a lot of callers. This check
while recording the dependency is a nice wholesale protection, and
even if we change all the CREATE commands to acquire locks earlier,
it's still good to have this as a backstop to catch any cases where we
forgot to do so.

The patch adds a few tests for some cases that left behind orphaned
objects before this. It also adds a test for roles, which already had
such protection, although that test is partially disabled because the
error message includes an OID which is not predictable.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 14

Don't try to record dependency on a dropped column's datatype

When creating a relation with a dropped column, we called
recordDependencyOn() also on the datatype of the dropped column, which
is always InvalidOid. In versions 15 and above, that was harmless
because recordDependencyOn() considers InvalidOid as a pinned object,
and skips over it. On version 14, isPinnedObject() does not consider
InvalidOid as pinned, so we created a bogus pg_depend entry with
refobjectid == 0.

As far as I can tell, the only case when AddNewAttributeTuples() is
called with dropped columns is when performing a table-rewriting ALTER
TABLE command. That temporarily creates a new relation with the same
columns, including dropped ones, then swaps the relations, and drops
the newly created table again. So even on version 14, the bogus
pg_depend entry was only on the transient relation that was dropped at
the end of the ALTER TABLE command, which was harmless.

Even though this is harmless, let's be tidy, similar to commit
713bce9484. The reason I noticed this now and why I backported this,
is because the next commit will add code to acquire locks on the
referenced objects, and we don't want to acquire a lock on InvalidOid.

Discussion: https://postgr.es/m/ZiYjn0eVc7pxVY45@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 14

Fix self-deadlock when replaying WAL generated by older minor version

Commit 77dff5d937 introduced a SimpleLruWriteAll() call when replaying
multixact WAL records generated by older minor versions. However,
SimpleLruWriteAll() acquires the SLRU lock and on v16 and below, it's
called while already holding the lock, leading to self-deadlock.
Version 17 and 18 did not have that problem, because in those versions
the lock is acquired later in the function.

To fix, acquire MultiXactOffsetSLRULock later in RecordNewMultiXact(),
at the same place where it's acquired on version 17 and 18.

Author: Andrey Borodin <x4mmm@yandex-team.ru>
Reported-by: Radim Marek <radim@boringsql.com>
Discussion: https://www.postgresql.org/message-id/19490-9c59c6a583513b99@postgresql.org
Backpatch-through: 14-16

Fix procLatch ownership race in ProcKill()

DisownLatch() was executed after the PGPROC entry of the process
terminated is pushed back into a freelist. A newly-forked backend that
recycles the slot could call OwnLatch() and PANIC with a "latch already
owned by PID", taking down the server.

There were two scenarios related to lock groups where this issue could
be reached:
* A follower pushes the leader's PGPROC back to the freelist while the
leader has not yet called DisownLatch() in its own ProcKill().
* A leader outliving all its followers pushes its own PGPROC onto the
freelist before reaching DisownLatch(), which would be the most common
scenario.

This issue is fixed by calling SwitchBackToLocalLatch() and
DisownLatch() at an earlier phase of ProcKill(), before any freelist
manipulation happens, so that the slot of the backend terminated is
never exposed as owning a latch.

Note that pgstat_reset_wait_event_storage() is kept at a later stage.
An upcoming commit will take advantage of that by introducing a test
able to check the original PANIC scenario.

Author: Vlad Lesin <vladlesin@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 14

Fix race conditions in ProcKill()'s lock-group freelist handling

This commit fixes two bugs in ProcKill()'s lock-group teardown freelist
publication:
* a double push of the leader's PGPROC that corrupts the freelist.
* a leak of the last follower's PGPROC slot.

ProcKill()'s lock-group teardown had two PGPROC freelist updates
scattered through the function, done under two separate freeProcsLock
acquisitions:
* A follower's push of the leader's PGPROC, done when a follower is the
last group member exiting.
* Every backend's self-push at the bottom of the function.

The two freelist updates were coordinated only by inspecting
proc->lockGroupLeader, which a follower could clear as a side effect of
pushing the leader. This coordination was broken. For example, with
two concurrent backends:
* The follower clears leader->lockGroupLeader and pushes the leader's
PGPROC under leader_lwlock.
* The follower does not clear its own proc->lockGroupLeader, being
skipped.
* When the leader reaches the bottom of ProcKill(), it sees a NULL
proc->lockGroupLeader (the follower cleared it) and pushes itself,
causing a second dlist_push_tail() of the same node onto the same
freelist.
* The follower at the bottom sees its own proc->lockGroupLeader being
not NULL (never cleared) and skips its own push, causing its own slot
to leak.

This commit refactors the freelist manipulation to be done in two
distinct phases, each step using its own lock acquisition to ensure that
each freelist operation happens in an isolated manner for each backend
(follower or leader):
- First, under a single leader_lwlock acquisition, check the state of
the lock-group. Depending on if we are dealing with a follower and/or a
leader, and if the leader has exited before a follower, then set some
state booleans that define which actions should be taken with the
freelist.
- Second, under a single freeProcsLock acquisition, perform the cleanup
actions, self-push of a backend and/or push of the leader back to the
freelist.

This is an old issue, dating back to 9.6 where parallel workers and lock
grouping has been added.

Author: Vlad Lesin <vladlesin@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Backpatch-through: 14

test_slru: Fix LWLock initialization for EXEC_BACKEND builds

The LWLock used by this test module was defined as a process-local
variable, which was broken under -DEXEC_BACKEND, each backend getting
its own copy of the lock state.  The shmem_startup_hook unconditionally
called LWLockRegisterTranche() and LWLockInitialize(), which means that
every backend would allocate a new tranche ID (which is still OK for
this module) but reset the lock's atomic state (which was bad).

This commit moves the LWLock to shared memory, so as it is initialized
only once, similarly to pg_prewarm/autoprewarm.c.

This change is only for REL_16_STABLE, per a report from buildfarm
member gokiburi (the system has been upgraded recently, so perhaps it
began failing due to some ALSR changes?).  I have been able to reproduce
the problem on the same host with -DEXEC_BACKEND, and checked that this
commit addresses the issue.  In v17 and v18, the test module wastes
tranche IDs, which only impacts the visibility of the locks like in
pg_stat_activity.  The use of the SLRU bank locks ensures that the
LWLock state is safe.  On HEAD, the logic of the module is safer thanks
to 38b602b0289f.

Discussion: https://postgr.es/m/agr6-cIQ4EUA86Cs@paquier.xyz
Backpatch-through: 16

Fix missed ReleaseVariableStats() in intarray's _int_matchsel().

Given a WHERE clause like "int[] @@ query_int" or "query_int ~~ int[]"
where the query_int side is a table column having statistics,
_int_matchsel() exited without remembering to free the statistics
tuple. This would typically lead to warnings about cache refcount
leakage, like
WARNING: resource was not closed: cache pg_statistic (73), tuple 42/12 has count 1
It's been wrong since this code was added, in commit c6fbe6d6f.

Bug: #19492
Reported-by: Man Zeng <zengman@halodbtech.com>
Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19492-ddcd0e22399ef85a@postgresql.org
Backpatch-through: 14

Fix size check in statext_dependencies_deserialize()

The check for the minimum expected bytea size of a MVDependencies object
was using SizeOfItem() for its calculation. This macro uses the number
of attributes in a single dependency.

This minimum size calculation should be based on MinSizeOfItems(), that
computes the minimum expected size as the header plus the
minimally-sized number of dependency items.

Oversight in d08c44f7a4ec.

Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Discussion: https://postgr.es/m/4b8d299d-2505-4c30-bf80-0f697410db35@tantorlabs.com
Backpatch-through: 14

Avoid exposing WAL receiver raw conninfo during timeline jumps

When reusing an existing WAL receiver after it has reached
WALRCV_WAITING for new instructions, RequestXLogStreaming() copied
PrimaryConnInfo into WalRcv->conninfo before switching the state to
WALRCV_RESTARTING.  At that point ready_to_display could still be true,
so pg_stat_wal_receiver could expose the raw connection string,
including sensitive fields, but it should only show the user-displayable
version of the connection string.

WALRCV_RESTARTING does not establish a new connection.  The waiting WAL
receiver reuses its existing connection and only needs a new startpoint
and timeline, so there is no need to copy the raw connection string into
shared memory again.  Let's only copy conninfo when launching a new WAL
receiver after WALRCV_STOPPED, not while waiting for instructions.

This commit adds coverage for the case fixed by this commit to the
timeline-switch test by verifying that the WAL receiver conninfo remains
consistent across the jump.

Backpatch all the way down, as this issue is possible since
pg_stat_wal_receiver has been introduced.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com
Backpatch-through: 14

pg_recvlogical: Honor source cluster file permissions for output files

Commit c37b3d08ca6 attempted to preserve group permissions on pg_recvlogical
output files when group access was enabled on the source cluster. However,
the output files were still created with a fixed S_IRUSR | S_IWUSR mode,
preventing group-read permissions from being applied.

This commit fixes the issue by creating output files with pg_file_create_mode
instead of a hard-coded mode. This allows pg_recvlogical to correctly preserve
group permissions from the source cluster.

Backpatch to all supported branches.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHhpizYzMo3nFP4GkNMueSNMY3QfC-gBN1VTXtuiANDvw@mail.gmail.com
Backpatch-through: 14

Use ereport(ERROR), not Assert(), for publisher tuples missing columns.

Three locations use Assert() to guard against a mismatch between the
number of columns advertised in the RELATION message and the number
actually received in the subsequent INSERT/UPDATE tuple message. Since
these values originate from the publisher, the check must survive into
production builds.

A malicious or buggy publisher can send a RELATION claiming N columns
and an INSERT claiming M < N columns. The subscriber's apply worker
indexes into colvalues[]/colstatus[] using column indices from the
RELATION message's attribute map, causing a heap out-of-bounds read when
the tuple's column array is smaller than expected. We've looked, without
success, for a scenario in which the publisher holds sufficient control
over these out-of-bounds bytes to exploit this or even to reach a
SIGSEGV. Despite not finding one, the code has been fragile. Back-patch
to v14 (all supported versions).

Reported-by: Varik Matevosyan <varikmatevosyan@gmail.com>
Author: Varik Matevosyan <varikmatevosyan@gmail.com>
Discussion: https://postgr.es/m/CA+bBoog3cCogktzfLb9bppUByu-10B3CFp8u=iKXG_OvtAguCw@mail.gmail.com
Backpatch-through: 14

Doc: fix release-note typo.

This mention of memcpy() should of course have said memcmp().

Reported-by: chris@chrullrich.net
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/177883653690.764749.14038057906859461991@wrigleys.postgresql.org
Backpatch-through: 14

Re-add regression tests for ltree and intarray

These tests have been removed by 906ea101d0d5, due to some of them being
unstable in the buildfarm with low max_stack_depth values. They are now
reworked so as they should be more portable.

The tests to cover the findoprnd() overflows use a balanced tree to
avoid using too much stack, per a suggestion and an investigation by Tom
Lane.

Note: This is initially applied only on HEAD; a backpatch will follow
should the buildfarm be fine with the situation.

Discussion: https://postgr.es/m/agZc6XecyE7E7fep@paquier.xyz
Backpatch-through: 14

refint: Fix segfault in check_foreign_key().

When an UPDATE statement triggers check_foreign_key() with the
action set to "cascade", it generates more UPDATE statements to
modify the key values in referencing relations.  If a new key value
is NULL, SPI_getvalue() returns a NULL pointer, which is
subsequently passed to quote_literal_cstr(), causing a segfault.
To fix, skip quoting when a new key value is NULL and insert an
unquoted NULL keyword instead.

Oversight in commit 260e97733b.  While the refint documentation
recommends marking primary key columns NOT NULL, the aforementioned
scenario accidentally worked on platforms where snprintf()
substitutes "(null)" for NULL pointers.  Note that for
character-type columns, the old code quoted "(null)" as a string
literal, so this didn't always produce correct results.  But it
still seems better to fix this than to reject cases that previously
worked.

Reported-by: Nikita Kalinin <n.kalinin@postgrespro.ru>
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Pierre Forstmann <pierre.forstmann@gmail.com>
Discussion: https://postgr.es/m/19476-bd04ea6241345303%40postgresql.org
Backpatch-through: 14

pgbench: fix verbose error message corruption with multiple threads

When pgbench runs with multiple threads and verbose error reporting is
enabled (--verbose-errors), multiple clients can build verbose error
messages concurrently. Previously, a function-local static
PQExpBuffer was used for these messages, causing the buffer to be
shared across threads. This was not thread-safe and could result in
corrupted or incorrect log output.

Fix this by using a local PQExpBufferData instead of a static buffer.
This keeps verbose error messages correct during concurrent execution.

Backpatch to v15, where this issue was introduced.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alex Guo <guo.alex.hengchen@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwER1AjGXpkKB9t9820NBhMQ_Ghv7=HsKeodUr3=SZsF4g@mail.gmail.com
Backpatch-through: 15

Add more tests for corrupted data with pglz_decompress()

Two cases fixed by 2b5ba2a0a141 were not covered, to emulate the
handling of corrupted data, for:
- set control bit with a valid 2-byte match tag where offset is 0.
- set control bit with a valid 2-byte match tag where offset exceeds
output written.

Oversight in 67d318e70402.

Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/agF4xkIdRcrCIprs@paquier.xyz
Backpatch-through: 14

Fix stale COPY progress during logical replication table sync

Previously, pg_stat_progress_copy in the subscriber could continue to show
the initial COPY operation for logical replication table synchronization as
active even after the data copy had finished. The stale progress entry
remained visible until synchronization caught up with the publisher.

This happened because the table synchronization code called BeginCopyFrom()
and CopyFrom(), but failed to call EndCopyFrom() afterward.

This commit fixes the issue by adding the missing EndCopyFrom() call so that
the COPY progress state in the subscriber is cleared as soon as the initial
data copy completes.

Backpatch to all supported branches.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOzEurQKuy3RiPkd=25PEwEzaqHuGvEOf=X7vaVzhgNjaukYzA@mail.gmail.com
Backpatch-through: 14

Add missing include in Cluster.pm

The postmaster test 004_negotiate.pl could fail due to IO::Socket::INET
gone missing, in environments that cannot use Unix sockets.

Oversight in the backport done in 6dffaeb8e54c, so like the other commit
this is applied across the v14~17 range. Per buildfarm member drongo.

Security: CVE-2026-6479
Backpatch-through: 14

Stamp 16.14.

Last-minute updates for release notes.

Security: CVE-2026-6472, CVE-2026-6473, CVE-2026-6474, CVE-2026-6475, CVE-2026-6476, CVE-2026-6477, CVE-2026-6478, CVE-2026-6479, CVE-2026-6575, CVE-2026-6637, CVE-2026-6638

Use palloc_array() in a few more places to avoid overflow

These could overflow on 32-bit systems.

Backpatch-through: 14
Security: CVE-2026-6473

Remove test cases for field overflows in intarray and ltree.

These checks are failing in the buildfarm, reporting stack overflows
rather than the expected errors, though seemingly only on ppc64 and
s390x platforms. Perhaps there is something off about our tests
for stack depth on those architectures? But there's no time to
debug that right now, and surely these tests aren't too essential.
Revert for now and plan to revisit after the release dust settles.

Backpatch-through: 14
Security: CVE-2026-6473

refint: Fix SQL injection and buffer overruns.

Maliciously crafted key value updates could achieve SQL injection
within check_foreign_key(). To fix, ensure new key values are
properly quoted and escaped in the internally generated SQL
statements. While at it, avoid potential buffer overruns by
replacing the stack buffers for internally generated SQL statements
with StringInfo.

Reported-by: Nikolay Samokhvalov <nik@postgres.ai>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Security: CVE-2026-6637
Backpatch-through: 14

Mark PQfn() unsafe and fix overrun in frontend LO interface.

When result_is_int is set to 0, PQfn() cannot validate that the
result fits in result_buf, so it will write data beyond the end of
the buffer when the server returns more data than requested. Since
this function is insecurable and obsolete, add a warning to the top
of the pertinent documentation advising against its use.

The only in-tree caller of PQfn() is the frontend large object
interface. To fix that, add a buf_size parameter to
pqFunctionCall3() that is used to protect against overruns, and use
it in a private version of PQfn() that also accepts a buf_size
parameter.

Reported-by: Yu Kunpeng <yu443940816@live.com>
Reported-by: Martin Heistermann <martin.heistermann@unibe.ch>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Security: CVE-2026-6477
Backpatch-through: 14

Fix integer overflow in array_agg(), when the array grows too large

If you accumulate many arrays full of NULLs, you could overflow
'nitems', before reaching the MaxAllocSize limit on the allocations.
Add an explicit check that the number of items doesn't grow too large.
With more than MaxArraySize items, getting the final result with
makeArrayResultArr() would fail anyway, so better to error out early.

Reported-by: Xint Code
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473

Fix integer-overflow and alignment hazards in locale-related code.

pg_locale_icu.c was full of places where a very long input string
could cause integer overflow while calculating a buffer size,
leading to buffer overruns.

It also was cavalier about using char-type local arrays as buffers
holding arrays of UChar. The alignment of a char[] variable isn't
guaranteed, so that this risked failure on alignment-picky platforms.
The lack of complaints suggests that such platforms are very rare
nowadays; but it's likely that we are paying a performance price on
rather more platforms. Declare those arrays as UChar[] instead,
keeping their physical size the same.

pg_locale_libc.c's strncoll_libc_win32_utf8() also had the
disease of assuming it could double or quadruple the input
string length without concern for overflow.

Reported-by: Xint Code
Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473

Prevent path traversal in pg_basebackup and pg_rewind

pg_rewind and pg_basebackup could be fed paths from rogue endpoints that
could overwrite the contents of the client when received, achieving path
traversal.

There were two areas in the tree that were sensitive to this problem:
- pg_basebackup, through the astreamer code, where no validation was
performed before building an output path when streaming tar data.  This
is an issue in v15 and newer versions.
- pg_rewind file operations for paths received through libpq, for all
the stable branches supported.

In order to address this problem, this commit adds a helper function in
path.c, that reuses path_is_relative_and_below_cwd() after applying
canonicalize_path().  This can be used to validate the paths received
from a connection point.  A path is considered invalid if any of the two
following conditions is satisfied:
- The path is absolute.
- The path includes a direct parent-directory reference.

Reported-by: XlabAI Team of Tencent Xuanwu Lab
Reported-by: Valery Gubanov <valerygubanov95@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6475

Avoid overflow in size calculations in formatting.c.

A few functions in this file were incautious about multiplying a
possibly large integer by a factor more than 1 and then using it as
an allocation size. This is harmless on 64-bit systems where we'd
compute a size exceeding MaxAllocSize and then fail, but on 32-bit
systems we could overflow size_t, leading to an undersized
allocation and buffer overrun. To fix, use palloc_array() or
mul_size() instead of handwritten multiplication.

Reported-by: Sven Klemm <sven@tigerdata.com>
Reported-by: Xint Code
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Security: CVE-2026-6473
Backpatch-through: 14

Check CREATE privilege on multirange type schema in CREATE TYPE.

This omission allowed roles to create multirange types in any
schema, potentially leading to privilege escalations. Note that
when a multirange type name is not specified in CREATE TYPE, it is
automatically placed in the range type's schema, which is checked
at the beginning of DefineRange().

Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Security: CVE-2026-6472
Backpatch-through: 14

Guard against unsafe conditions in usage of pg_strftime().

Although pg_strftime() has defined error conditions, no callers bother
to check for errors.  This is problematic because the output string is
very likely not null-terminated if an error occurs, so that blindly
using it is unsafe.  Rather than trusting that we can find and fix all
the callers, let's alter the function's API spec slightly: make it
guarantee a null-terminated result so long as maxsize > 0.

Furthermore, if we do get an error, let's make that null-terminated
result be an empty string.  We could instead truncate at the buffer
length, but that risks producing mis-encoded output if the tz_name
string contains multibyte characters.  It doesn't seem reasonable for
src/timezone/ to make use of our encoding-aware truncation logic.
Also, the only really likely source of a failure is a user-supplied
timezone name that is intentionally trying to overrun our buffers.
I don't feel a need to be particularly friendly about that case.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474

Avoid passing unintended format codes to snprintf().

timeofday() assumed that the output of pg_strftime() could not contain
% signs, other than the one it explicitly asks for with %%. However,
we don't have that guarantee with respect to the time zone name (%Z).
A crafted time zone setting could abuse the subsequent snprintf()
call, resulting in crashes or disclosure of server memory.

To fix, split the pg_strftime() call into two and then treat the
outputs as literal strings, not a snprintf format string. The
extra pg_strftime() call doesn't really cost anything, since the
bulk of the conversion work was done by pg_localtime().

Also, adjust buffer widths so that we're not risking string truncation
during the snprintf() step, as that would create a hazard of producing
mis-encoded output.

This also fixes a latent portability issue: the format string expects
an int, but tp.tv_usec is long int on many platforms.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474

Fix SQL injection in logical replication origin checks.

ALTER SUBSCRIPTION ... REFRESH PUBLICATION interpolates schema and
relation names into SQL without quoting them.  A crafted subscriber
relation name can inject arbitrary SQL on the publisher.  Test such a
name.  Back-patch to v16, where commit
875693019053b8897ec3983e292acbb439b088c3 first appeared.

Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Pavel Kohout <pavel.kohout@aisle.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Backpatch-through: 16
Security: CVE-2026-6638

Apply timingsafe_bcmp() in authentication paths

This commit applies timingsafe_bcmp() to authentication paths that
handle attributes or data previously compared with memcpy() or strcmp(),
which are sensitive to timing attacks.

The following data is concerned by this change, some being in the
backend and some in the frontend:
- For a SCRAM or MD5 password, the computed key or the MD5 hash compared
with a password during a plain authentication.
- For a SCRAM exchange, the stored key, the client's final nonce and the
server nonce.
- RADIUS (up to v18), the encrypted password.
- For MD5 authentication, the MD5(MD5()) hash.

Reported-by: Joe Conway <mail@joeconway.com>
Security: CVE-2026-6478
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14

Add timingsafe_bcmp(), for constant-time memory comparison

timingsafe_bcmp() should be used instead of memcmp() or a naive
for-loop, when comparing passwords or secret tokens, to avoid leaking
information about the secret token by timing. This commit just
introduces the function but does not change any existing code to use
it yet.

This has been initially applied as of 09be39112654 in v18 and newer
versions, and will be used in all the stable branches for an upcoming
fix.

Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/7b86da3b-9356-4e50-aa1b-56570825e234@iki.fi
Security: CVE-2026-6478
Backpatch-through: 14

Guard against overflow in "left" fields of query_int and ltxtquery.

contrib/intarray's query_int type uses an int16 field to hold the
offset from a binary operator node to its left operand.  However, it
allows the number of nodes to be as much as will fit in MaxAllocSize,
so there is a risk of overflowing int16 depending on the precise shape
of the tree.  Simple right-associative cases like "a | b | c | ..."
work fine, so we should not solve this by restricting the overall
number of nodes.  Instead add a direct test of whether each individual
offset is too large.

contrib/ltree's ltxtquery type uses essentially the same logic and
has the same 16-bit restriction.

(The core backend's tsquery.c has a variant of this logic too, but
in that case the target field is 32 bits, so it is okay so long
as varlena datums are restricted to 1GB.)

In v16 and up, these types support soft error reporting, so we have
to complicate the recursive findoprnd function's API a bit to allow
the complaint to be reported softly.  v14/v15 don't need that.

Undocumented and overcomplicated code like this makes my head hurt,
so add some comments and simplify while at it.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473

Unify src/common/'s definitions of MaxAllocSize.

Define MaxAllocSize in src/include/common/fe_memutils.h rather
than having several copies of it in different src/common/*.c files.
This also provides an opportunity to document it better.

Back-patch of commit 11b7de4a7, needed now because assorted security
fixes are adding additional references to MaxAllocSize in frontend
code.

Backpatch-through: 14-17
Security: CVE-2026-6473

Fix unbounded recursive handling of SSL/GSS in ProcessStartupPacket()

The handling of SSL and GSS negotiation messages in
ProcessStartupPacket() could cause a recursion of the backend,
ultimately crashing the server as the negotiation attempts were not
tracked across multiple calls processing startup packets.

A malicious client could therefore alternate rejected SSL and GSS
requests indefinitely, each adding a stack frame, until the backend
crashed with a stack overflow, taking down a server.

This commit addresses this issue by modifying ProcessStartupPacket() so
as processed negotiation attempts are tracked, preventing infinite
recursive attempts. A TAP test is added to check this problem, where
multiple SSL and GSS negotiated attempts are stacked.

Reported-by: Calif.io in collaboration with Claude and Anthropic
Research
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Security: CVE-2026-6479
Backpatch-through: 14

Add raw_connect and raw_connect_works to Cluster.pm

These two routines will be used in a test of an upcoming fix. This
commit affects the v14~v17 range. v18 and newer versions already
include them, thanks to 85ec945b7880.

Security: CVE-2026-6479
Backpatch-through: 14

Fix assorted places that need to use palloc_array().

multirange_recv and BlockRefTableReaderNextRelation were incautious
about multiplying a possibly-large integer by a factor more than 1
and then using it as an allocation size.  This is harmless on 64-bit
systems where we'd compute a size exceeding MaxAllocSize and then
fail, but on 32-bit systems we could overflow size_t leading to an
undersized allocation and buffer overrun.

Fix these places by using palloc_array() instead of a handwritten
multiplication.  (In HEAD, some of them were fixed already, but
none of that work got back-patched at the time.)

In addition, BlockRefTableReaderNextRelation passes the same value
to BlockRefTableRead's "int length" parameter.  If built for
64-bit frontend code, palloc_array() allows a larger array size
than it otherwise would, potentially allowing that parameter to
overflow.  Add an explicit check to forestall that and keep the
behavior the same cross-platform.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473

Prevent buffer overrun in unicode_normalize().

Some UTF8 characters decompose to more than a dozen codepoints.
It is possible for an input string that fits into well under
1GB to produce more than 4G decomposed codepoints, causing
unicode_normalize()'s decomp_size variable to wrap around to a
small positive value.  This results in a small output buffer
allocation and subsequent buffer overrun.

To fix, test after each addition to see if we've overrun MaxAllocSize,
and break out of the loop early if so.  In frontend code we want to
just return NULL for this failure (treating it like OOM).  In the
backend, we can rely on the following palloc() call to throw error.

I also tightened things up in the calling functions in varlena.c,
using size_t rather than int and allocating the input workspace
with palloc_array().  These changes are probably unnecessary
given the knowledge that the original input and the normalized
output_chars array must fit into 1GB, but it's a lot easier to
believe the code is safe with these changes.

Reported-by: Xint Code
Reported-by: Bruce Dang <bruce@calif.io>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 14
Security: CVE-2026-6473

Harden our regex engine against integer overflow in size calculations.

The number of NFA states, number of NFA arcs, and number of colors
are all bounded to reasonably small values.  However, there are
places where we try to allocate arrays sized by products of those
quantities, and those calculations could overflow, enabling
buffer-overrun attacks.  In practice there's no problem on 64-bit
machines, but there are some live scenarios on 32-bit machines.

A related problem is that citerdissect() and creviterdissect()
allocate arrays based on the length of the input string, which
potentially could overflow.

To fix, invent MALLOC_ARRAY and REALLOC_ARRAY macros that rely on
palloc_array_extended and repalloc_array_extended with the NO_OOM
option, similarly to the existing MALLOC and REALLOC macros.
(Like those, they'll throw an error not return a NULL result for
oversize requests.  This doesn't really fit into the regex code's
view of error handling, but it'll do for now.  We can consider
whether to change that behavior in a non-security follow-up patch.)

I installed similar defenses in the colormap construction code.
It's not entirely clear whether integer overflow is possible
there, but analyzing the behavior in detail seems not worth
the trouble, as the risky spots are not in hot code paths.

I left a bunch of calls as-is after verifying that they can't
overflow given reasonable limits on nstates and narcs.  Those
limits were enforced already via REG_MAX_COMPILE_SPACE, but
add commentary to document the interactions.

In passing, also fix a related edge case, which is that the
special color numbers used in LACON carcs could overflow the
"color" data type, if ncolors is close to MAX_COLOR.

In v14 and v15, the regex engine calls malloc() directly instead
of using palloc(), so MALLOC_ARRAY and REALLOC_ARRAY do likewise.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473

Make palloc_array() and friends safe against integer overflow.

Sufficiently large "count" arguments could result in undetected
overflow, causing the allocated memory chunk to be much smaller
than what the caller will subsequently write into it.  This is
unlikely to be a hazard with 64-bit size_t but can sometimes
happen on 32-bit builds, primarily where a function allocates
workspace that's significantly larger than its input data.
Rather than trying to patch the at-risk callers piecemeal,
let's just redefine these macros so that they always check.

To do that, move the longstanding add_size() and mul_size() functions
into palloc.h and mcxt.c, and adjust them to not be specific to
shared-memory allocation.  Then invent palloc_mul(), palloc0_mul(),
palloc_mul_extended() to use these functions.  Actually, the latter
use inlined copies to save one function call.  repalloc_array() gets
similar treatment.  I didn't bother trying to inline the calls for
repalloc0_array() though.

In v14 and v15, this also adds repalloc_extended(), which previously
was only available in v16 and up.

We need copies of all this in fe_memutils.[hc] as well, since that
module also provides palloc_array() etc.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473

Add pg_add_size_overflow() and friends

Commit 600086f47 added (several bespoke copies of) size_t addition with
overflow checks to libpq. Move this to common/int.h, along with
its subtraction and multiplication counterparts.

pg_neg_size_overflow() is intentionally omitted; I'm not sure we should
add SSIZE_MAX to win32_port.h for the sake of a function with no
callers.

Back-patch of commit 8934f2136, done now because pg_add_size_overflow()
and friends are needed more widely for security fixes.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi%2B%3D%2BpqUd2MUitvgW1pAJuXgG_TKCVc3_Ek7pe8z9nkf%2BAg%40mail.gmail.com
Backpatch-through: 14-18
Security: CVE-2026-6473

Fix overflows with ts_headline()

The options "StartSel", "StopSel" and "FragmentDelimiter" given by a
caller of the SQL function ts_headline() have their lengths stored as
int16. When providing values larger than PG_INT16_MAX, it was possible
to overflow the length values stored, leading to incorrect behaviors in
generateHeadline(), in most cases translating to a crash.

Attempting to use values for these options larger than PG_INT16_MAX is
now blocked. Some test cases are added to cover our tracks.

Reported-by: Xint Code
Author: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473

ltree: Fix overflows with lquery parsing

The lquery parser in contrib/ltree/ had two overflow problems:
- A single lquery level with many OR-separated variants (e.g.,
'label1|label2|...'), could cause an overflow of totallen, this being
stored as a uint16, meaning a maximum value of UINT16_MAX or 65k.  Each
variant contributes MAXALIGN(LVAR_HDRSIZE + len) bytes.  With enough
long variants, the value would wraparound.  This would corrupt the data
written by LQL_NEXT(), leading to a stack corruption, most likely
translating into a crash, but it would allow incorrect memory access.
- numvar, labelled as a uint16, counts the number of OR-variants in a
single level, and it is incremented without bounds checking.  With more
than PG_UINT16_MAX (65k) variants in a single level, and a minimum of
131kB of input data, it would wrap to 0.  When a (wildcard) '*' is
used, this would change the query results silently.

For both issues, a set of overflows checks are added to guard against
these problematic patterns.

The first issue has been reported by the three people listed below,
affecting v16 and newer versions due to b1665bf01e5f.  Its coding was
still unsafe in v14 and v15.  The second issue affects all the stable
branches; I have bumped into while reviewing the code of the module.

Reported-by: Vergissmeinnicht <vergissmeinnichtzh@gmail.com>
Reported-by: A1ex <alex000young@gmail.com>
Reported-by: Jihe Wang <wangjihe.mail@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Security: CVE-2026-6473
Backpatch-through: 14

Translation updates

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: dea17fe860f80e6bdb49f8c2549c877dd759d7cd

Release notes for 18.4, 17.10, 16.14, 15.18, 14.23.

postgres_fdw: Fix handling of abort-cleanup-failed connections.

As connections that failed abort cleanup can't safely be further used,
if a remote query tries to get such a connection, we reject it.
Previously, this rejection involved dropping the connection if it was
open, without accounting for the possibility of open cursors using it,
causing a server crash when such an open cursor tried to use an
already-dropped connection, as a cursor-handling function
(create_cursor, fetch_more_data, or close_cursor) was called on a freed
PGconn. To fix, delay dropping failed connections until abort cleanup
of the main transaction, to ensure open cursors using such a connection
can safely refer to the PGconn for it.

Oversight in commit 8bf58c0d9.

Reported-by: Zhibai Song <songzhibai1234@gmail.com>
Diagnosed-by: Zhibai Song <songzhibai1234@gmail.com>
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAPmGK176y6JP017-Cn%2BhS9CEJx_6iVhRoYbAqzuLU4d8-XPPNg%40mail.gmail.com
Backpatch-through: 14

Consider collation when proving subquery uniqueness

rel_is_distinct_for()'s RTE_SUBQUERY branch passed only the equality
operator from each join clause to query_is_distinct_for(), discarding
the operator's input collation. query_is_distinct_for() then verified
opfamily compatibility but never checked collations, so a DISTINCT /
GROUP BY / set-op operating under one collation was trusted to prove
uniqueness for a comparison performed under an unrelated collation.
As with the recent fix in relation_has_unique_index_for(), this is
unsound for nondeterministic collations and yields wrong query results
in any optimization that consumes the proof.

Fix by carrying each clause's operator input collation into
query_is_distinct_for() and validating it at every check-site against
the subquery target expression's collation.

Back-patch to all supported branches. query_is_distinct_for() is
declared in an installed header, so on stable branches the existing
two-list signature is retained as a thin wrapper that forwards to a
new collation-aware entry point; external callers continue to receive
the historical collation-blind answer.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14

Consider collation when proving uniqueness from unique indexes

relation_has_unique_index_for() has long had an XXX noting that it
doesn't check collations when matching a unique index's columns
against equality clauses.  This was benign as long as all collations
in play reduced to the same notion of equality, but has been incorrect
since nondeterministic collations were introduced in PG 12: a unique
index under a deterministic collation does not prove uniqueness under
a nondeterministic collation, nor vice versa.

The consequence is wrong query results for any planner optimization
that consumes the faulty proof, including inner-unique join execution
(which stops the inner search after the first match per outer row),
useless-left-join removal, semijoin-to-innerjoin reduction, and
self-join elimination.

Fix by requiring the index's collation to agree on equality with the
clause's input collation.  Two collations agree on equality if either
is InvalidOid (denoting a non-collation-sensitive operation, which
cannot conflict with the other side), if they have the same OID, or if
both are deterministic: by definition a deterministic collation treats
two strings as equal iff they are byte-wise equal (see CREATE
COLLATION), so any two deterministic collations share the same
equality relation and the uniqueness proof carries over.  Any mismatch
involving a nondeterministic collation is rejected.

Back-patch to all supported branches; the bug has existed since
nondeterministic collations were introduced in PG 12.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14

Mark modified the FSM buffer as dirty during recovery

The XLogRecordPageWithFreeSpace function updates the freespace map (FSM) data
while replaying data-level WAL records during the recovery. If the FSM block
is updated, it needs to be marked as modified. Currently, this is done with
the MarkBufferDirtyHint call (as in all other cases for modifying FSM data).
However, in the recovery context, this function will actually do nothing if
checksums are enabled. It's assumed that the page should not be dirtied
during recovery while modifying hints to protect against torn pages, since no
new WAL data can be generated at this point to store FPI.

Such logic does not seem fully aligned with the FSM case, as its blocks could
be simply zeroed if a checksum mismatch is detected. Currently, changes to an
FSM block could be lost if each change to that block occurs infrequently
enough to allow it to be evicted from the cache. To persist the change, the
modification needs to be performed while the FSM block is still kept in
buffers and marked as dirty after receiving its FPI. If the block has already
been cleaned, the change won't be persisted, so stored FSM blocks may remain
in an obsolete state.

If a large number of discrepancies between the data in leaf FSM blocks and the
actual data blocks accumulate on the replica server, this could cause
significant delays in insert operations after switchover. Such an insert
operation may need to visit many data blocks marked as having sufficient
space in the FSM, only to discover that the information is incorrect and the
FSM records need to be corrected. In a heavily trafficked insert-only table
with many concurrent clients performing inserts, this has been observed to
cause several-second stalls, causing visible application malfunction. The
desire to avoid such cases was the reason behind the commit ab7dbd681, which
introduced an update of FSM data during the heap_xlog_visible invocation.
However, an update to the FSM data on the standby side could be lost due to a
missing 'dirty' flag, so there is still a possibility that a large number of
FSM records will contain incorrect data. Note that having a zeroed FSM page
in such a case (due to a checksum mismatch) is preferable, as a zero value
will be interpreted as an indication of full data blocks, and the inserter
will be routed to the next FSM block or to the end of the table.

Given that FSM is ready to handle torn page writes and
XLogRecordPageWithFreeSpace is called only during the recovery, there seems
to be no reason to use MarkBufferDirtyHint here instead of a regular
MarkBufferDirty call.

Discussion: https://postgr.es/m/596c4f1c-f966-4512-b9c9-dd8fbcaf0928%40postgrespro.ru
Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>

Add missing connection validation in ECPG

ECPGdeallocate_all(), ECPGprepared_statement(), ECPGget_desc(), and
ecpg_freeStmtCacheEntry() could crash with a SIGSEGV when called
without an established connection (for example, when EXEC SQL CONNECT
was forgotten or a non-existent connection name was used), because
they dereferenced the result of ecpg_get_connection() without first
checking it for NULL.

Each site is fixed in the style of the surrounding code.

New tests are added for these conditions.

Author: Shruthi Gowda <gowdashru@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Nishant Sharma <nishant.sharma@enterprisedb.com>
Discussion: https://postgr.es/m/3007317.1765210195@sss.pgh.pa.us
Backpatch-through: 14

doc: Mention validation attempt during ALTER INDEX .. ATTACH PARTITION

Since 9d3e094f12, the command tries to validate the parent index of the
named index, if invalid. The documentation did not mention this
behavior, which could be confusing.

Author: Mohamed ALi <moali.pg@gmail.com>
Discussion: https://postgr.es/m/CAGnOmWpHu25_LpT=zv7KtetQhqV1QEZzFYLd_TDyOLu1Od9fpw@mail.gmail.com
Backpatch-through: 14

Fix attnum remapping in generateClonedExtStatsStmt()

When cloning extended statistics via CREATE TABLE ... LIKE ... INCLUDING
STATISTICS, stxkeys holds attribute numbers from the source (parent)
table, but get_attname() was being called with the child relation's
OID. If the parent has dropped columns, the child's attribute numbers
are renumbered sequentially and no longer match, so the lookup either
returns the wrong column name (silent corruption) or errors out when
the attnum does not exist in the child.

Fix it by remapping the parent attnum through attmap before the lookup,
consistent with how expression statistics are already handled a few
lines below.

Add a regression test covering both manifestations: a 3-column parent
where the stale attnum refers to no child column (cache-lookup error),
and a 4-column parent where the stale attnum silently refers to the
wrong child column.

Author: Julien Tachoires <julmon@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/20260415105718.tomuncfbmlt67oel@poseidon.home.virt
Backpatch-through: 14

Fix errno check based on EINTR in pg_flush_data()

Upon a failure of sync_file_range(), EINTR was checked based on the
returned result of the routine rather than its errno. sync_file_range()
returns -1 on failure, making the check a no-op, invalidating the retry
attempt in this case.

Oversight in 0d369ac65004.

Author: DaeMyung Kang <charsyam@gmail.com>
Discussion: https://postgr.es/m/20260429151811.1810874-1-charsyam@gmail.com
Backpatch-through: 16

Suppress "has no symbols" linker warnings on macOS.

After a recent macOS update, building Postgres produces warnings
that look like this:

ranlib: warning: 'libpgport_shlib.a(pg_cpu_x86.c.o)' has no symbols
ranlib: warning: 'libpgport_shlib.a(pg_popcount_x86.c.o)' has no symbols

To fix, add a dummy symbol to files that may otherwise have none.
Per project policy, this is a candidate for back-patching into
out-of-support branches: it suppresses annoying compiler warnings
but changes no behavior.

Reported-by: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/229aaaf3-f529-44ed-8e50-00cb6909af21%40Spark
Backpatch-through: 13

doc: Fix grammar in some logical replication pages

Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+PuvY_wYLPJ4DTs7NE9Lu2ty4d-OgZAOJC-NvCM=2wwcQQ@mail.gmail.com
Backpatch-through: 14

Update time zone data files to tzdata release 2026b.

British Columbia (America/Vancouver) moved to permanent UTC-07 on
2026-03-09, which will affect their clocks beginning on 2026-11-01.
For lack of any clarity on the point, assume their TZ abbreviation
will be MST from that time forward.

Moldova (Europe/Chisinau) has followed EU DST transition times since
2022.

Backpatch-through: 14

Fix incorrect logic for hashed IN / NOT IN with non-strict operators

ExecEvalHashedScalarArrayOp(), when using a strict equality function,
performs a short-circuit when looking up NULL values.  When the function
is non-strict, the code incorrectly looked up the hash table for a
zero-valued Datum, which could have resulted in an accidental true
return if the hash table contained zero valued Datum, or could result
in a crash for non-byval types.

Here we fix this by adding an extra step when we build the hash table to
check what the result of a NULL lookup would be.  This requires looping
over the array and checking what the non-hashed version of the code
would do.  We cache the results of that in the expression so that we can
reuse the result any time we're asked to search for a NULL value.

It's important to note that non-strict equality functions are free to
treat any NULL value as equal to any non-NULL value.  For example,
someone may wish to design a type that treats an empty string and NULL
as equal.

All built-in types have strict equality functions, so this could affect
custom / user-defined types.

Author: Chengpeng Yan <chengpeng_yan@outlook.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/A16187AE-2359-4265-9F5E-71D015EC2B2D@outlook.com
Backpatch-through: 14

pg_test_timing: fix unit in backward-clock warning

pg_test_timing reports timing differences in nanoseconds in master, and
in microseconds in v14 through v18, but previously the backward-clock
warning incorrectly labeled the value as milliseconds.

This commit fixes the warning message to use "ns" in master and
"us" in v14 through v18, matching the actual unit being reported.

Backpatch to all supported versions.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/F780CEEB-A237-4302-9F55-60E9D8B6533D@gmail.com
Backpatch-through: 14

Don't call CheckAttributeType() with InvalidOid on dropped cols

If CheckAttributeType() is called with InvalidOid, it performs a bunch
of pointless, futile syscache lookups with InvalidOid, but ultimately
tolerates it and has no effect. We were calling it with InvalidOid on
dropped columns, but it seems accidental that it works, so let's stop
doing it.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14

Don't allow composite type to be member of itself via multirange

CheckAttributeType() checks that a composite type is not made a member
of itself with ALTER TABLE ADD COLUMN or ALTER TYPE ADD ATTRIBUTE,
even indirectly via a domain, array, another composite type or a range
type. But it missed checking for multiranges. That was a simple
oversight when multiranges were added.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/93ce56cd-02a6-4db1-8224-c8999372facc@iki.fi
Backpatch-through: 14

Guard against overly-long numeric formatting symbols from locale.

to_char() allocates its output buffer with 8 bytes per formatting
code in the pattern. If the locale's currency symbol, thousands
separator, or decimal or sign symbol is more than 8 bytes long,
in principle we could overrun the output buffer. No such locales
exist in the real world, so it seems sufficient to truncate the
symbol if we do see it's too long.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/638232.1776790821@sss.pgh.pa.us
Backpatch-through: 14

Prevent some buffer overruns in spell.c's parsing of affix files.

parse_affentry() and addCompoundAffixFlagValue() each collect fields
from an affix file into working buffers of size BUFSIZ.  They failed
to defend against overlength fields, so that a malicious affix file
could cause a stack smash.  BUFSIZ (typically 8K) is certainly way
longer than any reasonable affix field, but let's fix this while
we're closing holes in this area.

I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield().  Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.

Reported-by: Igor Stepansky <igor.stepansky@orca.security>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/864123.1776810909@sss.pgh.pa.us
Backpatch-through: 14

Prevent buffer overrun in spell.c's CheckAffix().

This function writes into a caller-supplied buffer of length
2 * MAXNORMLEN, which should be plenty in real-world cases.
However a malicious affix file could supply an affix long
enough to overrun that. Defend by just rejecting the match
if it would overrun the buffer. I also inserted a check of
the input word length against Affix->replen, just to be sure
we won't index off the buffer, though it would be caller error
for that not to be true.

Also make the actual copying steps a bit more readable, and remove
an unnecessary requirement for the whole input word to fit into the
output buffer (even though it always will with the current caller).

The lack of documentation in this code makes my head hurt, so
I also reverse-engineered a basic header comment for CheckAffix.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/641711.1776792744@sss.pgh.pa.us
Backpatch-through: 14

Allow ALTER INDEX .. ATTACH PARTITION to validate a parent index

This commit tweaks ALTER INDEX .. ATTACH PARTITION to attempt a
validation of a parent index in the case where an index is already
attached but the parent is not yet valid.  This occurs in cases where a
parent index was created invalid such as with CREATE INDEX ONLY, but was
left invalid after an invalid child index was attached (partitioned
indexes set indisvalid to false if at least one partition is
!indisvalid, indisvalid is true in a partitioned table iff all
partitions are indisvalid).  This could leave a partition tree in a
situation where a user could not bring the parent index back to valid
after fixing the child index, as there is no built-in mechanism to do
so.  This commit relies on the fact that repeated ATTACH PARTITION
commands on the same index silently succeed.

An invalid parent index is more than just a passive issue.  It causes
for example ON CONFLICT on a partitioned table if the invalid parent
index is used to enforce a unique constraint.

Some test cases are added to track some of problematic patterns, using a
set of partition trees with combinations of invalid indexes and ATTACH
PARTITION.

Reported-by: Mohamed Ali <moali.pg@gmail.com>
Author: Sami Imseih <sanmimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CAGnOmWqi1D9ycBgUeOGf6mOCd2Dcf=6sKhbf4sHLs5xAcKVCMQ@mail.gmail.com
Backpatch-through: 14

Make plpgsql_trap test more robust and less resource-intensive.

We were using "select count(*) into x from generate_series(1,
1_000_000_000_000)" to waste one second waiting for a statement
timeout trap. Aside from consuming CPU to little purpose, this could
easily eat several hundred MB of temporary file space, which has been
observed to cause out-of-disk-space errors in the buildfarm.
Let's just use "pg_sleep(10)", which is far less resource-intensive.

Also update the "when others" exception handler so that if it does
ever again trap an error, it will tell us what error. The cause of
these intermittent buildfarm failures had been obscure for awhile.

Discussion: https://postgr.es/m/557992.1776779694@sss.pgh.pa.us
Backpatch-through: 14

Fix incorrect NEW references to generated columns in rule rewriting

When a rule action or rule qualification references NEW.col where col
is a generated column (stored or virtual), the rewriter produces
incorrect results.

rewriteTargetListIU removes generated columns from the query's target
list, since stored generated columns are recomputed by the executor
and virtual ones store nothing.  However, ReplaceVarsFromTargetList
then cannot find these columns when resolving NEW references during
rule rewriting.  For UPDATE, the REPLACEVARS_CHANGE_VARNO fallback
redirects NEW.col to the original target relation, making it read the
pre-update value (same as OLD.col).  For INSERT,
REPLACEVARS_SUBSTITUTE_NULL replaces it with NULL.  Both are wrong
when the generated column depends on columns being modified.

Fix by building target list entries for generated columns from their
generation expressions, pre-resolving the NEW.attribute references
within those expressions against the query's targetlist, and passing
them together with the query's targetlist to ReplaceVarsFromTargetList.

Back-patch to all supported branches.  Virtual generated columns were
added in v18, so the back-patches in pre-v18 branches only handle
stored generated columns.

Reported-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDexGTmCZzx=73gXkY2ZADS6LRhpnU+-8Y_QmrdTS6yUhA@mail.gmail.com
Backpatch-through: 14

Fix orphaned processes when startup process fails during PM_STARTUP

When the startup process exists with a FATAL error during PM_STARTUP,
the postmaster called ExitPostmaster() directly, assuming that no other
processes are running at this stage.  Since 7ff23c6d277d, this
assumption is not true, as the checkpointer, the background writer, the
IO workers and bgworkers kicking in early would be around.

This commit removes the startup-specific shortcut happening in
process_pm_child_exit() for a failing startup process during PM_STARTUP,
falling down to the existing exit() flow to signal all the started
children with SIGQUIT, so as we have no risk of creating orphaned
processes.

This required an extra change in HandleFatalError() for v18 and newer
versions, as an assertion could be triggered for PM_STARTUP.  It is now
incorrect.  In v17 and older versions, HandleChildCrash() needs to be
changed to handle PM_STARTUP so as children can be waited on.

While on it, fix a comment at the top of postmaster.c.  It was claiming
that the checkpointer and the background writer were started after
PM_RECOVERY.  That is not the case.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWVoD3V9yhhqSae1_wqcnTdpFY-hDT7dPm5005ZFsL_bpA@mail.gmail.com
Backpatch-through: 15

doc: Correct context description for some JIT support GUCs

The documentation for jit_debugging_support and jit_profiling_support
previously stated that these parameters can only be set at server start.

However, both parameters use the PGC_SU_BACKEND context, meaning they
can be set at session start by superusers or users granted the appropriate
SET privilege, but cannot be changed within an active session.

This commit updates the documentation to reflect the actual behavior.

Backpatch to all supported versions.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHGQGwEpMDpB-K8SSUVRRHg6L6z3pLAkekd9aviOS=ns0EC=+Q@mail.gmail.com
Backpatch-through: 14

Fix relid-set clobber during join removal.

Commit cfcd57111 et al fell over under Valgrind testing.
(It seems to be enough to #define USE_VALGRIND, you don't actually
need to run it under Valgrind to see failures.)  The cause is that
remove_rel_from_eclass updates each EquivalenceMember's em_relids,
and those can be aliases of the left_relids or right_relids of some
RestrictInfo in ec_sources.  If the update made em_relids empty then
bms_del_member will have pfree'd the relid set, so that the subsequent
attempt to clean up ec_sources accesses already-freed memory.

We missed seeing ill effects before cfcd57111 because (a) if the
pfree happens then we will remove the EquivalenceMember altogether,
making the source RestrictInfo no longer of use, and (b) the
cleanup of ec_sources didn't touch left/right_relids before that.

I'm unclear though on how cfcd57111 managed to pass non-USE_VALGRIND
testing.  Apparently we managed to store another Bitmapset into the
freed space before trying to access it, but you'd not think that would
happen 100% of the time.  I think what USE_VALGRIND changes is that it
makes list.c much more memory-hungry, so that the freed space gets
claimed by some List node before a Bitmapset can be put there.

This failure can be seen in v16, v17, and master, but oddly enough not
v18.  That's because the SJE patch replaced the simple bms_del_members
calls used here with adjust_relid_set, which is careful not to
scribble on its input.  But commit 20efbdffe just recently put back
the old coding and thus resurrected the problem.

Discussion: https://postgr.es/m/458729.1776724816@sss.pgh.pa.us
Backpatch-through: 16, 17, master

Clean up all relid fields of RestrictInfos during join removal.

The original implementation of remove_rel_from_restrictinfo()
thought it could skate by with removing no-longer-valid relid
bits from only the clause_relids and required_relids fields.
This is quite bogus, although somehow we had not run across a
counterexample before now.  At minimum, the left_relids and
right_relids fields need to be fixed because they will be
examined later by clause_sides_match_join().  But it seems
pretty foolish not to fix all the relid fields, so do that.

This needs to be back-patched as far as v16, because the
bug report shows a planner failure that does not occur
before v16.  I'm a little nervous about back-patching,
because this could cause unexpected plan changes due to
opening up join possibilities that were rejected before.
But it's hard to argue that this isn't a regression.  Also,
the fact that this changes no existing regression test results
suggests that the scope of changes may be fairly narrow.
I'll refrain from back-patching further though, since no
adverse effects have been demonstrated in older branches.

Bug: #19460
Reported-by: François Jehl <francois.jehl@pigment.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19460-5625143cef66012f@postgresql.org
Backpatch-through: 16

Flush statistics during idle periods in parallel apply worker.

Parallel apply workers previously failed to report statistics while
waiting for new work in the main loop. This resulted in the stats from the
most recent transaction remaining unbuffered, leading to arbitrary
reporting delays—particularly when streamed transactions were infrequent.

This commit ensures that statistics are explicitly flushed when the worker
is idle, providing timely visibility into accumulated worker activity.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/TYRPR01MB1419579F217CC4332B615589594202@TYRPR01MB14195.jpnprd01.prod.outlook.com

doc: Improve description of pg_ctl -l log file permissions

The documentation stated only that the log file created by pg_ctl -l is
inaccessible to other users by default. However, since commit c37b3d0,
the actual behavior is that only the cluster owner has access by default,
but users in the same group as the cluster owner may also read the file
if group access is enabled in the cluster.

This commit updates the documentation to describe this behavior
more clearly.

Backpatch to all supported versions.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/OS9PR01MB1214959BE987B4839E3046050F54BA@OS9PR01MB12149.jpnprd01.prod.outlook.com
Backpatch-through: 14

Fix comments for Korean encodings in encnames.c

  * JOHAB: replace the incorrect "simplified Chinese" description with
    a correct one that identifies it as the Korean combining (Johab)
    encoding standardized in KS X 1001 annex 3.

  * EUC_KR: drop a stray space before the comma in the existing
    comment, and note that the encoding covers the KS X 1001
    precomposed (Wansung) form.

  * UHC: spell out "Unified Hangul Code", clarify that it is
    Microsoft Windows CodePage 949, and describe its relationship to
    EUC-KR (superset covering all 11,172 precomposed Hangul syllables).

Backpatch-through: 14
Author: Henson Choi <assam258@gmail.com>
Discussion: https://postgr.es/m/CAAAe_zAFz1v-3b7Je4L%2B%3DwZM3UGAczXV47YVZfZi9wbJxspxeA%40mail.gmail.com

Check for unterminated strings when calling uloc_getLanguage().

Missed by commit 1671f990dd66.

Author: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/118ca69e-47eb-42e1-83e9-72ccf40dd6fd@proxel.se
Backpatch-through: 16

Add tests for low-level PGLZ [de]compression routines

The goal of this module is to provide an entry point for the coverage of
the low-level compression and decompression PGLZ routines. The new test
is moved to a new parallel group, with all the existing
compression-related tests added to it.

This includes tests for the cases detected by fuzzing that emulate
corrupted compressed data, as fixed by 2b5ba2a0a141:
- Set control bit with read of a match tag, where no data follows.
- Set control bit with read of a match tag, where 1 byte follows.
- Set control bit with match tag where length nibble is 3 bytes
(extended case).

While on it, some tests are added for compress/decompress roundtrips,
and for check_complete=false/true. Like 2b5ba2a0a141, backpatch to all
the stable branches.

Discussion: https://postgr.es/m/adw647wuGjh1oU6p@paquier.xyz
Backpatch-through: 14

Honor passed-in database OIDs in pgstat_database.c

Three routines in pgstat_database.c incorrectly ignore the database OID
provided by their caller, using MyDatabaseId instead:
- pgstat_report_connect()
- pgstat_report_disconnect()
- pgstat_reset_database_timestamp()

The first two functions, for connection and disconnection, each have a
single caller that already passes MyDatabaseId.  This was harmless,
still incorrect.

The timestamp reset function also has a single caller, but in this case
the issue has a real impact: it fails to reset the timestamp for the
shared-database entry (datid=0) when operating on shared objects.  This
situation can occur, for example, when resetting counters for shared
relations via pg_stat_reset_single_table_counters().

There is currently one test in the tree that checks the reset of a
shared relation, for pg_shdescription, we rely on it to check what is
stored in pg_stat_database.  As stats_reset may be NULL, two resets are
done to provide a baseline for comparison.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dapeng Wang <wangdp20191008@gmail.com>
Discussion: https://postgr.es/m/ABBD5026-506F-4006-A569-28F72C188693@gmail.com
Backpatch-through: 15

Fix heap-buffer-overflow in pglz_decompress() on corrupt input.

When decoding a match tag, pglz_decompress() reads 2 bytes (or 3
for extended-length matches) from the source buffer before checking
whether enough data remains. The existing bounds check (sp > srcend)
occurs after the reads, so truncated compressed data that ends
mid-tag causes a read past the allocated buffer.

Fix by validating that sufficient source bytes are available before
reading each part of the match tag. The post-read sp > srcend
check is no longer needed and is removed.

Found by fuzz testing with libFuzzer and AddressSanitizer.

Backpatch-through: 14

Fix integer overflow in nodeWindowAgg.c

In nodeWindowAgg.c, the calculations for frame start and end positions
in ROWS and GROUPS modes were performed using simple integer addition.
If a user-supplied offset was sufficiently large (close to INT64_MAX),
adding it to the current row or group index could cause a signed
integer overflow, wrapping the result to a negative number.

This led to incorrect behavior where frame boundaries that should have
extended indefinitely (or beyond the partition end) were treated as
falling at the first row, or where valid rows were incorrectly marked
as out-of-frame.  Depending on the specific query and data, these
overflows can result in incorrect query results, execution errors, or
assertion failures.

To fix, use overflow-aware integer addition (ie, pg_add_s64_overflow)
to check for overflows during these additions.  If an overflow is
detected, the boundary is now clamped to INT64_MAX.  This ensures the
logic correctly treats the boundary as extending to the end of the
partition.

Bug: #19405
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19405-1ecf025dda171555@postgresql.org
Backpatch-through: 14

Avoid unsafe access to negative index in a TupleDesc.

Commit aa606b931 installed a test that would reference a nonexistent
TupleDesc array entry if a system column is used in COPY FROM WHERE.
Typically this would be harmless, but with bad luck it could result
in a phony "generated columns are not supported in COPY FROM WHERE
conditions" error, and at least in principle it could cause SIGSEGV.
(Compare 570e2fcc0 which fixed the identical problem in another
place.) Also, since c98ad086a it throws an Assert instead.

In the back branches, just guard the test to make it a safe no-op for
system columns. Commit 21c69dc73 installed a more aggressive answer
in master.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6f435023-8ab6-47c2-ba07-035d0c4212f9@gmail.com
Backpatch-through: 14-18

Fix null-bitmap combining in array_agg_array_combine().

This code missed the need to update the combined state's
nullbitmap if state1 already had a bitmap but state2 didn't.
We need to extend the existing bitmap with 1's but didn't.
This could result in wrong output from a parallelized
array_agg(anyarray) calculation, if the input has a mix of
null and non-null elements. The errors depended on timing
of the parallel workers, and therefore would vary from one
run to another.

Also install guards against integer overflow when calculating
the combined object's sizes, and make some trivial cosmetic
improvements.

Author: Dmytro Astapov <dastapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFQUnFj2pQ1HbGp69+w2fKqARSfGhAi9UOb+JjyExp7kx3gsqA@mail.gmail.com
Backpatch-through: 16

jit: No backport::SectionMemoryManager for LLVM 22.

LLVM 22 has the fix that we copied into our tree in commit 9044fc1d and
a new function to reach it[1][2], so we only need to use our copy for
Aarch64 + LLVM < 22.  The only change to the final version that our copy
didn't get is a new LLVM_ABI macro, but that isn't appropriate for us.
Our copy is hopefully now frozen and would only need maintenance if bugs
are found in the upstream code.

Non-Aarch64 systems now also use the new API with LLVM 22.  It allocates
all sections with one contiguous mmap() instead of one per
section.  We could have done that earlier, but commit 9044fc1d wanted to
limit the blast radius to the affected systems.  We might as well
benefit from that small improvement everywhere now that it is available
out of the box.

We can't delete our copy until LLVM 22 is our minimum supported version,
or we switch to the newer JITLink API for at least Aarch64.

[1] https://github.com/llvm/llvm-project/pull/71968
[2] https://github.com/llvm/llvm-project/pull/174307

Backpatch-through: 14
Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com

jit: Stop emitting lifetime.end for LLVM 22.

The lifetime.end intrinsic can now only be used for stack memory
allocated with alloca[1][2][3]. We use it to tell LLVM about the
lifetime of function arguments/isnull values that we keep in palloc'd
memory, so that it can avoid spilling registers to memory.

We might need to rearrange things and put them on the stack, but that'll
take some research. In the meantime, unbreak the build on LLVM 22.

[1] https://github.com/llvm/llvm-project/pull/149310
[2] https://llvm.org/docs/LangRef.html#llvm-lifetime-end-intrinsic
[3] https://llvm.org/docs/LangRef.html#i-alloca

Backpatch-through: 14
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> (earlier attempt)
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> (earlier attempt)
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier attempt)
Discussion: https://postgr.es/m/CA%2BhUKGJTumad75o8Zao-LFseEbt%3DenbUFCM7LZVV%3Dc8yg2i7dg%40mail.gmail.com

doc: Add missing description for DROP SUBSCRIPTION IF EXISTS.

Oversight in commit 665d1fad99.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHut%2BPv72haFerrCdYdmF6hu6o2jKcGzkXehom%2BsP-JBBmOVDg%40mail.gmail.com
Backpatch-through: 14

Be more careful to preserve consistency of a tuplestore.

Several places in tuplestore.c would leave the tuplestore data
structure effectively corrupt if some subroutine were to throw
an error.  Notably, if WRITETUP() failed after some number of
successful calls within dumptuples(), the tuplestore would
contain some memtuples pointers that were apparently live
entries but in fact pointed to pfree'd chunks.

In most cases this sort of thing is fine because transaction
abort cleanup is not too picky about the contents of memory that
it's going to throw away anyway.  There's at least one exception
though: if a Portal has a holdStore, we're going to call
tuplestore_end() on that, even during transaction abort.
So it's not cool if that tuplestore is corrupt, and that means
tuplestore.c has to be more careful.

This oversight demonstrably leads to crashes in v15 and before,
if a holdable cursor fails to persist its data due to an undersized
temp_file_limit setting.  Very possibly the same thing can happen in
v16 and v17 as well, though the specific test case submitted failed
to fail there (cf. 095555daf).  The failure is accidentally dodged
as of v18 because 590b045c3 got rid of tuplestore_end's retail tuple
deletion loop.  Still, it seems unwise to permit tuplestores to become
internally inconsistent in any branch, so I've applied the same fix
across the board.

Since the known test case for this is rather expensive and doesn't
fail in recent branches, I've omitted it.

Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 14

Detect pfree or repalloc of a previously-freed memory chunk.

Before the major rewrite in commit c6e0fe1f2, AllocSetFree() would
typically crash when asked to free an already-free chunk.  That was
an ugly but serviceable way of detecting coding errors that led to
double pfrees.  But since that rewrite, double pfrees went through
just fine, because the "hdrmask" of a freed chunk isn't changed at all
when putting it on the freelist.  We'd end with a corrupt freelist
that circularly links back to the doubly-freed chunk, which would
usually result in trouble later, far removed from the actual bug.

This situation is no good at all for debugging purposes.  Fortunately,
we can fix it at low cost in MEMORY_CONTEXT_CHECKING builds by making
AllocSetFree() check for chunk->requested_size == InvalidAllocSize,
relying on the pre-existing code that sets it that way just below.

I investigated the alternative of changing a freed chunk's methodid
field, which would allow detection in non-MEMORY_CONTEXT_CHECKING
builds too.  But that adds measurable overhead.  Seeing that we didn't
notice this oversight for more than three years, it's hard to argue
that detecting this type of bug is worth any extra overhead in
production builds.

Likewise fix AllocSetRealloc() to detect repalloc() on a freed chunk,
and apply similar changes in generation.c and slab.c.  (generation.c
would hit an Assert failure anyway, but it seems best to make it act
like aset.c.)  bump.c doesn't need changes since it doesn't support
pfree in the first place.  Ideally alignedalloc.c would receive
similar changes, but in debugging builds it's impossible to reach
AlignedAllocFree() or AlignedAllocRealloc() on a pfreed chunk, because
the underlying context's pfree would have wiped the chunk header of
the aligned chunk.  But that means we should get an error of some
sort, so let's be content with that.

Per investigation of why the test case for bug #19438 didn't appear to
fail in v16 and up, even though the underlying bug was still present.
(This doesn't fix the underlying double-free bug, just cause it to
get detected.)

Bug: #19438
Reported-by: Dmitriy Kuzmin <kuzmin.db4@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/19438-9d37b179c56d43aa@postgresql.org
Backpatch-through: 16

Fix datum_image_*()'s inability to detect sign-extension variations

Functions such as hash_numeric() are not careful to use the correct
PG_RETURN_*() macro according to the return type of that function as
defined in pg_proc.  Because that function is meant to return int32,
when the hashed value exceeds 2^31, the 64-bit Datum value won't wrap to
a negative number, which means the Datum won't have the same value as it
would have had it been cast to int32 on a two's complement machine.  This
isn't harmless as both datum_image_eq() and datum_image_hash() may receive
a Datum that's been formed and deformed from a tuple in some cases, and
not in other cases.  When formed into a tuple, the Datum value will be
coerced into an integer according to the attlen as specified by the
TupleDesc.  This can result in two Datums that should be equal being
classed as not equal, which could result in (but not limited to) an error
such as:

ERROR:  could not find memoization table entry

Here we fix this by ensuring we cast the Datum value to a signed integer
according to the typLen specified in the datum_image_eq/datum_image_hash
function call before comparing or hashing.

Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Tender Wang <tndrwang@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/CAHewXNmcXVFdB9_WwA8Ez0P+m_TQy_KzYk5Ri5dvg+fuwjD_yw@mail.gmail.com

Fix multiple bugs in astreamer pipeline code.

astreamer_tar_parser_content() sent the wrong data pointer when
forwarding MEMBER_TRAILER padding to the next streamer.  After
astreamer_buffer_until() buffers the padding bytes, the 'data'
pointer has been advanced past them, but the code passed 'data'
instead of bbs_buffer.data.  This caused the downstream consumer
to receive bytes from after the padding rather than the padding
itself, and could read past the end of the input buffer.

astreamer_gzip_decompressor_content() only checked for
Z_STREAM_ERROR from inflate(), silently ignoring Z_DATA_ERROR
(corrupted data) and Z_MEM_ERROR (out of memory).  Fix by
treating any return other than Z_OK, Z_STREAM_END, and
Z_BUF_ERROR as fatal.

astreamer_gzip_decompressor_free() missed calling inflateEnd() to
release zlib's internal decompression state.

astreamer_tar_parser_free() neglected to pfree() the streamer
struct itself, leaking it.

astreamer_extractor_content() did not check the return value of
fclose() when closing an extracted file.  A deferred write error
(e.g., disk full on buffered I/O) would be silently lost.

Discussion: https://postgr.es/m/results/98c6b630-acbb-44a7-97fa-1692ce2b827c@dunslane.net

Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 15

Avoid memory leak on error while parsing pg_stat_statements dump file

By using palloc() instead of raw malloc().

Reported-by: Gaurav Singh <gaurav.singh@yugabyte.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAEcQ1bYR9s4eQLFDjzzJHU8fj-MTbmRpW-9J-r2gsCn+HEsynw@mail.gmail.com
Backpatch-through: 14

Fix premature NULL lag reporting in pg_stat_replication

pg_stat_replication is documented to keep the last measured lag values for
a short time after the standby catches up, and then set them to NULL when
there is no WAL activity. However, previously lag values could become NULL
prematurely even while WAL activity was ongoing, especially in logical
replication.

This happened because the code cleared lag when two consecutive reply messages
indicated that the apply location had caught up with the send location.
It did not verify that the reported positions were unchanged, so lag could be
cleared even when positions had advanced between messages. In logical
replication, where the apply location often quickly catches up, this issue was
more likely to occur.

This commit fixes the issue by clearing lag only when the standby reports that
it has fully replayed WAL (i.e., both flush and apply locations have caught up
with the send location) and the write/flush/apply positions remain unchanged
across two consecutive reply messages.

The second message with unchanged positions typically results from
wal_receiver_status_interval, so lag values are cleared after that interval
when there is no activity. This avoids showing stale lag data while preventing
premature NULL values.

Even with this fix, lag may rarely become NULL during activity if identical
position reports are sent repeatedly. Eliminating such duplicate messages
would address this fully, but that change is considered too invasive for stable
branches and will be handled in master only later.

Backpatch to all supported branches.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com
Backpatch-through: 14

Fix copy-paste error in test_ginpostinglist

The check for a mismatch on the second decoded item pointer
was an exact copy of the first item pointer check, comparing
orig_itemptrs[0] with decoded_itemptrs[0] instead of orig_itemptrs[1]
with decoded_itemptrs[1]. The error message also reported (0, 1) as
the expected value instead of (blk, off). As a result, any decoding
error in the second item pointer (where the varbyte delta encoding
is exercised) would go undetected.

This has been wrong since commit bde7493d1, so backpatch to all
supported versions.

Author: Jianghua Yang <yjhjstz@gmail.com>
Discussion: https://postgr.es/m/CAAZLFmSOD8R7tZjRLZsmpKtJLoqjgawAaM-Pne1j8B_Q2aQK8w@mail.gmail.com
Backpatch-through: 14

Fix multixact backwards-compatibility with CHECKPOINT race condition

If a CHECKPOINT record with nextMulti N is written to the WAL before
the CREATE_ID record for N, and N happens to be the first multixid on
an offset page, the backwards compatibility logic to tolerate WAL
generated by older minor versions (before commit 789d65364c) failed to
compensate for the missing XLOG_MULTIXACT_ZERO_OFF_PAGE record. In
that case, the latest_page_number was initialized at the start of WAL
replay to the page for nextMulti from the CHECKPOINT record, even if
we had not seen the CREATE_ID record for that multixid yet, which
fooled the backwards compatibility logic to think that the page was
already initialized.

To fix, track the last XLOG_MULTIXACT_ZERO_OFF_PAGE that we've seen
separately from latest_page_number. If we haven't seen any
XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, use
SimpleLruDoesPhysicalPageExist() to check if the page needs to be
initialized.

Reported-by: duankunren.dkr <duankunren.dkr@alibaba-inc.com>
Analyzed-by: duankunren.dkr <duankunren.dkr@alibaba-inc.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/c4ef1737-8cba-458e-b6fd-4e2d6011e985.duankunren.dkr@alibaba-inc.com
Backpatch-through: 14-18

Fix finalization of decompressor astreamers.

Send the correct amount of data to the next astreamer, not the
whole allocated buffer size. This bug escaped detection because
in present uses the next astreamer is always a tar-file parser
which is insensitive to trailing garbage. But that may not
be true in future uses.

Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Backpatch-through: 15

Fix dependency on FDW handler.

ALTER FOREIGN DATA WRAPPER could drop the dependency on the handler
function if it wasn't explicitly specified.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/35c44a4b7fb76d35418c4d66b775a88f4ce60c86.camel@j-davis.com
Backpatch-through: 14

Fix WAL flush LSN used by logical walsender during shutdown

Commit 6eedb2a5fd8 made the logical walsender call
XLogFlush(GetXLogInsertRecPtr()) to ensure that all pending WAL is flushed,
fixing a publisher shutdown hang. However, if the last WAL record ends at
a page boundary, GetXLogInsertRecPtr() can return an LSN pointing past
the page header, which can cause XLogFlush() to report an error.

A similar issue previously existed in the GiST code. Commit b1f14c96720
introduced GetXLogInsertEndRecPtr(), which returns a safe WAL insertion end
location (returning the start of the page when the last record ends at a page
boundary), and updated the GiST code to use it with XLogFlush().

This commit fixes the issue by making the logical walsender use
XLogFlush(GetXLogInsertEndRecPtr()) when flushing pending WAL during shutdown.

Backpatch to all supported versions.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux
Backpatch-through: 14

Tighten asserts on ParallelWorkerNumber

The comment about ParallelWorkerNumbr in parallel.c says:

  In parallel workers, it will be set to a value >= 0 and < the number
  of workers before any user code is invoked; each parallel worker will
  get a different parallel worker number.

However asserts in various places collecting instrumentation allowed
(ParallelWorkerNumber == num_workers). That would be a bug, as the value
is used as index into an array with num_workers entries.

Fixed by adjusting the asserts accordingly. Backpatch to all supported
versions.

Discussion: https://postgr.es/m/5db067a1-2cdf-4afb-a577-a04f30b69167@vondra.me
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 14