Tom Lane [Sun, 22 Mar 2026 22:24:42 +0000 (18:24 -0400)]
Fix assorted bugs in archive_waldump.c.
1. archive_waldump.c called astreamer_finalize() nowhere. This meant
that any data retained in decompression buffers at the moment we
detect archive EOF would never reach astreamer_waldump_content(),
resulting in surprising failures if we actually need the last few
bytes of the archive file.
To fix that, make read_archive_file() do the finalize once it detects
EOF. Change its API to return a boolean "yes there's more data"
rather than the entirely-misleading raw count of bytes read.
2. init_archive_reader() relied on privateInfo->cur_file to track
which WAL segment was being read, but cur_file can become NULL if a
member trailer is processed during a read_archive_file() call. This
could cause unreproducible "could not find WAL in archive" failures,
particularly with compressed archives where all the WAL data fits in
a small number of compressed bytes.
Fix by scanning the hash table after each read to find any cached
WAL segment with sufficient data, instead of depending on cur_file.
Also reduce the minimum data requirement from XLOG_BLCKSZ to
sizeof(XLogLongPageHeaderData), since we only need the long page
header to extract the segment size.
We likewise need to fix init_archive_reader() to scan the whole
hash table for irrelevant entries, since we might have already
loaded more than one entry when the data is compressible enough.
3. get_archive_wal_entry() relied on tracking cur_file to identify
WAL hash table entries that need to be spilled to disk. However,
this can't work for entries that are read completely within a
single read_archive_file call: the caller will never see cur_file
pointing at such an entry. Instead, scan the WAL hash table to
find entries we should spill. This also fixes a buglet that any
hash table entries completely loaded during init_archive_reader
were never considered for spilling.
Also, simplify the logic tremendously by not attempting to spill
entries that haven't been read fully. I am not convinced that the old
logic handled that correctly in every path, and it's really not worth
the complication and risk of bugs to try to spill entries on the fly.
We can just write them in a single go once they are no longer the
cur_file.
4. Fix a rather critical performance problem: the code thought that
resetStringInfo() will reclaim storage, but it doesn't. So by the
end of the run we'd have consumed storage space equal to the total
amount of WAL read, negating all the effort of the spill logic.
Also document the contract that cur_file can change (or become NULL)
during a single read_archive_file() call, since the decompression
pipeline may produce enough output to trigger multiple astreamer
callbacks.
Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Tom Lane [Sun, 22 Mar 2026 22:13:41 +0000 (18:13 -0400)]
Remove nonfunctional tar file trailer size check.
The ASTREAMER_ARCHIVE_TRAILER case in astreamer_tar_parser_content()
intended to reject tar files whose trailer exceeded 2 blocks. However,
the check compared 'len' after astreamer_buffer_bytes() had already
consumed all the data and set len to 0, so the pg_fatal() could never
fire.
Moreover, per the POSIX specification for the ustar format, the last
physical block of a tar archive is always full-sized, and "logical
records after the two zero logical records may contain undefined data."
GNU tar, for example, zero-pads its output to a 10kB boundary by
default. So rejecting extra data after the two zero blocks would be
wrong even if the check worked. (But if the check had worked, it
would have alerted us to the bug just fixed in 9aa1fcc54.)
Remove the dead check and update the comment to explain why trailing
data is expected and harmless.
Per report from Tom Lane.
Author: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Tom Lane [Sun, 22 Mar 2026 22:06:48 +0000 (18:06 -0400)]
Fix finalization of decompressor astreamers.
Send the correct amount of data to the next astreamer, not the
whole allocated buffer size. This bug escaped detection because
in present uses the next astreamer is always a tar-file parser
which is insensitive to trailing garbage. But that may not
be true in future uses.
Author: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2178517.1774064942@sss.pgh.pa.us
Backpatch-through: 15
Peter Geoghegan [Sun, 22 Mar 2026 21:31:43 +0000 (17:31 -0400)]
Add fake LSN support to hash index AM.
Use fake LSNs in all hash AM critical sections that write a WAL record.
This gives us a reliable way (a way that works during scans of both
logged and unlogged relations) to detect when an index page was
concurrently modified during the window between when the page is
initially read (by _hash_readpage) and when the page has any known-dead
items LP_DEAD-marked (by _hash_kill_items).
Preparation for an upcoming patch that makes the hash index AM use the
amgetbatch interface, enabling I/O prefetching during hash index scans.
The amgetbatch design imposes certain rules on index AMs with respect to
how they hold on to index page buffer pins (at least in the case of pins
held as an interlock against unsafe concurrent TID recycling by VACUUM).
These rules have consequences for routines that set LP_DEAD bits on
index tuples from an amgetbatch index AM: such routines have an inherent
need to reason about concurrent TID recycling by VACUUM, but can no
longer rely on their amgettuple routine holding on to a buffer pin
(during the aforementioned window) as an interlock against such
recycling. Instead, they have to follow a new, standardized approach.
The new approach taken by amgetbatch index AMs when setting LP_DEAD bits
is heavily based on the current nbtree dropPin design, which was added
by commit 2ed5b87f. It also works by checking if the page's LSN
advanced during the window where unsafe concurrent TID recycling might
have taken place.
This commit is similar to commit 8a879119, which taught nbtree to use
fake LSNs to improve its dropPin behavior. However, unlike that commit,
this is not an independently useful enhancement, since hash doesn't
implement anything like nbtree's dropPin behavior (not yet).
Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com
Melanie Plageman [Sun, 22 Mar 2026 19:46:50 +0000 (15:46 -0400)]
Add pruning fast path for all-visible and all-frozen pages
Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work to do. To avoid this, if a page is already all-frozen or
it is all-visible and no freezing will be attempted, exit early. We
can't exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.
Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
Peter Geoghegan [Sun, 22 Mar 2026 17:20:29 +0000 (13:20 -0400)]
Make IndexScanInstrumentation a pointer in executor scan nodes.
Change the IndexScanInstrumentation fields in IndexScanState,
IndexOnlyScanState, and BitmapIndexScanState from inline structs to
pointers. This avoids additional space overhead whenever new fields are
added to IndexScanInstrumentation in the future, at least in the common
case where the instrumentation isn't used (i.e. when the executor node
isn't being run through an EXPLAIN ANALYZE).
Preparation for an upcoming patch series that will add index
prefetching. The new slot-based interface that will enable index
prefetching necessitates that we add at least one more field to
IndexScanInstrumentation (to count heap fetches during index-only
scans).
Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
Melanie Plageman [Sun, 22 Mar 2026 15:52:40 +0000 (11:52 -0400)]
Detect and fix visibility map corruption in more cases
Move VM corruption detection and repair into heap page pruning. This
allows VM repair during on-access pruning, not only during vacuum.
Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or deleted by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.
Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once,
and a single VM page covers a large number of heap pages.
Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
Don't reset 'latest_page_number' when replaying multixid truncation
'latest_page_number' is set to the correct value, according to
nextOffset, early at system startup. Contrary to the comment, it hence
should be set up correctly by the time we get to WAL replay.
This was committed to back-branches earlier already (commit 817f74600d), to fix a bug in a backwards-compatibility codepath. We
don't have that bug on 'master', but the change nevertheless makes
sense on 'master' too.
Michael Paquier [Sun, 22 Mar 2026 06:24:33 +0000 (15:24 +0900)]
Add test for single-page VACUUM of hash index on INSERT
_hash_vacuum_one_page() in hashinsert.c is a routine related to hash
indexes that can perform a single-page VACUUM when dead tuples are
detected during index insertion. This routine previously had no test
coverage, and this commit adds a test case for that purpose.
To safely create dead tuples in a way that works with parallel tests,
this uses a technique based on a rollbacked INSERT, following a
suggestion by Heikki Linnakangas.
Author: Alexander Kuzmenkov <akuzmenkov@tigerdata.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CALzhyqxrc1ZHYmf5V8NE+yMboqVg7xZrQM7K2c7VS0p1v8z42w@mail.gmail.com
Michael Paquier [Sat, 21 Mar 2026 05:34:47 +0000 (14:34 +0900)]
Move declarations related to locktags from lock.h to new locktag.h
This commit moves all the declarations related to locktags from lock.h
to a new header called locktag.h. This header is useful so as code
paths that care about locktags but not the lock hashtable can know about
these without having to include lock.h and all its set of dependencies.
This move includes the basic locktag structures and the set of macros to
fill in the locktag fields before attempting to acquire a lock.
Based on a suggestion from me, suggestion done while discussing a
different feature.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/abufUya2oK-_PJ3E@paquier.xyz
Tom Lane [Fri, 20 Mar 2026 22:23:45 +0000 (18:23 -0400)]
plpgsql: optimize "SELECT simple-expression INTO var".
Previously, we always fed SELECT ... INTO to the SPI machinery.
While that works for all cases, it's a great deal slower than
the otherwise-equivalent "var := expression" if the expression
is "simple" and the INTO target is a single variable. Users
coming from MSSQL or T_SQL are likely to be surprised by this;
they are used to writing SELECT ... INTO since there is no
"var := expression" syntax in those dialects. Hence, check for
a simple expression and use the faster code path if possible.
(Here, "simple" means whatever exec_is_simple_query accepts,
which basically means "SELECT scalar-expression" without any
input tables, aggregates, qual clauses, etc.)
This optimization is not entirely transparent. Notably, one of
the reasons it's faster is that the hooks that pg_stat_statements
uses aren't called in this path, so that the evaluated expression
no longer appears in pg_stat_statements output as it did before.
There may be some other minor behavioral changes too, although
I tried hard to make error reporting look the same. Hopefully,
none of them are significant enough to not be acceptable as
routine changes in a PG major version.
Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRDieSQOPDHD_svvR75875uRejS9cN87FoAC3iXMXS1saQ@mail.gmail.com
Andrew Dunstan [Fri, 20 Mar 2026 19:31:35 +0000 (15:31 -0400)]
pg_verifybackup: Enable WAL parsing for tar-format backups
Now that pg_waldump supports reading WAL from tar archives, remove the
restriction that forced --no-parse-wal for tar-format backups.
pg_verifybackup now automatically locates the WAL archive: it looks for
a separate pg_wal.tar first, then falls back to the main base.tar. A
new --wal-path option (replacing the old --wal-directory, which is kept
as a silent alias) accepts either a directory or a tar archive path.
The default WAL directory preparation is deferred until the backup
format is known, since tar-format backups resolve the WAL path
differently from plain-format ones.
Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
Andrew Dunstan [Fri, 20 Mar 2026 19:31:35 +0000 (15:31 -0400)]
pg_waldump: Add support for reading WAL from tar archives
pg_waldump can now accept the path to a tar archive (optionally
compressed with gzip, lz4, or zstd) containing WAL files and decode
them. This was added primarily for pg_verifybackup, which previously
had to skip WAL parsing for tar-format backups.
The implementation uses the existing archive streamer infrastructure
with a hash table to track WAL segments read from the archive. If WAL
files within the archive are not in sequential order, out-of-order
segments are written to a temporary directory (created via mkdtemp under
$TMPDIR or the archive's directory) and read back when needed. An
atexit callback ensures the temporary directory is cleaned up.
The --follow option is not supported when reading from a tar archive.
Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
Andrew Dunstan [Fri, 20 Mar 2026 19:31:35 +0000 (15:31 -0400)]
Move tar detection and compression logic to common.
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.
This change promotes code reuse and simplifies maintenance across
frontend tools.
Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
Nathan Bossart [Fri, 20 Mar 2026 19:15:33 +0000 (14:15 -0500)]
Bump transaction/multixact ID warning limits to 100M.
These warning limits were last changed to 40M by commit cd5e82256d.
For the benefit of workloads that rapidly consume transactions or
multixacts, this commit bumps the limits to 100M. This will
hopefully give users enough time to react.
Nathan Bossart [Fri, 20 Mar 2026 19:15:33 +0000 (14:15 -0500)]
Add percentage of available IDs to wraparound warnings.
This commit adds DETAIL messages to the existing wraparound
WARNINGs that include the percentage of transaction/multixact IDs
that remain available for use. The hope is that this more clearly
expresses the urgency of the situation.
Tom Lane [Fri, 20 Mar 2026 18:50:53 +0000 (14:50 -0400)]
Discount the metapage when estimating number of index pages visited.
genericcostestimate() estimates the number of index leaf pages to
be visited as a pro-rata fraction of the total number of leaf pages.
Or at least that was the intention. What it actually used in the
calculation was the total number of index pages, so that non-leaf
pages were also counted. In a decent-sized index the error is
probably small, since we expect upper page fanout to be high.
But in a small index that's not true; in the worst case with one
data-bearing page plus a metapage, we had 100% relative error.
This led to surprising planning choices such as not using a small
partial index.
To fix, ask genericcostestimate's caller to supply an estimate of
the number of non-leaf pages, and subtract that. For the built-in
index AMs, it seems sufficient to count the index metapage (if the
AM uses one) as non-leaf. Per the above argument, counting upper
index pages shouldn't change the estimate much, and in most cases
we don't have any easy way of estimating the number of upper pages.
This might be an area for further research in future.
Any external genericcostestimate callers that do not set the new field
GenericCosts.numNonLeafPages will see the same behavior as before,
assuming they followed the advice to zero out that whole struct.
Unsurprisingly, this change affects a number of plans seen in the
core regression tests. I hacked up the existing tests to keep the
tests' plans the same, since in each case it appeared that the
test's intent was to test exactly that plan. Also add one new
test case demonstrating that a better index choice is now made.
Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://postgr.es/m/870521.1745860752@sss.pgh.pa.us
Fix self-join removal to update bare Var references in join clauses
Self-join removal failed to update Var nodes when the join clause was a
bare Var (e.g., ON t1.bool_col) rather than an expression containing
Vars. ChangeVarNodesWalkExpression() used expression_tree_walker(),
which descends into child nodes but does not process the top-level node
itself. When a bare Var referencing the removed relation appeared as
the clause, its varno was left unchanged, leading to "no relation entry
for relid N" errors.
Fix by calling ChangeVarNodes_walker() directly instead of
expression_tree_walker(), so the top-level node is also processed.
Bug: #19435 Reported-by: Hang Ammmkilo <ammmkilo@163.com>
Author: Andrei Lepikhov <lepihov@gmail.com> Co-authored-by: Tender Wang <tndrwang@gmail.com> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/19435-3cc1a87f291129f1%40postgresql.org
Backpatch-through: 18
Robert Haas [Fri, 20 Mar 2026 12:39:55 +0000 (08:39 -0400)]
test_plan_advice: Set TAP test priority 50 in meson.build.
Since this runs the main regression tests, it can take some time to
complete. Therefore, it's better to start it earlier, as we also do
for the main regression test suite.
Andrew Dunstan [Mon, 16 Mar 2026 20:51:12 +0000 (16:51 -0400)]
Add option force_array for COPY JSON FORMAT
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.
When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the
default behavior is to output a stream of independent JSON objects.
Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
Author: Joe Conway <mail@joeconway.com>
Author: jian he <jian.universality@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Florents Tselai <florents.tselai@gmail.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
Andrew Dunstan [Mon, 16 Mar 2026 20:50:24 +0000 (16:50 -0400)]
json format for COPY TO
This introduces the JSON format option for the COPY TO command, allowing
users to export query results or table data directly as a stream of JSON
objects (one per line, NDJSON style).
The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.
JSON format is incompatible with some standard text/CSV formatting
options, including HEADER, DEFAULT, NULL, DELIMITER, FORCE QUOTE,
FORCE NOT NULL, and FORCE NULL.
Column list support is included: when a column list is specified, only
the named columns are emitted in each JSON object.
Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
Author: Joe Conway <mail@joeconway.com>
Author: jian he <jian.universality@gmail.com> Co-Authored-By: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Davin Shearer <davin@apache.org> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
Andrew Dunstan [Mon, 16 Mar 2026 20:49:01 +0000 (16:49 -0400)]
introduce CopyFormat, refactor CopyFormatOptions
Currently, the COPY command format is determined by two boolean fields
(binary, csv_mode) in CopyFormatOptions. This approach, while
functional, isn't ideal for implementing other formats in the future.
To simplify adding new formats, introduce a CopyFormat enum. This makes
the code cleaner and more maintainable, allowing for easier integration
of additional formats down the line.
Author: Joel Jacobson <joel@compiler.org>
Author: jian he <jian.universality@gmail.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
Peter Eisentraut [Fri, 20 Mar 2026 09:54:35 +0000 (10:54 +0100)]
Cleanup users and roles in graph_table_rls test
This test leaves behind the roles and users it creates.
002_pg_upgrade test dumps and restore the regression when
PG_TEST_EXTRA contains regress_dump_restore. The global objects such
as users and roles are not dumped by pg_dump. But it still dumps the
policies associated with users, and commands to set the ownership.
Restoring these policies and the ownerships fails since the users and
roles do not exist. To fix this failure we could use --no-owner, but
it does not exclude the policy objects associated with users. Hence
drop the users, roles and policies that depend upon them at the end of
the test.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reported-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
Peter Eisentraut [Fri, 20 Mar 2026 07:42:50 +0000 (08:42 +0100)]
Update Unicode data to Unicode 17.0.0
Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Alexander Borisov <lex.borisov@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/2a668979-ed92-49a3-abf9-a3ec2d460ec2%40eisentraut.org
Amit Kapila [Fri, 20 Mar 2026 06:06:09 +0000 (11:36 +0530)]
Add support for EXCEPT TABLE in ALTER PUBLICATION.
Following commit fd366065e0, which added EXCEPT TABLE support to
CREATE PUBLICATION, this commit extends ALTER PUBLICATION to allow
modifying the exclusion list.
New Syntax:
ALTER PUBLICATION name SET publication_all_object [, ... ]
where publication_all_object is one of:
ALL TABLES [ EXCEPT TABLE ( except_table_object [, ... ] ) ]
ALL SEQUENCES
If the EXCEPT clause is provided, the existing exclusion list in
pg_publication_rel is replaced with the specified relations. If the
EXCEPT clause is omitted, any existing exclusions for the publication
are cleared. Similarly, SET ALL SEQUENCES updates
Note that because this is a SET command, specifying only one object
type (e.g., SET ALL SEQUENCES) will reset the other unspecified flags
(e.g., setting puballtables to false).
Consistent with CREATE PUBLICATION, only root partitioned tables or
standard tables can be specified in the EXCEPT list. Specifying a
partition child will result in an error.
Author: vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3=JrucjhiiwsYQw5-PGtBHFONa6F7hhWCXMsGvh=tamA@mail.gmail.com
David Rowley [Fri, 20 Mar 2026 01:16:06 +0000 (14:16 +1300)]
Fix new tuple deforming code so it can support cstrings again
In c456e3911, I mistakenly thought that the deformer code would never
see cstrings and that I could use pg_assume() to have the compiler omit
producing code for attlen == -2 attributes. That saves bloating the
deforming code a bit with the extra check and strlen() call. While this
is ok to do for tuples from the heap, it's not ok to do for
MinimalTuples as those *can* contain cstrings and
tts_minimal_getsomeattrs() implements deforming by inlining the
(slightly misleadingly named) slot_deform_heap_tuple() code.
To fix, add a new parameter to the slot_deform_heap_tuple() and have the
callers define which code to inline. Because this new parameter is
passed as a const, the compiler can choose to emit or not emit the
cstring-related code based on the parameter's value.
Author: David Rowley <dgrowleyml@gmail.com> Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNmSK+gKziAt_WvQoMVWt3_LRVMmRYY9dAbMPMcpPV0QmA@mail.gmail.com
Masahiko Sawada [Thu, 19 Mar 2026 22:01:47 +0000 (15:01 -0700)]
Add parallel vacuum worker usage to VACUUM (VERBOSE) and autovacuum logs.
This commit adds both the number of parallel workers planned and the
number of parallel workers actually launched to the output of
VACUUM (VERBOSE) and autovacuum logs.
Previously, this information was only reported as an INFO message
during VACUUM (VERBOSE), which meant it was not included in autovacuum
logs in practice. Although autovacuum does not yet support parallel
vacuum, a subsequent patch will enable it and utilize these logs in
its regression tests. This change also improves observability by
making it easier to verify if parallel vacuum is utilizing the
expected number of workers.
Masahiko Sawada [Thu, 19 Mar 2026 20:51:50 +0000 (13:51 -0700)]
Allow explicit casting between bytea and uuid.
This enables the use of functions such as encode() and decode() with
UUID values, allowing them to be converted to and from alternative
formats like base64 or hex.
The cast maps the 16-byte internal representation of a UUID directly
to a bytea datum. This is more efficient than going through a text
forepresentation.
Tom Lane [Thu, 19 Mar 2026 19:21:36 +0000 (15:21 -0400)]
Improve hash join's handling of tuples with null join keys.
In a plain join, we can just summarily discard an input tuple
with null join key(s), since it cannot match anything from
the other side of the join (assuming a strict join operator).
However, if the tuple comes from the outer side of an outer join
then we have to emit it with null-extension of the other side.
Up to now, hash joins did that by inserting the tuple into the hash
table as though it were a normal tuple. This is unnecessarily
inefficient though, since the required processing is far simpler than
for a potentially-matchable tuple. Worse, if there are a lot of such
tuples they will bloat the hash bucket they go into, possibly causing
useless repeated attempts to split that bucket or increase the number
of batches. We have a report of a large join vainly creating many
thousands of batches when faced with such input.
This patch improves the situation by keeping such tuples out of the
hash table altogether, instead pushing them into a separate tuplestore
from which we return them later. (One might consider trying to return
them immediately; but that would require substantial refactoring, and
it doesn't work anyway for cases where we rescan an unmodified hash
table.) This works even in parallel hash joins, because whichever
worker reads a null-keyed tuple can just return it; there's no need
for consultation with other workers. Thus the tuplestores are local
storage even in a parallel join.
A pre-existing buglet that I noticed while analyzing the code's
behavior is that ExecHashRemoveNextSkewBucket fails to decrement
hashtable->skewTuples for tuples moved into the main hash table
from the skew hash table. This invalidates ExecHashTableInsert's
calculation of the number of main-hash-table tuples, though probably
not by a lot since we expect the skew table to be small relative
to the main one. Nonetheless, let's fix that too while we're here.
Bug: #18909 Reported-by: Sergey Koposov <Sergey.Koposov@ed.ac.uk>
Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3061845.1746486714@sss.pgh.pa.us
which is problematic for two reasons. First, if dup() succeeds but
fdopen() fails, we'd leak the duplicated FD. That's not important
at present since the program will just exit immediately after failure
anyway; but perhaps someday we'll try to continue, making the resource
leak potentially significant. Second, if dup() fails then fdopen()
will overwrite the useful errno (perhaps EMFILE) with a misleading
value EBADF, making it difficult to understand what went wrong.
Fix both issues by testing for dup() failure before proceeding to
the next call.
These failures are sufficiently unlikely, and the consequences minor
enough, that this doesn't seem worth the effort to back-patch.
But let's fix it in HEAD.
Author: Jianghua Yang <yjhjstz@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/62bbe34d-2315-4b42-b768-56d901aa83e1@gmail.com
Nathan Bossart [Thu, 19 Mar 2026 16:41:39 +0000 (11:41 -0500)]
Allow choosing specific grantors via GRANT/REVOKE ... GRANTED BY.
Except for GRANT and REVOKE on roles, the GRANTED BY clause
currently only accepts the current role to match the SQL standard.
And even if an acceptable grantor (i.e., the current role) is
specified, Postgres ignores it and chooses the "best" grantor for
the command. Allowing the user to select a specific grantor would
allow better control over the precise behavior of GRANT/REVOKE
statements. This commit adds that ability. For consistency with
select_best_grantor(), we only permit choosing grantor roles for
which the current role inherits privileges.
Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aRYLkTpazxKhnS_w%40nathan
Robert Haas [Thu, 19 Mar 2026 15:51:17 +0000 (11:51 -0400)]
dshash: Make it possible to suppress out of memory errors
Introduce dshash_find_or_insert_extended, which is just like
dshash_find_or_insert except that it takes a flags argument.
Currently, the only supported flag is DSHASH_INSERT_NO_OOM, but
I have chosen to use an integer rather than a boolean in case we
end up with more flags in the future.
Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoaJwUukUZGu7_yL74oMTQQz2=zqucMhF9+9xBmSC5us1w@mail.gmail.com
Tom Lane [Thu, 19 Mar 2026 15:37:14 +0000 (11:37 -0400)]
Fix transient memory leakage in jsonpath evaluation.
This patch reimplements JsonValueList to be more space-efficient
and arranges for temporary JsonValueLists created during jsonpath
evaluation to be freed when no longer needed, rather than being
leaked till the end of the function evaluation cycle as before.
The motivation is to prevent indefinite memory bloat while
evaluating jsonpath expressions that traverse a lot of data.
As an example, this query
SELECT
jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i),
'$[*] ? (@ < $)');
formerly required about 6GB to execute, with the space required
growing quadratically with the length of the input array.
With this patch the memory consumption stays static. (The time
required is still quadratic, but we can't do much about that: this
path expression asks to compare each array element to each other one.)
The bloat happens because we construct a JsonValueList containing all
the array elements to represent the second occurrence of "$", and then
just leak it after evaluating the filter expression for any one value
generated from "$[*]". If I were implementing this functionality from
scratch I'd probably try to avoid materializing that representation at
all, but changing that now looks like more trouble than it's worth.
This patch takes the more conservative approach of just making sure
we free the list after we're done with it.
The existing representation of JsonValueList is neither especially
compact nor especially easy to free: it's a List containing pointers
to separately-palloc'd JsonbValue structs. We could theoretically
use list_free_deep, but it's not 100% clear that all the JsonbValues
are always safe for us to free. In any case we are talking about a
lot of palloc/pfree traffic if we keep it like this. This patch
replaces that with what's essentially an expansible array of
JsonbValues, so that even a long list requires relatively few
palloc requests. Also, for the very common case that only one or
two elements appear in the list, this representation uses *zero*
pallocs: the elements can be kept in the on-the-stack base struct.
Note that we are only interested in freeing the JsonbValue structs
themselves. While many types of JsonbValue include pointers to
external data such as strings or numerics, we expect that that data
is part of the original jsonb input Datum(s) and need not (indeed
cannot) be freed here.
In this reimplementation, JsonValueListAppend() always copies the
supplied JsonbValue struct into the JsonValueList data. This allows
simplifying and regularizing many call sites that sometimes palloc'd
JsonbValues and sometimes passed a local-variable JsonbValue. Always
doing the latter is simpler, faster, and less bug-prone.
I also removed JsonValueListLength() in favor of constant-time tests
for whether the list has zero, one, or more than one member, which is
what the callers really need to know. JsonValueListLength() was not
a hot code path, so this aspect of the patch won't move the needle in
the least performance-wise. But it seems neater.
I've not done any wide-ranging performance testing, but this should
be faster than the old code thanks to reduction of palloc overhead.
On the specific example shown above, it's about twice as fast as
before on not-very-large inputs; and of course it wins big if you
consider an input large enough to drive the old code into swapping.
Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/569394.1773783211@sss.pgh.pa.us
Peter Eisentraut [Thu, 19 Mar 2026 05:34:27 +0000 (06:34 +0100)]
Add some const qualifiers enabled by typeof_unqual change on copyObject
The recent commit to change copyObject() to use typeof_unqual allows
cleaning up some APIs to take advantage of this improved qualifier
handling. EventTriggerCollectSimpleCommand() is a good example: It
takes a node tree and makes a copy that it keeps around for its
internal purposes, but it can't communicate via its function signature
that it promises not scribble on the passed node tree. That is now
fixed.
Reviewed-by: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
David Rowley [Thu, 19 Mar 2026 04:16:36 +0000 (17:16 +1300)]
Short-circuit row estimation in NOT IN containing NULL consts
ScalarArrayOpExpr used for either NOT IN or <>/= ALL, when the array
contains a NULL constant, will never evaluate to true. Here we add an
explicit short-circuit in scalararraysel() to account for this and return
0.0 rows when we see that a NULL exists. When the array is a constant,
we can very quickly see if there are any NULL values and return early
before going to much effort in scalararraysel(). For non-const arrays,
we short-circuit after finding the first NULL and forego selectivity
estimations of any remaining elements.
In the future, it might be better to do something for this case in
constant folding. We would need to be careful to only do this for
strict operators on expressions located in places that don't care about
distinguishing false from NULL returns. i.e. EXPRKIND_QUAL expressions.
Doing that requires a bit more thought and effort, so here we just fix
some needlessly slow selectivity estimations for ScalarArrayOpExpr
containing many array elements and at least one NULL.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/eaa2598c-5356-4e1e-9ec3-5fd6eb1cd704@tantorlabs.com
Michael Paquier [Thu, 19 Mar 2026 04:03:30 +0000 (13:03 +0900)]
test_saslprep: Test module for SASLprep()
This module includes two functions:
- test_saslprep(), that performs pg_saslprep on a bytea.
- test_saslprep_ranges(), able to check for all valid ranges of UTF-8
codepoints pg_saslprep() handles each one of them.
This provides a detailed coverage of our implementation of SASLprep()
used for SCRAM, with:
- ASCII characters.
- Incomplete UTF-8 sequences, for 390b3cbbb2af (later backpatched).
- A more advanced check for all the valid UTF-8 ranges of codepoints, to
check for cases where these generate an empty password, based on an
original suggestion from Heikki Linnakangas. This part consumes
resources and time, so it is implemented as a TAP test under a
new PG_TEST_EXTRA value.
A different patch is still under discussion to tweak our internal
SASLprep() implementation, and this module can be used to track any
changes in behavior.
Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aaEJ-El2seZHeFcG@paquier.xyz
Michael Paquier [Thu, 19 Mar 2026 02:39:31 +0000 (11:39 +0900)]
Add more debugging information for bgworker termination tests of worker_spi
widowbird has failed again after af8837a10bc7, with the same symptoms of
a backend still lying around when attempting a database rename with a
bgworker connected to the database being renamed.
We are still not sure yet how the failure can be reached, if this is a
timing issue in the test or an actual bug in the logic used for
interruptible bgworkers. This commit adds more debugging information in
the backend to help with the analysis as a temporary measure.
Another thing I have noticed is that the queries launching the dynamic
bgworkers or checking pg_stat_activity would connect to the database
renamed. These are switched to use 'postgres'. That will hopefully
remove some of the friction of the test, but I doubt that this is the
end of the story.
Robert Haas [Wed, 18 Mar 2026 22:15:36 +0000 (18:15 -0400)]
pg_plan_advice: Fix multiple copy-and-paste-errors in test case.
The second half of this file is meant to test feedback, not
generated advice, and is meant to use the statements that it
prepares, not leftover prepared statements from earlier in the
file.
These mistakes resulted in failures under debug_discard_caches = 1,
because re-executing pt2 instead of executing pt4 for the first
time resulted in different output depending on whether the query
was replanned.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (per BF member avocet)
ssl: Skip passphrase reload tests in EXEC_BACKEND builds
SSL password command reloading must be enabled on Windows and in
EXEC_BACKEND builds due to them always reloading the context. The
new tests in commit 4f433025 skipped under Windows but missed the
EXEC_BACKEND check. Reported by buildfarm member culicidae.
Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+kXmCCgBWffzmSjaNhME5rD=gjyc_OP1FeWQTw2MmSNjg@mail.gmail.com
Tom Lane [Wed, 18 Mar 2026 19:25:12 +0000 (15:25 -0400)]
Fix -Wstrict-prototypes warning in ecpg_init_sqlca() declaration.
When headerscheck compiles ecpglib_extern.h, POSTGRES_ECPG_INTERNAL is
not defined, causing sqlca.h to expand "sqlca" as a macro
(*ECPGget_sqlca()). This causes the ecpg_init_sqlca() declaration to
trigger a -Wstrict-prototypes warning.
Fix by renaming the parameter from "sqlca" to "sqlca_p" in both the
declaration and definition, avoiding the macro expansion.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reported-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAN55FZ1VDwJ-ZD092ChYf%2B%2BhuP%2B-S3Cg45tJ8jNH5wx2c4BHAg%40mail.gmail.com
Nathan Bossart [Wed, 18 Mar 2026 19:22:15 +0000 (14:22 -0500)]
pg_restore: Remove unnecessary strlen() calls in options parsing.
Unlike pg_dump and pg_dumpall, pg_restore first checks whether the
argument passed to --format, --host, and --port is empty before
setting the corresponding variable. Consequently, pg_restore does
not error if given an empty format name, whereas pg_dump and
pg_dumpall do. Empty arguments for --host and --port are ignored
by all three applications, so this commit produces no functionality
changes there. This behavior should perhaps be reconsidered, but
that is left as a future exercise. As with other recent changes to
option handling for these applications (commits b2898baaf7, 7c8280eeb5, and be0d0b457c), no back-patch.
Tom Lane [Wed, 18 Mar 2026 17:10:14 +0000 (13:10 -0400)]
Exclude contrib/pg_plan_advice/pgpa_parser.h from headerscheck.
Like other Bison-written headers, it's not worth the trouble to
make this compilable standalone. (We might revisit this someday,
if we ever move up our minimum required Bison version.)
Peter Eisentraut [Wed, 18 Mar 2026 13:31:50 +0000 (14:31 +0100)]
Enable -Wstrict-prototypes and -Wold-style-definition by default
Those are available in all gcc and clang versions that support C11 and as C11
is required as of f5e0186f865c, then we can add them without capability test.
Peter Eisentraut [Wed, 18 Mar 2026 12:36:44 +0000 (13:36 +0100)]
Implement unaccent Unicode data update in meson
The meson/ninja update-unicode target did not cover the required
updates in contrib/unaccent/. This is fixed now.
Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/2a668979-ed92-49a3-abf9-a3ec2d460ec2%40eisentraut.org
Support for SNI was added to clientside libpq in 5c55dc8b4733 with the
sslsni parameter, but there was no support for utilizing it serverside.
This adds support for serverside SNI such that certificate/key handling
is available per host. A new config file, $datadir/pg_hosts.conf, is
used for configuring which certificate and key should be used for which
hostname. In order to use SNI the ssl_sni GUC must be set to on, when
it is off the ssl configuration works just like before. If ssl_sni is
enabled and pg_hosts.conf is non-empty it will take precedence over
the regular SSL GUCs, if it is empty or missing the regular GUCs will
be used just as before this commit with no hostname specific handling.
The TLS init hook is not compatible with ssl_sni since it operates on
a single TLS configuration and SNI break that assumption. If the init
hook and ssl_sni are both enabled, a WARNING will be issued.
Host configuration can either be for a literal hostname to match, non-
SNI connections using the no_sni keyword or a default fallback matching
all connections. By omitting no_sni and the fallback a strict mode
can be achieved where only connections using sslsni=1 and a specified
hostname are allowed.
CRL file(s) are applied from postgresql.conf to all configured hostnames.
Serverside SNI requires OpenSSL, currently LibreSSL does not support
the required infrastructure to update the SSL context during the TLS
handshake.
Author: Daniel Gustafsson <daniel@yesql.se> Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Dewei Dai <daidewei1970@163.com> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1C81CD0D-407E-44F9-833A-DD0331C202E5@yesql.se
These tests were originally written to test the SSL SNI patchset
but they have merit on their own since we lack coverage for these
scenarios in the non SNI case as well.
Author: Jacob Champion <jacob.champion@enterprisedb.com> Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/1C81CD0D-407E-44F9-833A-DD0331C202E5@yesql.se
Peter Eisentraut [Wed, 18 Mar 2026 10:09:14 +0000 (11:09 +0100)]
meson: Add {perl|python}_includespec to generated Makefile.global
This is meant to help enable headerscheck under meson, but can also be
useful in general, for example for third-party extension that might
use these values.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMSWrt-PoQt4sHryWrB1ViuGBJF_PpbjoSGrWR2Ry47bHNLDqg%40mail.gmail.com
Peter Eisentraut [Wed, 18 Mar 2026 07:54:35 +0000 (08:54 +0100)]
Allow setting the collation strength in ICU tailoring rules
There was a bug that if you created an ICU collation with tailoring
rules, any strength specification inside the rules was ignored. This
was because we called ucol_openRules() with UCOL_DEFAULT_STRENGTH for
the strength argument, which overrides the strength. This was because
of faulty guidance in the ICU documentation, which has since been
fixed. The correct invocation is to use UCOL_DEFAULT for the strength
argument.
This fixes bug #18771 and bug #19425.
Author: Daniel Verite <daniel@manitou-mail.org> Reported-by: Ruben Ruiz <ruben.ruizcuadrado@gmail.com> Reported-by: dorian.752@live.fr Reported-by: Todd Lang <Todd.Lang@D2L.com>
Discussion: https://www.postgresql.org/message-id/flat/YT2PPF959236618377A072745A280E278F4BE1DA@YT2PPF959236618.CANPRD01.PROD.OUTLOOK.COM
Discussion: https://www.postgresql.org/message-id/flat/18771-98bb23e455b0f367@postgresql.org
Discussion: https://www.postgresql.org/message-id/flat/19425-58915e19dacd4f40%40postgresql.org
David Rowley [Wed, 18 Mar 2026 04:22:05 +0000 (17:22 +1300)]
Move planner row-estimation tests to new planner_est.sql
Move explain_mask_costs() and the associated planner row-estimation
tests from misc_functions.sql to a new regression test file,
planner_est.sql.
Previously, there wasn't an ideal home for such tests, likely as there
were very few such tests due to width and selectivity estimations being
too dependent on statistics and hardware. That's not always the case, as
we have SupportRequestRows support functions. More such tests are
possibly on the way, so let's create a better home so that we don't have
to create the explain_mask_costs() function in each file we might have
added such tests to.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvphShGABn-3AoE36dTvGHW7gUpFSw0_ZZnH84wGCW3hHw@mail.gmail.com
Andrew Dunstan [Fri, 13 Mar 2026 03:51:26 +0000 (11:51 +0800)]
Allow IS JSON predicate to work with domain types
The IS JSON predicate only accepted the base types text, json, jsonb, and
bytea. Extend it to also accept domain types over those base types by
resolving through getBaseType() during parse analysis.
The base type OID is stored in the JsonIsPredicate node (as exprBaseType)
so the executor can dispatch to the correct validation path without
repeating the domain lookup at runtime.
When a non-supported type (or domain over a non-supported type) is used,
the error message displays the original type name as written by the user,
rather than the resolved base type.
Author: jian he <jian.universality@gmail.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CACJufxEk34DnJFG72CRsPPT4tsJL9arobX0tNPsn7yH28J=zQg@mail.gmail.com
Andres Freund [Tue, 17 Mar 2026 18:54:41 +0000 (14:54 -0400)]
Fix use of wrong variable in _hash_kill_items()
In 82467f627bd I somehow ended up using 'so->currPos.buf' instead of the 'buf'
variable, which is incorrect when the buffer is not already pinned. At the
very least this can lead to assertion failures
Unfortunately this shows that this code path was not covered. Expand
src/test/modules/index/specs/killtuples.spec to test it. Until now the
'result' step always reported either a 0 or 1 buffer accesses, but when
exercising hash overflows, more buffers are accessed. To avoid depending on
the precise number of accesses, change the result step to return whether there
were any heap accesses. That makes the change a lot more verbose, but still
seems worth it.
Reported-by: Alexander Kuzmenkov <akuzmenkov@tigerdata.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/vjtmvwvbxt7w5uyacxpzibpj65ewcb7uqaqbhd4arvnjbp5jqz%405ksdh6fsyqve
Discussion: https://postgr.es/m/b9de8d05-3b02-4a27-9b0b-03972fa4bfd3@iki.fi
Robert Haas [Tue, 17 Mar 2026 18:25:43 +0000 (14:25 -0400)]
pg_plan_advice: Avoid a crash under GEQO.
The previous code could allocate pgpa_sj_unique_rel objects in a context
that had too short a lifespan. Fix by allocating them (and any
associated List-related allocations) in the same context as the
pgpa_planner_state to which they are attached. We also need to copy
uniquerel->relids, because the associated RelOptInfo may also be
allocated within a short-lived context.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/a6e6d603-e847-44dc-acd5-879fb4570062@gmail.com
Robert Haas [Tue, 17 Mar 2026 18:00:45 +0000 (14:00 -0400)]
Test pg_plan_advice using a new test_plan_advice module.
The TAP test included in this new module runs the regression tests
with pg_plan_advice loaded. It arranges for each query to be planned
twice. The first time, we generate plan advice. The second time, we
replan the query using the resulting advice string. If the tests
fail, that means that using pg_plan_advice to tell the planner to
do what it was going to do anyway breaks something, which indicates
a problem either with pg_plan_advice or with the planner.
The test also enables pg_plan_advice.feedback_warnings, so that if the
plan advice fails to apply cleanly when the query is replanned, a
failure will occur.
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA%2BTgmoZzM2i%2Bp-Rxdphs4qx7sshn-kzxF91ASQ5duOo0dFRXLQ%40mail.gmail.com
Robert Haas [Tue, 17 Mar 2026 16:36:57 +0000 (12:36 -0400)]
pg_plan_advice: Always install pg_plan_advice.h, and in the right place
The Makefile failed to set HEADERS_pg_plan_advice, so the header wasn't
installed. Fixing that reveals another problem: since this is just a
loadable module, not an extension, the header file is installed into
$(includedir_server)/contrib rather than $(includedir_server)/extension.
While we have no existing cases of installing header files there, it
appears to be the intent of pgxs.mk. However, this is inconsistent with
meson.build, which was using dir_include_extension. Changing that to
dir_include_server / 'contrib' makes the install locations consistent
across the two builds.
Nathan Bossart [Tue, 17 Mar 2026 16:32:40 +0000 (11:32 -0500)]
pg_dump: Simplify query for retrieving attribute statistics.
This query fetches information from pg_stats, which did not return
table OIDs until recent commit 3b88e50d6c. Because of this, we had
to cart around arrays of schema and table names, and we needed an
extra filter clause to hopefully convince the planner to use the
correct index. With the introduction of pg_stats.tableid, we can
instead just use an array of OIDs, and we no longer need the extra
filter clause hack.
Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DcoCVy92QkVUUTLdo5eO2bMDtwMrzRn_8miAhX%2BuPaqXg%40mail.gmail.com
Peter Eisentraut [Tue, 17 Mar 2026 15:44:43 +0000 (16:44 +0100)]
Hardcode typeof_unqual to __typeof_unqual__ for clang
A new attempt was made in 63275ce84d2 to make typeof_unqual work on all
configurations of CC and CLANG. This re-introduced an old problem
though, where CLANG would only support __typeof_unqual__ but the
configure check for CC detected support for typeof_unqual.
This fixes that by always defining typeof_unqual as __typeof_unqual__
under clang.
Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
Andrew Dunstan [Tue, 10 Mar 2026 06:25:29 +0000 (14:25 +0800)]
make immutability tests in to_json and to_jsonb complete
Complete the TODOs in to_json_is_immutable() and to_jsonb_is_immutable()
by recursing into container types (arrays, composites, ranges, multiranges,
domains) to check element/sub-type mutability, rather than conservatively
returning "mutable" for all arrays and composites.
The shared logic is factored into a single json_check_mutability() function
in jsonfuncs.c, with the existing exported functions as thin wrappers.
Composite type inspection uses lookup_rowtype_tupdesc() (typcache) instead
of relation_open() to avoid unnecessary lock acquisition in the optimizer.
Range and multirange types are now also checked recursively: if the
subtype's conversion is immutable, the range is considered immutable
for JSON purposes, even though range_out is generically marked STABLE.
This is a behavioral change: range types with immutable subtypes (e.g.,
int4range) can now appear in expression indexes via JSON_ARRAY/JSON_OBJECT,
whereas previously they were conservatively rejected.
Add regression tests for JSON_ARRAY and JSON_OBJECT mutability with
expression indexes and generated columns, covering arrays, composites,
domains, ranges, multiranges and combinations thereof.
Author: Jian He <jian.universality@gmail.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CACJufxFz=OsXQdsMJ-cqoqspD9aJrwntsQP-U2A-UaV_M+-S9g@mail.gmail.com
Commitfest: https://commitfest.postgresql.org/patch/5759
Nathan Bossart [Tue, 17 Mar 2026 14:26:27 +0000 (09:26 -0500)]
Add more columns to pg_stats, pg_stats_ext, and pg_stats_ext_exprs.
This commit adds table OID and attribute number columns to
pg_stats, and it adds table OID and statistics object OID columns
to pg_stats_ext and pg_stats_ext_exprs. A proposed follow-up
commit would use pg_stats.tableid to simplify a query in pg_dump.
The others have no immediate purpose but may be useful later.
Bumps catversion.
Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM%3DcoCVy92QkVUUTLdo5eO2bMDtwMrzRn_8miAhX%2BuPaqXg%40mail.gmail.com
Peter Eisentraut [Tue, 17 Mar 2026 13:06:53 +0000 (14:06 +0100)]
Dump labels in reproducible order
In pg_get_propgraphdef(), sort the labels before writing out, for a
consistent dump order. Also, since we now have a list, we can get rid
of the separate table scan to get the count.
Don't leave behind files in src dir in 007_multixact_conversion.pl
pg_upgrade test 007_multixact_conversion.pl was leaving files like
delete_old_cluster.sh in the source directory for VPATH and meson
builds. To fix, change the tmp_check directory before running the
test, like in the other pg_upgrade tests.
Michael Paquier [Tue, 17 Mar 2026 08:38:55 +0000 (17:38 +0900)]
gen_guc_tables.pl: Improve detection of inconsistent data
This commit adds two improvements to gen_guc_tables.pl:
1) When finding two entries with the same name, the script complained
about these being not in alphabetical order, which was confusing.
Duplicated entries are now reported as their own error.
2) While the presence of the required fields is checked for all the
parameters, the script did not perform any checks on the non-required
fields. A check is added to check that any field defined matches with
what can be accepted. Previously, a typo in the name of a required
field would cause the field to be reported as missing. Non-mandatory
fields would be silently ignored, which was problematic as we could lose
some information.
Author: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAN4CZFP=3xUoXb9jpn5OWwicg+rbyrca8-tVmgJsQAa4+OExkw@mail.gmail.com
Michael Paquier [Tue, 17 Mar 2026 05:34:29 +0000 (14:34 +0900)]
Refactor some code around ALTER TABLE [NO] INHERIT
[NO] INHERIT is not supported for partitioned tables, but this portion
of tablecmds.c did not apply the same rules as the other sub-commands,
checking the relkind in the execution phase, not the preparation phase.
This commit refactors the code to centralize the relkind and other
checks in the preparation phase for both command patterns, getting rid
of one translatable string on the way. ATT_PARTITIONED_TABLE is
removed from ATSimplePermissions(), and the child relation is checked
the same way for both sub-commands. The ALTER TABLE patterns that now
fail at preparation failed already at execution, hence there should be
no changes from the user perspective except more consistent error
messages generated.
Some comments at the top of ATPrepAddInherit() were incorrect,
CreateInheritance() being the routine checking the columns and
constraints between the parent and its to-be-child.
Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAEoWx2kggo1N2kDH6OSfXHL_5gKg3DqQ0PdNuL4LH4XSTKJ3-g@mail.gmail.com
Michael Paquier [Tue, 17 Mar 2026 03:56:46 +0000 (12:56 +0900)]
Tweak TAP test for worker terminations in worker_spi
The test has been reported as having a race condition for the case of a
worker that should be terminated after a database rename. Based on the
report received from buildfarm member jay, the database renamed is
accessed by a different session, preventing the ALTER DATABASE to
complete, ultimately failing the test.
Honestly, I am not completely sure what is the origin of this
disturbance, but two possibilities are an autovacuum or parallel worker
(due to debug_parallel_query being used by the host). In order to
(hopefully) stabilize the test, autovacuum and debug_parallel_query are
now disabled in the configuration of the node used in the test.
The failure is hard to reproduce, so it will take a few weeks to make
sure that the test has become stable. Let's see where it goes.
Reported-by: Aya Iwata <iwata.aya@fujitsu.com>
Discussion: https://postgr.es/m/OS3PR01MB8889505E2F3E443CCA4BD72EEA45A@OS3PR01MB8889.jpnprd01.prod.outlook.com
David Rowley [Tue, 17 Mar 2026 02:06:31 +0000 (15:06 +1300)]
Reduce size of CompactAttribute struct to 8 bytes
Previously, this was 16 bytes. With the use of some bitflags and by
reducing the attcacheoff field size to a 16-bit type, we can halve the
size of the struct.
It's unlikely that caching the offsets for offsets larger than what will
fit in a 16-bit int will help much as the tuple is very likely to have
some non-fixed-width types anyway, the offsets of which we cannot cache.
Shrinking this down to 8 bytes helps by accessing fewer cachelines when
performing tuple deformation. The fields used there are all fully
fledged fields, which don't require any bitmasking to extract the value
of. It also helps to more efficiently calculate the address of a
compact_attrs[] element in TupleDesc as the x86 LEA instruction can work
with 8 byte offsets, which allows the element address to be calculated
from the TupleDesc's address in a single instruction using LEA's
concurrent shift and add.
Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAApHDvodSVBj3ypOYbYUCJX%2BNWL%3DVZs63RNBQ_FxB_F%2B6QXF-A%40mail.gmail.com
Fujii Masao [Mon, 16 Mar 2026 23:10:20 +0000 (08:10 +0900)]
Fix WAL flush LSN used by logical walsender during shutdown
Commit 6eedb2a5fd8 made the logical walsender call
XLogFlush(GetXLogInsertRecPtr()) to ensure that all pending WAL is flushed,
fixing a publisher shutdown hang. However, if the last WAL record ends at
a page boundary, GetXLogInsertRecPtr() can return an LSN pointing past
the page header, which can cause XLogFlush() to report an error.
A similar issue previously existed in the GiST code. Commit b1f14c96720
introduced GetXLogInsertEndRecPtr(), which returns a safe WAL insertion end
location (returning the start of the page when the last record ends at a page
boundary), and updated the GiST code to use it with XLogFlush().
This commit fixes the issue by making the logical walsender use
XLogFlush(GetXLogInsertEndRecPtr()) when flushing pending WAL during shutdown.
Jeff Davis [Mon, 16 Mar 2026 20:42:55 +0000 (13:42 -0700)]
Clean up postgres_fdw/t/010_subscription.pl.
The test was based on test/subscription/002_rep_changes.pl, but had
some leftover copy+paste problems that were useless and/or
distracting.
Discussion: https://postgr.es/m/CAA4eK1+=V_UFNHwcoMFqzy0F4AtS9_GyXhQDUzizgieQPWr=0A@mail.gmail.com Reported-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
David Rowley [Mon, 16 Mar 2026 20:00:39 +0000 (09:00 +1300)]
Fix thinko in nocachegetattr() and nocache_index_getattr()
This code was recently adjusted by c456e3911, but that commit didn't get
the logic correct when finding the attnum to start walking the tuple in.
If there is a NULL, we need to start walking the tuple before it.
Author: David Rowley <dgrowleyml@gmail.com> Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNnb-s_=VdVUZ9h7dPA0u3hxV8x2aU3obZytnqQZ_MiROA@mail.gmail.com
Robert Haas [Mon, 16 Mar 2026 18:37:12 +0000 (14:37 -0400)]
pg_plan_advice: Fix failures to accept identifier keywords.
TOK_IDENT allows only non-keywords; identifier should be used
any place where either keywords or non-keywords should be accepted.
Hence, without this commit, any string that happens to be a keyword
can't be used as a partition schema, partition name, or plan name,
which is incorrect.
Peter Eisentraut [Mon, 16 Mar 2026 17:56:30 +0000 (18:56 +0100)]
Hardcode override of typeof_unqual for clang-for-bitcode
The fundamental problem is that when we call clang to generate
bitcode, we might be using configure results from a different
compiler, which might not be fully compatible with the clang we are
using. In practice, clang supports most things other compilers
support, so this has apparently not been a problem in practice.
But commits 4cfce4e62c8, 0af05b5dbb4, and 59292f7aac7 have been
struggling to make typeof_unqual work in this situation. Clang added
support in version 19, GCC in version 14, so if you are using, say,
GCC 14 and Clang 16, the compilation with the latter will fail. Such
combinations are not very likely in practice, because GCC 14 and Clang
19 were released within a few months of each other, and so Linux
distributions are likely to have suitable combinations. But some
buildfarm members and some Fedora versions are affected, so this tries
to fix it.
The fully correct solution would be to run a separate set of configure
tests for that clang-for-bitcode, but that would be very difficult to
implement, and probably of limited use in practice. So the workaround
here is that we hardcodedly override the configure result under clang
based on the version number. As long as we only have a few of these
cases, this should be manageable.
Also swap the order of the tests of typeof_unqual: Commit 59292f7aac7
tested the underscore variant first, but the reasons for that are now
gone.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
Nathan Bossart [Mon, 16 Mar 2026 16:01:20 +0000 (11:01 -0500)]
pg_dumpall: Fix handling of incompatible options.
This commit teaches pg_dumpall to fail when both --clean and
--data-only are specified. Previously, it passed the options
through to pg_dump, which would fail after pg_dumpall had already
started producing output. Like recent commits b2898baaf7 and 7c8280eeb5, no back-patch.
Álvaro Herrera [Mon, 16 Mar 2026 13:34:57 +0000 (14:34 +0100)]
Reduce header inclusions via execnodes.h
Remove a bunch of #include lines from execnodes.h. Most of these
requier suitable typedefs to be added, so that it still compiles
standalone. In one case, the fix is to move a struct definition to the
one .c file where it is needed.
Also some light clean up in plannodes.h and genam.h, though not as
extensive as in execnodes.h.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql
Fujii Masao [Mon, 16 Mar 2026 12:05:13 +0000 (21:05 +0900)]
Remove unstable test for pg_statio_all_sequences stats reset
Commit 8fe315f18d4 added the stats_reset column to pg_statio_all_sequences and
included a regression test to verify that statistics in this view are reset
correctly. However, this test caused buildfarm member crake to report
a pg_upgradeCheck failure.
The failing test assumed that the blks_read and blks_hit counters
in pg_statio_all_sequences would be zero after calling
pg_stat_reset_single_table_counters(). On crake, however, either blks_read or
blks_hit sometimes appeared as 1 during the pg_upgradeCheck test, even right
after the reset.
Since these counters may change due to concurrent activity and the test is
unstable, this commit removes the checks for blks_read and blks_hit in
pg_statio_all_sequences from the regression test.
Peter Eisentraut [Mon, 16 Mar 2026 10:52:02 +0000 (11:52 +0100)]
Fix pg_upgrade failure when extension_control_path is used
When an extension is located via extension_control_path and it has a
hardcoded $libdir/ path, this is stripped by the
extension_control_path mechanism. But when pg_upgrade verifies the
extension using LOAD, this stripping does not happen, and so
pg_upgrade will fail because it cannot load the extension. To work
around that, change pg_upgrade to itself strip the prefix when it runs
its checks. A test case is also added.
Author: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com> Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/43b3691c673a8b9158f5a09f06eacc3c63e2c02d.camel%40gmail.com
Peter Eisentraut [Mon, 16 Mar 2026 09:53:24 +0000 (10:53 +0100)]
Prevent -Wstrict-prototypes and -Wold-style-definition warnings
A following commit will enable -Wstrict-prototypes and -Wold-style-definition
by default. This commit fixes the warnings that those new flags will generate
before actually adding the new flags.
Peter Eisentraut [Mon, 16 Mar 2026 09:14:18 +0000 (10:14 +0100)]
SQL Property Graph Queries (SQL/PGQ)
Implementation of SQL property graph queries, according to SQL/PGQ
standard (ISO/IEC 9075-16:2023).
This adds:
- GRAPH_TABLE table function for graph pattern matching
- DDL commands CREATE/ALTER/DROP PROPERTY GRAPH
- several new system catalogs and information schema views
- psql \dG command
- pg_get_propgraphdef() function for pg_dump and psql
A property graph is a relation with a new relkind RELKIND_PROPGRAPH.
It acts like a view in many ways. It is rewritten to a standard
relational query in the rewriter. Access privileges act similar to a
security invoker view. (The security definer variant is not currently
implemented.)
Starting documentation can be found in doc/src/sgml/ddl.sgml and
doc/src/sgml/queries.sgml.
Fujii Masao [Mon, 16 Mar 2026 09:10:57 +0000 (18:10 +0900)]
Ensure "still waiting on lock" message is logged only once per wait.
When log_lock_waits is enabled, the "still waiting on lock" message is normally
emitted only once while a session continues waiting. However, if the wait is
interrupted, for example by wakeups from client_connection_check_interval,
SIGHUP for configuration reloads, or similar events, the message could be
emitted again each time the wait resumes.
For example, with very small client_connection_check_interval values
(e.g., 100 ms), this behavior could flood the logs with repeated messages,
making them difficult to use.
To prevent this, this commit guards the "still waiting on lock" message so
it is reported at most once during a lock wait, even if the wait is interrupted.
This preserves the intended behavior when no interrupts occur.
Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Hüseyin Demir <huseyin.d3r@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHZUmg+r4kMcPYt_Z-txxVX+CJJhfra+qemxKXvAxYbpw@mail.gmail.com
Michael Paquier [Mon, 16 Mar 2026 08:48:39 +0000 (17:48 +0900)]
Reject ALTER TABLE .. CLUSTER earlier for partitioned tables
ALTER TABLE .. CLUSTER ON and SET WITHOUT CLUSTER are not supported for
partitioned tables and already fail with a check happening when the
sub-command is executed, not when it is prepared.
This commit moves the relkind check for partitioned tables to happen
when the sub-command is prepared in ATSimplePermissions(). This matches
with the practice of the other sub-commands of ALTER TABLE, shaving one
translatable string.
mark_index_clustered() can be a bit simplified, switching one
elog(ERROR) to an assertion. Note that mark_index_clustered() can also
be called through a CLUSTER command, but it cannot be reached for a
partitioned table, per the assertion based on the relkind in
cluster_rel(), and there is only one caller of rebuild_relation().
Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/CAEoWx2kggo1N2kDH6OSfXHL_5gKg3DqQ0PdNuL4LH4XSTKJ3-g@mail.gmail.com
Fujii Masao [Mon, 16 Mar 2026 08:24:08 +0000 (17:24 +0900)]
Add stats_reset column to pg_statio_all_sequences
pg_statio_all_sequences lacked a stats_reset column, unlike the other
pg_statio_* views that already expose it. This commit adds the column so
users can see when the statistics in this view were last reset.
Also this commit updates the documentation for
pg_stat_reset_single_table_counters() to clarify that it can reset statistics
for sequences and materialized views as well.
Amit Kapila [Mon, 16 Mar 2026 04:44:22 +0000 (10:14 +0530)]
Remove obsolete speculative insert cleanup in ReorderBuffer.
Commit 4daa140a2f introduced proper decoding for speculative aborts. As a
result, the internal state is guaranteed to be clean when a new
speculative insert is encountered. This patch removes the defensive
cleanup code that is no longer reachable.
Fujii Masao [Mon, 16 Mar 2026 03:13:11 +0000 (12:13 +0900)]
file_fdw: Add regression test for file_fdw with ON_ERROR='set_null'
Commit 2a525cc97e1 introduced the ON_ERROR = 'set_null' option for COPY,
allowing it to be used with foreign tables backed by file_fdw. However,
unlike ON_ERROR = 'ignore', no regression test was added to verify
this behavior for file_fdw.
This commit adds a regression test to ensure that foreign tables using
file_fdw work correctly with ON_ERROR = 'set_null', improving test coverage.
Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Yi Ding <dingyi_yale@163.com>
Discussion: https://postgr.es/m/CAHGQGwGmPc6aHpA5=WxKreiDePiOEitfOFsW2dSo5m81xWXgRA@mail.gmail.com
Michael Paquier [Mon, 16 Mar 2026 00:22:09 +0000 (09:22 +0900)]
Optimize hash index bulk-deletion with streaming read
This commit refactors hashbulkdelete() to use streaming reads, improving
the efficiency of the operation by prefetching upcoming buckets while
processing a current bucket. There are some specific changes required
to make sure that the cleanup work happens in accordance to the data
pushed to the stream read callback. When the cached metadata page is
refreshed to be able to process the next set of buckets, the stream is
reset and the data fed to the stream read callback has to be updated.
The reset needs to happen in two code paths, when _hash_getcachedmetap()
is called.
The author has seen better performance numbers than myself on this one
(with tweaks similar to 6c228755add8). The numbers are good enough for
both of us that this change is worth doing, in terms of IO and runtime.
Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com