git.ipfire.org Git - thirdparty/postgresql.git/log

Make smgr access for a BufferManagerRelation safer in relcache inval

Currently there's no bug, because we have no code path where we
invalidate relcache entries where it'd cause a problem. But it's more
robust to do it this way in case we introduce such a path later, as some
Postgres forks reportedly already have.

Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Stepan Neretin <slpmcf@gmail.com>
Discussion: https://postgr.es/m/CAJDiXgj3FNzAhV+jjPqxMs3jz=OgPohsoXFj_fh-L+nS+13CKQ@mail.gmail.com

Fix BRIN 32-bit counter wrap issue with huge tables

A BlockNumber (32-bit) might not be large enough to add bo_pagesPerRange
to when the table contains close to 2^32 pages. At worst, this could
result in a cancellable infinite loop during the BRIN index scan with
power-of-2 pagesPerRange, and slow (inefficient) BRIN index scans and
scanning of unneeded heap blocks for non power-of-2 pagesPerRange.

Backpatch to all supported versions.

Author: sunil s <sunilfeb26@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOG6S4-tGksTQhVzJM19NzLYAHusXsK2HmADPZzGQcfZABsvpA@mail.gmail.com
Backpatch-through: 13

Fix comment in pg_get_shmem_allocations_numa()

The comment fixed in this commit described the function as dealing with
database blocks, but in reality it processes shared memory allocations.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aH4DDhdiG9Gi0rG7@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 18

Fix pushdown of degenerate HAVING clauses

67a54b9e8 taught the planner to push down HAVING clauses even when
grouping sets are present, as long as the clause does not reference
any columns that are nullable by the grouping sets.  However, there
was an oversight: if any empty grouping sets are present, the
aggregation node can produce a row that did not come from the input,
and pushing down a HAVING clause in this case may cause us to fail to
filter out that row.

Currently, non-degenerate HAVING clauses are not pushed down when
empty grouping sets are present, since the empty grouping sets would
nullify the vars they reference.  However, degenerate (variable-free)
HAVING clauses are not subject to this restriction and may be
incorrectly pushed down.

To fix, explicitly check for the presence of empty grouping sets and
retain degenerate clauses in HAVING when they are present.  This
ensures that we don't emit a bogus aggregated row.  A copy of each
such clause is also put in WHERE so that query_planner() can use it in
a gating Result node.

To facilitate this check, this patch expands the groupingSets tree of
the query to a flat list of grouping sets before applying the HAVING
pushdown optimization.  This does not add any additional planning
overhead, since we need to do this expansion anyway.

In passing, make a small tweak to preprocess_grouping_sets() by
reordering its initial operations a bit.

Backpatch to v18, where this issue was introduced.

Reported-by: Yuhang Qiu <iamqyh@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0879D9C9-7FE2-4A20-9593-B23F7A0B5290@gmail.com
Backpatch-through: 18

Fix POSIX compliance in pgwin32_unsetenv() for "name" argument

pgwin32_unsetenv() (compatibility routine of unsetenv() on Windows)
lacks the input validation that its sibling pgwin32_setenv() has.
Without these checks, calling unsetenv() with incorrect names crashes on
WIN32. However, invalid names should be handled, failing on EINVAL.

This commit adds the same checks as setenv() to fail with EINVAL for a
"name" set to NULL, an empty string, or if '=' is included in the value,
per POSIX requirements.

Like 7ca37fb0406b, backpatch down to v14. pgwin32_unsetenv() is defined
on REL_13_STABLE, but with the branch going EOL soon and the lack of
setenv() there for WIN32, nothing is done for v13.

Author: Bryan Green <dbryan.green@gmail.com>
Discussion: https://postgr.es/m/b6a1e52b-d808-4df7-87f7-2ff48d15003e@gmail.com
Backpatch-through: 14

Support COPY TO for partitioned tables.

Previously, COPY TO command didn't support directly specifying
partitioned tables so users had to use COPY (SELECT ...) TO variant.

This commit adds direct COPY TO support for partitioned
tables, improving both usability and performance. Performance tests
show it's faster than the COPY (SELECT ...) TO variant as it avoids
the overheads of query processing and sending results to the COPY TO
command.

When used with partitioned tables, COPY TO copies the same rows as
SELECT * FROM table. Row-level security policies of the partitioned
table are applied in the same way as when executing COPY TO on a plain
table.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CACJufxEZt%2BG19Ors3bQUq-42-61__C%3Dy5k2wk%3DsHEFRusu7%3DiQ%40mail.gmail.com

Fix thinko in commit 7d129ba54.

The revised logic in 001_ssltests.pl would fail if openssl
doesn't work or if Perl is a 32-bit build, because it had
already overwritten $serialno with something inappropriate
to use in the eventual match. We could go back to the
previous code layout, but it seems best to introduce a
separate variable for the output of openssl.

Per failure on buildfarm member mamba, which has a 32-bit Perl.

pg_dump: Remove unnecessary code for security labels on extensions.

Commit d9572c4e3b4 added extension support and made pg_dump attempt to
dump security labels on extensions. However, security labels on extensions
are not actually supported, so this code was unnecessary.

This commit removes it.

Suggested-by: Jian He <jian.universality@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxF8=z0v=888NKKEoTHQ+Jc4EXutFi91BF0fFjgFsZT6JQ@mail.gmail.com

pg_checksums: Use new routine to retrieve data of PG_VERSION

Previously, attempting to use pg_checksums on a cluster with a control
file whose version does not match with what thetool is able to support
would lead to the following error:
pg_checksums: error: pg_control CRC value is incorrect

This is confusing, because it would look like the control file is
corrupted. However, the contents of the control file are correct,
pg_checksums not being able to understand how the past control file is
shaped.

This commit adds a check based on PG_VERSION, using the facility added
by cd0be131ba6f, using the same error message as some of the other
frontend tools. A note is added in the documentation about the major
version requirement.

Author: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/68f1ff21.170a0220.2c9b5f.4df5@mx.google.com

Add static assertion that RELSEG_SIZE fits in an int.

Our configure script intended to ensure this, but it supposed that
expr(1) would report an error for integer overflow.  Maybe that
was true when the code was written (commit 3c6248a82 of 2008-05-02),
but all the modern expr's I tried will deliver bigger-than-int32
results without complaint.  Moreover, if you use --with-segsize-blocks
then there's no check at all.

Ideally we'd add a test in configure itself to check that the value
fits in int, but to do that we'd need to suppose that test(1) handles
bigger-than-int32 numbers correctly.  Probably modern ones do, but
that's an assumption I could do without; and I'm not too trusting
about meson either.  Instead, let's install a static assertion, so
that even people who ignore all the compiler warnings you get from
such values will be forced to confront the fact that it won't work.

This has been hazardous for awhile, but given that we hadn't heard
a complaint about it till now, I don't feel a need to back-patch.

Reported-by: Casey Shobe <casey.allen.shobe@icloud.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/C5DC82D6-C76D-4E8F-BC2E-DF03EFC4FA24@icloud.com

Don't rely on zlib's gzgetc() macro.

It emerges that zlib's configuration logic is not robust enough
to guarantee that the macro will have the same ideas about struct
field layout as the library itself does, leading to corruption of
zlib's state struct followed by unintelligible failure messages.
This hazard has existed for a long time, but we'd not noticed
for several reasons:

(1) We only use gzgetc() when trying to read a manually-compressed
TOC file within a directory-format dump, which is a rarely-used
scenario that we weren't even testing before 20ec99589.

(2) No corruption actually occurs unless sizeof(long) is different
from sizeof(off_t) and the platform is big-endian.

(3) Some platforms have already fixed the configuration instability,
at least sufficiently for their environments.

Despite (3), it seems foolish to assume that the problem isn't
going to be present in some environments for a long time to come.
Hence, avoid relying on this macro. We can just #undef it and
fall back on the underlying function of the same name.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2122679.1760846783@sss.pgh.pa.us
Backpatch-through: 13

Fix Coverity issue reported in commit 2273fa32bce.

Coverity complains that the return value from gettuple_eval_partition
(stored in variable "datum") in a do..while loop in
WinGetFuncArgInPartition is overwritten when exiting the while
loop. This commit tries to fix the issue by changing the
gettuple_eval_partition call to:

(void) gettuple_eval_partition()

explicitly stating that we discard the return value. We are just
interested in whether we are inside or outside of partition, NULL or
NOT NULL here.

Also enhance some comments for easier code reading.

Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aPCOabSE4VfJLaky%40paquier.xyz

Add pg_database_locale() to retrieve database default locale.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com

Add pg_iswxdigit(), useful for tsearch.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com

Allow role created by new test to log in on Windows.

We must tell init about each role name we plan to connect as,
else SSPI auth fails. Similar to previous patches such as
14793f471, 973542866.

Oversight in 208927e65, per buildfarm member drongo.
(Although that was back-patched to v13, the test script
only exists in v16 and up.)

Tidyup truncate_useless_pathkeys() function

This removes a few static functions and replaces them with 2 functions
which aim to be more reusable.  The upper planner's pathkey requirements
can be simplified down to operations which require pathkeys in the same
order as the pathkeys for the given operation, and operations which can
make use of a Path's pathkeys in any order.

Here we also add some short-circuiting to truncate_useless_pathkeys().  At
any point we discover that all pathkeys are useful to a single operation,
we can stop checking the remaining operations as we're not going to be
able to find any further useful pathkeys - they're all possibly useful
already.  Adjusting this seems to warrant trying to put the checks
roughly in order of least-expensive-first so that the short-circuits
have the most chance of skipping the more expensive checks.

In passing clean up has_useful_pathkeys() as it seems to have grown a
redundant check for group_pathkeys.  This isn't needed as
standard_qp_callback will set query_pathkeys if there's any requirement
to have group_pathkeys.  All this code does is waste run-time effort and
take up needless space.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpbsEoTksvW5901MMoZo-hHf78E5up3uDOfkJnxDe_WAw@mail.gmail.com

Fix determination of not-null constraint "locality" for inherited columns

It is possible to have a non-inherited not-null constraint on an
inherited column, but we were failing to preserve such constraints
during pg_upgrade where the source is 17 or older, because of a bug in
the pg_dump query for it. Oversight in commit 14e87ffa5c54. Fix that
query. In passing, touch-up a bogus nearby comment introduced by the
same commit.

In version 17, make the regression tests leave a table in this
situation, so that this scenario is tested in the cross-version upgrade
tests of 18 and up.

Author: Dilip Kumar <dilipbalaut@gmail.com>
Reported-by: Andrew Bille <andrewbille@gmail.com>
Bug: #19074
Backpatch-through: 18
Discussion: https://postgr.es/m/19074-ae2548458cf0195c@postgresql.org

Fix pg_dump sorting of foreign key constraints

Apparently, commit 04bc2c42f765 failed to notice that DO_FK_CONSTRAINT
objects require identical handling as DO_CONSTRAINT ones, which causes
some pg_upgrade tests in debug builds to fail spuriously. Add that.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Backpatch-through: 13
Discussion: https://postgr.es/m/202510181201.k6y75v2tpf5r@alvherre.pgsql

Fix reset of incorrect hash iterator in GROUPING SETS queries

This fixes an unlikely issue when fetching GROUPING SET results from
their internally stored hash tables. It was possible in rare cases that
the hash iterator would be set up incorrectly which could result in a
crash.

This was introduced in 4d143509c, so backpatch to v18.

Many thanks to Yuri Zamyatin for reporting and helping to debug this
issue.

Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18

Englishify comment wording

Switch to using the English word here rather than using a verbified
function name.

The full word still fits within a single comment line, so it's probably
better just to use that instead of trying to shorten it, which might
cause confusion.

Author: Rafia Sabih <rafia.pghackers@gmail.com>
Discussion: https://postgr.es/m/CA+FpmFe7LnRF2NA_QfARjkSWme4mNt+Udwbh2Yb=zZm35Ji31w@mail.gmail.com

Fix hashjoin memory balancing logic

Commit a1b4f289beec improved the hashjoin sizing to also consider the
memory used by BufFiles for batches. The code however had multiple
issues, making it ineffective or not working as expected in some cases.

* The amount of memory needed by buffers was calculated using uint32,
  so it would overflow for nbatch >= 262144. If this happened the loop
  would exit prematurely and the memory usage would not be reduced.

  The nbatch overflow is fixed by reworking the condition to not use a
  multiplication at all, so there's no risk of overflow. An explicit
  cast was added to a similar calculation in ExecHashIncreaseBatchSize.

* The loop adjusting the nbatch value used hash_table_bytes to calculate
  the old/new size, but then updated only space_allowed. The consequence
  is the total memory usage was not reduced, but all the memory saved by
  reducing the number of batches was used for the internal hash table.

  This was fixed by using only space_allowed. This is also more correct,
  because hash_table_bytes does not account for skew buckets.

* The code was also doubling multiple parameters (e.g. the number of
  buckets for hash table), but was missing overflow protections.

  The loop now checks for overflow, and terminates if needed. It'd be
  possible to cap the value and continue the loop, but it's not worth
  the complexity. And the overflow implies the in-memory hash table is
  already very large anyway.

While at it, rework the comment explaining how the memory balancing
works, to make it more concise and easier to understand.

The initial nbatch overflow issue was reported by Vaibhav Jain. The
other issues were noticed by me and Melanie Plageman. Fix by me, with a
lot of review and feedback by Melanie.

Backpatch to 18, where the hashjoin memory balancing was introduced.

Reported-by: Vaibhav Jain <jainva@google.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CABa-Az174YvfFq7rLS+VNKaQyg7inA2exvPWmPWqnEn6Ditr_Q@mail.gmail.com

Remove unused data_bufsz from DecodedBkpBlock struct.

Author: Mikhail Gribkov <youzhick@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAMEv5_sxuaiAfSy1ZyN%3D7UGbHg3C10cwHhEk8nXEjiCsBVs4vQ%40mail.gmail.com

Fix privilege checks for pg_prewarm() on indexes.

pg_prewarm() currently checks for SELECT privileges on the target
relation.  However, indexes do not have access rights of their own,
so a role may be denied permission to prewarm an index despite
having the SELECT privilege on its parent table.  This commit fixes
this by locking the parent table before the index (to avoid
deadlocks) and checking for SELECT on the parent table.  Note that
the code is largely borrowed from
amcheck_lock_relation_and_check().

An obvious downside of this change is the extra AccessShareLock on
the parent table during prewarming, but that isn't expected to
cause too much trouble in practice.

Author: Ayush Vatsa <ayushvatsa1810@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CACX%2BKaMz2ZoOojh0nQ6QNBYx8Ak1Dkoko%3DD4FSb80BYW%2Bo8CHQ%40mail.gmail.com
Backpatch-through: 13

Improve TAP tests by replacing ok() with better Test::More functions

Transpose the changes made by commit fabb33b35 in 002_pg_dump.pl
into its recently-created clone 006_pg_dump_compress.pl.

Avoid warnings in tests when openssl binary isn't available

The SSL tests for pg_stat_ssl tries to exactly match the serial
from the certificate by extracting it with the openssl binary.
If that fails due to the binary not being available, a fallback
match is used, but the attempt to execute a missing binary adds
a warning to the output which can confuse readers for a failure
in the test. Fix by only attempting if the openssl binary was
found by autoconf/meson.

Backpatch down to v16 where commit c8e4030d1bdd made the test
use the OPENSSL variable from autoconf/meson instead of a hard-
coded value.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aNPSp1-RIAs3skZm@msg.df7cb.de
Backpatch-through: 16

Change config_generic.vartype to be initialized at compile time

Previously, this was initialized at run time so that it did not have
to be maintained by hand in guc_tables.c. But since that table is now
generated anyway, we might as well generate this bit as well.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org

Use designated initializers for guc_tables

This makes the generating script simpler and the output easier to
read. In the future, it will make it easier to reorder and rearrange
the underlying C structures.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org

ecpg: check return value of replace_variables()

The function returns false if it fails to allocate memory, so
make sure to check the return value in callsites.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAJ7c6TNPrU8ZxgdfN3PyGY1tzo0bgszx+KkqW0Z7zt3heyC1GQ@mail.gmail.com

Replace defunct URL with stable archive.org URL in rbtree.c

The URL for "Sorting and Searching Algorithms: A Cookbook"
by Thomas Niemann has started returning 404, and since we
refer to the page for license terms this replaces the now
defunct link with one to the copy on archive.org.

Author: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/6DED3DEF-875E-4D1D-8F8F-7353D5AF7B79@gmail.com

Improve TAP tests by replacing ok() with better Test::More functions

The TAP tests whose ok() calls are changed in this commit were relying
on perl operators, rather than equivalents available in Test::More. For
example, rather than the following:
ok($data =~ qr/expr/m, "expr matching");
ok($data !~ qr/expr/m, "expr not matching");
The new test code uses this equivalent:
like($data, qr/expr/m, "expr matching");
unlike($data, qr/expr/m, "expr not matching");

A huge benefit of the new formulation is that it is possible to know
about the values we are checking if a failure happens, making debugging
easier, should the test runs happen in the buildfarm, in the CI or
locally.

This change leads to more test code overall as perltidy likes to make
the code pretty the way it is in this commit.

Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com

doc: Clarify when backend_xmin in pg_stat_replication can be NULL.

Improve the documentation of pg_stat_replication to explain when
the backend_xmin column becomes NULL. This happens when
a replication slot is used (the xmin is then shown in pg_replication_slots)
or when hot_standby_feedback is disabled.

Author: Renzo Dani <arons7@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CA+XOKQAMXzskpdUmj2sg03_5fmiXc2Gs0r3TX1_rmcFcqh+=xQ@mail.gmail.com

Fix matching check in recovery test 042_low_level_backup

042_low_level_backup compared the result of a query two times with a
comparison operator based on an integer, while the result should be
compared with a string.

The outcome of the tests is currently not impacted by this change.
However, it could be possible that the tests fail to detect future
issues if the query results become different, for some reason.

Oversight in 99b4a63bef94.

Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 17

pg_createsubscriber: Fix matching check in TAP test

040_pg_createsubscriber has been calling safe_psql(), that returns the
result of a SQL query, with ok() without checking the result generated
(in this case 't', for a number of publications).

The outcome of the tests is currently not impacted by this change.
However, it could be possible that the test fails to detect future
issues if the query results become different.

The test is rewritten so as the number of publications is checked. This
is not the fix suggested originally by the author, but this is more
reliable in the long run.

Oversight in e5aeed4b8020.

Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 18

Fix update-po for the PGXS case

The original formulation failed to take into account the fact that for
the PGXS case, the source dir is not $(top_srcdir), so it ended up not
doing anything. Handle it explicitly.

Author: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Bryan Green <dbryan.green@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/TYCPR01MB113164770FB0B0BE6ED21E68EE8DCA@TYCPR01MB11316.jpnprd01.prod.outlook.com

Add more TAP test coverage for pg_dump.

Add a test case to cover pg_dump with --compress=none.  This
brings the coverage of compress_none.c up from about 64% to 90%,
in particular covering the new code added in a previous patch.

Include compression of toc.dat in manually-compressed test cases.
We would have found the bug fixed in commit a239c4a0c much sooner
if we'd done this.  As far as I can tell, this doesn't reduce test
coverage at all, since there are other tests of directory format
that still use an uncompressed toc.dat.

Widen the wide row used to verify correct (de) compression.
Commit 1a05c1d25 advises us (not without reason) to ensure that
this test case fully fills DEFAULT_IO_BUFFER_SIZE, so that loops
within the compression logic will iterate completely.  To follow
that advice with the proposed DEFAULT_IO_BUFFER_SIZE of 128K,
we need something close to this.  This does indeed increase the
reported code coverage by a few lines.

While here, fix a glitch that I noticed in testing: the
$glob_patterns tests were incapable of failing, because glob()
will return 'foo' as 'foo' whether there is a matching file or
not.  (Indeed, the stanza just above that one relies on that.)

In my testing, this patch adds approximately as much runtime
as was saved by the previous patch, so that it's about a wash
compared to the old code.  However, we get better test coverage.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us

Split 002_pg_dump.pl into two test files.

Add a new test script 006_pg_dump_compress.pl, containing just
the pg_dump tests specifically concerned with compression, and
remove those tests from 002_pg_dump.pl. We can also drop some
infrastructure in 002_pg_dump.pl that was used only for these tests.

The point of this is to avoid the cost of running these test
cases over and over in all the scenarios (runs) that 002_pg_dump.pl
exercises. We don't learn anything more about the behavior of the
compression code that way, and we expend significant amounts of
time, since one of these test cases is quite large and due to get
larger.

The intent of this specific patch is to provide exactly the same
coverage as before, except that I went back to using --no-sync
in all the test runs moved over to 006_pg_dump_compress.pl.
I think that avoiding that had basically been cargo-culted into
these test cases as a result of modeling them on the
defaults_custom_format test case; again, doing that over and over
isn't going to teach us anything new.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us

Align the data block sizes of pg_dump's various compression modes.

After commit fe8192a95, compress_zstd.c tends to produce data block
sizes around 128K, and we don't really have any control over that
unless we want to overrule ZSTD_CStreamOutSize().  Which seems like
a bad idea.  But let's try to align the other compression modes to
produce block sizes roughly comparable to that, so that pg_restore's
skip-data performance isn't enormously different for different modes.

gzip compression can be brought in line simply by setting
DEFAULT_IO_BUFFER_SIZE = 128K, which this patch does.  That
increases some unrelated buffer sizes, but none of them seem
problematic for modern platforms.

lz4's idea of appropriate block size is highly nonlinear:
if we just increase DEFAULT_IO_BUFFER_SIZE then the output
blocks end up around 200K.  I found that adjusting the slop
factor in LZ4State_compression_init was a not-too-ugly way
of bringing that number roughly into line.

With compress = none you get data blocks the same sizes as the
table rows, which seems potentially problematic for narrow tables.
Introduce a layer of buffering to make that case match the others.

Comments in compress_io.h and 002_pg_dump.pl suggest that if
we increase DEFAULT_IO_BUFFER_SIZE then we need to increase the
amount of data fed through the tests in order to improve coverage.
I've not done that here, leaving it for a separate patch.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us

Remove partColsUpdated.

This information appears to have been unused since commit
c5b7ba4e67. We could not find any references in third-party code,
either.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO_CyFRpbVMtgJWM%40nathan

Refactor logical worker synchronization code into a separate file.

To support the upcoming addition of a sequence synchronization worker,
this patch extracts common synchronization logic shared by table sync
workers and the new sequence sync worker into a dedicated file. This
modularization improves code reuse, maintainability, and clarity in the
logical workers framework.

Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com

Fix EPQ crash from missing partition directory in EState

EvalPlanQualStart() failed to propagate es_partition_directory into
the child EState used for EPQ rechecks. When execution time partition
pruning ran during the EPQ scan, executor code dereferenced a NULL
partition directory and crashed.

Previously, propagating es_partition_directory into the EPQ EState was
unnecessary because CreatePartitionPruneState(), which sets it on
demand, also initialized the exec-pruning context.  After commit
d47cbf474, CreatePartitionPruneState() now initializes only the init-
time pruning context, leaving exec-pruning context initialization to
ExecInitNode(). Since EvalPlanQualStart() runs only ExecInitNode() and
not CreatePartitionPruneState(), it can encounter a NULL
es_partition_directory.  Other executor fields initialized during
CreatePartitionPruneState() are already copied into the child EState
thanks to commit 8741e48e5d, but es_partition_directory was missed.

Fix by borrowing the parent estate's  es_partition_directory in
EvalPlanQualStart(), and by clearing that field in EvalPlanQualEnd()
so the parent remains responsible for freeing the directory.

Add an isolation test permutation that triggers EPQ with execution-
time partition pruning, the case that reproduces this crash.

Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18

Override log_error_verbosity to "default" in test 009_log_temp_files

Per report from buildfarm member prion.  The CI does not use this
parameter, and this buildfarm member sets log_error_verbosity to
"verbose".  This would generate extra LOCATION entries in the logs,
causing the regexps of the test to fail.

Trying to support log_error_verbosity=verbose in the test would mean to
tweak all the regexps used in the test to detect an optional set of
LOCATION lines, at least.  This would not improve the coverage, and
forcing the GUC value is simpler.

Oversight in 76bba033128a.

Discussion: https://postgr.es/m/aPBaNNGiYT3xMBN1@paquier.xyz

Add tests for logging of temporary file removal and statement

Temporary file usage is sometimes attributed to the wrong query in the logs
output. One identified reason is that unnamed portal cleanup (and
consequently temp file logging) happens during the next BIND message as
a, after debug_query_string has already been updated to the new query.

Dropping an unnamed portal in the next BIND message is a rather old
protocol behavior (fe19e56c572f, also mentioned in the docs).
log_temp_files is a bit newer than that, as of be8a4318815.

This commit adds tests to track which query is displayed next to the
temporary file(s) removed when a portal is dropped, and in some cases if
a query is displayed or not. We have not concluded how to improve the
situation yet; these tests will at least help in checking what changes
in the logs depending on the proposal discussed and how it affects the
scenarios tracked by this new test.

Author: Sami Imseih <samimseih@gmail.com>
Author: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3d07ee43-8855-42db-97e0-bad5db82d972@dalibo.com

Fix lookup code for REINDEX INDEX.

This commit adjusts RangeVarCallbackForReindexIndex() to handle an
extremely unlikely race condition involving concurrent OID reuse.
In short, if REINDEX INDEX is executed at the same time that the
index is re-created with the same name and OID but a different
parent table OID, we might lock the wrong parent table.  To fix,
simply detect when this happens and emit an ERROR.  Unfortunately,
we can't gracefully handle this situation because we will have
already locked the index, and we must lock the parent table before
the index to avoid deadlocks.

While at it, I've replaced all but one early return in this
callback function with ERRORs that should be unreachable.  While I
haven't verified the presence of a live bug, the checks in question
appear to be unnecessary, and the early returns seem prone to
breaking the parent table locking code in subtle ways.  If nothing
else, this simplifies the code a bit.

This is a bug fix and could be back-patched, but given the presumed
rarity of the race condition and the lack of reports, I'm not going
to bother.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan

Add pg_iswalpha() and related functions.

Per-character pg_locale_t APIs. Useful for tsearch parsing and
potentially other places.

Significant overlap with the regc_wc_isalpha() and related functions
in regc_pg_locale.c, but this change leaves those intact for
now.

Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com

Fix lookups in pg_{clear,restore}_{attribute,relation}_stats().

Presently, these functions look up the relation's OID, lock it, and
then check privileges.  Not only does this approach provide no
guarantee that the locked relation matches the arguments of the
lookup, but it also allows users to briefly lock relations for
which they do not have privileges, which might enable
denial-of-service attacks.  This commit adjusts these functions to
use RangeVarGetRelidExtended(), which is purpose-built to avoid
both of these issues.  The new RangeVarGetRelidCallback function is
somewhat complicated because it must handle both tables and
indexes, and for indexes, we must check privileges on the parent
table and lock it first.  Also, it needs to handle a couple of
extremely unlikely race conditions involving concurrent OID reuse.

A downside of this change is that the coding doesn't allow for
locking indexes in AccessShare mode anymore; everything is locked
in ShareUpdateExclusive mode.  Per discussion, the original choice
of lock levels was intended for a now defunct implementation that
used in-place updates, so we believe this change is okay.

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Backpatch-through: 18

Change reset_extra into a config_generic common field

This is not specific to the GUC parameter type, so it can be part of
the generic struct rather than the type-specific struct (like the
related "extra" field). This allows for some code simplifications.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org

Add log_autoanalyze_min_duration

The log output functionality of log_autovacuum_min_duration applies to
both VACUUM and ANALYZE, so it is not possible to separate the VACUUM
and ANALYZE log output thresholds. Logs are likely to be output only for
VACUUM and not for ANALYZE.

Therefore, we decided to separate the threshold for log output of VACUUM
by autovacuum (log_autovacuum_min_duration) and the threshold for log
output of ANALYZE by autovacuum (log_autoanalyze_min_duration).

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com

Fix EvalPlanQual handling of foreign/custom joins in ExecScanFetch.

If inside an EPQ recheck, ExecScanFetch would run the recheck method
function for foreign/custom joins even if they aren't descendant nodes
in the EPQ recheck plan tree, which is problematic at least in the
foreign-join case, because such a foreign join isn't guaranteed to have
an alternative local-join plan required for running the recheck method
function; in the postgres_fdw case this could lead to a segmentation
fault or an assert failure in an assert-enabled build when running the
recheck method function.

Even if inside an EPQ recheck, any scan nodes that aren't descendant
ones in the EPQ recheck plan tree should be normally processed by using
the access method function; fix by modifying ExecScanFetch so that if
inside an EPQ recheck, it runs the recheck method function for
foreign/custom joins that are descendant nodes in the EPQ recheck plan
tree as before and runs the access method function for foreign/custom
joins that aren't.

This fix also adds to postgres_fdw an isolation test for an EPQ recheck
that caused issues stated above.

Oversight in commit 385f337c9.

Reported-by: Kristian Lejao <kristianlejao@gmail.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com
Backpatch-through: 13

Add some const qualifiers

in guc-related source files, in anticipation of some further
restructuring.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org

Modernize some for loops

in guc-related source files, in anticipation of some further
restructuring.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org

plpython: Remove support for major version conflict detection

This essentially reverts commit 866566a690b, which installed
safeguards against loading plpython2 and plpython3 into the same
process. We don't support plpython2 anymore, so this is obsolete.

The Python and PL/Python initialization now happens again in
_PG_init() rather than the first time a PL/Python call handler is
invoked. (Often, these will be very close together.)

I kept the separate PLy_initialize() function introduced by
866566a690b to keep _PG_init() a bit modular.

Reviewed-by: Mario González Troncoso <gonzalemario@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/9eb9feb6-1df3-4f0c-a0dc-9bcf35273111%40eisentraut.org

Standardize use of REFRESH PUBLICATION in code and messages.

This patch replaces ALTER SUBSCRIPTION REFRESH with
ALTER SUBSCRIPTION REFRESH PUBLICATION in comments and error messages to
improve clarity and support future extensibility. The change aligns with
upcoming addition REFRESH SEQUENCES for sequence synchronization.

Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com

pg_createsubscriber: Use new routine to retrieve data of PG_VERSION

pg_createsubscriber is documented as requiring the same major version as
the target clusters.  Attempting to use this tool on a cluster where the
control file version read does not match with the version compiled with
would lead to the following error message:
pg_createsubscriber: error: control file appears to be corrupt

This is confusing as the control file is correct: only the version
expected does not match.  This commit integrates pg_createsubscriber
with the facility added by cd0be131ba6f, where the contents of
PG_VERSION are read and compared with the value of PG_MAJORVERSION_NUM
expected by the tool.  This puts pg_createsubscriber in line with the
documentation, with a better error message when the version does not
match.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aONDWig0bIGilixs@paquier.xyz

pg_resetwal: Use new routine to retrieve data of PG_VERSION

pg_resetwal's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba6f. This removes some code.

One thing specific to pg_resetwal is that the first line of PG_VERSION
is read and reported in the error report generated when the major
version read does not match with the version pg_resetwal has been
compiled with. The new logic preserves this property, without changes
to neither the error message nor the data used in the error report.

Note that as a chdir() is done within the data folder before checking the
data of PG_VERSION, get_pg_version() needs to be tweaked to look for
PG_VERSION in the current folder.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz

pg_combinebackup: Use new routine to retrieve data of PG_VERSION

pg_combinebackup's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba6f. This removes some code.

One thing specific to this tool is that backend versions older than v10
are not supported. The new code does the same checks as the previous
code.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz

Revert "pg_createsubscriber: Add log message when no publications exist to drop."

This reverts commit 74ac377d75135e02064fc4427bec401277b4f60c.

The previous change contained a misconception about how publications
are cleaned up on the subscriber. The newly added log message could
confuse users, particularly when running pg_createsubscriber with
--dry-run - users would see a "dropping publication" message
immediately followed by a "no publications found" message.

Discussion: https://postgr.es/m/CAHut+Pu7xz1LqNvyQyvSHrV0Sw6D=e6T-Jm=gh1MRJrkuWGyBQ@mail.gmail.com

Make heap_page_is_all_visible independent of LVRelState

This function only requires a few fields from LVRelState, so pass them
in individually.

This change allows calling heap_page_is_all_visible() from code such as
pruneheap.c, which does not have access to an LVRelState.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek

Inline TransactionIdFollows/Precedes[OrEquals]()

These functions appeared prominently in a profile of a patch that sets
the visibility map on-access. Inline them to remove call overhead and
make them cheaper to use in hot paths.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek

Add helper for freeze determination to heap_page_prune_and_freeze

After scanning the line pointers on a heap page during the first phase
of vacuum, we use the information collected to decide whether to use
the assembled freeze plans.

Move this decision logic into a helper function to improve readability.

While here, rename a PruneState member and disambiguate some local
variables in heap_page_prune_and_freeze().

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek

pg_createsubscriber: Add log message when no publications exist to drop.

When specifying --clean=publication to pg_createsubscriber, it drops
all existing publications with a log message "dropping all existing
publications in database "testdb"". Add a new log message "no
publications found" when there are no publications to drop, making the
progress more transparent to users.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAHut+Ptm+WJwbbYXhC0s6FP_98KzZCR=5CPu8F8N5uV8P7BpqA@mail.gmail.com

pg_regc_locale.c: rename some static functions.

Use the more specific prefix "regc_" rather than the generic prefix
"pg_".

A subsequent commit will create generic versions of some of these
functions that can be called from other modules.

Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com

dblink: Avoid locking relation before privilege check.

The present coding of dblink's get_rel_from_relname() predates the
introduction of RangeVarGetRelidExtended(), which provides a way to
check permissions before locking the relation. This commit adjusts
get_rel_from_relname() to use that function.

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/aOgmi6avE6qMw_6t%40nathan

Bump XLOG_PAGE_MAGIC after xl_heap_prune change

add323da40a6 altered xl_heap_prune, changing the WAL format, but
neglected to bump XLOG_PAGE_MAGIC. Do so now.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO3Gw6hCAZFUd5ab%40paquier.xyz

Use ereport rather than elog in WinCheckAndInitializeNullTreatment.

Previously WinCheckAndInitializeNullTreatment() used elog() to emit an
error message. ereport() should be used instead because it's a
user-facing error. Also use existing get_func_name() to get a
function's name, rather than own implementation.

Moreover add an assertion to validate winobj parameter, just like
other window function API.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/2952409.1760023154%40sss.pgh.pa.us

Rename apply_at to apply_agg_at for clarity

The field name "apply_at" in RelAggInfo was a bit ambiguous. Rename
it to "apply_agg_at" to improve clarity and make its purpose clearer.

Per complaint from David Rowley, Robert Haas.

Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+TgmoZ0KR2_XCWHy17=HHcQ3p2Mamc9c6Dnnhf1J6wPYFD9ng@mail.gmail.com

pg_upgrade: Use new routine to retrieve data of PG_VERSION

Unsurprisingly, this shaves code. get_major_server_version() can be
replaced by the new routine added by cd0be131ba6f, with the contents of
PG_VERSION stored in an allocated buffer instead of a fixed-sized one.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz

Introduce frontend API able to retrieve the contents of PG_VERSION

get_pg_version() is able to return a version number, that can be used
for comparisons based on PG_VERSION_NUM. A macro is added to convert
the result to a major version number, to work with PG_MAJORVERSION_NUM.

It is possible to pass to the routine an optional argument, where the
contents retrieved from PG_VERSION are saved. This requirement matters
for some of the frontend code (one example: pg_upgrade wants that for
tablespace paths with a version number strictly older than v10).

This will be used by a set of follow-up patches, to be consumed in
various frontend tools that duplicate a logic similar to do what this
new routine does, like:
- pg_resetwal
- pg_combinebackup
- pg_createsubscriber
- pg_upgrade

This routine supports both the post-v10 version number and the older
flavor (aka 9.6), as required at least by pg_upgrade.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz

Fix version number calculation for data folder flush in pg_combinebackup

The version number calculated by read_pg_version_file() is multiplied
once by 10000, to be able to do comparisons based on PG_VERSION_NUM or
equivalents with a minor version included.  However, the version number
given sync_pgdata() was multiplied by 10000 a second time, leading to an
overestimated number.

This issue was harmless (still incorrect) as pg_combinebackup does not
support versions of Postgres older than v10, and sync_pgdata() only
includes a version check due to the rename of pg_xlog/ to pg_wal/.  This
folder rename happened in the development cycle of v10.  This would
become a problem if in the future  sync_pgdata() is changed to have more
version-specific checks.

Oversight in dc212340058b, so backpatch down to v17.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aOil5d0y87ZM_wsZ@paquier.xyz
Backpatch-through: 17

Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III

Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each
page that becomes all-visible in vacuum's third phase, specify the VM
changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.

Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.

This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com

Fix incorrect message-printing in win32security.c.

log_error() would probably fail completely if used, and would
certainly print garbage for anything that needed to be interpolated
into the message, because it was failing to use the correct printing
subroutine for a va_list argument.

This bug likely went undetected because the error cases this code
is used for are rarely exercised - they only occur when Windows
security API calls fail catastrophically (out of memory, security
subsystem corruption, etc).

The FRONTEND variant can be fixed just by calling vfprintf()
instead of fprintf(). However, there was no va_list variant
of write_stderr(), so create one by refactoring that function.
Following the usual naming convention for such things, call
it vwrite_stderr().

Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com
Backpatch-through: 13

Doc: clarify n_distinct_inherited setting

There was some confusion around how to adjust the n_distinct estimates
for partitioned tables. Here we try and clarify that
n_distinct_inherited needs to be adjusted rather than n_distinct.

Also fix some slightly misleading text which was talking about table
size rather than table rows, fix a grammatical error, and adjust some
text which indicated that ANALYZE was performing calculations based on
the n_distinct settings. Really it's the query planner that does this
and ANALYZE only stores the overridden n_distinct estimate value in
pg_statistic.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvrL7a-ZytM1SP8Uk9nEw9bR2CPzVb+uP+bcNj=_q-ZmVw@mail.gmail.com

Fix serious performance problems in LZ4Stream_read_internal.

I was distressed to find that reading an LZ4-compressed toc.dat
file was hundreds of times slower than it ought to be.  On
investigation, the blame mostly affixes to LZ4Stream_read_overflow's
habit of memmove'ing all the remaining buffered data after each read
operation.  Since reading a TOC file tends to involve a lot of small
(even one-byte) decompression calls, that amounts to an O(N^2) cost.

This could have been fixed with a minimal patch, but to my
eyes LZ4Stream_read_internal and LZ4Stream_read_overflow are
badly-written spaghetti code; in particular the eol_flag logic
is inefficient and duplicative.  I chose to throw the code
away and rewrite from scratch.  This version is about sixty
lines shorter as well as not having the performance issue.

Fortunately, AFAICT the only way to get to this problem is to
manually LZ4-compress the toc.dat and/or blobs.toc files within a
directory-style archive; in the main data files, we read blocks
that are large enough that the O(N^2) behavior doesn't manifest.
Few people do that, which likely explains the lack of field
complaints.  Otherwise this performance bug might be considered
bad enough to warrant back-patching.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us

Fix poor buffering logic in pg_dump's lz4 and zstd compression code.

Both of these modules dumped each bit of output that they got from
the underlying compression library as a separate "data block" in
the emitted archive file.  In the case of zstd this'd frequently
result in block sizes well under 100 bytes; lz4 is a little better
but still produces blocks around 300 bytes, at least in the test
case I tried.  This bloats the archive file a little bit compared
to larger block sizes, but the real problem is that when pg_restore
has to skip each data block rather than seeking directly to some
target data, tiny block sizes are enormously inefficient.

Fix both modules so that they fill their allocated buffer reasonably
well before dumping a data block.  In the case of lz4, also delete
some redundant logic that caused the lz4 frame header to be emitted
as a separate data block.  (That saves little, but I see no reason
to expend extra code to get worse results.)

I fixed the "stream API" code too.  In those cases, feeding small
amounts of data to fwrite() probably doesn't have any meaningful
performance consequences.  But it seems like a bad idea to leave
the two sets of code doing the same thing in two different ways.

In passing, remove unnecessary "extra paranoia" check in
_ZstdWriteCommon.  _CustomWriteFunc (the only possible referent
of cs->writeF) already protects itself against zero-length writes,
and it's really a modularity violation for _ZstdWriteCommon to know
that the custom format disallows empty data blocks.  Also, fix
Zstd_read_internal to do less work when passed size == 0.

Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us

Fix issue with reading zero bytes in Gzip_read.

pg_dump expects a read request of zero bytes to be a no-op; see for
example ReadStr().  Gzip_read got this wrong and falsely supposed
that the resulting gzret == 0 indicated an error.  We could complicate
that error-checking logic some more, but it seems best to just fall
out immediately when passed size == 0.

This bug breaks the nominally-supported case of manually gzip'ing
the toc.dat file within a directory-style dump, so back-patch to v16
where this code came in.  (Prior branches already have a short-circuit
for size == 0 before their only gzread call.)

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 16

docs: Fix protocol version 3.2 message format of CancelRequest

Since protocol version 3.2 the CancelRequest does not have a fixed size
length anymore. The protocol docs still listed the length field to be a
constant number though. This fixes that.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reported-by: Dmitry Igrishin <dmitigr@gmail.com>
Backpatch-through: 18

Remove extra semicolon in example

Reported-By: Pavel Luzanov <p.luzanov@postgrespro.ru>
Discussion: https://postgr.es/m/175976566145.768.4645962241073007347@wrigleys.postgresql.org
Backpatch-through: 18

Remove unused nbtree array advancement variable.

Remove a variable that is no longer in use following commit 9a2e2a28.
It's not immediately clear why there were no compiler warnings about
this oversight.

Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18

Restore test coverage of LZ4Stream_gets().

In commit a45c78e32 I removed the only regression test case that
reaches this function, because it turns out that we only use it
if reading an LZ4-compressed blobs.toc file in a directory dump,
and that is a state that has to be created manually. That seems
like a bad thing to not test, not so much for LZ4Stream_gets()
itself as because it means the squirrely eol_flag logic in
LZ4Stream_read_internal() is not tested.

The reason for the change was that I thought the lz4 program did not
have any way to perform compression without explicit specification
of the output file name. However, it turns out that the syntax
synopsis in its man page is a lie, and if you read enough of the
man page you find out that with "-m" it will do what's needful.
So restore the manual compression step in that test case.

Noted while testing some proposed changes in pg_dump's compression
logic.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 17

Stop creating constraints during DETACH CONCURRENTLY

Commit 71f4c8c6f74b (which implemented DETACH CONCURRENTLY) added code
to create a separate table constraint when a table is detached
concurrently, identical to the partition constraint, on the theory that
such a constraint was needed in case the optimizer had constructed any
query plans that depended on the constraint being there.  However, that
theory was apparently bogus because any such plans would be invalidated.

For hash partitioning, those constraints are problematic, because their
expressions reference the OID of the parent partitioned table, to which
the detached table is no longer related; this causes all sorts of
problems (such as inability of restoring a pg_dump of that table, and
the table no longer working properly if the partitioned table is later
dropped).

We'd like to get rid of all those constraints.  In fact, for branch
master, do that -- no longer create any substitute constraints.
However, out of fear that some users might somehow depend on these
constraints for other partitioning strategies, for stable branches
(back to 14, which added DETACH CONCURRENTLY), only do it for hash
partitioning.

(If you repeatedly DETACH CONCURRENTLY and then ATTACH a partition, then
with this constraint addition you don't need to scan the table in the
ATTACH step, which presumably is good.  But if users really valued this
feature, they would have requested that it worked for non-concurrent
DETACH also.)

Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reported-by: Fei Changhong <feichanghong@qq.com>
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/18371-7fef49f63de13f02@postgresql.org
Discussion: https://postgr.es/m/19070-781326347ade7c57@postgresql.org

dbase_redo: Fix Valgrind-reported memory leak

Introduced by my (Álvaro's) commit 9e4f914b5eba, which was itself
backpatched to pg10, though only pg15 and up contain the problem
because of commit 9c08aea6a309.

This isn't a particularly significant leak, but given the fix is
trivial, we might as well backpatch to all branches where it applies, so
do that.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/x4odfdlrwvsjawscnqsqjpofvauxslw7b4oyvxgt5owoyf4ysn@heafjusodrz7

Remove overzealous _bt_killitems assertion.

An assertion in _bt_killitems expected the scan's currPos state to
contain a valid LSN, saved from when currPos's page was initially read.
The assertion failed to account for the fact that even logged relations
can have leaf pages with an invalid LSN when built with wal_level set to
"minimal". Remove the faulty assertion.

Oversight in commit e6eed40e (though note that the assertion was
backpatched to stable branches before 18 by commit 7c319f54).

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl>
Bug: #19082
Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org
Backpatch-through: 13

Fix two typos in xlogstats.h and xlogstats.c

Issue found while browsing this area of the code, introduced and
copy-pasted around by 2258e76f90bf.

Backpatch-through: 15

Remove state.tmp when failing to save a replication slot

An error happening while a slot data is saved on disk in
SaveSlotToPath() could cause a state.tmp file (temporary file holding
the slot state data, renamed to its permanent name at the end of the
function) to remain around after it has been created.  This temporary
file is created with O_EXCL, meaning that if an existing state.tmp is
found, its creation would fail.  This would prevent the slot data to be
saved, requiring a manual intervention to remove state.tmp before being
able to save again a slot.  Possible scenarios where this temporary file
could remain on disk is for example a ENOSPC case (no disk space) while
writing, syncing or renaming it.  The bug reports point to a write
failure as the principal cause of the problems.

Using O_TRUNC has been argued back in 2019 as a potential solution to
discard any temporary file that could exist.  This solution was rejected
as O_EXCL can also act as a safety measure when saving the slot state,
crash recovery offering cleanup guarantees post-crash.  This commit uses
the alternative approach that has been suggested by Andres Freund back
in 2019.  When the temporary state file cannot be written, synced,
closed or renamed (note: not when created!), an unlink() is used to
remove the temporary state file while holding the in-progress I/O
LWLock, so as any follow-up attempts to save a slot's data would not
choke on an existing file that remained around because of a previous
failure.

This problem has been reported a few times across the years, going back
to 2019, but for some reason I have never come back to do something
about it and it has been forgotten.  A recent report has reminded me
that this was still a problem.

Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Reported-by: Sergei Kornilov <sk@zsrv.org>
Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru>
Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com
Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru
Backpatch-through: 13

bufmgr: Fix valgrind checking for buffers pinned in StrategyGetBuffer()

In 5e899859287 I made StrategyGetBuffer() pin buffers with a single CAS,
instead of using PinBuffer_Locked(). Unfortunately I missed that
PinBuffer_Locked() marked the page as defined for valgrind.

Fix this oversight by centralizing the valgrind initialization into
TrackNewBufferPin(), which also allows us to reduce the number of places doing
VALGRIND_MAKE_MEM_DEFINED.

Per buildfarm animal skink and Amit Langote.

Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Discussion: https://postgr.es/m/CA+HiwqGKJ6nEXEPQW7EpykVsEtzxp5-up_xhtcUAkWFtATVQvQ@mail.gmail.com

test_bitmapset: Improve random function

test_random_operations() did not check the result returned by
bms_is_member() in its last phase, when checking that the contents of
the bitmap match with what is expected. This was impacting the
reliability of the function and the coverage it could provide.

This commit improves the whole function, adding more checks based on
bms_is_member(), using a bitmap and a secondary array that tracks the
members added by random additions and deletions.

While on it, more comments are added to document the internals of the
function.

Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Author: Greg Burd <greg@burd.me>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEudQAq_zOSA2NUQSWePTGV_=90Uw0WcXxGOWnN-vwF046OOqA@mail.gmail.com

Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE

Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the VM block changes in the
XLOG_HEAP2_MULTI_INSERT record.

This halves the number of WAL records emitted by COPY FREEZE.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com

Cleanup VACUUM option processing error messages

The processing of the PARALLEL option for VACUUM was not quite
following what the DefElem code had intended. defGetInt32() already has
code to handle missing parameters and returns a perfectly good error
message for when that happens.

Here we get rid of the ExecVacuum() error:

ERROR: parallel option requires a value between 0 and N

and leave defGetInt32() handle it, which will give:

ERROR: parallel requires an integer value

defGetInt32() was already handling the non-integer parameter case, so it
may as well handle the missing parameter case too.

Additionally, parameterize the option name to make translator work easier,
and also use errhint_internal() rather than errhint() for the
BUFFER_USAGE_LIMIT option since there isn't any work for a translator to
do for "%s".

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAApHDvovH14tNWB+WvP6TSbfi7-=TysQ9h5tQ5AgavwyWRWKHA@mail.gmail.com

Clean up memory leakage that occurs in context callback functions.

An error context callback function might leak some memory into
ErrorContext, since those functions are run with ErrorContext as
current context.  In the case where the elevel is ERROR, this is
no problem since the code level that catches the error should do
FlushErrorState to clean up, and that will reset ErrorContext.
However, if the elevel is less than ERROR then no such cleanup occurs.
In principle, repeated leaks while emitting log messages or client
notices could accumulate arbitrarily much leaked data, if no ERROR
occurs in the session.

To fix, let errfinish() perform an ErrorContext reset if it is
at the outermost error nesting level.  (If it isn't, we'll delay
cleanup until the outermost nesting level is exited.)

The only actual leakage of this sort that I've been able to observe
within our regression tests was recently introduced by commit
f727b63e8.  While it seems plausible that there are other such
leaks not reached in the regression tests, the lack of field
reports suggests that they're not a big problem.  Accordingly,
I won't take the risk of back-patching this now.  We can always
back-patch later if we get field reports of leaks.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/jngsjonyfscoont4tnwi2qoikatpd5hifsg373vmmjvugwiu6g@m6opxh7uisgd

Fix access-to-already-freed-memory issue in pgoutput.

While pgoutput caches relation synchronization information in
RelationSyncCache that resides in CacheMemoryContext, each entry's
information (such as row filter expressions and column lists) is
stored in the entry's private memory context (entry_cxt in
RelationSyncEntry), which is a descendant memory context of the
decoding context. If a logical decoding invoked via SQL functions like
pg_logical_slot_get_binary_changes fails with an error, subsequent
logical decoding executions could access already-freed memory of the
entry's cache, resulting in a crash.

With this change, it's ensured that RelationSyncCache is cleaned up
even in error cases by using a memory context reset callback function.

Backpatch to 15, where entry_cxt was introduced for column filtering
and row filtering.

While the backbranches v13 and v14 have a similar issue where
RelationSyncCache persists even after an error when pgoutput is used
via SQL API, we decided not to backport this fix. This decision was
made because v13 is approaching its final minor release, and we won't
have an chance to fix any new issues that might arise. Additionally,
since using pgoutput via SQL API is not a common use case, the risk
outwights the benefit. If we receive bug reports, we can consider
backporting the fixes then.

Author: vignesh C <vignesh21@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com
Backpatch-through: 15

Avoid uninitialized-variable warnings from older compilers.

Some of the buildfarm is still unhappy with WinGetFuncArgInPartition
even after 2273fa32b. While it seems to be just very old compilers,
we can suppress the warnings and arguably make the code more readable
by not initializing these variables till closer to where they are
used. While at it, make a couple of cosmetic comment improvements.

Fix comment in eager_aggregate.sql

The comment stated that eager aggregation is disabled by default,
which is no longer true. This patch removes that comment as well as
the related GUC set statement.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvr4YWpiMR3RsgYwJWv-u8xoRqTAKRiYy9zUszjZOqG4Ug@mail.gmail.com

Remove unnecessary include of "utils/fmgroids.h"

In initsplan.c, no macros for built-in function OIDs are used, so this
include is unnecessary and can be removed. This was my oversight in
commit 8e1185910.

Discussion: https://postgr.es/m/CAMbWs4_-sag-cAKrLJ+X+5njL1=oudk=+KfLmsLZ5a2jckn=kg@mail.gmail.com

Remove duplicated log related to slot creation in pg_createsubscriber

The creation of a replication slot done in a specific database on a
publisher was logged twice, with the second log not mentioning the
database where the slot creation happened. This commit removes the
information logged after a slot has been successfully created, moving
the information about the publisher from the second to the first log.
Note that failing a slot creation is also logged, so there is no loss of
information.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pv7qDvLbDgc9PQGhULT3rPXTxdu_=w+iW-kMs+zPADR+w@mail.gmail.com

Add "ALL SEQUENCES" support to publications.

This patch adds support for the ALL SEQUENCES clause in publications,
enabling synchronization/replication of all sequences that is useful for
upgrades.

Publications can now include all sequences via FOR ALL SEQUENCES.
psql enhancements:
\d shows publications for a given sequence.
\dRp indicates if a publication includes all sequences.

ALL SEQUENCES can be combined with ALL TABLES, but not with other options
like TABLE or TABLES IN SCHEMA. We can extend support for more granular
clauses in future.

The view pg_publication_sequences provides information about the mapping
between publications and sequences.

This patch enables publishing of sequences; subscriber-side support will
be added in upcoming patches.

Author: vignesh C <vignesh21@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com

Fix internal error from CollateExpr in SQL/JSON DEFAULT expressions

SQL/JSON functions such as JSON_VALUE could fail with "unrecognized
node type" errors when a DEFAULT clause contained an explicit COLLATE
expression. That happened because assign_collations_walker() could
invoke exprSetCollation() on a JsonBehavior expression whose DEFAULT
still contained a CollateExpr, which exprSetCollation() does not
handle.

For example:

SELECT JSON_VALUE('{"a":1}', '$.c' RETURNING text
DEFAULT 'A' COLLATE "C" ON EMPTY);

Fix by validating in transformJsonBehavior() that the DEFAULT
expression's collation matches the enclosing JSON expression’s
collation. In exprSetCollation(), replace the recursive call on the
JsonBehavior expression with an assertion that its collation already
matches the target, since the parser now enforces that condition.

Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxHVwYYSyiVQ6o+PsRX6zQ7rAFinh_fv1kCfTsT1xG4Zeg@mail.gmail.com
Backpatch-through: 17

Make truncate_useless_pathkeys() consider WindowFuncs

truncate_useless_pathkeys() seems to have neglected to account for
PathKeys that might be useful for WindowClause evaluation.  Modify it so
that it properly accounts for that.

Making this work required adjusting two things:

1. Change from checking query_pathkeys to check sort_pathkeys instead.
2. Add explicit check for window_pathkeys

For #1, query_pathkeys gets set in standard_qp_callback() according to the
sort order requirements for the first operation to be applied after the
join planner is finished, so this changes depending on which upper
planner operations a particular query needs.  If the query has window
functions and no GROUP BY, then query_pathkeys gets set to
window_pathkeys.  Before this change, this meant PathKeys useful for the
ORDER BY were not accounted for in queries with window functions.

Because of #1, #2 is now required so that we explicitly check to ensure
we don't truncate away PathKeys useful for window functions.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrj3HTKmXoLMbUjTO=_MNMxM=cnuCSyBKidAVibmYPnrg@mail.gmail.com

bufmgr: Don't lock buffer header in StrategyGetBuffer()

Previously StrategyGetBuffer() acquired the buffer header spinlock for every
buffer, whether it was reusable or not. If reusable, it'd be returned, with
the lock held, to GetVictimBuffer(), which then would pin the buffer with
PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for
holding spinlocks (i.e. that they are only held for a few lines of consecutive
code) and necessitates using PinBuffer_Locked(), which scales worse than
PinBuffer() due to holding the spinlock. This alone makes it worth changing
the code.

However, the main reason to change this is that a future commit will make
PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain
scalability for the much more common case of pinning a pre-existing buffer. By
pinning the buffer with a single atomic operation, iff the buffer is reusable,
we avoid any potential regression for miss-heavy workloads. There strictly are
fewer atomic operations for each potential buffer after this change.

The price for this improvement is that freelist.c needs two CAS loops and
needs to be able to set up the resource accounting for pinned buffers. The
latter is achieved by exposing a new function for that purpose from bufmgr.c,
that seems better than exposing the entire private refcount infrastructure.
The improvement seems worth the complexity.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff

bufmgr: fewer calls to BufferDescriptorGetContentLock

We're planning to merge buffer content locks into BufferDesc.state. To reduce
the size of that patch, centralize calls to BufferDescriptorGetContentLock().

The biggest part of the change is in assertions, by introducing
BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This
seems like an improvement even without aforementioned plans.

Additionally replace some direct calls to LWLockAcquire() with calls to
LockBuffer().

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff

bufmgr: Fix signedness of mask variable in BufferSync()

BM_PERMANENT is defined as 1U<<31, which is a negative number when interpreted
as a signed integer. Unfortunately the mask variable in BufferSync() was
signed. This has been wrong for a long time, but failed to fail, due to
integer conversion rules.

However, in an upcoming patch the width of the state variable will be
increased, with the wrong signedness leading to never flushing permanent
buffers - luckily caught in a test.

It seems better to fix this separately, instead of doing so as part of a
large, otherwise mechanical, patch.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff

bufmgr: Introduce FlushUnlockedBuffer

There were several copies of code locking a buffer, flushing its contents, and
unlocking the buffer. It seems worth centralizing that into a helper function.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff