Tom Lane [Tue, 31 Jan 2023 19:32:24 +0000 (14:32 -0500)]
Doc: clarify use of NULL to drop comments and security labels.
This was only mentioned in the description of the text/label, which
are marked as being in quotes in the synopsis, which can cause
confusion (as witnessed on IRC).
Also separate the literal and NULL cases in the parameter list, per
suggestion from Tom Lane.
Michael Paquier [Tue, 31 Jan 2023 03:47:20 +0000 (12:47 +0900)]
Remove recovery test 011_crash_recovery.pl
This test has been added as of 857ee8e that has introduced the SQL
function txid_status(), with the purpose of checking that a transaction
ID still in-progress during a crash is correctly marked as aborted after
recovery finishes.
This test is unstable, and some configuration scenarios may that easier
to reproduce (wal_level=minimal, wal_compression=on) because the WAL
holding the information about the in-progress transaction ID may not
have made it to disk yet, hence a post-crash recovery may cause the same
XID to be reused, triggering a test failure.
We have discussed a few approaches, like making this function force a
WAL flush to make it reliable across crashes, but we don't want to pay a
performance penalty in some scenarios, as well. The test could have
been tweaked to enforce a checkpoint but that actually breaks the
promise of the test to rely on a stable result of txid_status() after
a crash.
This issue has been reported a few times across the past years, with an
original report from Kyotaro Horiguchi. The buildfarm machines tanager,
hachi and gokiburi enable wal_compression, and fail on this test
periodically.
Thomas Munro [Thu, 26 Jan 2023 01:50:07 +0000 (14:50 +1300)]
Fix rare sharedtuplestore.c corruption.
If the final chunk of an oversized tuple being written out to disk was
exactly 32760 bytes, it would be corrupted due to a fencepost bug.
Bug #17619. Back-patch to 11 where the code arrived.
While testing that (see test module in archives), I (tmunro) noticed
that the per-participant page counter was not initialized to zero as it
should have been; that wasn't a live bug when it was written since DSM
memory was originally always zeroed, but since 14
min_dynamic_shared_memory might be configured and it supplies non-zeroed
memory, so that is also fixed here.
Andres Freund [Tue, 24 Jan 2023 02:04:02 +0000 (18:04 -0800)]
Fix error handling in libpqrcv_connect()
When libpqrcv_connect (also known as walrcv_connect()) failed, it leaked the
libpq connection. In most paths that's fairly harmless, as the calling process
will exit soon after. But e.g. CREATE SUBSCRIPTION could lead to a somewhat
longer lived leak.
Fix by releasing resources, including the libpq connection, on error.
Add a test exercising the error code path. To make it reliable and safe, the
test tries to connect to port=-1, which happens to fail during connection
establishment, rather than during connection string parsing.
Tom Lane [Sat, 21 Jan 2023 18:10:30 +0000 (13:10 -0500)]
Allow REPLICA IDENTITY to be set on an index that's not (yet) valid.
The motivation for this change is that when pg_dump dumps a
partitioned index that's marked REPLICA IDENTITY, it generates a
command sequence that applies REPLICA IDENTITY before the partitioned
index has been marked valid, causing restore to fail. We could
perhaps change pg_dump to not do it like that, but that would be
difficult and would not fix existing dump files with the problem.
There seems to be very little reason for the backend to disallow
this anyway --- the code ignores indisreplident when the index
isn't valid --- so instead let's fix it by allowing the case.
Commit 9511fb37a previously expressed a concern that allowing
indisreplident to be set on invalid indexes might allow us to
wind up in a situation where a table could have indisreplident
set on multiple indexes. I'm not sure I follow that concern
exactly, but in any case the only way that could happen is because
relation_mark_replica_identity is too trusting about the existing set
of markings being valid. Let's just rip out its early-exit code path
(which sure looks like premature optimization anyway; what are we
doing expending code to make redundant ALTER TABLE ... REPLICA
IDENTITY commands marginally faster and not-redundant ones marginally
slower?) and fix it to positively guarantee that no more than one
index is marked indisreplident.
The pg_dump failure can be demonstrated in all supported branches,
so back-patch all the way. I chose to back-patch 9511fb37a as well,
just to keep indisreplident handling the same in all branches.
Noah Misch [Sat, 21 Jan 2023 14:08:00 +0000 (06:08 -0800)]
Reject CancelRequestPacket having unexpected length.
When the length was too short, the server read outside the allocation.
That yielded the same log noise as sending the correct length with
(backendPID,cancelAuthCode) matching nothing. Change to a message about
the unexpected length. Given the attacker's lack of control over the
memory layout and the general lack of diversity in memory layouts at the
code in question, we doubt a would-be attacker could cause a segfault.
Hence, while the report arrived via security@postgresql.org, this is not
a vulnerability. Back-patch to v11 (all supported versions).
Andrey Borodin, reviewed by Tom Lane. Reported by Andrey Borodin.
Tom Lane [Fri, 20 Jan 2023 16:58:12 +0000 (11:58 -0500)]
Make our back branches build under -fkeep-inline-functions.
Add "#ifndef FRONTEND" where necessary to make pg_waldump build
on compilers that don't elide unused static-inline functions.
This back-patches relevant parts of commit 3e9ca5260, fixing build
breakage from dc7420c2c and back-patching of f10f0ae42.
Per recently-resurrected buildfarm member castoroides. We aren't
expecting castoroides to build anything newer than v11, but we
might as well clean up the intermediate branches while at it.
Tom Lane [Thu, 19 Jan 2023 17:23:20 +0000 (12:23 -0500)]
Log the correct ending timestamp in recovery_target_xid mode.
When ending recovery based on recovery_target_xid matching with
recovery_target_inclusive = off, we printed an incorrect timestamp
(always 2000-01-01) in the "recovery stopping before ... transaction"
log message. This is a consequence of sloppy refactoring in c945af80c: the code to fetch recordXtime out of the commit/abort
record used to be executed unconditionally, but it was changed
to get called only in the RECOVERY_TARGET_TIME case. We need only
flip the order of operations to restore the intended behavior.
Per report from Torsten Förtsch. Back-patch to all supported
branches.
Michael Paquier [Thu, 19 Jan 2023 04:13:34 +0000 (13:13 +0900)]
Add missing assign hook for GUC checkpoint_completion_target
This is wrong since 88e9823, that has switched the WAL sizing
configuration from checkpoint_segments to min_wal_size and
max_wal_size. This missed the recalculation of the internal value of
the internal "CheckPointSegments", that works as a mapping of the old
GUC checkpoint_segments, on reload, for example, and it controls the
timing of checkpoints depending on the volume of WAL generated.
Most users tend to leave checkpoint_completion_target at 0.9 to smooth
the I/O workload, which is why I guess this has gone unnoticed for so
long, still it can be useful to tweak and reload the value dynamically
in some cases to control the timing of checkpoints.
Tom Lane [Tue, 17 Jan 2023 21:00:39 +0000 (16:00 -0500)]
AdjustUpgrade.pm should zap test_ext_cine, too.
test_extensions' test_ext_cine extension has the same upgrade hazard
as test_ext7: the regression test leaves it in an updated state
from which no downgrade path to default is provided. This causes
the update_extensions.sql script helpfully provided by pg_upgrade
to fail. So drop it in cross-version-upgrade testing.
Not entirely sure how come I didn't hit this in testing yesterday;
possibly I'd built the upgrade reference databases with
testmodules-install-check disabled.
Backpatch to v10 where this module was introduced.
Tom Lane [Tue, 17 Jan 2023 01:35:53 +0000 (20:35 -0500)]
Create common infrastructure for cross-version upgrade testing.
To test pg_upgrade across major PG versions, we have to be able to
modify or drop any old objects with no-longer-supported properties,
and we have to be able to deal with cosmetic changes in pg_dump output.
Up to now, the buildfarm and pg_upgrade's own test infrastructure had
separate implementations of the former, and we had nothing but very
ad-hoc rules for the latter (including an arbitrary threshold on how
many lines of unchecked diff were okay!). This patch creates a Perl
module that can be shared by both those use-cases, and adds logic
that deals with pg_dump output diffs in a much more tightly defined
fashion.
This largely supersedes previous efforts in commits 0df9641d3, 9814ff550, and 62be9e4cd, which developed a SQL-script-based solution
for the task of dropping old objects. There was nothing fundamentally
wrong with that work in itself, but it had no basis for solving the
output-formatting problem. The most plausible way to deal with
formatting is to build a Perl module that can perform editing on the
dump files; and once we commit to that, it makes more sense for the
same module to also embed the knowledge of what has to be done for
dropping old objects.
Back-patch versions of the helper module as far as 9.2, to
support buildfarm animals that still test that far back.
It's also necessary to back-patch PostgreSQL/Version.pm,
because the new code depends on that. I fixed up pg_upgrade's
002_pg_upgrade.pl in v15, but did not look into back-patching
it further than that.
Thomas Munro [Thu, 12 Jan 2023 21:40:52 +0000 (10:40 +1300)]
Fix WaitEventSetWait() buffer overrun.
The WAIT_USE_EPOLL and WAIT_USE_KQUEUE implementations of
WaitEventSetWaitBlock() confused the size of their internal buffer with
the size of the caller's output buffer, and could ask the kernel for too
many events. In fact the set of events retrieved from the kernel needs
to be able to fit in both buffers, so take the smaller of the two.
The WAIT_USE_POLL and WAIT_USE WIN32 implementations didn't have this
confusion.
This probably didn't come up before because we always used the same
number in both places, but commit 7389aad6 calculates a dynamic size at
construction time, while using MAXLISTEN for its output event buffer on
the stack. That seems like a reasonable thing to want to do, so
consider this to be a pre-existing bug worth fixing.
As discovered by valgrind on skink.
Back-patch to all supported releases for epoll, and to release 13 for
the kqueue part, which copied the incorrect epoll code.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/901504.1673504836%40sss.pgh.pa.us
Dean Rasheed [Fri, 6 Jan 2023 11:09:56 +0000 (11:09 +0000)]
Fix tab completion of ALTER FUNCTION/PROCEDURE/ROUTINE ... SET SCHEMA.
The ALTER DATABASE|FUNCTION|PROCEDURE|ROLE|ROUTINE|USER ... SET <name>
case in psql tab completion failed to exclude <name> = "SCHEMA", which
caused ALTER FUNCTION|PROCEDURE|ROUTINE ... SET SCHEMA to complete
with "FROM CURRENT" and "TO", which won't work.
Fix that, so that those cases now complete with the list of schemas,
like other ALTER ... SET SCHEMA commands.
Noticed while testing the recent patch to improve tab completion for
ALTER FUNCTION/PROCEDURE/ROUTINE, but this is not directly related to
that patch. Rather, this is a long-standing bug, so back-patch to all
supported branches.
Robert Haas [Tue, 3 Jan 2023 19:50:40 +0000 (14:50 -0500)]
Improve documentation of the CREATEROLE attibute.
In user-manag.sgml, document precisely what privileges are conveyed
by CREATEROLE. Make particular note of the fact that it allows
changing passwords and granting access to high-privilege roles.
Also remove the suggestion of using a user with CREATEROLE and
CREATEDB instead of a superuser, as there is no real security
advantage to this approach.
Elsewhere in the documentation, adjust text that suggests that
<literal>CREATEROLE</literal> only allows for role creation, and
refer to the documentation in user-manag.sgml as appropriate.
Michael Paquier [Tue, 3 Jan 2023 07:26:38 +0000 (16:26 +0900)]
Fix typos in comments, code and documentation
While on it, newlines are removed from the end of two elog() strings.
The others are simple grammar mistakes. One comment in pg_upgrade
referred incorrectly to sequences since a7e5457.
Andres Freund [Thu, 29 Dec 2022 20:47:29 +0000 (12:47 -0800)]
perl: Hide warnings inside perl.h when using gcc compatible compiler
New versions of perl trigger warnings within perl.h with our compiler
flags. At least -Wdeclaration-after-statement, -Wshadow=compatible-local are
known to be problematic.
To avoid these warnings, conditionally use #pragma GCC system_header before
including plperl.h.
Alternatively, we could add the include paths for problematic headers with
-isystem, but that is a larger hammer and is harder to search for.
A more granular alternative would be to use #pragma GCC diagnostic
push/ignored/pop, but gcc warns about unknown warnings being ignored, so every
to-be-ignored-temporarily compiler warning would require its own pg_config.h
symbol and #ifdef.
As the warnings are voluminous, it makes sense to backpatch this change. But
don't do so yet, we first want gather buildfarm coverage - it's e.g. possible
that some compiler claiming to be gcc compatible has issues with the pragma.
Author: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: Discussion: https://postgr.es/m/20221228182455.hfdwd22zztvkojy2@awork3.anarazel.de
Tom Lane [Mon, 2 Jan 2023 21:17:00 +0000 (16:17 -0500)]
Avoid reference to nonexistent array element in ExecInitAgg().
When considering an empty grouping set, we fetched
phasedata->eqfunctions[-1]. Because the eqfunctions array is
palloc'd, that would always be an aset pointer in released versions,
and thus the code accidentally failed to malfunction (since it would
do nothing unless it found a null pointer). Nonetheless this seems
like trouble waiting to happen, so add a check for length == 0.
It's depressing that our valgrind testing did not catch this.
Maybe we should reconsider the choice to not mark that word NOACCESS?
Michael Paquier [Fri, 23 Dec 2022 01:04:37 +0000 (10:04 +0900)]
Fix come incorrect elog() messages in aclchk.c
Three error strings used with cache lookup failures were referring to
incorrect object types for ACL checks:
- Schemas
- Types
- Foreign Servers
There errors should never be triggered, but if they do incorrect
information would be reported.
Tom Lane [Thu, 22 Dec 2022 15:35:03 +0000 (10:35 -0500)]
Add some recursion and looping defenses in prepjointree.c.
Andrey Lepikhov demonstrated a case where we spend an unreasonable
amount of time in pull_up_subqueries(). Not only is that recursing
with no explicit check for stack overrun, but the code seems not
interruptable by control-C. Let's stick a CHECK_FOR_INTERRUPTS
there, along with sprinkling some stack depth checks.
An actual fix for the excessive time consumption seems a bit
risky to back-patch; but this isn't, so let's do so.
Tom Lane [Wed, 21 Dec 2022 22:51:50 +0000 (17:51 -0500)]
Fix contrib/seg to be more wary of long input numbers.
seg stores the number of significant digits in an input number
in a "char" field. If char is signed, and the input is more than
127 digits long, the count can read out as negative causing
seg_out() to print garbage (or, if you're really unlucky,
even crash).
To fix, clamp the digit count to be not more than FLT_DIG.
(In theory this loses some information about what the original
input was, but it doesn't seem like useful information; it would
not survive dump/restore in any case.)
Also, in case there are stored values of the seg type containing
bad data, add a clamp in seg_out's restore() subroutine.
Per bug #17725 from Robins Tharakan. It's been like this
forever, so back-patch to all supported branches.
Tom Lane [Tue, 13 Dec 2022 19:23:59 +0000 (14:23 -0500)]
Rethink handling of [Prevent|Is]InTransactionBlock in pipeline mode.
Commits f92944137 et al. made IsInTransactionBlock() set the
XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false",
on the grounds that that kept its API promises equivalent to those of
PreventInTransactionBlock(). This turns out to be a bad idea though,
because it allows an ANALYZE in a pipelined series of commands to
cause an immediate commit, which is unexpected.
Furthermore, if we return "false" then we have another issue,
which is that ANALYZE will decide it's allowed to do internal
commit-and-start-transaction sequences, thus possibly unexpectedly
committing the effects of previous commands in the pipeline.
To fix the latter situation, invent another transaction state flag
XACT_FLAGS_PIPELINING, which explicitly records the fact that we
have executed some extended-protocol command and not yet seen a
commit for it. Then, require that flag to not be set before allowing
InTransactionBlock() to return "false".
Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT
without fear of causing problems. This means that the API guarantees
of IsInTransactionBlock now diverge from PreventInTransactionBlock,
which is mildly annoying, but it seems OK given the very limited usage
of IsInTransactionBlock. (In any case, a caller preferring the old
behavior could always set NEEDIMMEDIATECOMMIT for itself.)
For consistency also require XACT_FLAGS_PIPELINING to not be set
in PreventInTransactionBlock. This too is meant to prevent commands
such as CREATE DATABASE from silently committing previous commands
in a pipeline.
Per report from Peter Eisentraut. As before, back-patch to all
supported branches (which sadly no longer includes v10).
Tom Lane [Sun, 4 Dec 2022 18:17:18 +0000 (13:17 -0500)]
Fix generate_partitionwise_join_paths() to tolerate failure.
We might fail to generate a partitionwise join, because
reparameterize_path_by_child() does not support all path types.
This should not be a hard failure condition: we should just fall back
to a non-partitioned join. However, generate_partitionwise_join_paths
did not consider this possibility and would emit the (misleading)
error "could not devise a query plan for the given query" if we'd
failed to make any paths for a child join. Fix it to give up on
partitionwise joining if so. (The accepted technique for giving up
appears to be to set rel->nparts = 0, which I find pretty bizarre,
but there you have it.)
I have not added a test case because there'd be little point:
any omissions of this sort that we identify would soon get fixed
by extending reparameterize_path_by_child(), so the test would stop
proving anything. However, right now there is a known test case based
on failure to cover MaterialPath, and with that I've found that this
is broken in all supported versions. Hence, patch all the way back.
Original report and patch by me; thanks to Richard Guo for
identifying a test case that works against committed versions.
Dean Rasheed [Sat, 3 Dec 2022 12:20:02 +0000 (12:20 +0000)]
Fix DEFAULT handling for multi-row INSERT rules.
When updating a relation with a rule whose action performed an INSERT
from a multi-row VALUES list, the rewriter might skip processing the
VALUES list, and therefore fail to replace any DEFAULTs in it. This
would lead to an "unrecognized node type" error.
The reason was that RewriteQuery() assumed that a query doing an
INSERT from a multi-row VALUES list would necessarily only have one
item in its fromlist, pointing to the VALUES RTE to read from. That
assumption is correct for the original query, but not for product
queries produced for rule actions. In such cases, there may be
multiple items in the fromlist, possibly including multiple VALUES
RTEs.
What is required instead is for RewriteQuery() to skip any RTEs from
the product query's originating query, which might include one or more
already-processed VALUES RTEs. What's left should then include at most
one VALUES RTE (from the rule action) to be processed.
Andres Freund [Sat, 3 Dec 2022 01:50:26 +0000 (17:50 -0800)]
Prevent pgstats from getting confused when relkind of a relation changes
When the relkind of a relache entry changes, because a table is converted into
a view, pgstats can get confused in 15+, leading to crashes or assertion
failures.
For HEAD, Tom fixed this in b23cd185fd5, by removing support for converting a
table to a view, removing the source of the inconsistency. This commit just
adds an assertion that a relcache entry's relkind does not change, just in
case we end up with another case of that in the future. As there's no cases of
changing relkind anymore, we can't add a test that that's handled correctly.
For 15, fix the problem by not maintaining the association with the old pgstat
entry when the relkind changes during a relcache invalidation processing. In
that case the pgstat entry needs to be unlinked first, to avoid
PgStat_TableStatus->relation getting out of sync. Also add a test reproducing
the issues.
No known problem exists in 11-14, so just add the test there.
Reported-by: vignesh C <vignesh21@gmail.com>
Author: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALDaNm2yXz+zOtv7y5zBd5WKT8O0Ld3YxikuU3dcyCvxF7gypA@mail.gmail.com
Discussion: https://postgr.es/m/CALDaNm3oZA-8Wbps2Jd1g5_Gjrr-x3YWrJPek-mF5Asrrvz2Dg@mail.gmail.com
Backpatch: 15-
Bruce Momjian [Thu, 1 Dec 2022 15:45:07 +0000 (10:45 -0500)]
revert: add transaction processing chapter with internals info
This doc patch (master hash 66bc9d2d3e) was decided to be too
significant for backpatching, so reverted in all but master. Also fix
SGML file header comment in master.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/c6304b19-6ff7-f3af-0148-cf7aa7e2fbfd@enterprisedb.com
Tom Lane [Wed, 30 Nov 2022 18:01:41 +0000 (13:01 -0500)]
Reject missing database name in pg_regress and cohorts.
Writing "pg_regress --dbname= ..." led to a crash, because
we weren't expecting there to be no database name supplied.
It doesn't seem like a great idea to run regression tests
in whatever is the user's default database; so rather than
supporting this case let's explicitly reject it.
Per report from Xing Guo. Back-patch to all supported
branches.
Bruce Momjian [Wed, 30 Nov 2022 01:49:51 +0000 (20:49 -0500)]
doc: add transaction processing chapter with internals info
This also adds references to this new chapter at relevant sections of
our documentation. Previously much of these internal details were
exposed to users, but not explained. This also updates RELEASE
SAVEPOINT.
Michael Paquier [Tue, 29 Nov 2022 23:38:40 +0000 (08:38 +0900)]
Fix comment in fe-auth-scram.c
The frontend-side routine in charge of building a SCRAM verifier
mentioned that the restrictions applying to SASLprep on the password
with the encoding are described at the top of fe-auth-scram.c, but this
information is in auth-scram.c.
This is wrong since 8f8b9be, so backpatch all the way down as this is an
important documentation bit.
Tom Lane [Tue, 29 Nov 2022 20:43:17 +0000 (15:43 -0500)]
Improve heuristics for compressing the KnownAssignedXids array.
Previously, we'd compress only when the active range of array entries
reached Max(4 * PROCARRAY_MAXPROCS, 2 * pArray->numKnownAssignedXids).
If max_connections is large, the first term could result in not
compressing for a long time, resulting in much wastage of cycles in
hot-standby backends scanning the array to take snapshots. Get rid
of that term, and just bound it to 2 * pArray->numKnownAssignedXids.
That however creates the opposite risk, that we might spend too much
effort compressing. Hence, consider compressing only once every 128
commit records. (This frequency was chosen by benchmarking. While
we only tried one benchmark scenario, the results seem stable over
a fairly wide range of frequencies.)
Also, force compression when processing RecoveryInfo WAL records
(which should be infrequent); the old code could perform compression
then, but would do so only after the same array-range check as for
the transaction-commit path.
Also, opportunistically run compression if the startup process is about
to wait for WAL, though not oftener than once a second. This should
prevent cases where we waste lots of time by leaving the array
not-compressed for long intervals due to low WAL traffic.
Lastly, add a simple check to keep us from uselessly compressing
when the array storage is already compact.
Back-patch, as the performance problem is worse in pre-v14 branches
than in HEAD.
Simon Riggs and Michail Nikolaev, with help from Tom Lane and
Andres Freund.
Andrew Dunstan [Sun, 27 Nov 2022 14:03:22 +0000 (09:03 -0500)]
Fix binary mismatch for MSVC plperl vs gcc built perl libs
When loading plperl built against Strawberry perl or the msys2 ucrt perl
that have been built with gcc, a binary mismatch has been encountered
which looks like this:
Andrew Dunstan [Sat, 26 Nov 2022 12:44:23 +0000 (07:44 -0500)]
Add portlock directory to .gitignore
Commit 9b4eafcaf4 added creattion of a directory to reserve TAP test
ports at the top of the build tree. In a non-vpath build this means at
the top of the source tree, so it needs to be added to .gitignore.
Amit Kapila [Fri, 25 Nov 2022 03:26:54 +0000 (08:56 +0530)]
Fix uninitialized access to InitialRunningXacts during decoding.
In commit 272248a0c, we introduced an InitialRunningXacts array to
remember transactions and subtransactions that were running when the
xl_running_xacts record that we decoded was written. This array was
allocated in the snapshot builder memory context after we restore
serialized snapshot but we forgot to reset the array while freeing the
builder memory context. So, the next time when we start decoding in the
same session where we don't restore any serialized snapshot, we ended up
using the uninitialized array and that can lead to unpredictable behavior.
This problem doesn't exist in HEAD as instead of using
InitialRunningXacts, we added the list of transaction IDs and
sub-transaction IDs, that have modified catalogs and are running during
snapshot serialization, to the serialized snapshot (see commit 7f13ac8123).
Alvaro Herrera [Thu, 24 Nov 2022 09:45:10 +0000 (10:45 +0100)]
Make multixact error message more explicit
There are recent reports involving a very old error message that we have
no history of hitting -- perhaps a recently introduced bug. Improve the
error message in an attempt to improve our chances of investigating the
bug.
Tom Lane [Tue, 22 Nov 2022 19:40:20 +0000 (14:40 -0500)]
YA attempt at taming worst-case behavior of get_actual_variable_range.
We've made multiple attempts at preventing get_actual_variable_range
from taking an unreasonable amount of time (3ca930fc3, fccebe421).
But there's still an issue for the very first planning attempt after
deletion of a large number of extremal-valued tuples. While that
planning attempt will set "killed" bits on the tuples it visits and
thereby reduce effort for next time, there's still a lot of work it
has to do to visit the heap and then set those bits. It's (usually?)
not worth it to do that much work at plan time to have a slightly
better estimate, especially in a context like this where the table
contents are known to be mutating rapidly.
Therefore, let's bound the amount of work to be done by giving up
after we've visited 100 heap pages. Giving up just means we'll
fall back on the extremal value recorded in pg_statistic, so it
shouldn't mean that planner estimates suddenly become worthless.
Note that this means we'll still gradually whittle down the problem
by setting a few more index "killed" bits in each planning attempt;
so eventually we'll reach a good state (barring further deletions),
even in the absence of VACUUM.
Simon Riggs, per a complaint from Jakub Wartak (with cosmetic
adjustments by me). Back-patch to all supported branches.
Andrew Dunstan [Tue, 22 Nov 2022 15:35:04 +0000 (10:35 -0500)]
Prevent port collisions between concurrent TAP tests
Currently there is a race condition where if concurrent TAP tests both
test that they can open a port they will assume that it is free and use
it, causing one of them to fail. To prevent this we record a reservation
using an exclusive lock, and any TAP test that discovers a reservation
checks to see if the reserving process is still alive, and looks for
another free port if it is.
Ports are reserved in a directory set by the environment setting
PG_TEST_PORT_DIR, or if that doesn't exist a subdirectory of the top
build directory as set by Makefile.global, or its own
tmp_check directory.
The prove_check recipe in Makefile.global.in is extended to export
top_builddir to the TAP tests. This was already exported by the
prove_installcheck recipes.
Replace link to Hunspell with the current homepage
The Hunspell project moved from Sourceforge to Github sometime
in 2016, so update our links to match the new URL. Backpatch
the doc changes to all supported versions.
Tom Lane [Mon, 21 Nov 2022 22:07:07 +0000 (17:07 -0500)]
Add comments and a missing CHECK_FOR_INTERRUPTS in ts_headline.
I just spent an annoying amount of time reverse-engineering the
100%-undocumented API between ts_headline and the text search
parser's prsheadline function. Add some commentary about that
while it's fresh in mind. Also remove some unused macros in
wparser_def.c.
While at it, I noticed that when commit 78e73e875 added a
CHECK_FOR_INTERRUPTS call in TS_execute_recurse, it missed
doing so in the parallel function TS_phrase_execute, which
surely needs one just as much.
Back-patch because of the missing CHECK_FOR_INTERRUPTS.
Might as well back-patch the rest of this too.
Andres Freund [Thu, 17 Nov 2022 04:00:59 +0000 (20:00 -0800)]
Fix mislabeling of PROC_QUEUE->links as PGPROC, fixing UBSan on 32bit
ProcSleep() used a PGPROC* variable to point to PROC_QUEUE->links.next,
because that does "the right thing" with SHMQueueInsertBefore(). While that
largely works, it's certainly not correct and unnecessary - we can just use
SHM_QUEUE* to point to the insertion point.
Noticed when testing a 32bit of postgres with undefined behavior
sanitizer. UBSan noticed that sometimes the supposed PGPROC wasn't
sufficiently aligned (required since 46d6e5f5679, ensured indirectly, via
ShmemAllocRaw() guaranteeing cacheline alignment).
For now fix this by using a SHM_QUEUE* for the insertion point. Subsequently
we should replace all the use of PROC_QUEUE and SHM_QUEUE with ilist.h, but
that's a larger change that we don't want to backpatch.
Backpatch to all supported versions - it's useful to be able to run postgres
under UBSan.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20221117014230.op5kmgypdv2dtqsf@awork3.anarazel.de
Backpatch: 11-
Tom Lane [Sat, 19 Nov 2022 18:09:14 +0000 (13:09 -0500)]
Doc: sync src/tutorial/basics.source with SGML documentation.
basics.source is supposed to be pretty closely in step with
the examples in chapter 2 of the tutorial, but I forgot to
update it in commit f05a5e000. Fix that, and adjust a couple
of other discrepancies that had crept in over time.
(I notice that advanced.source is nowhere near being in sync
with chapter 3, but I lack the ambition to do something
about that right now.)
Tom Lane [Sat, 19 Nov 2022 17:00:27 +0000 (12:00 -0500)]
pg_dump: avoid unsafe function calls in getPolicies().
getPolicies() had the same disease I fixed in other places in
commit e3fcbbd62, i.e., it was calling pg_get_expr() for
expressions on tables that we don't necessarily have lock on.
To fix, restrict the query to only collect interesting rows,
rather than doing the filtering on the client side.
Back-patch of commit 3e6e86abc. That's been in v15/HEAD long enough
to have some confidence about it, so now let's fix the problem in
older branches.
Tom Lane [Sat, 19 Nov 2022 16:40:30 +0000 (11:40 -0500)]
Postpone calls of unsafe server-side functions in pg_dump.
Avoid calling pg_get_partkeydef(), pg_get_expr(relpartbound),
and regtypeout until we have lock on the relevant tables.
The existing coding is at serious risk of failure if there
are any concurrent DROP TABLE commands going on --- including
drops of other sessions' temp tables.
Back-patch of commit e3fcbbd62. That's been in v15/HEAD long enough
to have some confidence about it, so now let's fix the problem in
older branches.
Original patch by me; thanks to Gilles Darold for back-patching
legwork.
Tom Lane [Thu, 17 Nov 2022 21:54:31 +0000 (16:54 -0500)]
Replace RelationOpenSmgr() with RelationGetSmgr().
This is a back-patch of the v15-era commit f10f0ae42 into older
supported branches. The idea is to design out bugs in which an
ill-timed relcache flush clears rel->rd_smgr partway through
some code sequence that wasn't expecting that. We had another
report today of a corner case that reliably crashes v14 under
debug_discard_caches (nee CLOBBER_CACHE_ALWAYS), and therefore
would crash once in a blue moon in the field. We're unlikely
to get rid of all such code paths unless we adopt the more
rigorous coding rules instituted by f10f0ae42. Therefore,
even though this is a bit invasive, it's time to back-patch.
Some comfort can be taken in the fact that f10f0ae42 has been
in v15 for 16 months without problems.
I left the RelationOpenSmgr macro present in the back branches,
even though no core code should use it anymore, in order to not break
third-party extensions in minor releases. Such extensions might opt
to start using RelationGetSmgr instead, to reduce their code
differential between v15 and earlier branches. This carries a hazard
of failing to compile against headers from existing minor releases.
However, once compiled the extension should work fine even with such
releases, because RelationGetSmgr is a "static inline" function so
it creates no link-time dependency. So depending on distribution
practices, that might be an OK tradeoff.
Per report from Spyridon Dimitrios Agathos. Original patch
by Amul Sul.
Amit Kapila [Mon, 14 Nov 2022 04:22:06 +0000 (09:52 +0530)]
Fix cleanup lock acquisition in SPLIT_ALLOCATE_PAGE replay.
During XLOG_HASH_SPLIT_ALLOCATE_PAGE replay, we were checking for a
cleanup lock on the new bucket page after acquiring an exclusive lock on
it and raising a PANIC error on failure. However, it is quite possible
that checkpointer can acquire the pin on the same page before acquiring a
lock on it, and then the replay will lead to an error. So instead, directly
acquire the cleanup lock on the new bucket page during
XLOG_HASH_SPLIT_ALLOCATE_PAGE replay operation.
Reported-by: Andres Freund
Author: Robert Haas Reviewed-By: Amit Kapila, Andres Freund, Vignesh C
Backpatch-through: 11
Discussion: https://postgr.es/m/20220810022617.fvjkjiauaykwrbse@awork3.anarazel.de
Jeff Davis [Thu, 10 Nov 2022 22:46:30 +0000 (14:46 -0800)]
Fix theoretical torn page hazard.
The original report was concerned with a possible inconsistency
between the heap and the visibility map, which I was unable to
confirm. The concern has been retracted.
However, there did seem to be a torn page hazard when using
checksums. By not setting the heap page LSN during redo, the
protections of minRecoveryPoint were bypassed. Fixed, along with a
misleading comment.
It may have been impossible to hit this problem in practice, because
it would require a page tear between the checksum and the flags, so I
am marking this as a theoretical risk. But, as discussed, it did
violate expectations about the page LSN, so it may have other
consequences.
Backpatch to all supported versions.
Reported-by: Konstantin Knizhnik Reviewed-by: Konstantin Knizhnik
Discussion: https://postgr.es/m/fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru
Backpatch-through: 11
Tom Lane [Thu, 10 Nov 2022 22:24:26 +0000 (17:24 -0500)]
Fix alter_table.sql test case to test what it claims to.
The stanza "SET STORAGE may need to add a TOAST table" does not
test what it's supposed to, and hasn't done so since we added
the ability to store constant column default values as metadata.
We need to use a non-constant default to get the expected table
rewrite to actually happen.
Fix that, and add the missing checks that would have exposed the
problem to begin with.
Noted while reviewing a patch that made changes in this test case.
Back-patch to v11 where the problem came in.
Tom Lane [Wed, 9 Nov 2022 16:08:52 +0000 (11:08 -0500)]
Doc: add comments about PreventInTransactionBlock/IsInTransactionBlock.
Add a little to the header comments for these functions to make it
clearer what guarantees about commit behavior are provided to callers.
(See commit f92944137 for context.)
Although this is only a comment change, it's really documentation
aimed at authors of extensions, so it seems appropriate to back-patch.
Yugo Nagata and Tom Lane, per further discussion of bug #17434.
Michael Paquier [Wed, 9 Nov 2022 00:39:57 +0000 (09:39 +0900)]
Fix compilation warnings with libselinux 3.1 in contrib/sepgsql/
Upstream SELinux has recently marked security_context_t as officially
deprecated, causing warnings with -Wdeprecated-declarations. This is
considered as legacy code for some time now by upstream as
security_context_t got removed from most of the code tree during the
development of 2.3 back in 2014.
This removes all the references to security_context_t in sepgsql/ to be
consistent with SELinux, fixing the warnings. Note that this does not
impact the minimum version of libselinux supported.
This has been applied first as 1f32136 for 14~, but no other branches
got the call. This is in line with the recent project policy to have no
warnings in branches where builds should still be supported (9.2~ as of
today). Per discussion with Tom Lane and Álvaro Herrera.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20200813012735.GC11663@paquier.xyz
Discussion: https://postgr.es/m/20221103181028.raqta27jcuypor4l@alvherre.pgsql
Backpatch-through: 9.2
Tom Lane [Tue, 8 Nov 2022 23:25:03 +0000 (18:25 -0500)]
Doc: improve tutorial section about grouped aggregates.
Commit fede15417 introduced FILTER by jamming it into the existing
example introducing HAVING, which seems pedagogically poor to me;
and it added no information about what the keyword actually does.
Not to mention that the claimed output didn't match the sample
data being used in this running example.
Revert that and instead make an independent example using FILTER.
To help drive home the point that it's a per-aggregate filter,
we need to use two aggregates not just one; for consistency
expand all the examples in this segment to do that.
Also adjust the example using WHERE ... LIKE so that it'd produce
nonempty output with this sample data, and show that output.
Back-patch, as the previous patch was. (Sadly, v10 is now out
of scope.)
Etsuro Fujita [Fri, 4 Nov 2022 10:15:08 +0000 (19:15 +0900)]
Correct error message for row-level triggers with transition tables on partitioned tables.
"Triggers on partitioned tables cannot have transition tables." is
incorrect as we allow statement-level triggers on partitioned tables to
have transition tables.
This has been wrong since commit 86f575948; back-patch to v11 where that
commit came in.
Tom Lane [Thu, 3 Nov 2022 16:01:57 +0000 (12:01 -0400)]
Avoid crash after function syntax error in a replication worker.
If a syntax error occurred in a SQL-language or PL/pgSQL-language
CREATE FUNCTION or DO command executed in a logical replication worker,
we'd suffer a null pointer dereference or assertion failure. That
seems like a rather contrived case, but nonetheless worth fixing.
The cause is that function_parse_error_transpose assumes it must be
executing within the context of a Portal, but logical/worker.c
doesn't create a Portal since it's not running the standard executor.
We can just back off the hard Assert check and make it fail gracefully
if there's not an ActivePortal. (I have a feeling that the aggressive
check here was my fault originally, probably because I wasn't sure if
the case would always hold and wanted to find out. Well, now we know.)
The hazard seems to exist in all branches that have logical replication,
so back-patch to v10.
Maxim Orlov, Anton Melnikov, Masahiko Sawada, Tom Lane
Tom Lane [Wed, 2 Nov 2022 21:37:26 +0000 (17:37 -0400)]
Allow use of __sync_lock_test_and_set for spinlocks on any machine.
If we have no special-case code in s_lock.h for the current platform,
but the compiler has __sync_lock_test_and_set, use that instead of
failing. It's unlikely that anybody's __sync_lock_test_and_set
would be so awful as to be worse than our semaphore-based fallback,
but if it is, they can (continue to) use --disable-spinlocks.
This allows removal of the RISC-V special case installed by commit c32fcac56, which generated exactly the same code but only on that
platform. Usefully, the RISC-V buildfarm animals should now test
at least the int variant of this patch.
I've manually tested both variants on ARM by dint of removing the
ARM-specific stanza. We don't want to drop that, because it already
has some special knowledge and is likely to grow more over time.
Likewise, this is not meant to preclude installing special cases
for other arches if that proves worthwhile.
Per discussion of a request to install the same code for loongarch64.
Like the previous patch, we might as well back-patch to supported
branches.
Tom Lane [Tue, 1 Nov 2022 21:08:28 +0000 (17:08 -0400)]
Update time zone data files to tzdata release 2022f.
DST law changes in Chile, Fiji, Iran, Jordan, Mexico, Palestine,
and Syria. Historical corrections for Chile, Crimea, Iran, and
Mexico.
Also, the Europe/Kiev zone has been renamed to Europe/Kyiv
(retaining the old name as a link).
The following zones have been merged into nearby, more-populous zones
whose clocks have agreed since 1970: Antarctica/Vostok, Asia/Brunei,
Asia/Kuala_Lumpur, Atlantic/Reykjavik, Europe/Amsterdam,
Europe/Copenhagen, Europe/Luxembourg, Europe/Monaco, Europe/Oslo,
Europe/Stockholm, Indian/Christmas, Indian/Cocos, Indian/Kerguelen,
Indian/Mahe, Indian/Reunion, Pacific/Chuuk, Pacific/Funafuti,
Pacific/Majuro, Pacific/Pohnpei, Pacific/Wake and Pacific/Wallis.
(This indirectly affects zones that were already links to one of
these: Arctic/Longyearbyen, Atlantic/Jan_Mayen, Iceland,
Pacific/Ponape, Pacific/Truk, and Pacific/Yap.) America/Nipigon,
America/Rainy_River, America/Thunder_Bay, Europe/Uzhgorod, and
Europe/Zaporozhye were also merged into nearby zones after discovering
that their claimed post-1970 differences from those zones seem to have
been errors.
While the IANA crew have been working on merging zones that have no
post-1970 differences for some time, this batch of changes affects
some zones that are significantly more populous than those merged
in the past, notably parts of Europe. The loss of pre-1970 timezone
history for those zones may be troublesome for applications
expecting consistency of timestamptz display. As an example, the
stored value '1944-06-01 12:00 UTC' would previously display as
'1944-06-01 13:00:00+01' if the Europe/Stockholm zone is selected,
but now it will read out as '1944-06-01 14:00:00+02'.
There exists a "packrat" option that will build the timezone data
files with this old data preserved, but the problem is that it also
resurrects a bunch of other, far less well-attested data; so much so
that actually more zones' contents change from 2022a with that option
than without it. I have chosen not to do that here, for that reason
and because it appears that no major OS distributions are using the
"packrat" option, so that doing so would cause Postgres' behavior
to diverge significantly depending on whether it was built with
--with-system-tzdata. However, for anyone for whom these changes pose
significant problems, there is a solution: build a set of timezone
files with the "packrat" option and use those with Postgres.
Tom Lane [Tue, 1 Nov 2022 16:48:01 +0000 (12:48 -0400)]
pg_stat_statements: fetch stmt location/length before it disappears.
When executing a utility statement, we must fetch everything
we need out of the PlannedStmt data structure before calling
standard_ProcessUtility. In certain cases (possibly only ROLLBACK
in extended query protocol), that data structure will get freed
during command execution. The situation is probably often harmless
in production builds, but in debug builds we intentionally overwrite
the freed memory with garbage, leading to picking up garbage values
of statement location and length, typically causing an assertion
failure later in pg_stat_statements. In non-debug builds, if
something did go wrong it would likely lead to storing garbage
for the query string.
Report and fix by zhaoqigui (with cosmetic adjustments by me).
It's an old problem, so back-patch to all supported versions.
Michael Paquier [Wed, 26 Oct 2022 00:41:28 +0000 (09:41 +0900)]
Fix ordering issue with WAL operations in GIN fast insert path
Contrary to what is documented in src/backend/access/transam/README,
ginHeapTupleFastInsert() had a few ordering issues with the way it does
its WAL operations when inserting items in its fast path.
First, when using a separate list, XLogBeginInsert() was being always
called before START_CRIT_SECTION(), and in this case a second thing was
wrong when merging lists, as an exclusive lock was taken on the tail
page *before* calling XLogBeginInsert(). Finally, when inserting items
into a tail page, the order of XLogBeginInsert() and
START_CRIT_SECTION() was reversed. This commit addresses all these
issues by moving the calls of XLogBeginInsert() after all the pages
logged are locked and pinned, within a critical section.
This has been applied first only on HEAD as of 56b6625, but as per
discussion with Tom Lane and Álvaro Herrera, a backpatch is preferred to
keep all the branches consistent and to respect the transam's README
where we can.
Author: Matthias van de Meent, Zhang Mingli
Discussion: https://postgr.es/m/CAEze2WhL8uLMqynnnCu1LAPwxD5RKEo0nHV+eXGg_N6ELU88HQ@mail.gmail.com
Backpatch-through: 10
Specifically, when pg_basebackup is invoked with -Tx=y, don't error
out if x could plausibly be an absolute path either on Windows or on
non-Windows systems. We don't know whether the remote system is
running the same OS as the local system, so it's not appropriate to
assume that our local rule about absolute pathnames is the same as
the rule on the remote system.
Patch by me, reviewed by Tom Lane, Andrew Dunstan, and
Davinder Singh.
Amit Kapila [Fri, 21 Oct 2022 06:38:14 +0000 (12:08 +0530)]
Add CHECK_FOR_INTERRUPTS while restoring changes during decoding.
Previously in commit 42681dffaf, we added CFI during decoding changes but
missed another similar case that can happen while restoring changes
spilled to disk back into memory in a loop.
Reported-by: Robert Haas
Author: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CA+TgmoaLObg0QbstbC8ykDwOdD1bDkr4AbPpB=0DPgA2JW0mFg@mail.gmail.com
Amit Kapila [Fri, 21 Oct 2022 03:52:20 +0000 (09:22 +0530)]
Fix executing invalidation messages generated by subtransactions during decoding.
This problem has been introduced by commit 272248a0c1 where we started
assigning the subtransactions to the top-level transaction when we mark
both the top-level transaction and its subtransactions as containing
catalog changes. After we assign subtransactions to the top-level
transaction, we were not allowed to execute any invalidations associated
with it when we decide to skip the transaction.
The reason to assign the subtransactions to the top-level transaction was
to avoid the assertion failure in AssertTXNLsnOrder() as they have the
same LSN when we sometimes start accumulating transaction changes for
partial transactions after the restart. Now that with commit 64ff0fe4e8,
we skip this assertion check until we reach the LSN at which we start
decoding the contents of the transaction, so, there is no reason for such
an assignment anymore.
The assignment change was introduced in 15 and prior versions but this bug
doesn't exist in branches prior to 14 since we don't add invalidation
messages to subtransactions. We decided to backpatch through 11 for
consistency but not for 10 since its final release is near.
Amit Kapila [Thu, 20 Oct 2022 03:37:04 +0000 (09:07 +0530)]
Fix assertion failures while processing NEW_CID record in logical decoding.
When the logical decoding restarts from NEW_CID, since there is no
association between the top transaction and its subtransaction, both are
created as top transactions and have the same LSN. This caused the
assertion failure in AssertTXNLsnOrder().
This patch skips the assertion check until we reach the LSN at which we
start decoding the contents of the transaction, specifically
start_decoding_at LSN in SnapBuild. This is okay because we don't
guarantee to make the association between top transaction and
subtransaction until we try to decode the actual contents of transaction.
The ordering of the records prior to the start_decoding_at LSN should have
been checked before the restart.
The other assertion failure is due to the reason that we forgot to track
that we have considered top-level transaction id in the list of catalog
changing transactions that were committed when one of its subtransactions
is marked as containing catalog change.
Thomas Munro [Wed, 19 Oct 2022 09:38:58 +0000 (22:38 +1300)]
Track LLVM 15 changes.
Per https://llvm.org/docs/OpaquePointers.html, support for non-opaque
pointers still exists and we can request that on our context. We have
until LLVM 16 to move to opaque pointers, a much larger change.
Back-patch to 11, where LLVM support arrived.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAMHz58Sf_xncdyqsekoVsNeKcruKootLtVH6cYXVhhUR1oKPCg%40mail.gmail.com
Tom Lane [Mon, 17 Oct 2022 16:14:39 +0000 (12:14 -0400)]
Reject non-ON-SELECT rules that are named "_RETURN".
DefineQueryRewrite() has long required that ON SELECT rules be named
"_RETURN". But we overlooked the converse case: we should forbid
non-ON-SELECT rules that are named "_RETURN". In particular this
prevents using CREATE OR REPLACE RULE to overwrite a view's _RETURN
rule with some other kind of rule, thereby breaking the view.
Per bug #17646 from Kui Liu. Back-patch to all supported branches.
Tom Lane [Sun, 16 Oct 2022 19:27:04 +0000 (15:27 -0400)]
Rename parser token REF to REF_P to avoid a symbol conflict.
In the latest version of Apple's macOS SDK, <sys/socket.h>
fails to compile if "REF" is #define'd as something.
Apple may or may not agree that this is a bug, and even if
they do accept the bug report I filed, they probably won't
fix it very quickly. In the meantime, our back branches will all
fail to compile gram.y. v15 and HEAD currently escape the problem
thanks to the refactoring done in 98e93a1fc, but that's purely
accidental. Moreover, since that patch removed a widely-visible
inclusion of <netdb.h>, back-patching it seems too likely to break
third-party code.
Instead, change the token's code name to REF_P, following our usual
convention for naming parser tokens that are likely to have symbol
conflicts. The effects of that should be localized to the grammar
and immediately surrounding files, so it seems like a safer answer.
Per project policy that we want to keep recently-out-of-support
branches buildable on modern systems, back-patch all the way to 9.2.
Alvaro Herrera [Thu, 13 Oct 2022 11:36:14 +0000 (13:36 +0200)]
Fix typo in CREATE PUBLICATION reference page
While at it, simplify wording a bit.
Author: Takamichi Osumi <osumi.takamichi@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/TYCPR01MB8373F93F5D094A2BE648990DED259@TYCPR01MB8373.jpnprd01.prod.outlook.com
Tom Lane [Wed, 12 Oct 2022 14:51:11 +0000 (10:51 -0400)]
Doc: improve recommended systemd unit file.
Add
After=network-online.target
Wants=network-online.target
to the suggested unit file for starting a Postgres server.
This delays startup until the network interfaces have been
configured; without that, any attempt to bind to a specific
IP address will fail.
If listen_addresses is set to "localhost" or "*", it might be
possible to get away with the less restrictive "network.target",
but I don't think we need to get into such detail here.
Tom Lane [Tue, 11 Oct 2022 22:54:31 +0000 (18:54 -0400)]
Harden pmsignal.c against clobbered shared memory.
The postmaster is not supposed to do anything that depends
fundamentally on shared memory contents, because that creates
the risk that a backend crash that trashes shared memory will
take the postmaster down with it, preventing automatic recovery.
In commit 969d7cd43 I lost sight of this principle and coded
AssignPostmasterChildSlot() in such a way that it could fail
or even crash if the shared PMSignalState structure became
corrupted. Remarkably, we've not seen field reports of such
crashes; but I managed to induce one while testing the recent
changes around palloc chunk headers.
To fix, make a semi-duplicative state array inside the postmaster
so that we need consult only local state while choosing a "child
slot" for a new backend. Ensure that other postmaster-executed
routines in pmsignal.c don't have critical dependencies on the
shared state, either. Corruption of PMSignalState might now
lead ReleasePostmasterChildSlot() to conclude that backend X
failed, when actually backend Y was the one that trashed things.
But that doesn't matter, because we'll force a cluster-wide reset
regardless.
Back-patch to all supported branches, since this is an old bug.
Tom Lane [Tue, 11 Oct 2022 22:24:15 +0000 (18:24 -0400)]
Yet further fixes for multi-row VALUES lists for updatable views.
DEFAULT markers appearing in an INSERT on an updatable view
could be mis-processed if they were in a multi-row VALUES clause.
This would lead to strange errors such as "cache lookup failed
for type NNNN", or in older branches even to crashes.
The cause is that commit 41531e42d tried to re-use rewriteValuesRTE()
to remove any SetToDefault nodes (that hadn't previously been replaced
by the view's own default values) appearing in "product" queries,
that is DO ALSO queries. That's fundamentally wrong because the
DO ALSO queries might not even be INSERTs; and even if they are,
their targetlists don't necessarily match the view's column list,
so that almost all the logic in rewriteValuesRTE() is inapplicable.
What we want is a narrow focus on replacing any such nodes with NULL
constants. (That is, in this context we are interpreting the defaults
as being strictly those of the view itself; and we already replaced
any that aren't NULL.) We could add still more !force_nulls tests
to further lobotomize rewriteValuesRTE(); but it seems cleaner to
split out this case to a new function, restoring rewriteValuesRTE()
to the charter it had before.
Per bug #17633 from jiye_sw. Patch by me, but thanks to
Richard Guo and Japin Li for initial investigation.
Back-patch to all supported branches, as the previous fix was.
Alvaro Herrera [Tue, 11 Oct 2022 07:56:13 +0000 (09:56 +0200)]
Ensure all perl test modules are installed
PostgreSQL::Test::Cluster and ::Utils were not being installed. This is
very hard to notice, as it only seems to affect external modules that
want to run tests from 15 back in earlier versions. Oversight in b235d41d9646.
This applies only to branches 14 and back, because 15 had already been
made correct in commit b3b4d8e68ae8.
The compression parameter to PQsslAttribute has never returned the
compression method used, it has always returned "on" or "off since
it was added in commit 91fa7b4719ac. Backpatch through v10.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/B9EC60EC-F665-47E8-A221-398C76E382C9@yesql.se
Backpatch-through: v10
This prevents marking the argument string for translation for gettext,
and it also prevents the given string (which is already translated) from
being translated at runtime.
Also, mark the strings used as arguments to check_rolespec_name for
translation.
Backpatch all the way back as appropriate. None of this is caught by
any tests (necessarily so), so I verified it manually.
Tom Lane [Wed, 21 Sep 2022 17:52:38 +0000 (13:52 -0400)]
Suppress more variable-set-but-not-used warnings from clang 15.
Mop up assorted set-but-not-used warnings in the back branches.
This includes back-patching relevant fixes from commit 152c9f7b8
the rest of the way, but there are also several cases that did not
appear in HEAD. Some of those we'd fixed in a retail way but not
back-patched, and others I think just got rewritten out of existence
during nearby refactoring.
While here, also back-patch b1980f6d0 (PL/Tcl: Fix compiler warnings
with Tcl 8.6) into 9.2, so that that branch compiles warning-free
with modern Tcl.
Per project policy, this is a candidate for back-patching into
out-of-support branches: it suppresses annoying compiler warnings
but changes no behavior. Hence, back-patch all the way to 9.2.
Tom Lane [Tue, 20 Sep 2022 22:59:53 +0000 (18:59 -0400)]
Disable -Wdeprecated-non-prototype in the back branches.
There doesn't seem to be any good ABI-preserving way to silence
clang 15's -Wdeprecated-non-prototype warnings about our tree-walk
APIs. While we've fixed it properly in HEAD, the only way to not
see hundreds of these in the back branches is to disable the
warnings. We're not going to do anything about them, so we might
as well disable them.
I noticed that we also get some of these warnings about fmgr.c's
support for V0 function call convention, in branches before v10
where we removed that. That's another area we aren't going to
change, so turning off the warning seems fine for that too.
Per project policy, this is a candidate for back-patching into
out-of-support branches: it suppresses annoying compiler warnings
but changes no behavior. Hence, back-patch all the way to 9.2.
Tom Lane [Tue, 20 Sep 2022 16:04:37 +0000 (12:04 -0400)]
Suppress variable-set-but-not-used warnings from clang 15.
clang 15+ will issue a set-but-not-used warning when the only
use of a variable is in autoincrements (e.g., "foo++;").
That's perfectly sensible, but it detects a few more cases that
we'd not noticed before. Silence the warnings with our usual
methods, such as PG_USED_FOR_ASSERTS_ONLY, or in one case by
actually removing a useless variable.
One thing that we can't nicely get rid of is that with %pure-parser,
Bison emits "yynerrs" as a local variable that falls foul of this
warning. To silence those, I inserted "(void) yynerrs;" in the
top-level productions of affected grammars.
Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
annoying compiler warnings but changes no behavior. Hence,
back-patch to 9.5, which is as far as these patches go without
issues. (A preliminary check shows that the prior branches
need some other set-but-not-used cleanups too, so I'll leave
them for another day.)
Tom Lane [Mon, 19 Sep 2022 16:16:02 +0000 (12:16 -0400)]
Future-proof the recursion inside ExecShutdownNode().
The API contract for planstate_tree_walker() callbacks is that they
take a PlanState pointer and a context pointer. Somebody figured
they could save a couple lines of code by ignoring that, and passing
ExecShutdownNode itself as the walker even though it has but one
argument. Somewhat remarkably, we've gotten away with that so far.
However, it seems clear that the upcoming C2x standard means to
forbid such cases, and compilers that actively break such code
likely won't be far behind. So spend the extra few lines of code
to do it honestly with a separate walker function.
In HEAD, we might as well go further and remove ExecShutdownNode's
useless return value. I left that as-is in back branches though,
to forestall complaints about ABI breakage.
Back-patch, with the thought that this might become of practical
importance before our stable branches are all out of service.
It doesn't seem to be fixing any live bug on any currently known
platform, however.
Peter Geoghegan [Sat, 17 Sep 2022 23:54:08 +0000 (16:54 -0700)]
Make check_usermap() parameter names consistent.
The function has a bool argument named "case_insensitive", but that was
spelled "case_sensitive" in the declaration. Make them consistent now
to avoid confusion in the future.
Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
Backpatch: 10-
Tom Lane [Fri, 16 Sep 2022 17:23:01 +0000 (13:23 -0400)]
Improve plpgsql's ability to handle arguments declared as RECORD.
Treat arguments declared as RECORD as if that were a polymorphic type
(which it is, sort of), in that we substitute the actual argument type
while forming the function cache lookup key. This allows the specific
composite type to be known in some cases where it was not before,
at the cost of making a separate function cache entry for each named
composite type that's passed to the function during a session. The
particular symptom discussed in bug #17610 could be solved in other
more-efficient ways, but only at the cost of considerable development
work, and there are other cases where we'd still fail without this.
Per bug #17610 from Martin Jurča. Back-patch to v11 where we first
allowed plpgsql functions to be declared as taking type RECORD.
Tom Lane [Thu, 15 Sep 2022 19:21:35 +0000 (15:21 -0400)]
In back branches, fix conditions for pullup of FROM-less subqueries.
In branches before commit 4be058fe9, we have to prevent flattening
of subqueries with empty jointrees if the subqueries' output
columns might need to be wrapped in PlaceHolderVars. That's
because the empty jointree would result in empty phrels for the
PlaceHolderVars, meaning we'd be unable to figure out where to
evaluate them. However, we've failed to keep is_simple_subquery's
check for this hazard in sync with what pull_up_simple_subquery
actually does. The former is checking "lowest_outer_join != NULL",
whereas the conditions pull_up_simple_subquery actually uses are
if (lowest_nulling_outer_join != NULL)
if (containing_appendrel != NULL)
if (parse->groupingSets)
So the outer-join test is overly conservative, while we missed out
checking for appendrels and groupingSets. The appendrel omission
is harmless, because in that case we also check is_safe_append_member
which will also reject such subqueries. The groupingSets omission
is a bug though, leading to assertion failures or planner errors
such as "variable not found in subplan target lists".
is_simple_subquery has access to none of the three variables used
in the correct tests, but its callers do, so I chose to have them
pass down a bool corresponding to the OR of these conditions.
(The need for duplicative conditions would be a maintenance
hazard in actively-developed code, but I'm not too concerned
about it in branches that have only ~ 1 year to live.)
Per bug #17614 from Wei Wei. Patch v10 and v11 only, since we have a
better answer to this in v12 and later (indeed, the faulty check in
is_simple_subquery is gone entirely).
postgres_fdw: Avoid 'variable not found in subplan target list' error.
The tlist of the EvalPlanQual outer plan for a ForeignScan node is
adjusted to produce a tuple whose descriptor matches the scan tuple slot
for the ForeignScan node. But in the case where the outer plan contains
an extra Sort node, if the new tlist contained columns required only for
evaluating PlaceHolderVars or columns required only for evaluating local
conditions, this would cause setrefs.c to fail with the error.
The cause of this is that when creating the outer plan by injecting the
Sort node into an alternative local join plan that could emit such extra
columns as well, we fail to arrange for the outer plan to propagate them
up through the Sort node, causing setrefs.c to fail to match up them in
the new tlist to what is available from the outer plan. Repair.
Per report from Alexander Pyhalov.
Richard Guo and Etsuro Fujita, reviewed by Alexander Pyhalov and Tom Lane.
Backpatch to all supported versions.
Michael Paquier [Wed, 14 Sep 2022 05:52:35 +0000 (14:52 +0900)]
Fix incorrect value for "strategy" with deflateParams() in walmethods.c
The zlib documentation mentions the values supported for the compression
strategy, but this code has been using a hardcoded value of 0 rather
than Z_DEFAULT_STRATEGY. This commit adjusts the code to use
Z_DEFAULT_STRATEGY.
Backpatch down to where this code has been added to ease the backport of
any future patch touching this area.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 10
Peter Eisentraut [Wed, 14 Sep 2022 04:04:24 +0000 (06:04 +0200)]
Expand palloc/pg_malloc API for more type safety
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.
Specifically, this adds palloc_object(), palloc_array(), and
repalloc_array(), which take the type name of the object to be
allocated as its first argument and cast the return as a pointer to
that type. There are also palloc0_object() and palloc0_array()
variants for initializing with zero, and pg_malloc_*() variants of all
of the above.
Inspired by the talloc library.
This is backpatched from master so that future backpatchable code can
make use of these APIs. This patch by itself does not contain any
users of these APIs.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com