Amit Kapila [Thu, 11 Jun 2020 08:40:43 +0000 (14:10 +0530)]
Fix typos.
Reported-by: John Naylor
Author: John Naylor
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CACPNZCtRuvs6G+EYqejhVJgBq2AKeZdXRVJsbX4syhO9gn5SNQ@mail.gmail.com
Andres Freund [Tue, 9 Jun 2020 02:52:19 +0000 (19:52 -0700)]
Avoid need for valgrind suppressions for pg_atomic_init_u64 on some platforms.
Previously we used pg_atomic_write_64_impl inside
pg_atomic_init_u64. That works correctly, but on platforms without
64bit single copy atomicity it could trigger spurious valgrind errors
about uninitialized memory, because we use compare_and_swap for atomic
writes on such platforms.
I previously suppressed one instance of this problem (6c878edc1df),
but as Tom reports that wasn't enough. As the atomic variable cannot
yet be concurrently accessible during initialization, it seems better
to have pg_atomic_init_64_impl set the value directly.
Change pg_atomic_init_u32_impl for symmetry.
Reported-By: Tom Lane
Author: Andres Freund
Discussion: https://postgr.es/m/1714601.1591503815@sss.pgh.pa.us
Backpatch: 9.5-
Thomas Munro [Mon, 8 Jun 2020 01:57:24 +0000 (13:57 +1200)]
Fix locking bugs that could corrupt pg_control.
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.
Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().
Noah Misch [Sun, 7 Jun 2020 23:27:13 +0000 (16:27 -0700)]
MSVC: Avoid warning when testing a TAP suite without PROVE_FLAGS.
Commit 7be5d8df1f74b78620167d3abf32ee607e728919 surfaced the logic
error, which had no functional implications, by adding "use warnings".
The buildfarm always customizes PROVE_FLAGS, so the warning did not
appear there. Back-patch to 9.5 (all supported versions).
Michael Paquier [Mon, 1 Jun 2020 05:41:46 +0000 (14:41 +0900)]
Fix use-after-release mistake in currtid() and currtid2() for views
This issue has been present since the introduction of this code as of a3519a2 from 2002, and has been found by buildfarm member prion that
uses RELCACHE_FORCE_RELEASE via the tests introduced recently in e786be5.
Joe Conway [Thu, 28 May 2020 17:45:15 +0000 (13:45 -0400)]
Initialize dblink remoteConn struct in all cases
Two of the members of rconn were left uninitialized. When
dblink_open() is called without an outer transaction it
handles the initialization for us, but with an outer
transaction it does not. Arrange for initialization
in all cases. Backpatch to all supported versions.
Joe Conway [Thu, 28 May 2020 17:17:28 +0000 (13:17 -0400)]
Add CHECK_FOR_INTERRUPTS() to the repeat() function
The repeat() function loops for potentially a long time without
ever checking for interrupts. This prevents, for example, a query
cancel from interrupting until the work is all done. Fix by
inserting a CHECK_FOR_INTERRUPTS() into the loop.
Noah Misch [Mon, 25 May 2020 23:21:04 +0000 (16:21 -0700)]
Add a temp-install prerequisite to top-level "check-tests".
The target failed, tested $PATH binaries, or tested a stale temporary
installation. Commit c66b438db62748000700c9b90b585e756dd54141 missed
this. Back-patch to 9.5 (all supported versions).
Michael Paquier [Thu, 21 May 2020 05:41:43 +0000 (14:41 +0900)]
Fix MSVC installations with multiple "configure" files detected
When installing binaries and libraries using the MSVC installation
routines, the operation gets done after moving to the root folder, whose
location is detected by checking if "configure" exists two times in a
row. So, calling the installation script from src/tools/msvc/ with an
extra "configure" file four levels up the root path of the code tree
causes the execution to go further up, leading to a failure in finding
the builds. This commit fixes the issue by moving to the root folder of
the code tree only once, when necessary.
Author: Arnold Müller Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/16343-f638f67e7e52b86c@postgresql.org
Backpatch-through: 9.5
Tom Lane [Fri, 15 May 2020 23:05:40 +0000 (19:05 -0400)]
Fix bogus initialization of replication origin shared memory state.
The previous coding zeroed out offsetof(ReplicationStateCtl, states)
more bytes than it was entitled to, as a consequence of starting the
zeroing from the wrong pointer (or, if you prefer, using the wrong
calculation of how much to zero).
It's unsurprising that this has not caused any reported problems,
since it can be expected that the newly-allocated block is at the end
of what we've used in shared memory, and we always make the shmem
block substantially bigger than minimally necessary. Nonetheless,
this is wrong and it could bite us someday; plus it's a dangerous
model for somebody to copy.
This dates back to the introduction of this code (commit 5aa235042),
so back-patch to all supported branches.
Alvaro Herrera [Fri, 15 May 2020 20:50:34 +0000 (16:50 -0400)]
Avoid killing btree items that are already dead
_bt_killitems marks btree items dead when a scan leaves the page where
they live, but it does so with only share lock (to improve concurrency).
This was historicall okay, since killing a dead item has no
consequences. However, with the advent of data checksums and
wal_log_hints, this action incurs a WAL full-page-image record of the
page. Multiple concurrent processes would write the same page several
times, leading to WAL bloat. The probability of this happening can be
reduced by only killing items if they're not already dead, so change the
code to do that.
The problem could eliminated completely by having _bt_killitems upgrade
to exclusive lock upon seeing a killable item, but that would reduce
concurrency so it's considered a cure worse than the disease.
Backpatch all the way back to 9.5, since wal_log_hints was introduced in
9.4.
Amit Kapila [Thu, 14 May 2020 04:25:04 +0000 (09:55 +0530)]
Fix the MSVC build for versions 2015 and later.
Visual Studio 2015 and later versions should still be able to do the same
as Visual Studio 2012, but the declaration of locale_name is missing in
_locale_t, causing the code compilation to fail, hence this falls back
instead on to enumerating all system locales by using EnumSystemLocalesEx
to find the required locale name. If the input argument is in Unix-style
then we can get ISO Locale name directly by using GetLocaleInfoEx() with
LCType as LOCALE_SNAME.
In passing, change the documentation references of the now obsolete links.
Note that this problem occurs only with NLS enabled builds.
Author: Juan José Santamaría Flecha, Davinder Singh and Amit Kapila Reviewed-by: Ranier Vilela and Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CAHzhFSFoJEWezR96um4-rg5W6m2Rj9Ud2CNZvV4NWc9tXV7aXQ@mail.gmail.com
Noah Misch [Thu, 14 May 2020 03:42:09 +0000 (20:42 -0700)]
Fix pg_recvlogical avoidance of superfluous Standby Status Update.
The defect suppressed a Standby Status Update message when bytes flushed
to disk had changed but bytes received had not changed. If
pg_recvlogical then exited with no intervening Standby Status Update,
the next pg_recvlogical repeated already-flushed records. The defect
could also cause superfluous messages, which are functionally harmless.
Back-patch to 9.5 (all supported versions).
Prevent archive recovery from scanning non-existent WAL files.
Previously when there were multiple timelines listed in the history file
of the recovery target timeline, archive recovery searched all of them,
starting from the newest timeline to the oldest one, to find the segment
to read. That is, archive recovery had to continuously fail scanning
the segment until it reached the timeline that the segment belonged to.
These scans for non-existent segment could be harmful on the recovery
performance especially when archival area was located on the remote
storage and each scan could take a long time.
To address the issue, this commit changes archive recovery so that
it skips scanning the timeline that the segment to read doesn't belong to.
Per discussion, back-patch to all supported versions.
Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin
Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org
Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
Alvaro Herrera [Wed, 6 May 2020 16:29:41 +0000 (12:29 -0400)]
Heed lock protocol in DROP OWNED BY
We were acquiring object locks then deleting objects one by one, instead
of acquiring all object locks first, ignoring those that did not exist,
and then deleting all objects together. The latter is the correct
protocol to use, and what this commits changes to code to do. Failing
to follow that leads to "cache lookup failed for relation XYZ" error
reports when DROP OWNED runs concurrently with other DDL -- for example,
a session termination that removes some temp tables.
Author: Álvaro Herrera Reported-by: Mithun Chicklore Yogendra (Mithun CY) Reviewed-by: Ahsan Hadi, Tom Lane
Discussion: https://postgr.es/m/CADq3xVZTbzK4ZLKq+dn_vB4QafXXbmMgDP3trY-GuLnib2Ai1w@mail.gmail.com
Michael Paquier [Wed, 6 May 2020 12:08:42 +0000 (21:08 +0900)]
Handle spaces for Python install location in MSVC scripts
Attempting to use an installation path of Python that includes spaces
caused the MSVC builds to fail. This fixes the issue by using the same
quoting method as ad7595b for OpenSSL.
Author: Victor Wagner
Discussion: https://postgr.es/m/20200430150608.6dc6b8c4@antares.wagner.home
Backpatch-through: 9.5
Tom Lane [Fri, 1 May 2020 21:28:01 +0000 (17:28 -0400)]
Get rid of trailing semicolons in C macro definitions.
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead. (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".) Thus, if we have a semi inside
the macro, the compiler sees "something;;". Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites. In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.
The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old. (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)
Andrew Gierth [Sat, 25 Apr 2020 04:10:05 +0000 (05:10 +0100)]
Fix error case for CREATE ROLE ... IN ROLE.
CreateRole() was passing a Value node, not a RoleSpec node, for the
newly-created role name when adding the role as a member of existing
roles for the IN ROLE syntax.
This mistake went unnoticed because the node in question is used only
for error messages and is not accessed on non-error paths.
In older pg versions (such as 9.5 where this was found), this results
in an "unexpected node type" error in place of the real error. That
node type check was removed at some point, after which the code would
accidentally fail to fail on 64-bit platforms (on which accessing the
Value node as if it were a RoleSpec would be mostly harmless) or give
an "unexpected role type" error on 32-bit platforms.
Fix the code to pass the correct node type, and add an lfirst_node
assertion just in case.
Per report on irc from user m1chelangelo.
Backpatch all the way, because this error has been around for a long
time.
Tom Lane [Fri, 24 Apr 2020 21:21:44 +0000 (17:21 -0400)]
Improve placement of "display name" comment in win32_tzmap[] entries.
Sticking this comment at the end of the last line was a bad idea: it's
not particularly readable, and it tempts pgindent to mess with line
breaks within the comment, which in turn reveals that win32tzlist.pl's
clean_displayname() does the wrong thing to clean up such line breaks.
While that's not hard to fix, there's basically no excuse for this
arrangement to begin with, especially since it makes the table layout
needlessly vary across back branches with different pgindent rules.
Let's just put the comment inside the braces, instead.
This commit just moves and reformats the comments, and updates
win32tzlist.pl to match; there's no actual data change.
Per odd-looking results from Juan José Santamaría Flecha.
Back-patch, since the point is to make win32_tzmap[] look the
same in all supported branches again.
Michael Paquier [Fri, 24 Apr 2020 02:34:16 +0000 (11:34 +0900)]
Remove some unstable parts from new TAP test for archive status check
The test is proving to have timing issues when looking at archive status
files on standbys after crash recovery, while other parts of the test
rely on pg_stat_archiver as a wait point to make sure that a given state
of the archiving is reached. The coverage is not heavily impacted by
the removal those extra tests.
Per reports from several buildfarm animals, like crake, piculet,
culicidae and francolin.
Michael Paquier [Thu, 23 Apr 2020 23:48:55 +0000 (08:48 +0900)]
Fix handling of WAL segments ready to be archived during crash recovery
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode. However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.
This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not. Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.
A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.
Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
Michael Paquier [Tue, 21 Apr 2020 22:28:04 +0000 (07:28 +0900)]
Fix memory leak in libpq when using sslmode=verify-full
Checking if Subject Alternative Names (SANs) from a certificate match
with the hostname connected to leaked memory after each lookup done.
This is broken since acd08d7 that added support for SANs in SSL
certificates, so backpatch down to 9.5.
Author: Roman Peshkurov Reviewed-by: Hamid Akhtar, Michael Paquier, David Steele
Discussion: https://postgr.es/m/CALLDf-pZ-E3mjxd5=bnHsDu9zHEOnpgPgdnO84E2RuwMCjjyPw@mail.gmail.com
Backpatch-through: 9.5
Tom Lane [Tue, 21 Apr 2020 19:58:43 +0000 (15:58 -0400)]
Fix possible crash during FATAL exit from reindexing.
index.c supposed that it could just use a PG_TRY block to clean up the
state associated with an active REINDEX operation. However, that code
doesn't run if we do a FATAL exit --- for example, due to a SIGTERM
shutdown signal --- while the REINDEX is happening. And that state does
get consulted during catalog accesses, which makes it problematic if we
do any catalog accesses during shutdown --- for example, to clean up any
temp tables created in the session.
If this combination of circumstances occurred, we could find ourselves
trying to access already-freed memory. In debug builds that'd fairly
reliably cause an assertion failure. In production we might often
get away with it, but with some bad luck it could cause a core dump.
Another possible bad outcome is an erroneous conclusion that an
index-to-be-accessed is being reindexed; but it looks like that would
be unlikely to have any consequences worse than failing to drop temp
tables right away. (They'd still get dropped by the next session that
uses that temp schema.)
To fix, get rid of the use of PG_TRY here, and instead hook into
the transaction abort mechanisms to clean up reindex state.
Per bug #16378 from Alexander Lakhin. This has been wrong for a
very long time, so back-patch to all supported branches.
Andrew Dunstan [Fri, 17 Apr 2020 18:55:55 +0000 (14:55 -0400)]
Use a slightly more liberal regex to detect Visual Studio version
Apparently in some language versions of Visual Studio nmake outputs some
material after the version number and before the end of the line. This
has been seen in Chinese versions. Therefore, we no longer demand that
the version string comes at the end of a line.
Tom Lane [Thu, 16 Apr 2020 18:45:54 +0000 (14:45 -0400)]
Fix cache reference leak in contrib/sepgsql.
fixup_whole_row_references() did the wrong thing with a dropped column,
resulting in a commit-time warning about a cache reference leak.
I (tgl) added a test case exercising this, but back-patched the test
only as far as v10; the patch didn't apply cleanly to 9.6 and it
didn't seem worth the trouble to adapt it. The bug is pretty old
though, so apply the code change all the way back.
Tom Lane [Sat, 11 Apr 2020 16:29:06 +0000 (12:29 -0400)]
Clear dangling pointer to avoid bogus EXPLAIN printout in a corner case.
ExecReScanHashJoin will destroy the join's hash table if it expects
that the inner relation will produce different rows on rescan.
Up to now it's not bothered to clear the additional pointer to that
hash table that exists in the child HashState node. However, it's
possible for the query to terminate without building a fresh hash
table (this happens if the outer relation is found to be empty
during the final rescan). So we can end with a dangling pointer
to a deleted hash table. That was harmless originally, but since
9.0 EXPLAIN ANALYZE has used that pointer to print hash table
statistics. In debug builds this reproducibly results in garbage
statistics. In non-debug builds there's frequently no ill effects,
but in principle one could get wrong EXPLAIN ANALYZE output, or
perhaps even a crash if free() has released the hashtable memory
back to the OS.
To fix, just make sure we clear the additional pointer when destroying
the hash table. In problematic cases, EXPLAIN ANALYZE will then print
no hashtable statistics (reverting to its pre-9.0 behavior). This isn't
ideal, but since the problem manifests only in unusual corner cases,
it's hard to justify taking any risks to do better in the back
branches. A follow-on patch will improve matters in HEAD.
Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.
Tom Lane [Fri, 10 Apr 2020 17:12:58 +0000 (13:12 -0400)]
Doc: clarify locking requirements for ALTER TABLE ADD FOREIGN KEY.
The docs explained that a SHARE ROW EXCLUSIVE lock is needed on the
referenced table, but failed to say the same about the table being
altered. Since the page says that ACCESS EXCLUSIVE lock is taken
unless otherwise stated, this left readers with the wrong conclusion.
Tom Lane [Fri, 10 Apr 2020 14:44:10 +0000 (10:44 -0400)]
Doc: sync CREATE GROUP syntax synopsis with CREATE ROLE.
CREATE GROUP is an exact alias for CREATE ROLE, and CREATE USER is
almost an exact alias, as can easily be confirmed by checking the
code. So the man page syntax descriptions ought to match up. The
last few additions of role options seem to have forgotten to update
create_group.sgml, though. Fix that, and add a naggy reminder to
create_role.sgml in hopes of not forgetting again.
Tom Lane [Wed, 8 Apr 2020 15:23:40 +0000 (11:23 -0400)]
Fix pg_dump/pg_restore to restore event trigger comments later.
Repair an oversight in commit 8728b2c70: if we're postponing restore
of event triggers to the end, we must also postpone restoring any
comments on them, since of course we cannot create the comments first.
(This opens yet another opportunity for an event trigger to bollix
the restore, but there's no help for that.)
Per bug #16346 from Alexander Lakhin.
Like the previous commit, back-patch to all supported branches.
Tom Lane [Wed, 8 Apr 2020 00:50:02 +0000 (20:50 -0400)]
Fix circle_in to accept "(x,y),r" as it's advertised to do.
Our documentation describes four allowed input syntaxes for circles,
but the regression tests tried only three ... with predictable
consequences. Remarkably, this has been wrong since the circle
datatype was added in 1997, but nobody noticed till now.
Tom Lane [Tue, 7 Apr 2020 20:30:55 +0000 (16:30 -0400)]
Adjust bytea get_bit/set_bit to cope with bytea strings > 256MB.
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value. However, it'd be good if they could do at
least that much, and not fall over entirely for longer bytea values.
Adjust the comparisons to be done in int64 arithmetic so that works.
Also tweak the error reports to show sane values in case of overflow.
Also add some test cases to improve the miserable code coverage
of these functions.
Apply patch to back branches only; HEAD has a better solution
as of commit 26a944cf2.
Tom Lane [Mon, 6 Apr 2020 16:29:54 +0000 (12:29 -0400)]
Stabilize new GIN test case in 9.5 branch.
In 9.6 and up, gin_test_tbl has autovacuum_enabled = off thanks to
commit f8a1c1d5a. 9.5 lacked that, which allowed autovacuum to
bollix the results of the test case added by commit 8150f7813.
We could fool with disabling seqscan around that test, but making
this branch look more like the later ones seems a better answer.
Per buildfarm member protosciurus. (I'm not very sure why
protosciurus is the only animal to report this so far; but it'd
clearly be a timing-related failure, so it's not astonishing that
only some machines would show it.)
Michael Paquier [Mon, 6 Apr 2020 02:06:07 +0000 (11:06 +0900)]
Preserve clustered index after rewrites with ALTER TABLE
A table rewritten by ALTER TABLE would lose tracking of an index usable
for CLUSTER. This setting is tracked by pg_index.indisclustered and is
controlled by ALTER TABLE, so some extra work was needed to restore it
properly. Note that ALTER TABLE only marks the index that can be used
for clustering, and does not do the actual operation.
Author: Amit Langote, Justin Pryzby Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
Backpatch-through: 9.5
Andres Freund [Mon, 6 Apr 2020 00:47:30 +0000 (17:47 -0700)]
Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().
There's a very low risk that RecentGlobalXmin could be far enough in
the past to be older than relfrozenxid, or even wrapped
around. Luckily the consequences of that having happened wouldn't be
too bad - the page wouldn't be pruned for a while.
Avoid that risk by using TransactionXmin instead. As that's announced
via MyPgXact->xmin, it is protected against wrapping around (see code
comments for details around relfrozenxid).
Author: Andres Freund
Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de
Backpatch: 9.5-
Tom Lane [Fri, 3 Apr 2020 17:15:30 +0000 (13:15 -0400)]
Fix bugs in gin_fuzzy_search_limit processing.
entryGetItem()'s three code paths each contained bugs associated
with filtering the entries for gin_fuzzy_search_limit.
The posting-tree path failed to advance "advancePast" after having
decided to filter an item. If we ran out of items on the current
page and needed to advance to the next, what would actually happen
is that entryLoadMoreItems() would re-load the same page. Eventually,
the random dropItem() test would accept one of the same items it'd
previously rejected, and we'd move on --- but it could take awhile
with small gin_fuzzy_search_limit. To add insult to injury, this
case would inevitably cause entryLoadMoreItems() to decide it needed
to re-descend from the root, making things even slower.
The posting-list path failed to implement gin_fuzzy_search_limit
filtering at all, so that all entries in the posting list would
be returned.
The bitmap-result path used a "gotitem" variable that it failed to
update in the one place where it'd actually make a difference, ie
at the one "continue" statement. I think this was unreachable in
practice, because if we'd looped around then it shouldn't be the
case that the entries on the new page are before advancePast.
Still, the "gotitem" variable was contributing nothing to either
clarity or correctness, so get rid of it.
Refactor all three loops so that the termination conditions are
more alike and less unreadable.
The code coverage report showed that we had no coverage at all for
the re-descend-from-root code path in entryLoadMoreItems(), which
seems like a very bad thing, so add a test case that exercises it.
We also had exactly no coverage for gin_fuzzy_search_limit, so add a
simplistic test case that at least hits those code paths a little bit.
Tom Lane [Fri, 3 Apr 2020 15:24:56 +0000 (11:24 -0400)]
Fix bogus CALLED_AS_TRIGGER() defenses.
contrib/lo's lo_manage() thought it could use
trigdata->tg_trigger->tgname in its error message about
not being called as a trigger. That naturally led to a core dump.
unique_key_recheck() figured it could Assert that fcinfo->context
is a TriggerData node in advance of having checked that it's
being called as a trigger. That's harmless in production builds,
and perhaps not that easy to reach in any case, but it's logically
wrong.
The first of these per bug #16340 from William Crowell;
the second from manual inspection of other CALLED_AS_TRIGGER
call sites.
Back-patch the lo.c change to all supported branches, the
other to v10 where the thinko crept in.
Bruce Momjian [Thu, 2 Apr 2020 21:42:09 +0000 (17:42 -0400)]
doc: remove unnecessary INNER keyword
A join that was added in commit 9b2009c4cf that did not use the INNER
keyword but the existing query used it. It was cleaner to remove the
existing INNER keyword.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/a1ffbfda-59d2-5732-e5fb-3df8582b6434@2ndquadrant.com
Tom Lane [Tue, 31 Mar 2020 15:37:44 +0000 (11:37 -0400)]
Back-patch addition of stack overflow and interrupt checks for lquery.
Experimentation shows that it's not hard at all to drive the
old implementation of "ltree ~ lquery" match to stack overflow,
so throw in a check_stack_depth() call, as I just did in HEAD.
I wasn't able to make it take a long time, because all the
pathological cases I tried hit stack overflow first; but
I bet there are some others that do take a long time, so add
CHECK_FOR_INTERRUPTS() too.
Tom Lane [Mon, 30 Mar 2020 15:14:58 +0000 (11:14 -0400)]
Be more careful about extracting encoding from locale strings on Windows.
GetLocaleInfoEx() can fail on strings that setlocale() was perfectly
happy with. A common way for that to happen is if the locale string
is actually a Unix-style string, say "et_EE.UTF-8". In that case,
what's after the dot is an encoding name, not a Windows codepage number;
blindly treating it as a codepage number led to failure, with a fairly
silly error message. Hence, check to see if what's after the dot is
all digits, and if not, treat it as a literal encoding name rather than
a codepage number. This will do the right thing with many Unix-style
locale strings, and produce a more sensible error message otherwise.
Somewhat independently of that, treat a zero (CP_ACP) result from
GetLocaleInfoEx() as meaning that we must use UTF-8 encoding.
Tom Lane [Sun, 29 Mar 2020 22:54:19 +0000 (18:54 -0400)]
Doc: correct misstatement about ltree label maximum length.
The documentation says that the max length is 255 bytes, but
code inspection says it's actually 255 characters; and relevant
lengths are stored as uint16 so that that works.
Tom Lane [Sat, 28 Mar 2020 21:09:52 +0000 (17:09 -0400)]
Protect against overflow of ltree.numlevel and lquery.numlevel.
These uint16 fields could be overflowed by excessively long input,
producing strange results. Complain for invalid input.
Likewise check for out-of-range values of the repeat counts in lquery.
(We don't try too hard on that one, notably not bothering to detect
if atoi's result has overflowed.)
Also detect length overflow in ltree_concat.
In passing, be more consistent about whether "syntax error" messages
include the type name. Also, clarify the documentation about what
the size limit is.
This has been broken for a long time, so back-patch to all supported
branches.
Nikita Glukhov, reviewed by Benjie Gillam and Tomas Vondra
Andres Freund [Sat, 28 Mar 2020 18:52:11 +0000 (11:52 -0700)]
Ensure snapshot is registered within ScanPgRelation().
In 9.4 I added support to use a historical snapshot in
ScanPgRelation(), while adding logical decoding. Unfortunately a
conflict with the concurrent removal of SnapshotNow was incorrectly
resolved, leading to an unregistered snapshot being used.
It is not correct to use an unregistered (or non-active) snapshot for
anything non-trivial, because catalog invalidations can cause the
snapshot to be invalidated.
Luckily it seems unlikely to actively cause problems in practice, as
ScanPgRelation() requires that we already have a lock on the relation,
we only look for a single row, and we don't appear to rely on the
result's tid to be correct. It however is clearly wrong and potential
negative consequences would likely be hard to find. So it seems worth
backpatching the fix, even without a concrete hazard.
Tom Lane [Thu, 26 Mar 2020 22:06:56 +0000 (18:06 -0400)]
Ensure that plpgsql cleans up cleanly during parallel-worker exit.
plpgsql_xact_cb ought to treat events XACT_EVENT_PARALLEL_COMMIT and
XACT_EVENT_PARALLEL_ABORT like XACT_EVENT_COMMIT and XACT_EVENT_ABORT
respectively, since its goal is to do process-local cleanup. This
oversight caused plpgsql's end-of-transaction cleanup to not get done
in parallel workers. Since a parallel worker will exit just after the
transaction cleanup, the effects of this are limited. I couldn't find
any case in the core code with user-visible effects, but perhaps there
are some in extensions. In any case it's wrong, so let's fix it before
it bites us not after.
In passing, add some comments around the handling of expression
evaluation resources in DO blocks. There's no live bug there, but it's
quite unobvious what's happening; at least I thought so. This isn't
related to the other issue, except that I found both things while poking
at expression-evaluation performance.
Back-patch the plpgsql_xact_cb fix to 9.5 where those event types
were introduced, and the DO-block commentary to v11 where DO blocks
gained the ability to issue COMMIT/ROLLBACK.
Peter Eisentraut [Thu, 26 Mar 2020 10:51:39 +0000 (11:51 +0100)]
Drop slot's LWLock before returning from SaveSlotToPath()
When SaveSlotToPath() is called with elevel=LOG, the early exits didn't
release the slot's io_in_progress_lock.
This could result in a walsender being stuck on the lock forever. A
possible way to get into this situation is if the offending code paths
are triggered in a low disk space situation.
Tom Lane [Mon, 23 Mar 2020 16:42:16 +0000 (12:42 -0400)]
Doc: explain that LIKE et al can be used in ANY (sub-select) etc.
This wasn't stated anywhere, and it's perhaps not that obvious,
since we get questions about it from time to time. Also undocumented
was that the parser actually translates these into operators.
Tom Lane [Mon, 23 Mar 2020 15:58:01 +0000 (11:58 -0400)]
Fix our getopt_long's behavior for a command line argument of just "-".
src/port/getopt_long.c failed on such an argument, always seeing it
as an unrecognized switch. This is unhelpful; better is to treat such
an item as a non-switch argument. That behavior is what we find in
GNU's getopt_long(); it's what src/port/getopt.c does; and it is
required by POSIX for getopt(), which getopt_long() ought to be
generally a superset of. Moreover, it's expected by ecpg, which
intends an argument of "-" to mean "read from stdin". So fix it.
Also add some documentation about ecpg's behavior in this area, since
that was miserably underdocumented. I had to reverse-engineer it
from the code.
Per bug #16304 from James Gray. Back-patch to all supported branches,
since this has been broken forever.
Michael Paquier [Mon, 23 Mar 2020 04:38:35 +0000 (13:38 +0900)]
Doc: Fix type of some storage parameters in CREATE TABLE page
autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor have
been documented as "float4", but "floating type" is used in this case
for GUCs and relation options in the documentation.
Noah Misch [Sun, 22 Mar 2020 16:24:09 +0000 (09:24 -0700)]
Revert "Skip WAL for new relfilenodes, under wal_level=minimal."
This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64. Back-patch to 9.5 (all supported versions).
Noah Misch [Sat, 21 Mar 2020 16:38:26 +0000 (09:38 -0700)]
Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Back-patch to 9.5 (all supported versions). This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As
always, update standby systems before master systems. This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions. (The most recent commit to
affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.)
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Noah Misch [Sat, 21 Mar 2020 16:38:33 +0000 (09:38 -0700)]
Back-patch log_newpage_range().
Back-patch a subset of commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b
to v11, v10, 9.6, and 9.5. Include the latest repairs to this function.
Use a new XLOG_FPI_MULTI value instead of reusing XLOG_FPI. That way,
if an older server reads WAL from this function, that server will PANIC
instead of applying just one page of the record. The next commit adds a
call to this function.
Noah Misch [Sat, 21 Mar 2020 16:38:26 +0000 (09:38 -0700)]
During heap rebuild, lock any TOAST index until end of transaction.
swap_relation_files() calls toast_get_valid_index() to find and lock
this index, just before swapping with the rebuilt TOAST index. The
latter function releases the lock before returning. Potential for
mischief is low; a concurrent session can issue ALTER INDEX ... SET
(fillfactor = ...), which is not alarming. Nonetheless, changing
pg_class.relfilenode without a lock is unconventional. Back-patch to
9.5 (all supported versions), because another fix needs this.
Noah Misch [Sat, 21 Mar 2020 16:38:26 +0000 (09:38 -0700)]
Fix cosmetic blemishes involving rd_createSubid.
Remove an obsolete comment from AtEOXact_cleanup(). Restore formatting
of a comment in struct RelationData, mangled by the pgindent run in
commit 9af4159fce6654aa0e081b00d02bca40b978745c. Back-patch to 9.5 (all
supported versions), because another fix stacks on this.
Bruce Momjian [Thu, 19 Mar 2020 19:20:55 +0000 (15:20 -0400)]
pg_upgrade: make get_major_server_version() err msg consistent
This patch fixes the error message in get_major_server_version() to be
"could not parse version file", and uses the full file path name, rather
than just the data directory path.
Also, commit 4109bb5de4 added the cause of the failure to the "could
not open" error message, and improved quoting. This patch backpatches
the "could not open" cause to PG 12, where it was first widely used, and
backpatches the quoting fix in that patch to all supported releases.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/87pne2w98h.fsf@wibble.ilmari.org
Noah Misch [Thu, 19 Mar 2020 16:39:26 +0000 (09:39 -0700)]
Back-patch src/test/recovery and PostgresNode from 9.6 to 9.5.
This omits 007_sync_rep.pl, which tests a feature new in 9.6. The only
other change is to substitute "hot_standby" for "replica". A planned
back-patch will use this suite to test its recovery behavior changes.
Identified by Kyotaro Horiguchi, though I did not use his patch.
Tom Lane [Wed, 18 Mar 2020 15:04:48 +0000 (11:04 -0400)]
Doc: remove reference to nonexisting GUC from 9.5 release notes.
idle_in_transaction_session_timeout doesn't exist before 9.6, so
mentioning it in older branches' release notes is confusing.
Lazy copy-and-paste on my (tgl's) part. Too late to do anything
about 9.4, but we can usefully fix this in 9.5.
Tom Lane [Tue, 17 Mar 2020 19:05:17 +0000 (15:05 -0400)]
Doc: clarify behavior of "anyrange" pseudo-type.
I noticed that we completely failed to document the restriction
that an "anyrange" result type has to be inferred from an "anyrange"
input. The docs also were less clear than they could be about the
relationship between "anyrange" and "anyarray".
Tom Lane [Tue, 17 Mar 2020 01:05:29 +0000 (21:05 -0400)]
Avoid holding a directory FD open across assorted SRF calls.
This extends the fixes made in commit 085b6b667 to other SRFs with the
same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(),
pg_ls_dir(), and pg_tablespace_databases().
Also adjust various comments and documentation to warn against
expecting to clean up resources during a ValuePerCall SRF's final
call.
Back-patch to all supported branches, since these functions were
all born broken.
Tom Lane [Sat, 14 Mar 2020 18:42:22 +0000 (14:42 -0400)]
Restructure polymorphic-type resolution in funcapi.c.
resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes() failed to
cover the case of having to resolve anyarray given only an anyrange input.
The bug was masked if anyelement was also used (as either input or
output), which probably helps account for our not having noticed.
While looking at this I noticed that resolve_generic_type() would produce
the wrong answer if asked to make that same resolution. ISTM that
resolve_generic_type() is confusingly defined and overly complex, so
rather than fix it, let's just make funcapi.c do the specific lookups
it requires for itself.
With this change, resolve_generic_type() is not used anywhere, so remove
it in HEAD. In the back branches, leave it alone (complete with bug)
just in case any external code is using it.
While we're here, make some other refactoring adjustments in funcapi.c
with an eye to upcoming future expansion of the set of polymorphic types:
* Simplify quick-exit tests by adding an overall have_polymorphic_result
flag. This is about a wash now but will be a win when there are more
flags.
* Reduce duplication of code between resolve_polymorphic_tupdesc() and
resolve_polymorphic_argtypes().
* Don't bother to validate correct matching of anynonarray or anyenum;
the parser should have done that, and even if it didn't, just doing
"return false" here would lead to a very confusing, off-point error
message. (Really, "return false" in these two functions should only
occur if the call_expr isn't supplied or we can't obtain data type
info from it.)
* For the same reason, throw an elog rather than "return false" if
we fail to resolve a polymorphic type.
The bug's been there since we added anyrange, so back-patch to
all supported branches.
Peter Eisentraut [Fri, 13 Mar 2020 10:28:11 +0000 (11:28 +0100)]
Preserve replica identity index across ALTER TABLE rewrite
If an index was explicitly set as replica identity index, this setting
was lost when a table was rewritten by ALTER TABLE. Because this
setting is part of pg_index but actually controlled by ALTER
TABLE (not part of CREATE INDEX, say), we have to do some extra work
to restore it.
Based-on-patch-by: Quan Zongliang <quanzongliang@gmail.com> Reviewed-by: Euler Taveira <euler.taveira@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com
Thomas Munro [Thu, 12 Mar 2020 05:06:54 +0000 (18:06 +1300)]
Fix nextXid tracking bug on standbys (9.5-11 only).
RecordKnownAssignedTransactionIds() should never move
nextXid backwards. Before this commit, that could happen
if some other code path had advanced it without advancing
latestObservedXid.
One consequence is that a well timed XLOG_CHECKPOINT_ONLINE
could cause hot standby feedback messages to get confused
and report an xmin from a future epoch, potentially allowing
vacuum to run too soon on the primary.
Repair, by making sure RecordKnownAssignedTransactionIds()
can only move nextXid forwards.
In release 12 and master, this was already done by commit 2fc7af5e, which consolidated similar code and straightened
out this bug. Back-patch to supported releases before that.
Tom Lane [Mon, 9 Mar 2020 18:58:11 +0000 (14:58 -0400)]
Fix pg_dump/pg_restore to restore event triggers later.
Previously, event triggers were restored just after regular triggers
(and FK constraints, which are basically triggers). This is risky
since an event trigger, once installed, could interfere with subsequent
restore commands. Worse, because event triggers don't have any
particular dependencies on any post-data objects, a parallel restore
would consider them eligible to be restored the moment the post-data
phase starts, allowing them to also interfere with restoration of a
whole bunch of objects that would have been restored before them in
a serial restore. There's no way to completely remove the risk of a
misguided event trigger breaking the restore, since if nothing else
it could break other event triggers. But we can certainly push them
to later in the process to minimize the hazard.
To fix, tweak the RestorePass mechanism introduced by commit 3eb9a5e7c
so that event triggers are handled as part of the post-ACL processing
pass (renaming the "REFRESH" pass to "POST_ACL" to reflect its more
general use). This will cause them to restore after everything except
matview refreshes, which seems OK since matview refreshes really ought
to run in the post-restore state of the database. In a parallel
restore, event triggers and matview refreshes might be intermixed,
but that seems all right as well.
Also update the code and comments in pg_dump_sort.c so that its idea
of how things are sorted agrees with what actually happens due to
the RestorePass mechanism. This is mostly cosmetic: it'll affect the
order of objects in a dump's TOC, but not the actual restore order.
But not changing that would be quite confusing to somebody reading
the code.
Fujii Masao [Mon, 9 Mar 2020 15:14:43 +0000 (00:14 +0900)]
Fix bug that causes to report waiting in PS display twice, in hot standby.
Previously "waiting" could appear twice via PS in case of lock conflict
in hot standby mode. Specifically this issue happend when the delay
in WAL application determined by max_standby_archive_delay and
max_standby_streaming_delay had passed but it took more than 500 msec
to cancel all the conflicting transactions. Especially we can observe this
easily by setting those delay parameters to -1.
The cause of this issue was that WaitOnLock() and
ResolveRecoveryConflictWithVirtualXIDs() added "waiting" to
the process title in that case. This commit prevents
ResolveRecoveryConflictWithVirtualXIDs() from reporting waiting
in case of lock conflict, to fix the bug.
Fujii Masao [Mon, 9 Mar 2020 06:31:31 +0000 (15:31 +0900)]
Avoid assertion failure with targeted recovery in standby mode.
At the end of recovery, standby mode is turned off to re-fetch the last
valid record from archive or pg_wal. Previously, if recovery target was
reached and standby mode was turned off while the current WAL source
was stream, recovery could try to retrieve WAL file containing the last
valid record unexpectedly from stream even though not in standby mode.
This caused an assertion failure. That is, the assertion test confirms that
WAL file should not be retrieved from stream if standby mode is not true.
This commit moves back the current WAL source to archive if it's stream
even though not in standby mode, to avoid that assertion failure.
This issue doesn't cause the server to crash when built with assertion
disabled. In this case, the attempt to retrieve WAL file from stream not
in standby mode just fails. And then recovery tries to retrieve WAL file
from archive or pg_wal.
Michael Paquier [Thu, 27 Feb 2020 02:21:28 +0000 (11:21 +0900)]
createdb: Fix quoting of --encoding, --lc-ctype and --lc-collate
The original coding failed to properly quote those arguments, leading to
failures when using quotes in the values used. As the quoting can be
encoding-sensitive, the connection to the backend needs to be taken
before applying the correct quoting.
Author: Michael Paquier Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200214041004.GB1998@paquier.xyz
Backpatch-through: 9.5
Tom Lane [Wed, 19 Feb 2020 19:44:58 +0000 (14:44 -0500)]
Fix confusion about event trigger vs. plain function in plpgsql.
The function hash table keys made by compute_function_hashkey() failed
to distinguish event-trigger call context from regular call context.
This meant that once we'd successfully made a hash entry for an event
trigger (either by validation, or by normal use as an event trigger),
an attempt to call the trigger function as a plain function would
find this hash entry and thereby bypass the you-can't-do-that check in
do_compile(). Thus we'd attempt to execute the function, leading to
strange errors or even crashes, depending on function contents and
server version.
To fix, add an isEventTrigger field to PLpgSQL_func_hashkey,
paralleling the longstanding infrastructure for regular triggers.
This fits into what had been pad space, so there's no risk of an ABI
break, even assuming that any third-party code is looking at these
hash keys. (I considered replacing isTrigger with a PLpgSQL_trigtype
enum field, but felt that that carried some API/ABI risk. Maybe we
should change it in HEAD though.)
Per bug #16266 from Alexander Lakhin. This has been broken since
event triggers were invented, so back-patch to all supported branches.
Fujii Masao [Wed, 19 Feb 2020 11:37:26 +0000 (20:37 +0900)]
Fix mesurement of elapsed time during truncating heap in VACUUM.
VACUUM may truncate heap in several batches. The activity report
is logged for each batch, and contains the number of pages in the table
before and after the truncation, and also the elapsed time during
the truncation. Previously the elapsed time reported in each batch was
the total elapsed time since starting the truncation until finishing
each batch. For example, if the truncation was processed dividing into
three batches, the second batch reported the accumulated time elapsed
during both first and second batches. This is strange and confusing
because the number of pages in the table reported together is not
total. Instead, each batch should report the time elapsed during
only that batch.
The cause of this issue was that the resource usage snapshot was
initialized only at the beginning of the truncation and was never
reset later. This commit fixes the issue by changing VACUUM so that
the resource usage snapshot is reset at each batch.
Amit Kapila [Wed, 19 Feb 2020 03:29:18 +0000 (08:59 +0530)]
Stop demanding that top xact must be seen before subxact in decoding.
Manifested as
ERROR: subtransaction logged without previous top-level txn record
this check forbids legit behaviours like
- First xl_xact_assignment record is beyond reading, i.e. earlier
restart_lsn.
- After restart_lsn there is some change of a subxact.
- After that, there is second xl_xact_assignment (for another subxact)
revealing the relationship between top and first subxact.
Such a transaction won't be streamed anyway because we hadn't seen it in
full. Saying for sure whether xact of some record encountered after
the snapshot was deserialized can be streamed or not requires to know
whether it wrote something before deserialization point --if yes, it
hasn't been seen in full and can't be decoded. Snapshot doesn't have such
info, so there is no easy way to relax the check.
Reported-by: Hsu, John Diagnosed-by: Arseny Sher
Author: Arseny Sher, Amit Kapila Reviewed-by: Amit Kapila, Dilip Kumar
Backpatch-through: 9.5
Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
Tom Lane [Mon, 17 Feb 2020 23:40:02 +0000 (18:40 -0500)]
Teach pg_dump to dump comments on RLS policy objects.
This was unaccountably omitted in the original RLS patch.
The SQL syntax is basically the same as for comments on triggers,
so crib code from dumpTrigger().
Per report from Marc Munro. Back-patch to all supported branches.