PostgreSQL always sends the BackendKeyData message at connection
startup, but there are some third party backend implementations out
there that don't support cancellation, and don't send the message
[1]. While the protocol docs left it up for interpretation if that is
valid behavior, libpq in PostgreSQL 17 and below accepted it. It does
not seem like the libpq behavior was intentional though, since it did
so by sending CancelRequest messages with all zeros to such servers
(instead of returning an error or making the cancel a no-op).
In version 18 the behavior was changed to return an error when trying
to create the cancel object with PGgetCancel() or PGcancelCreate().
This was done without any discussion, as part of supporting different
lengths of cancel packets for the new 3.2 version of the protocol.
This commit changes the behavior of PGgetCancel() / PGcancel() once
more to only return an error when the cancel object is actually used
to send a cancellation, instead of when merely creating the object.
The reason to do so is that some clients [2] create a cancel object as
part of their connection creation logic (thus having the cancel object
ready for later when they need it), so if creating the cancel object
returns an error, the whole connection attempt fails. By delaying the
error, such clients will still be able to connect to the third party
backend implementations in question, but when actually trying to
cancel a query, the user will be notified that that is not possible
for the server that they are connected to.
This commit only changes the behavior of the older PGgetCancel() /
PQcancel() functions, not the more modern PQcancelCreate() family of
functions. I.e. PQcancelCreate() returns a failed connection object
(CONNECTION_BAD) if the server didn't send a cancellation key. Unlike
the old PQgetCancel() function, we're not aware of any clients in the
field that use PQcancelCreate() during connection startup in a way
that would prevent connecting to such servers.
[1] AWS RDS Proxy is definitely one of them, and CockroachDB might be
another.
[2] psycopg2 (but not psycopg3).
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Backpatch-through: 18
Discussion: https://www.postgresql.org/message-id/
20250617.101056.
1437027795118961504.ishii%40postgresql.org
The frontend should not respond to this message, but should
continue listening for a ReadyForQuery message.
</para>
+ <para>
+ The <productname>PostgreSQL</productname> server will always send this
+ message, but some third party backend implementations of the protocol
+ that don't support query cancellation are known not to.
+ </para>
</listitem>
</varlistentry>
/* Check that we have received a cancellation key */
if (conn->be_cancel_key_len == 0)
- return NULL;
+ {
+ /*
+ * In case there is no cancel key, return an all-zero PGcancel object.
+ * Actually calling PQcancel on this will fail, but we allow creating
+ * the PGcancel object anyway. Arguably it would be better return NULL
+ * to indicate that cancellation is not possible, but there'd be no
+ * way for the caller to distinguish "out of memory" from "server did
+ * not send a cancel key". Also, this is how PGgetCancel() has always
+ * behaved, and if we changed it, some clients would stop working
+ * altogether with servers that don't support cancellation. (The
+ * modern PQcancelCreate() function returns a failed connection object
+ * instead.)
+ *
+ * The returned dummy object has cancel_pkt_len == 0; we check for
+ * that in PQcancel() to identify it as a dummy.
+ */
+ return calloc(1, sizeof(PGcancel));
+ }
cancel_req_len = offsetof(CancelRequestPacket, cancelAuthCode) + conn->be_cancel_key_len;
cancel = malloc(offsetof(PGcancel, cancel_req) + cancel_req_len);
return false;
}
+ if (cancel->cancel_pkt_len == 0)
+ {
+ /* This is a dummy PGcancel object, see PQgetCancel */
+ strlcpy(errbuf, "PQcancel() -- no cancellation key received", errbufsize);
+ /* strlcpy probably doesn't change errno, but be paranoid */
+ SOCK_ERRNO_SET(save_errno);
+ return false;
+ }
+
/*
* We need to open a temporary connection to the postmaster. Do this with
* only kernel calls.