From 42b1480eb2c6132d2b0ff76a767cccddc28ab20f Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Fri, 1 Aug 2025 18:24:19 +0300 Subject: [PATCH] libpq: Complain about missing BackendKeyData later with PGgetCancel() PostgreSQL always sends the BackendKeyData message at connection startup, but there are some third party backend implementations out there that don't support cancellation, and don't send the message [1]. While the protocol docs left it up for interpretation if that is valid behavior, libpq in PostgreSQL 17 and below accepted it. It does not seem like the libpq behavior was intentional though, since it did so by sending CancelRequest messages with all zeros to such servers (instead of returning an error or making the cancel a no-op). In version 18 the behavior was changed to return an error when trying to create the cancel object with PGgetCancel() or PGcancelCreate(). This was done without any discussion, as part of supporting different lengths of cancel packets for the new 3.2 version of the protocol. This commit changes the behavior of PGgetCancel() / PGcancel() once more to only return an error when the cancel object is actually used to send a cancellation, instead of when merely creating the object. The reason to do so is that some clients [2] create a cancel object as part of their connection creation logic (thus having the cancel object ready for later when they need it), so if creating the cancel object returns an error, the whole connection attempt fails. By delaying the error, such clients will still be able to connect to the third party backend implementations in question, but when actually trying to cancel a query, the user will be notified that that is not possible for the server that they are connected to. This commit only changes the behavior of the older PGgetCancel() / PQcancel() functions, not the more modern PQcancelCreate() family of functions. I.e. PQcancelCreate() returns a failed connection object (CONNECTION_BAD) if the server didn't send a cancellation key. Unlike the old PQgetCancel() function, we're not aware of any clients in the field that use PQcancelCreate() during connection startup in a way that would prevent connecting to such servers. [1] AWS RDS Proxy is definitely one of them, and CockroachDB might be another. [2] psycopg2 (but not psycopg3). Author: Jelte Fennema-Nio Reviewed-by: Jacob Champion Backpatch-through: 18 Discussion: https://www.postgresql.org/message-id/20250617.101056.1437027795118961504.ishii%40postgresql.org --- doc/src/sgml/protocol.sgml | 5 +++++ src/interfaces/libpq/fe-cancel.c | 28 +++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 6ca16523da1..4b2bd0d165a 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -537,6 +537,11 @@ The frontend should not respond to this message, but should continue listening for a ReadyForQuery message. + + The PostgreSQL server will always send this + message, but some third party backend implementations of the protocol + that don't support query cancellation are known not to. + diff --git a/src/interfaces/libpq/fe-cancel.c b/src/interfaces/libpq/fe-cancel.c index 65517c5703b..c872a0267f0 100644 --- a/src/interfaces/libpq/fe-cancel.c +++ b/src/interfaces/libpq/fe-cancel.c @@ -379,7 +379,24 @@ PQgetCancel(PGconn *conn) /* Check that we have received a cancellation key */ if (conn->be_cancel_key_len == 0) - return NULL; + { + /* + * In case there is no cancel key, return an all-zero PGcancel object. + * Actually calling PQcancel on this will fail, but we allow creating + * the PGcancel object anyway. Arguably it would be better return NULL + * to indicate that cancellation is not possible, but there'd be no + * way for the caller to distinguish "out of memory" from "server did + * not send a cancel key". Also, this is how PGgetCancel() has always + * behaved, and if we changed it, some clients would stop working + * altogether with servers that don't support cancellation. (The + * modern PQcancelCreate() function returns a failed connection object + * instead.) + * + * The returned dummy object has cancel_pkt_len == 0; we check for + * that in PQcancel() to identify it as a dummy. + */ + return calloc(1, sizeof(PGcancel)); + } cancel_req_len = offsetof(CancelRequestPacket, cancelAuthCode) + conn->be_cancel_key_len; cancel = malloc(offsetof(PGcancel, cancel_req) + cancel_req_len); @@ -544,6 +561,15 @@ PQcancel(PGcancel *cancel, char *errbuf, int errbufsize) return false; } + if (cancel->cancel_pkt_len == 0) + { + /* This is a dummy PGcancel object, see PQgetCancel */ + strlcpy(errbuf, "PQcancel() -- no cancellation key received", errbufsize); + /* strlcpy probably doesn't change errno, but be paranoid */ + SOCK_ERRNO_SET(save_errno); + return false; + } + /* * We need to open a temporary connection to the postmaster. Do this with * only kernel calls. -- 2.47.2