fix: produce consistent error messages on date overflow
- State if the date is too small or too large, not just "not supported".
- Use similar messages in text and binary format.
- Avoid an overflow error with the infinity date in Python.
Daniele Varrazzo [Thu, 19 May 2022 21:41:26 +0000 (23:41 +0200)]
Test: fix mypy 0.941 run
This seems a mypy shortcoming fixed in 0.950 as the CI didn't complain.
The row factory definition was exotic but arguably correct. Not worth
bumping up the min version for it anyway.
Denis Laxalde [Mon, 16 May 2022 17:16:53 +0000 (19:16 +0200)]
fix: only use covariant Row type variable
The real variance of Row type variable is covariant, per definition of
RowMaker (RowMaker[B] is a subtype of RowMaker[A] if B is a subtype of
A; similar to the Box type in [mypy documentation][]).
By dropping the Row type variable from Cursor argument in RowFactory, we
are now able to only use the covariant Row variable (previously Row_co,
now Row). Indeed, RowFactory does not actually depend on Cursor on being
parametrized on Row; rather it's the other way around (Cursor's Row type
variable comes from its row factory).
Daniele Varrazzo [Sat, 14 May 2022 08:22:44 +0000 (10:22 +0200)]
fix: don't barf on errors with blank sqlstate
Such messages are not entirely valid (sqlstate is documented to be
always present), however we receive them after a SHOW HELP in the
PgBouncer admin database.
The SHOW HELP actually does generate a sqlstate `00000` but the message
is somehow parsed incorrectly by the libpq, which goes on to report an
error:
message contents do not agree with length in message type "N"
See #303. PgBouncer issue reported upstream in
https://github.com/pgbouncer/pgbouncer/issues/718
Daniele Varrazzo [Sat, 14 May 2022 22:24:17 +0000 (00:24 +0200)]
test: skip testing random multirange arrays with empty last elements
Previously we were skipping the ones with an empty first element,
because of a known shortcoming in finding the right OID. Now that we
scan the whole array to find all the elements' classes, it's the last
entry which might break dumping.
Daniele Varrazzo [Fri, 13 May 2022 14:54:54 +0000 (16:54 +0200)]
fix: raise DataError if IntDumper tries to dump another type
Allow to dump int subclasses, but not other numeric type, which can lead
to truncation. As seen in #301, the Python implementation truncates; the
c implementation is reporting failing with a TypeError, but on Python
3.8 it raises a deprecation warning instead.
The condition largely happens dumping array of mixed types. However we
should probably test with array of mixed class in a more generic way.
test: add test to verify the wrong array oid with numbers
The array binary dumper does the right thing; the text one picks
numeric[] unconditionally. It was clearly made on purpose, but #293
shows that it's a bad idea.
perf: avoid unnecessary recvfrom() in cursor.stream()
Call PQisBusy() before PQconsumeInput() on fetching results. If not
busy, don't call PQconsumeInput() at all but just go to fetching results
and notifications.
This is especially useful in single-row mode because most of the times
the libpq can produce several results after a single network fetch.
Previously we were calling PQconsumeInput() also when results were
already on the client and there was nothing new to read, which forced
the libpq to run a select() to tell apart a lack of data from an EOF,
see `the grumble`_, and caused the overhead reported in #286.
Daniele Varrazzo [Fri, 25 Mar 2022 15:57:28 +0000 (16:57 +0100)]
fix(copy): chunk large buffers before queuing, not after
This is conceptually a better place to do it, because the queue has the
function of applying backpressure on the data generator. Splitting large
buffers later would flood the libpq without effectively slowing down the
producer.
Also, reduce the size of the chunks appended to the libpq from 1Mb to
128K. This makes an *immense* difference: the too large chunk probably
triggers some quadraticity in the libpq. The test script found in #255,
piped in `ts -s`, shows that pushing a block of data of about 1Gb size
(which will fail in Postgres anyway), with the smaller size, will take
about 9s. With the larger size, it takes 4.10m to get to waiting for
PQputCopyEnd, and other almost 6 minutes to receive the error message
from the server.
00:00:47 putting 1048576 (or less) bytes in queue size 1023
00:00:47 writing copy end
00:00:47 got 1048576 bytes from queue size 1023
...
00:01:25 got 1048576 bytes from queue size 640
...
00:01:54 got 1048576 bytes from queue size 512
...
00:03:00 got 1048576 bytes from queue size 256
...
00:04:12 got 0 bytes from queue size 0
00:04:12 wait for copy end
00:09:59 Traceback (most recent call last):
...
Adding a few prints (see #255 for details) also shows that the time
spent in PQputCopyData increases, going from ~15 entries/sec processed
when the writer has just finished pushing data in the queue, down to ~4
items/sec towards the end.
Considering that a reduction of 10-20% of the input size causes a
decrease of the processing time of about 50%, there is definitely
something quadratic going on there. It might be possible to improve the
libpq, but for the moment it's better to try and coexist nicely with the
current state.
Daniele Varrazzo [Sun, 20 Mar 2022 00:32:25 +0000 (01:32 +0100)]
fix: fix loading of text arrays with dimension information
The dimension information is a prefix such as ``[0:2]=`` in front of the
array. We just discard it when loading to lists, because for Python they
are always 0-based.
fix: don't raise error accessing Cursor.description after COPY_OUT
COPY_OUT result advertises the number of columns but not their names (or
types). Use a surrogate name for description (which is more useful than
returning `None`, because at lest it tells how many columns were
emitted).