test: add test to verify the wrong array oid with numbers
The array binary dumper does the right thing; the text one picks
numeric[] unconditionally. It was clearly made on purpose, but #293
shows that it's a bad idea.
perf: avoid unnecessary recvfrom() in cursor.stream()
Call PQisBusy() before PQconsumeInput() on fetching results. If not
busy, don't call PQconsumeInput() at all but just go to fetching results
and notifications.
This is especially useful in single-row mode because most of the times
the libpq can produce several results after a single network fetch.
Previously we were calling PQconsumeInput() also when results were
already on the client and there was nothing new to read, which forced
the libpq to run a select() to tell apart a lack of data from an EOF,
see `the grumble`_, and caused the overhead reported in #286.
Daniele Varrazzo [Fri, 25 Mar 2022 15:57:28 +0000 (16:57 +0100)]
fix(copy): chunk large buffers before queuing, not after
This is conceptually a better place to do it, because the queue has the
function of applying backpressure on the data generator. Splitting large
buffers later would flood the libpq without effectively slowing down the
producer.
Also, reduce the size of the chunks appended to the libpq from 1Mb to
128K. This makes an *immense* difference: the too large chunk probably
triggers some quadraticity in the libpq. The test script found in #255,
piped in `ts -s`, shows that pushing a block of data of about 1Gb size
(which will fail in Postgres anyway), with the smaller size, will take
about 9s. With the larger size, it takes 4.10m to get to waiting for
PQputCopyEnd, and other almost 6 minutes to receive the error message
from the server.
00:00:47 putting 1048576 (or less) bytes in queue size 1023
00:00:47 writing copy end
00:00:47 got 1048576 bytes from queue size 1023
...
00:01:25 got 1048576 bytes from queue size 640
...
00:01:54 got 1048576 bytes from queue size 512
...
00:03:00 got 1048576 bytes from queue size 256
...
00:04:12 got 0 bytes from queue size 0
00:04:12 wait for copy end
00:09:59 Traceback (most recent call last):
...
Adding a few prints (see #255 for details) also shows that the time
spent in PQputCopyData increases, going from ~15 entries/sec processed
when the writer has just finished pushing data in the queue, down to ~4
items/sec towards the end.
Considering that a reduction of 10-20% of the input size causes a
decrease of the processing time of about 50%, there is definitely
something quadratic going on there. It might be possible to improve the
libpq, but for the moment it's better to try and coexist nicely with the
current state.
Daniele Varrazzo [Sun, 20 Mar 2022 00:32:25 +0000 (01:32 +0100)]
fix: fix loading of text arrays with dimension information
The dimension information is a prefix such as ``[0:2]=`` in front of the
array. We just discard it when loading to lists, because for Python they
are always 0-based.
fix: don't raise error accessing Cursor.description after COPY_OUT
COPY_OUT result advertises the number of columns but not their names (or
types). Use a surrogate name for description (which is more useful than
returning `None`, because at lest it tells how many columns were
emitted).
Daniele Varrazzo [Tue, 22 Feb 2022 03:02:13 +0000 (04:02 +0100)]
fix: Cancel query on Ctrl-C
On KeyboardInterrupt, send a cancel to the server and keep waiting for
the result of the cancel, which is expected to raise a QueryCanceled,
then re-raise KeyboardInterrupt.
Before this, the connection was left in ACTIVE state, so it couldn't be rolled
back.
Only fixed on sync connections. Left a failing test for async
connections; the test fails with an output from the script such as:
error ignored in rollback on <psycopg.AsyncConnection [ACTIVE] ...>:
sending query failed: another command is already in progress
Traceback (most recent call last):
File "<string>", line 27, in <module>
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
event_list = self._selector.select(timeout)
File "/usr/lib/python3.8/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
KeyboardInterrupt
And the except branch in `AsyncConnection.wait()` is not reached.
After getting more information about the matter and having convinced
myself that there is no need, except FOMO, to extend the copyright year
on source code, change all our entries to leave only the start year.
git ls-tree -r HEAD --name-only \
| xargs sed -i '/Copyright.*\(Varrazzo\|Psycopg\)/ s/-20..//'
Don't auto-skip pool test if import fails. This would miss serious
problems leading to the pool not being importable. If someone wants to
skip the pool tests they can use `-m 'not pool'` now.
Don't look up other modules objects in __del__ methods
The modules might have been already cleaned up during interpreted
shutdown. See <https://bugs.python.org/issue46256#msg409847> for an
explanation.
The stdlib guards against the same thing happening too. However they
take a reference in the function closure, which is stronger than what we
do. Doing so, on our strictly typed codebase, is a tedious chore, so, if
this is enough (it should be, according to the OP), I'm happy this way.
Add ConnectionTimeout subclass of OperationalError
To be used in the connection pool to detect timeout on connection.
Backported to Psycopg 3.0.8 to allow the pool 3.1 to work with it, at
least on diminished capacity (NullPool.connection() would time out only
for clients in the queue, not in case of new connection timeout).