Daniele Varrazzo [Fri, 25 Mar 2022 15:57:28 +0000 (16:57 +0100)]
fix(copy): chunk large buffers before queuing, not after
This is conceptually a better place to do it, because the queue has the
function of applying backpressure on the data generator. Splitting large
buffers later would flood the libpq without effectively slowing down the
producer.
Also, reduce the size of the chunks appended to the libpq from 1Mb to
128K. This makes an *immense* difference: the too large chunk probably
triggers some quadraticity in the libpq. The test script found in #255,
piped in `ts -s`, shows that pushing a block of data of about 1Gb size
(which will fail in Postgres anyway), with the smaller size, will take
about 9s. With the larger size, it takes 4.10m to get to waiting for
PQputCopyEnd, and other almost 6 minutes to receive the error message
from the server.
00:00:47 putting 1048576 (or less) bytes in queue size 1023
00:00:47 writing copy end
00:00:47 got 1048576 bytes from queue size 1023
...
00:01:25 got 1048576 bytes from queue size 640
...
00:01:54 got 1048576 bytes from queue size 512
...
00:03:00 got 1048576 bytes from queue size 256
...
00:04:12 got 0 bytes from queue size 0
00:04:12 wait for copy end
00:09:59 Traceback (most recent call last):
...
Adding a few prints (see #255 for details) also shows that the time
spent in PQputCopyData increases, going from ~15 entries/sec processed
when the writer has just finished pushing data in the queue, down to ~4
items/sec towards the end.
Considering that a reduction of 10-20% of the input size causes a
decrease of the processing time of about 50%, there is definitely
something quadratic going on there. It might be possible to improve the
libpq, but for the moment it's better to try and coexist nicely with the
current state.
Daniele Varrazzo [Sun, 20 Mar 2022 00:32:25 +0000 (01:32 +0100)]
fix: fix loading of text arrays with dimension information
The dimension information is a prefix such as ``[0:2]=`` in front of the
array. We just discard it when loading to lists, because for Python they
are always 0-based.
fix: don't raise error accessing Cursor.description after COPY_OUT
COPY_OUT result advertises the number of columns but not their names (or
types). Use a surrogate name for description (which is more useful than
returning `None`, because at lest it tells how many columns were
emitted).
Daniele Varrazzo [Tue, 22 Feb 2022 03:02:13 +0000 (04:02 +0100)]
fix: Cancel query on Ctrl-C
On KeyboardInterrupt, send a cancel to the server and keep waiting for
the result of the cancel, which is expected to raise a QueryCanceled,
then re-raise KeyboardInterrupt.
Before this, the connection was left in ACTIVE state, so it couldn't be rolled
back.
Only fixed on sync connections. Left a failing test for async
connections; the test fails with an output from the script such as:
error ignored in rollback on <psycopg.AsyncConnection [ACTIVE] ...>:
sending query failed: another command is already in progress
Traceback (most recent call last):
File "<string>", line 27, in <module>
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
event_list = self._selector.select(timeout)
File "/usr/lib/python3.8/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
KeyboardInterrupt
And the except branch in `AsyncConnection.wait()` is not reached.
Daniele Varrazzo [Sat, 19 Feb 2022 17:19:31 +0000 (18:19 +0100)]
refactor(pool): update some debug logging
Drop debug logging on `MaintenanceTask.init()`, which is not
particularly useful. Add debug entries on tasks `run()` early bailout,
which is a more interesting condition to follow (and would have allowed
to spot #230 more easily).
Daniele Varrazzo [Sat, 19 Feb 2022 16:34:37 +0000 (17:34 +0100)]
fix(pool): set the open flag before starting the maintenance tasks
Failing to do so we might trigger the test in `MaintenanceTask.run()`
and find the pool closed, so discard the operation. It usually doesn't
happen, but with a few combination of IO operation it does happen: see
https://github.com/psycopg/psycopg/issues/230 for details.
Daniele Varrazzo [Sat, 19 Feb 2022 14:39:37 +0000 (15:39 +0100)]
test(pool): add test to show deadlock on logging
This deadlock seems reproducible at least on Python 3.8 and 3.10 on
Linux. It is caused by the logging statement in
``MaintenanceTask.__init__``: even just a ``print()`` there causes the lock.
See https://github.com/psycopg/psycopg/issues/230 for more details.
After getting more information about the matter and having convinced
myself that there is no need, except FOMO, to extend the copyright year
on source code, change all our entries to leave only the start year.
git ls-tree -r HEAD --name-only \
| xargs sed -i '/Copyright.*\(Varrazzo\|Psycopg\)/ s/-20..//'
This can help configuring connections to use PgBouncer. It is possible
to use the attribute in the pool kwargs, for instance, instead of using
the more complex configure callback.
Don't auto-skip pool test if import fails. This would miss serious
problems leading to the pool not being importable. If someone wants to
skip the pool tests they can use `-m 'not pool'` now.
Don't look up other modules objects in __del__ methods
The modules might have been already cleaned up during interpreted
shutdown. See <https://bugs.python.org/issue46256#msg409847> for an
explanation.
The stdlib guards against the same thing happening too. However they
take a reference in the function closure, which is stronger than what we
do. Doing so, on our strictly typed codebase, is a tedious chore, so, if
this is enough (it should be, according to the OP), I'm happy this way.
Add ConnectionTimeout subclass of OperationalError
To be used in the connection pool to detect timeout on connection.
Backported to Psycopg 3.0.8 to allow the pool 3.1 to work with it, at
least on diminished capacity (NullPool.connection() would time out only
for clients in the queue, not in case of new connection timeout).