git.ipfire.org Git - thirdparty/psycopg.git/commit

author	Daniele Varrazzo <daniele.varrazzo@gmail.com>
	Fri, 25 Mar 2022 15:57:28 +0000 (16:57 +0100)
committer	Daniele Varrazzo <daniele.varrazzo@gmail.com>
	Sat, 26 Mar 2022 00:17:25 +0000 (01:17 +0100)
commit	1a1395c448985f285ba2cf8924153df06bd23303
tree	eb6b501339dde6df5220c1ae3758da2b7bfe5685	tree
parent	5ccdf4d01cdfb6c8b62a1850b220d48a080f165f	commit \| diff

fix(copy): chunk large buffers before queuing, not after

This is conceptually a better place to do it, because the queue has the
function of applying backpressure on the data generator. Splitting large
buffers later would flood the libpq without effectively slowing down the
producer.

Also, reduce the size of the chunks appended to the libpq from 1Mb to
128K. This makes an *immense* difference: the too large chunk probably
triggers some quadraticity in the libpq. The test script found in #255,
piped in `ts -s`, shows that pushing a block of data of about 1Gb size
(which will fail in Postgres anyway), with the smaller size, will take
about 9s. With the larger size, it takes 4.10m to get to waiting for
PQputCopyEnd, and other almost 6 minutes to receive the error message
from the server.

    00:00:47 putting 1048576 (or less) bytes in queue size 1023
    00:00:47 writing copy end
    00:00:47 got 1048576 bytes from queue size 1023
    ...
    00:01:25 got 1048576 bytes from queue size 640
    ...
    00:01:54 got 1048576 bytes from queue size 512
    ...
    00:03:00 got 1048576 bytes from queue size 256
    ...
    00:04:12 got 0 bytes from queue size 0
    00:04:12 wait for copy end
    00:09:59 Traceback (most recent call last):
    ...

Adding a few prints (see #255 for details) also shows that the time
spent in PQputCopyData increases, going from ~15 entries/sec processed
when the writer has just finished pushing data in the queue, down to ~4
items/sec towards the end.

Considering that a reduction of 10-20% of the input size causes a
decrease of the processing time of about 50%, there is definitely
something quadratic going on there. It might be possible to improve the
libpq, but for the moment it's better to try and coexist nicely with the
current state.

psycopg/psycopg/copy.py		diff \| blob \| blame \| history
psycopg/psycopg/generators.py		diff \| blob \| blame \| history
tests/test_copy.py		diff \| blob \| blame \| history
tests/test_copy_async.py		diff \| blob \| blame \| history