fix(copy): chunk large buffers before queuing, not after
This is conceptually a better place to do it, because the queue has the
function of applying backpressure on the data generator. Splitting large
buffers later would flood the libpq without effectively slowing down the
producer.
Also, reduce the size of the chunks appended to the libpq from 1Mb to
128K. This makes an *immense* difference: the too large chunk probably
triggers some quadraticity in the libpq. The test script found in #255,
piped in `ts -s`, shows that pushing a block of data of about 1Gb size
(which will fail in Postgres anyway), with the smaller size, will take
about 9s. With the larger size, it takes 4.10m to get to waiting for
PQputCopyEnd, and other almost 6 minutes to receive the error message
from the server.
00:00:47 putting
1048576 (or less) bytes in queue size 1023
00:00:47 writing copy end
00:00:47 got
1048576 bytes from queue size 1023
...
00:01:25 got
1048576 bytes from queue size 640
...
00:01:54 got
1048576 bytes from queue size 512
...
00:03:00 got
1048576 bytes from queue size 256
...
00:04:12 got 0 bytes from queue size 0
00:04:12 wait for copy end
00:09:59 Traceback (most recent call last):
...
Adding a few prints (see #255 for details) also shows that the time
spent in PQputCopyData increases, going from ~15 entries/sec processed
when the writer has just finished pushing data in the queue, down to ~4
items/sec towards the end.
Considering that a reduction of 10-20% of the input size causes a
decrease of the processing time of about 50%, there is definitely
something quadratic going on there. It might be possible to improve the
libpq, but for the moment it's better to try and coexist nicely with the
current state.