Before this fix, Squid sometimes logged the following error:
BUG: Worker I/O pop queue for ... overflow: ...
The bug could result in truncated hit responses, reduced hit ratio, and,
combined with buggy lost I/O handling code (GitHub PR #352), even cache
corruption.
The bug could be triggered by the following sequence of events:
* Disker dequeues one I/O request from the worker push queue.
* Worker pushes more I/O requests to that disker, reaching 1024 requests
in its push queue (QueueCapacity or just "N" below). No overflow here!
* Worker process is suspended (or is just too busy to pop I/O results).
* Disker satisfies all 1+N requests, adding each to the worker pop queue
and overflows that queue when adding the last processed request.
This fix limits worker push so that the sum of all pending requests
never exceeds (pop) queue capacity. This approach will continue to work
even if diskers are enhanced to dequeue multiple requests for seek
optimization and/or priority-based scheduling.
Pop queue and push queue can still accommodate N requests each. The fix
appears to reduce supported disker "concurrency" levels from 2N down to
N pending I/O requests, reducing queue memory utilization. However, the
actual reduction is from N+1 to N: Since a worker pops all its satisfied
requests before queuing a new one, there could never be more than N+1
pending requests (N in the push queue and 1 worked on by the disker).
We left the BUG reporting and handling intact. There are no known bugs
in that code now. If the bug never surfaces again, it can be replaced
with code that translates low-level queue overflow exception into a
user-friendly TextException.