From: Eduard Bagdasaryan Date: Sat, 12 Jan 2019 10:05:30 +0000 (+0000) Subject: Fixed disker-to-worker queue overflows (#353) X-Git-Tag: M-staged-PR362~9 X-Git-Url: http://git.ipfire.org/gitweb/gitweb.cgi?a=commitdiff_plain;h=4ce035593f5257ee2deb6ca85c7b9a0c4b850f70;p=thirdparty%2Fsquid.git Fixed disker-to-worker queue overflows (#353) Before this fix, Squid sometimes logged the following error: BUG: Worker I/O pop queue for ... overflow: ... The bug could result in truncated hit responses, reduced hit ratio, and, combined with buggy lost I/O handling code (GitHub PR #352), even cache corruption. The bug could be triggered by the following sequence of events: * Disker dequeues one I/O request from the worker push queue. * Worker pushes more I/O requests to that disker, reaching 1024 requests in its push queue (QueueCapacity or just "N" below). No overflow here! * Worker process is suspended (or is just too busy to pop I/O results). * Disker satisfies all 1+N requests, adding each to the worker pop queue and overflows that queue when adding the last processed request. This fix limits worker push so that the sum of all pending requests never exceeds (pop) queue capacity. This approach will continue to work even if diskers are enhanced to dequeue multiple requests for seek optimization and/or priority-based scheduling. Pop queue and push queue can still accommodate N requests each. The fix appears to reduce supported disker "concurrency" levels from 2N down to N pending I/O requests, reducing queue memory utilization. However, the actual reduction is from N+1 to N: Since a worker pops all its satisfied requests before queuing a new one, there could never be more than N+1 pending requests (N in the push queue and 1 worked on by the disker). We left the BUG reporting and handling intact. There are no known bugs in that code now. If the bug never surfaces again, it can be replaced with code that translates low-level queue overflow exception into a user-friendly TextException. --- diff --git a/src/DiskIO/IpcIo/IpcIoFile.cc b/src/DiskIO/IpcIo/IpcIoFile.cc index 735cf0f09f..95d6d8b77f 100644 --- a/src/DiskIO/IpcIo/IpcIoFile.cc +++ b/src/DiskIO/IpcIo/IpcIoFile.cc @@ -364,6 +364,10 @@ IpcIoFile::push(IpcIoPendingRequest *const pending) debugs(47, 7, HERE << "pushing " << SipcIo(KidIdentifier, ipcIo, diskId)); + // protect DiskerHandleRequest() from pop queue overflow + if (pendingRequests() >= QueueCapacity) + throw Ipc::OneToOneUniQueue::Full(); + if (queue->push(diskId, ipcIo)) Notify(diskId); // must notify disker trackPendingRequest(ipcIo.requestId, pending); @@ -879,10 +883,8 @@ IpcIoFile::DiskerHandleRequest(const int workerId, IpcIoMsg &ipcIo) if (queue->push(workerId, ipcIo)) Notify(workerId); // must notify worker } catch (const Queue::Full &) { - // The worker queue should not overflow because the worker should pop() - // before push()ing and because if disker pops N requests at a time, - // we should make sure the worker pop() queue length is the worker - // push queue length plus N+1. XXX: implement the N+1 difference. + // The worker pop queue should not overflow because the worker can + // push only if pendingRequests() is less than QueueCapacity. debugs(47, DBG_IMPORTANT, "BUG: Worker I/O pop queue for " << DbName << " overflow: " << SipcIo(workerId, ipcIo, KidIdentifier)); // TODO: report queue len diff --git a/src/DiskIO/IpcIo/IpcIoFile.h b/src/DiskIO/IpcIo/IpcIoFile.h index f89197ed82..53a3749e3d 100644 --- a/src/DiskIO/IpcIo/IpcIoFile.h +++ b/src/DiskIO/IpcIo/IpcIoFile.h @@ -55,6 +55,8 @@ public: class IpcIoPendingRequest; +/// In a worker process, represents a single (remote) cache_dir disker file. +/// In a disker process, used as a bunch of static methods handling that file. class IpcIoFile: public DiskFile { CBDATA_CLASS(IpcIoFile); @@ -98,6 +100,10 @@ private: void push(IpcIoPendingRequest *const pending); IpcIoPendingRequest *dequeueRequest(const unsigned int requestId); + /// the total number of I/O requests in push queue and pop queue + /// (but no, the implementation does not add push and pop queue sizes) + size_t pendingRequests() const { return olderRequests->size() + newerRequests->size(); } + static void Notify(const int peerId); static void OpenTimeout(void *const param);