git.ipfire.org Git - thirdparty/squid.git/commit

Do not duplicate free disk slots on diskers restart (#731)

When a disker process starts, it scans the on-disk storage to populate
shared-memory indexes of cached entries and unused/free slots. This
process may take more than ten minutes for large caches. Squid workers
use these indexes as they are being populated by diskers - workers do
not wait for the slow index rebuild process to finish. Cached entries
can be retrieved and misses can be cached almost immediately.

The disker does not "lock" the free slots to itself because the disker
does not want to preclude workers from caching new entries while the
disker is scanning the rock storage to build a complete index of old
cached entries (and free slots). The disker knows that it shares the
disk slot index with workers and is careful to populate the indexes
without confusing workers.

However, if the disker process is restarted for any reason (e.g., a
crash or kid registration timeout), the disker starts scanning its
on-disk storage from the beginning, adding to the indexes that already
contain some entries (added by the first disker incarnation and adjusted
by workers). An attempt to index the same cached object twice may remove
that object. Such a removal would be wasteful but not dangerous.
Indexing a free/unused slot twice can be disastrous:

* If Squid is lucky, the disker quickly hits an assertion (or a fatal
  exception) when trying to add the already free slot to the free slot
  collection, as long as no worker starts using the free slot between
  additions (detailed in the next bullet).

* Unfortunately, there is also a good chance that a worker starts using
  the free slot before the (restarted) disker adds it the second time.
  In this case, the "double free" event cannot be detected. Both free
  slot copies (pointing to the same disk location) will eventually be
  used by a worker to cache new objects. In the worst case, it may lead
  to completely wrong cached response content being served to an
  unsuspecting user. The risk is partially mitigated by the fact that
  disker crashes/restarts are rare.

Now, if a disker did not finish indexing before being restarted, it
resumes from the next db slot, thus avoiding indexing the same slot
twice. In other words, the disker forgets/ignores all the slots scanned
prior to the restart. Squid logs "Resuming indexing cache_dir..."
instead of the usual "Loading cache_dir..." to mark these (hopefully
rare) occurrences.

Also simplified code that delays post-indexing revalidation of cache
entries (i.e. store_dirs_rebuilding hacks). We touched that code because
the updated rock code will now refuse to reindex the already indexed
cache_dir. That decision relies on shared memory info and should not be
made where the old code was fiddling with store_dirs_rebuilding level.
After several attempts resulted in subtle bugs, we decided to simplify
that hack to reduce the risks of mismanaging store_dirs_rebuilding.

Adjusted old level-1 "Store rebuilding is ... complete" messages to
report more details (especially useful when rebuilding kid crashes). The
code now also reports some of the "unknown rebuild goal" UFS cases
better, but more work is needed in that area.

Also updated several rebuild-related counters to use int64_t instead of
int. Those changes stemmed from the need to add a new counter
(StoreRebuildData::validations), and we did not want to add an int
counter that will sooner or later overflow (especially when counting db
slots (across all cache_dirs) rather than just cache entries (from one
cache_dir)). That new counter interacted with several others, so we
had to update them as well. Long-term, all old StoreRebuildData counters
and the cache_dir code feeding them should be updated/revised.

author	Eduard Bagdasaryan <eduard.bagdasaryan@measurement-factory.com>
	Fri, 9 Oct 2020 16:34:24 +0000 (16:34 +0000)
committer	Squid Anubis <squid-anubis@squid-cache.org>
	Tue, 13 Oct 2020 21:24:21 +0000 (21:24 +0000)
commit	8ecbe78dd0b13a6a23bfaa066a120ec17342813d
tree	fab96db97fc9f8870e514991703b6b59c619e336	tree \| snapshot
parent	6ef748c6a2094b67c2a3114f96c660068afbb1d1	commit \| diff

src/fs/rock/RockRebuild.cc		diff \| blob \| blame \| history
src/fs/rock/RockRebuild.h		diff \| blob \| blame \| history
src/fs/rock/RockSwapDir.cc		diff \| blob \| blame \| history
src/fs/rock/RockSwapDir.h		diff \| blob \| blame \| history
src/fs/ufs/RebuildState.cc		diff \| blob \| blame \| history
src/fs/ufs/UFSSwapDir.cc		diff \| blob \| blame \| history
src/ipc/StoreMap.h		diff \| blob \| blame \| history
src/ipc/mem/Pointer.h		diff \| blob \| blame \| history
src/ipc/mem/Segment.cc		diff \| blob \| blame \| history
src/ipc/mem/Segment.h		diff \| blob \| blame \| history
src/store/Disks.cc		diff \| blob \| blame \| history
src/store_rebuild.cc		diff \| blob \| blame \| history
src/store_rebuild.h		diff \| blob \| blame \| history
src/tests/stub_store_rebuild.cc		diff \| blob \| blame \| history