From: Alex Rousskov Date: Sat, 26 Apr 2014 17:30:33 +0000 (-0600) Subject: Document counter-intuitive round-robin cache_dir selection bias; decrease it. X-Git-Tag: SQUID_3_5_0_1~170^2~5 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=29a238a3a24c7776da9a6904824f86980a40c2de;p=thirdparty%2Fsquid.git Document counter-intuitive round-robin cache_dir selection bias; decrease it. Many squid.confs have at least two groups of cache_dir lines. For example, rare "large" objects go to larger/slower HDDs while popular "small" objects go to smaller/fast SSDs: # HDDs cache_dir rock /hdd1 ... min-size=large cache_dir rock /hdd2 ... min-size=large cache_dir rock /hdd3 ... min-size=large # SSDs cache_dir rock /ssd1 ... max-size=large-1 cache_dir rock /ssd2 ... max-size=large-1 cache_dir rock /ssd3 ... max-size=large-1 # rock store does not support least-load yet store_dir_select_algorithm round-robin Since round-robin selects the first suitable disk during a sequential scan, the probability of /hdd1 (/ssd1) selection is higher than other HDDs (SSDs). Consider a large object that needs an HDD: /hdd1 is selected whenever scan starts at /ssd1, /ssd2, /ssd3, and /hdd1 while /hdd2 is selected only when the scan starts at /hdd2 itself! Documentation now warns against the above cache_dir configuration approach and suggests to interleave cache_dirs: cache_dir rock /hdd1 ... min-size=large cache_dir rock /ssd1 ... max-size=large-1 cache_dir rock /hdd2 ... min-size=large cache_dir rock /ssd2 ... max-size=large-1 cache_dir rock /hdd3 ... min-size=large cache_dir rock /ssd3 ... max-size=large-1 store_dir_select_algorithm round-robin Squid's historic implementation of round-robin made its natural bias even worse because it made the starting point of the sequential scan to be the last selected dir. In the first configuration example above, it boosted the probability that the scan will start at one of the SSDs because smaller objects are more popular and, hence, their dirs are selected more often. With the starting point usually at an SSD, even more _large_ objects were sent to /hdd1 compared to other HDDs! The code change avoids this artificial boost (but the cache_dir lines should still be interleaved to avoid the natural round-robin bias discussed earlier). --- diff --git a/src/cf.data.pre b/src/cf.data.pre index 194ee1696a..07851c60e7 100644 --- a/src/cf.data.pre +++ b/src/cf.data.pre @@ -3583,6 +3583,19 @@ DOC_START disks. This algorithm does not spread objects by size, so any I/O loading per-disk may appear very unbalanced and volatile. + If several cache_dirs use similar min-size, max-size, or other + limits to to reject certain responses, then do not group such + cache_dir lines together, to avoid round-robin selection bias + towards the first cache_dir after the group. Instead, interleave + cache_dir lines from different groups. For example: + + store_dir_select_algorithm round-robin + cache_dir rock /hdd1 ... min-size=100000 + cache_dir rock /ssd1 ... max-size=99999 + cache_dir rock /hdd2 ... min-size=100000 + cache_dir rock /ssd2 ... max-size=99999 + cache_dir rock /hdd3 ... min-size=100000 + cache_dir rock /ssd3 ... max-size=99999 DOC_END NAME: max_open_disk_fds diff --git a/src/store_dir.cc b/src/store_dir.cc index 8d124736b8..28d1c32a1f 100644 --- a/src/store_dir.cc +++ b/src/store_dir.cc @@ -207,22 +207,22 @@ SwapDir::objectSizeIsAcceptable(int64_t objsize) const static int storeDirSelectSwapDirRoundRobin(const StoreEntry * e) { - static int dirn = 0; - int i; - int load; - RefCount sd; - // e->objectLen() is negative at this point when we are still STORE_PENDING ssize_t objsize = e->mem_obj->expectedReplySize(); if (objsize != -1) objsize += e->mem_obj->swap_hdr_sz; - for (i = 0; i < Config.cacheSwap.n_configured; ++i) { - if (++dirn >= Config.cacheSwap.n_configured) - dirn = 0; + // Increment the first candidate once per selection (not once per + // iteration) to reduce bias when some disk(s) attract more entries. + static int firstCandidate = 0; + if (++firstCandidate >= Config.cacheSwap.n_configured) + firstCandidate = 0; - sd = dynamic_cast(INDEXSD(dirn)); + for (int i = 0; i < Config.cacheSwap.n_configured; ++i) { + const int dirn = (firstCandidate + i) % Config.cacheSwap.n_configured; + const SwapDir *sd = dynamic_cast(INDEXSD(dirn)); + int load = 0; if (!sd->canStore(*e, objsize, load)) continue;