From: Vicky Risk Date: Wed, 22 Aug 2018 23:33:33 +0000 (-0400) Subject: Delete 03-cache-algorithm X-Git-Tag: gitlab20_base~14 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=81c1d157da3be8c3b7aad493e7bfbe909f78f852;p=thirdparty%2Fkea.git Delete 03-cache-algorithm --- diff --git a/doc/design/resolver/03-cache-algorithm b/doc/design/resolver/03-cache-algorithm deleted file mode 100644 index 261d824a84..0000000000 --- a/doc/design/resolver/03-cache-algorithm +++ /dev/null @@ -1,256 +0,0 @@ -03-cache-algorithm - -Introduction ------------- -Cache performance may be important for the resolver. It might not be -critical. We need to research this. - -One key question is: given a specific cache hit rate, how much of an -impact does cache performance have? - -For example, if we have 90% cache hit rate, will we still be spending -most of our time in system calls or in looking things up in our cache? - -There are several ways we can consider figuring this out, including -measuring this in existing resolvers (BIND 9, Unbound) or modeling -with specific values. - -Once we know how critical the cache performance is, we can consider -which algorithm is best for that. If it is very critical, then a -custom algorithm designed for DNS caching makes sense. If it is not, -then we can consider using an STL-based data structure. - -Effectiveness of Cache ----------------------- - -First, I'll try to answer the introductory questions. - -In some simplified model, we can express the amount of running time -for answering queries directly from the cache in the total running -time including that used for recursive resolution due to cache miss as -follows: - -A = r*Q2*/(r*Q2+ Q1*(1-r)) -where -A: amount of time for answering queries from the cache per unit time - (such as sec, 0<=A<=1) -r: cache hit rate (0<=r<=1) -Q1: max qps of the server with 100% cache hit -Q2: max qps of the server with 0% cache hit - -Q1 can be measured easily for given data set; measuring Q2 is tricky -in general (it requires many external queries with unreliable -results), but we can still have some not-so-unrealistic numbers -through controlled simulation. - -As a data point for these values, see a previous experimental results -of mine: -https://lists.isc.org/pipermail/bind10-dev/2012-July/003628.html - -Looking at the "ideal" server implementation (no protocol overhead) -with the set up 90% and 85% cache hit rate with 1 recursion on cache -miss, and with the possible maximum total throughput, we can deduce -Q1 and Q2, which are: 170591qps and 60138qps respectively. - -This means, with 90% cache hit rate (r = 0.9), the server would spend -76% of its run time for receiving queries and answering responses -directly from the cache: 0.9*60138/(0.9*60138 + 0.1*170591) = 0.76. - -I also ran more realistic experiments: using BIND 9.9.2 and unbound -1.4.19 in the "forward only" mode with crafted query data and the -forwarded server to emulate the situation of 100% and 0% cache hit -rates. I then measured the max response throughput using a -queryperf-like tool. In both cases Q2 is about 28% of Q1 (I'm not -showing specific numbers to avoid unnecessary discussion about -specific performance of existing servers; it's out of scope of this -memo). Using Q2 = 0.28*Q1, above equation with 90% cache hit rate -will be: A = 0.9 * 0.28 / (0.9*0.28 + 0.1) = 0.716. So the server will -spend about 72% of its running time to answer queries directly from -the cache. - -Of course, these experimental results are too simplified. First, in -these experiments we assumed only one external query is needed on -cache miss. In general it can be more; however, it may not actually -be too optimistic either: in my another research result: -http://bind10.isc.org/wiki/ResolverPerformanceResearch -In the more detailed analysis using real query sample and tracing what -an actual resolver would do, it looked we'd need about 1.44 to 1.63 -external queries per cache miss in average. - -Still, of course, the real world cases are not that simple: in reality -we'd need to deal with timeouts, slower remote servers, unexpected -intermediate results, etc. DNSSEC validating resolvers will clearly -need to do more work. - -So, in the real world deployment Q2 should be much smaller than Q1. -Here are some specific cases of the relationship between Q1 and Q2 for -given A (assuming r = 0.9): - -70%: Q2 = 0.26 * Q1 -60%: Q2 = 0.17 * Q1 -50%: Q2 = 0.11 * Q1 - -So, even if "recursive resolution is 10 times heavier" than the cache -only case, we can assume the server spends a half of its run time for -answering queries directly from the cache at the cache hit rate of -90%. I think this is a reasonably safe assumption. - -Now, assuming the number of 50% or more, does this suggest we should -highly optimize the cache? Opinions may vary on this point, but I -personally think the answer is yes. I've written an experimental -cache only implementation that employs the idea of fully-rendered -cached data. On one test machine (2.20GHz AMD64, using a single -core), queryperf-like benchmark shows it can handle over 180Kqps, -while BIND 9.9.2 can just handle 41K qps. The experimental -implementation skips some necessary features for a production server, -and cache management itself is always inevitable bottleneck, so the -production version wouldn't be that fast, but it still suggests it may -not be very difficult to reach over 100Kqps in production environment -including recursive resolution overhead. - -Cache Types ------------ - -1. Record cache - -Conceptually, any recursive resolver (with cache) implementation would -have cache for RRs (or RRsets in the modern version of protocol) given -in responses to its external queries. In BIND 9, it's called the -"cached DB", using an in-memory rbt-like tree. unbound calls it -"rrset cache", which is implemented as a hash table. - -2. Delegation cache - -Recursive server implementations would also have cache to determine -the deepest zone cut for a given query name in the recursion process. -Neither BIND 9 nor unbound has a separate cache for this purpose; -basically they try to find an NR RRset from the "record cache" whose -owner name best matches the given query name. - -3. Remote server cache - -In addition, a recursive server implementation may maintain a cache -for information of remote authoritative servers. Both BIND 9 and -unbound conceptually have this type of cache, although there are some -non-negligible differences in details. BIND 9's implementation of -this cache is called ADB. Its a hash table whose key is domain name, -and each entry stores corresponding IPv6/v4 addresses; another data -structure for each address stores averaged RTT for the address, -lameness information, EDNS availability, etc. unbound's -implementation is called "infrastructure cache". It's a hash table -keyed with IP addresses whose entries store similar information as -that in BIND 9's per address ADB entry. In unbound a remote server's -address must be determined by looking up the record cache (rrset cache -in unbound terminology); unlike BIND 9's ADB, there's no direct -shortcut from a server's domain name to IP addresses. - -4. Full response cache - -unbound has an additional cache layer, called the "message cache". -It's a hash table whose hash key is query parameter (essentially qname -and type) and entry is a sequence to record (rrset) cache entries. -This sequence constructs a complete response to the corresponding -query, so it would help optimize building a response message skipping -the record cache for each section (answer/authority/additional) of the -response message. PowerDNS recursor has (seemingly) the same concept -called "packet cache" (but I don't know its implementation details -very much). - -BIND 9 doesn't have this type of cache; it always looks into the -record cache to build a complete response to a given query. - -Miscellaneous General Requirements ----------------------------------- - -- Minimize contention between threads (if threaded) -- Cache purge policy: normally only a very small part of cached DNS - information will be reused, and those reused are very heavily - reused. So LRU-like algorithm should generally work well, but we'll - also need to honor DNS TTL. - -Random Ideas for BIND 10 ------------------------- - -Below are specific random ideas for BIND 10. Some are based on -experimental results with reasonably realistic data; some others are -mostly a guess. - -1. Fully rendered response cache - -Some real world query samples show that a very small portion of entire -queries are very popular and queried very often and many times; the -rest is rarely reused, if any. Two different data sets show top -10,000 queries would cover around 80% of total queries, regardless -of the size of the total queries. This suggests an idea of having a -small, highly optimized full response cache. - -I tried this idea in the jinmei-l1cache branch. It's a hash table -keyed with a tuple of query name and type whose entry stores fully -rendered, wire-format response image (answer section only, assuming -the "minimal-responses" option). It also maintains offsets to each -RR, so it can easily update TTLs when necessary or rotate RRs if -optionally requested. If neither TTL adjustment nor RR rotation is -required, query handling is just to lookup the hash table and copy the -pre-rendered data. Experimental benchmark showed it ran vary fast; -more than 4 times faster than BIND 9, and even much faster than other -implementations that have full response cache (although, as usual, the -comparison is not entirely fair). - -Also, the cache size is quite small; the run time memory footprint of -this server process was just about 5MB. So, I think it's reasonable -to have each process/thread have their own copy of this cache to -completely eliminate contention. Also, if we can keep the cache size -this small, it would be easier to dump it to a file on shutdown and -reuse it on restart. This will be quite effective (if the downtime is -reasonably short) because the cached data are expected to be highly -popular. - -2. Record cache - -For the normal record cache, I don't have a particular idea beyond -something obvious, like a hash table to map from query parameters to -corresponding RRset (or negative information). But I guess this cache -should be shared by multiple threads. That will help reconstruct the -full response cache data on TTL expiration more efficiently. And, if -shared, the data structure should be chosen so that contention -overhead can be minimized. In general, I guess something like hash -tables is more suitable than tree-like structure in that sense. - -There's other points to discuss for this cache related to other types -of cache (see below). - -3. Separate delegation cache - -One thing I'm guessing is that it may make sense if we have a separate -cache structure for delegation data. It's conceptually a set of NS -RRs so we can identify the best (longest) matching one for a given -query name. - -Analysis of some sets of query data showed the vast majority of -end client's queries are for A and AAAA (not surprisingly). So, even -if we separate this cache from the record cache, the additional -overhead (both for memory and fetch) will probably (hopefully) be -marginal. Separating caches will also help reduce contention between -threads. It *might* also help improve lookup performance because this -can be optimized for longest match search. - -4. Remote server cache without involving the record cache - -Likewise, it may make sense to maintain the remote server cache -separately from the record cache. I guess these AAAA and A records -are rarely the queried by end clients, so, like the case of delegation -cache it's possible that the data sets are mostly disjoint. Also, for -this purpose the RRsets don't have to have higher trust rank (per -RFC2181 5.4.1): glue or additional are okay, and, by separating these -from the record cache, we can avoid accidental promotion of these data -to trustworthy answers and returning them to clients (BIND 9 had this -type of bugs before). - -Custom vs Existing Library (STL etc) ------------------------------------- - -It may have to be discussed, but I guess in many cases we end up -introducing custom implementation because these caches should be -highly performance sensitive, directly related to our core business, and -also have to be memory efficient. But in some sub-components we may -be able to benefit from existing generic libraries.