]>
Commit | Line | Data |
---|---|---|
43d7a61c GKH |
1 | From 99d263d4c5b2f541dfacb5391e22e8c91ea982a6 Mon Sep 17 00:00:00 2001 |
2 | From: Linus Torvalds <torvalds@linux-foundation.org> | |
3 | Date: Sat, 13 Sep 2014 11:30:10 -0700 | |
4 | Subject: vfs: fix bad hashing of dentries | |
5 | ||
6 | From: Linus Torvalds <torvalds@linux-foundation.org> | |
7 | ||
8 | commit 99d263d4c5b2f541dfacb5391e22e8c91ea982a6 upstream. | |
9 | ||
10 | Josef Bacik found a performance regression between 3.2 and 3.10 and | |
11 | narrowed it down to commit bfcfaa77bdf0 ("vfs: use 'unsigned long' | |
12 | accesses for dcache name comparison and hashing"). He reports: | |
13 | ||
14 | "The test case is essentially | |
15 | ||
16 | for (i = 0; i < 1000000; i++) | |
17 | mkdir("a$i"); | |
18 | ||
19 | On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k | |
20 | dir/sec with 3.10. This is because we spend waaaaay more time in | |
21 | __d_lookup on 3.10 than in 3.2. | |
22 | ||
23 | The new hashing function for strings is suboptimal for < | |
24 | sizeof(unsigned long) string names (and hell even > sizeof(unsigned | |
25 | long) string names that I've tested). I broke out the old hashing | |
26 | function and the new one into a userspace helper to get real numbers | |
27 | and this is what I'm getting: | |
28 | ||
29 | Old hash table had 1000000 entries, 0 dupes, 0 max dupes | |
30 | New hash table had 12628 entries, 987372 dupes, 900 max dupes | |
31 | We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash | |
32 | ||
33 | My test does the hash, and then does the d_hash into a integer pointer | |
34 | array the same size as the dentry hash table on my system, and then | |
35 | just increments the value at the address we got to see how many | |
36 | entries we overlap with. | |
37 | ||
38 | As you can see the old hash function ended up with all 1 million | |
39 | entries in their own bucket, whereas the new one they are only | |
40 | distributed among ~12.5k buckets, which is why we're using so much | |
41 | more CPU in __d_lookup". | |
42 | ||
43 | The reason for this hash regression is two-fold: | |
44 | ||
45 | - On 64-bit architectures the down-mixing of the original 64-bit | |
46 | word-at-a-time hash into the final 32-bit hash value is very | |
47 | simplistic and suboptimal, and just adds the two 32-bit parts | |
48 | together. | |
49 | ||
50 | In particular, because there is no bit shuffling and the mixing | |
51 | boundary is also a byte boundary, similar character patterns in the | |
52 | low and high word easily end up just canceling each other out. | |
53 | ||
54 | - the old byte-at-a-time hash mixed each byte into the final hash as it | |
55 | hashed the path component name, resulting in the low bits of the hash | |
56 | generally being a good source of hash data. That is not true for the | |
57 | word-at-a-time case, and the hash data is distributed among all the | |
58 | bits. | |
59 | ||
60 | The fix is the same in both cases: do a better job of mixing the bits up | |
61 | and using as much of the hash data as possible. We already have the | |
62 | "hash_32|64()" functions to do that. | |
63 | ||
64 | Reported-by: Josef Bacik <jbacik@fb.com> | |
65 | Cc: Al Viro <viro@zeniv.linux.org.uk> | |
66 | Cc: Christoph Hellwig <hch@infradead.org> | |
67 | Cc: Chris Mason <clm@fb.com> | |
68 | Cc: linux-fsdevel@vger.kernel.org | |
69 | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | |
70 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
71 | ||
72 | --- | |
73 | fs/dcache.c | 3 +-- | |
74 | fs/namei.c | 4 ++-- | |
75 | 2 files changed, 3 insertions(+), 4 deletions(-) | |
76 | ||
77 | --- a/fs/dcache.c | |
78 | +++ b/fs/dcache.c | |
79 | @@ -106,8 +106,7 @@ static inline struct hlist_bl_head *d_ha | |
80 | unsigned int hash) | |
81 | { | |
82 | hash += (unsigned long) parent / L1_CACHE_BYTES; | |
83 | - hash = hash + (hash >> d_hash_shift); | |
84 | - return dentry_hashtable + (hash & d_hash_mask); | |
85 | + return dentry_hashtable + hash_32(hash, d_hash_shift); | |
86 | } | |
87 | ||
88 | /* Statistics gathering. */ | |
89 | --- a/fs/namei.c | |
90 | +++ b/fs/namei.c | |
91 | @@ -34,6 +34,7 @@ | |
92 | #include <linux/device_cgroup.h> | |
93 | #include <linux/fs_struct.h> | |
94 | #include <linux/posix_acl.h> | |
95 | +#include <linux/hash.h> | |
96 | #include <asm/uaccess.h> | |
97 | ||
98 | #include "internal.h" | |
99 | @@ -1629,8 +1630,7 @@ static inline int nested_symlink(struct | |
100 | ||
101 | static inline unsigned int fold_hash(unsigned long hash) | |
102 | { | |
103 | - hash += hash >> (8*sizeof(int)); | |
104 | - return hash; | |
105 | + return hash_64(hash, 32); | |
106 | } | |
107 | ||
108 | #else /* 32-bit case */ |