--- /dev/null
+From 99d263d4c5b2f541dfacb5391e22e8c91ea982a6 Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Sat, 13 Sep 2014 11:30:10 -0700
+Subject: vfs: fix bad hashing of dentries
+
+From: Linus Torvalds <torvalds@linux-foundation.org>
+
+commit 99d263d4c5b2f541dfacb5391e22e8c91ea982a6 upstream.
+
+Josef Bacik found a performance regression between 3.2 and 3.10 and
+narrowed it down to commit bfcfaa77bdf0 ("vfs: use 'unsigned long'
+accesses for dcache name comparison and hashing"). He reports:
+
+ "The test case is essentially
+
+ for (i = 0; i < 1000000; i++)
+ mkdir("a$i");
+
+ On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
+ dir/sec with 3.10. This is because we spend waaaaay more time in
+ __d_lookup on 3.10 than in 3.2.
+
+ The new hashing function for strings is suboptimal for <
+ sizeof(unsigned long) string names (and hell even > sizeof(unsigned
+ long) string names that I've tested). I broke out the old hashing
+ function and the new one into a userspace helper to get real numbers
+ and this is what I'm getting:
+
+ Old hash table had 1000000 entries, 0 dupes, 0 max dupes
+ New hash table had 12628 entries, 987372 dupes, 900 max dupes
+ We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash
+
+ My test does the hash, and then does the d_hash into a integer pointer
+ array the same size as the dentry hash table on my system, and then
+ just increments the value at the address we got to see how many
+ entries we overlap with.
+
+ As you can see the old hash function ended up with all 1 million
+ entries in their own bucket, whereas the new one they are only
+ distributed among ~12.5k buckets, which is why we're using so much
+ more CPU in __d_lookup".
+
+The reason for this hash regression is two-fold:
+
+ - On 64-bit architectures the down-mixing of the original 64-bit
+ word-at-a-time hash into the final 32-bit hash value is very
+ simplistic and suboptimal, and just adds the two 32-bit parts
+ together.
+
+ In particular, because there is no bit shuffling and the mixing
+ boundary is also a byte boundary, similar character patterns in the
+ low and high word easily end up just canceling each other out.
+
+ - the old byte-at-a-time hash mixed each byte into the final hash as it
+ hashed the path component name, resulting in the low bits of the hash
+ generally being a good source of hash data. That is not true for the
+ word-at-a-time case, and the hash data is distributed among all the
+ bits.
+
+The fix is the same in both cases: do a better job of mixing the bits up
+and using as much of the hash data as possible. We already have the
+"hash_32|64()" functions to do that.
+
+Reported-by: Josef Bacik <jbacik@fb.com>
+Cc: Al Viro <viro@zeniv.linux.org.uk>
+Cc: Christoph Hellwig <hch@infradead.org>
+Cc: Chris Mason <clm@fb.com>
+Cc: linux-fsdevel@vger.kernel.org
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/dcache.c | 3 +--
+ fs/namei.c | 4 ++--
+ 2 files changed, 3 insertions(+), 4 deletions(-)
+
+--- a/fs/dcache.c
++++ b/fs/dcache.c
+@@ -108,8 +108,7 @@ static inline struct hlist_bl_head *d_ha
+ unsigned int hash)
+ {
+ hash += (unsigned long) parent / L1_CACHE_BYTES;
+- hash = hash + (hash >> D_HASHBITS);
+- return dentry_hashtable + (hash & D_HASHMASK);
++ return dentry_hashtable + hash_32(hash, D_HASHBITS);
+ }
+
+ /* Statistics gathering. */
+--- a/fs/namei.c
++++ b/fs/namei.c
+@@ -34,6 +34,7 @@
+ #include <linux/device_cgroup.h>
+ #include <linux/fs_struct.h>
+ #include <linux/posix_acl.h>
++#include <linux/hash.h>
+ #include <asm/uaccess.h>
+
+ #include "internal.h"
+@@ -1647,8 +1648,7 @@ static inline int can_lookup(struct inod
+
+ static inline unsigned int fold_hash(unsigned long hash)
+ {
+- hash += hash >> (8*sizeof(int));
+- return hash;
++ return hash_64(hash, 32);
+ }
+
+ #else /* 32-bit case */