]>
Commit | Line | Data |
---|---|---|
a1cbf3d1 GKH |
1 | From 3f3295709edea6268ff1609855f498035286af73 Mon Sep 17 00:00:00 2001 |
2 | From: Peter Zijlstra <peterz@infradead.org> | |
3 | Date: Fri, 17 Nov 2017 15:28:04 -0800 | |
4 | Subject: lib/int_sqrt: optimize small argument | |
5 | ||
6 | From: Peter Zijlstra <peterz@infradead.org> | |
7 | ||
8 | commit 3f3295709edea6268ff1609855f498035286af73 upstream. | |
9 | ||
10 | The current int_sqrt() computation is sub-optimal for the case of small | |
11 | @x. Which is the interesting case when we're going to do cumulative | |
12 | distribution functions on idle times, which we assume to be a random | |
13 | variable, where the target residency of the deepest idle state gives an | |
14 | upper bound on the variable (5e6ns on recent Intel chips). | |
15 | ||
16 | In the case of small @x, the compute loop: | |
17 | ||
18 | while (m != 0) { | |
19 | b = y + m; | |
20 | y >>= 1; | |
21 | ||
22 | if (x >= b) { | |
23 | x -= b; | |
24 | y += m; | |
25 | } | |
26 | m >>= 2; | |
27 | } | |
28 | ||
29 | can be reduced to: | |
30 | ||
31 | while (m > x) | |
32 | m >>= 2; | |
33 | ||
34 | Because y==0, b==m and until x>=m y will remain 0. | |
35 | ||
36 | And while this is computationally equivalent, it runs much faster | |
37 | because there's less code, in particular less branches. | |
38 | ||
39 | cycles: branches: branch-misses: | |
40 | ||
41 | OLD: | |
42 | ||
43 | hot: 45.109444 +- 0.044117 44.333392 +- 0.002254 0.018723 +- 0.000593 | |
44 | cold: 187.737379 +- 0.156678 44.333407 +- 0.002254 6.272844 +- 0.004305 | |
45 | ||
46 | PRE: | |
47 | ||
48 | hot: 67.937492 +- 0.064124 66.999535 +- 0.000488 0.066720 +- 0.001113 | |
49 | cold: 232.004379 +- 0.332811 66.999527 +- 0.000488 6.914634 +- 0.006568 | |
50 | ||
51 | POST: | |
52 | ||
53 | hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681 | |
54 | cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219 | |
55 | ||
56 | Averages computed over all values <128k using a LFSR to generate order. | |
57 | Cold numbers have a LFSR based branch trace buffer 'confuser' ran between | |
58 | each int_sqrt() invocation. | |
59 | ||
60 | Link: http://lkml.kernel.org/r/20171020164644.876503355@infradead.org | |
61 | Fixes: 30493cc9dddb ("lib/int_sqrt.c: optimize square root algorithm") | |
62 | Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> | |
63 | Suggested-by: Anshul Garg <aksgarg1989@gmail.com> | |
64 | Acked-by: Linus Torvalds <torvalds@linux-foundation.org> | |
65 | Cc: Davidlohr Bueso <dave@stgolabs.net> | |
66 | Cc: Thomas Gleixner <tglx@linutronix.de> | |
67 | Cc: Ingo Molnar <mingo@kernel.org> | |
68 | Cc: Will Deacon <will.deacon@arm.com> | |
69 | Cc: Joe Perches <joe@perches.com> | |
70 | Cc: David Miller <davem@davemloft.net> | |
71 | Cc: Matthew Wilcox <mawilcox@microsoft.com> | |
72 | Cc: Kees Cook <keescook@chromium.org> | |
73 | Cc: Michael Davidson <md@google.com> | |
74 | Signed-off-by: Andrew Morton <akpm@linux-foundation.org> | |
75 | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | |
76 | Signed-off-by: Arnd Bergmann <arnd@arndb.de> | |
77 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
78 | ||
79 | --- | |
80 | lib/int_sqrt.c | 3 +++ | |
81 | 1 file changed, 3 insertions(+) | |
82 | ||
83 | --- a/lib/int_sqrt.c | |
84 | +++ b/lib/int_sqrt.c | |
85 | @@ -22,6 +22,9 @@ unsigned long int_sqrt(unsigned long x) | |
86 | return x; | |
87 | ||
88 | m = 1UL << (BITS_PER_LONG - 2); | |
89 | + while (m > x) | |
90 | + m >>= 2; | |
91 | + | |
92 | while (m != 0) { | |
93 | b = y + m; | |
94 | y >>= 1; |