]>
Commit | Line | Data |
---|---|---|
9c6fd700 GKH |
1 | From e658a6f14d7c0243205f035979d0ecf6c12a036f Mon Sep 17 00:00:00 2001 |
2 | From: Chris Metcalf <cmetcalf@mellanox.com> | |
3 | Date: Wed, 16 Nov 2016 11:18:05 -0500 | |
4 | Subject: tile: avoid using clocksource_cyc2ns with absolute cycle count | |
5 | ||
6 | From: Chris Metcalf <cmetcalf@mellanox.com> | |
7 | ||
8 | commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream. | |
9 | ||
10 | For large values of "mult" and long uptimes, the intermediate | |
11 | result of "cycles * mult" can overflow 64 bits. For example, | |
12 | the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; | |
13 | we have mult = 853, and after 208.5 days, we overflow 64 bits. | |
14 | ||
15 | Since clocksource_cyc2ns() is intended to be used for relative | |
16 | cycle counts, not absolute cycle counts, performance is more | |
17 | importance than accepting a wider range of cycle values. So, | |
18 | just use mult_frac() directly in tile's sched_clock(). | |
19 | ||
20 | Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow | |
21 | in sched_clock") by Salman Qazi results in essentially the same | |
22 | generated code for x86 as this change does for tile. In fact, | |
23 | a follow-on change by Salman introduced mult_frac() and switched | |
24 | to using it, so the C code was largely identical at that point too. | |
25 | ||
26 | Peter Zijlstra then added mul_u64_u32_shr() and switched x86 | |
27 | to use it. This is, in principle, better; by optimizing the | |
28 | 64x64->64 multiplies to be 32x32->64 multiplies we can potentially | |
29 | save some time. However, the compiler piplines the 64x64->64 | |
30 | multiplies pretty well, and the conditional branch in the generic | |
31 | mul_u64_u32_shr() causes some bubbles in execution, with the | |
32 | result that it's pretty much a wash. If tilegx provided its own | |
33 | implementation of mul_u64_u32_shr() without the conditional branch, | |
34 | we could potentially save 3 cycles, but that seems like small gain | |
35 | for a fair amount of additional build scaffolding; no other platform | |
36 | currently provides a mul_u64_u32_shr() override, and tile doesn't | |
37 | currently have an <asm/div64.h> header to put the override in. | |
38 | ||
39 | Additionally, gcc currently has an optimization bug that prevents | |
40 | it from recognizing the opportunity to use a 32x32->64 multiply, | |
41 | and so the result would be no better than the existing mult_frac() | |
42 | until such time as the compiler is fixed. | |
43 | ||
44 | For now, just using mult_frac() seems like the right answer. | |
45 | ||
46 | Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> | |
47 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
48 | ||
49 | --- | |
50 | arch/tile/kernel/time.c | 4 ++-- | |
51 | 1 file changed, 2 insertions(+), 2 deletions(-) | |
52 | ||
53 | --- a/arch/tile/kernel/time.c | |
54 | +++ b/arch/tile/kernel/time.c | |
55 | @@ -218,8 +218,8 @@ void do_timer_interrupt(struct pt_regs * | |
56 | */ | |
57 | unsigned long long sched_clock(void) | |
58 | { | |
59 | - return clocksource_cyc2ns(get_cycles(), | |
60 | - sched_clock_mult, SCHED_CLOCK_SHIFT); | |
61 | + return mult_frac(get_cycles(), | |
62 | + sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT); | |
63 | } | |
64 | ||
65 | int setup_profiling_timer(unsigned int multiplier) |