]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blame - releases/4.4.36/tile-avoid-using-clocksource_cyc2ns-with-absolute-cycle-count.patch
4.9-stable patches
[thirdparty/kernel/stable-queue.git] / releases / 4.4.36 / tile-avoid-using-clocksource_cyc2ns-with-absolute-cycle-count.patch
CommitLineData
9c6fd700
GKH
1From e658a6f14d7c0243205f035979d0ecf6c12a036f Mon Sep 17 00:00:00 2001
2From: Chris Metcalf <cmetcalf@mellanox.com>
3Date: Wed, 16 Nov 2016 11:18:05 -0500
4Subject: tile: avoid using clocksource_cyc2ns with absolute cycle count
5
6From: Chris Metcalf <cmetcalf@mellanox.com>
7
8commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream.
9
10For large values of "mult" and long uptimes, the intermediate
11result of "cycles * mult" can overflow 64 bits. For example,
12the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
13we have mult = 853, and after 208.5 days, we overflow 64 bits.
14
15Since clocksource_cyc2ns() is intended to be used for relative
16cycle counts, not absolute cycle counts, performance is more
17importance than accepting a wider range of cycle values. So,
18just use mult_frac() directly in tile's sched_clock().
19
20Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow
21in sched_clock") by Salman Qazi results in essentially the same
22generated code for x86 as this change does for tile. In fact,
23a follow-on change by Salman introduced mult_frac() and switched
24to using it, so the C code was largely identical at that point too.
25
26Peter Zijlstra then added mul_u64_u32_shr() and switched x86
27to use it. This is, in principle, better; by optimizing the
2864x64->64 multiplies to be 32x32->64 multiplies we can potentially
29save some time. However, the compiler piplines the 64x64->64
30multiplies pretty well, and the conditional branch in the generic
31mul_u64_u32_shr() causes some bubbles in execution, with the
32result that it's pretty much a wash. If tilegx provided its own
33implementation of mul_u64_u32_shr() without the conditional branch,
34we could potentially save 3 cycles, but that seems like small gain
35for a fair amount of additional build scaffolding; no other platform
36currently provides a mul_u64_u32_shr() override, and tile doesn't
37currently have an <asm/div64.h> header to put the override in.
38
39Additionally, gcc currently has an optimization bug that prevents
40it from recognizing the opportunity to use a 32x32->64 multiply,
41and so the result would be no better than the existing mult_frac()
42until such time as the compiler is fixed.
43
44For now, just using mult_frac() seems like the right answer.
45
46Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
47Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
48
49---
50 arch/tile/kernel/time.c | 4 ++--
51 1 file changed, 2 insertions(+), 2 deletions(-)
52
53--- a/arch/tile/kernel/time.c
54+++ b/arch/tile/kernel/time.c
55@@ -218,8 +218,8 @@ void do_timer_interrupt(struct pt_regs *
56 */
57 unsigned long long sched_clock(void)
58 {
59- return clocksource_cyc2ns(get_cycles(),
60- sched_clock_mult, SCHED_CLOCK_SHIFT);
61+ return mult_frac(get_cycles(),
62+ sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT);
63 }
64
65 int setup_profiling_timer(unsigned int multiplier)