]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
7 weeks agoMerge tag 'timers-v6.9-rc1' of https://git.linaro.org/people/daniel.lezcano/linux...
Thomas Gleixner [Mon, 18 Mar 2024 10:01:13 +0000 (11:01 +0100)] 
Merge tag 'timers-v6.9-rc1' of https://git.linaro.org/people/daniel.lezcano/linux into timers/core

Pull clocksource/event driver updates from Daniel Lezcano:

  - Fix -Wunused-but-set-variable warning for the iMX GPT timer (Daniel
    Lezcano)

  - Add Pixel6 compatible string for Exynos 4210 MCT timer (Peter Griffin)

  - Fix all kernel-doc warnings and misuse of comment format (Randy
    Dunlap)

  - Document in the DT bindings the interrupt used for input capture
    interrupt and udpate the example to match the reality (Geert
    Uytterhoeven)

  - Document RZ/Five SoC DT bindings (Lad Prabhakar)

  - Add DT bindings support for the i.MX95, reorganize the driver to
    move globale variables to a timer private structure and introduce
    the i.MX95 timer support (Peng Fan)

  - Fix prescalar value to conform to the ARM global timer
    documentation. Fix data types and comparison, guard the divide by
    zero code section and use the available macros for bit manipulation
    (Martin Blumenstingl)

  - Add Ralink SoCs system tick counter (Sergio Paracuellos)

  - Add support for cadence TTC PWM (Mubin Sayyed)

  - Clear timer interrupt on timer initialization to prevent the
    interrupt to fire during setup (Ley Foon Tan)

Link: https://lore.kernel.org/r/5552010a-1ce2-46a1-a740-a69f2e9a2cf2@linaro.org
8 weeks agoclocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization
Ley Foon Tan [Wed, 6 Mar 2024 17:23:30 +0000 (01:23 +0800)] 
clocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization

In the RISC-V specification, the stimecmp register doesn't have a default
value. To prevent the timer interrupt from being triggered during timer
initialization, clear the timer interrupt by writing stimecmp with a
maximum value.

Fixes: 9f7a8ff6391f ("RISC-V: Prefer sstc extension if available")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240306172330.255844-1-leyfoon.tan@starfivetech.com
2 months agotimer/migration: Fix quick check reporting late expiry
Frederic Weisbecker [Tue, 5 Mar 2024 00:28:22 +0000 (01:28 +0100)] 
timer/migration: Fix quick check reporting late expiry

When a CPU is the last active in the hierarchy and it tries to enter
into idle, the quick check looking up the next event towards cpuidle
heuristics may report a too late expiry, such as in the following
scenario:

                        [GRP1:0]
                     migrator = NONE
                     active   = NONE
                     nextevt  = T0:0, T0:1
                     /              \
          [GRP0:0]                  [GRP0:1]
       migrator = NONE           migrator = NONE
       active   = NONE           active   = NONE
       nextevt  = T0, T1         nextevt  = T2
       /         \                /         \
      0           1              2           3
    idle       idle           idle         idle

0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has
a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The
expire order is T0 < T1 < T2.

                        [GRP1:0]
                     migrator = GRP0:0
                     active   = GRP0:0
                     nextevt  = T0:0(i), T0:1
                   /              \
          [GRP0:0]                  [GRP0:1]
       migrator = CPU0           migrator = NONE
       active   = CPU0           active   = NONE
       nextevt  = T0(i), T1      nextevt  = T2
       /         \                /         \
      0           1              2           3
    active       idle           idle         idle

1) CPU 0 becomes active. The (i) means a now ignored timer.

                        [GRP1:0]
                     migrator = GRP0:0
                     active   = GRP0:0
                     nextevt  = T0:1
                     /              \
          [GRP0:0]                  [GRP0:1]
       migrator = CPU0           migrator = NONE
       active   = CPU0           active   = NONE
       nextevt  = T1             nextevt  = T2
       /         \                /         \
      0           1              2           3
    active       idle           idle         idle

2) CPU 0 handles remote. No timer actually expired but ignored timers
   have been cleaned out and their sibling's timers haven't been
   propagated. As a result the top level's next event is T2 and not T1.

3) CPU 0 tries to enter idle without any global timer enqueued and calls
   tmigr_quick_check(). The expiry of T2 is returned instead of the
   expiry of T1.

When the quick check returns an expiry that is too late, the cpuidle
governor may pick up a C-state that is too deep. This may be result into
undesired CPU wake up latency if the next timer is actually close enough.

Fix this with assuming that expiries aren't sorted top-down while
performing the quick check. Pick up instead the earliest encountered one
while walking up the hierarchy.

7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240305002822.18130-1-frederic@kernel.org
2 months agotick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n
Arnd Bergmann [Wed, 28 Feb 2024 12:38:41 +0000 (13:38 +0100)] 
tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n

In configurations with CONFIG_TICK_ONESHOT but no CONFIG_NO_HZ or
CONFIG_HIGH_RES_TIMERS, tick_sched_timer_dying() is stubbed out,
but still defined as a global function as well:

kernel/time/tick-sched.c:1599:6: error: redefinition of 'tick_sched_timer_dying'
 1599 | void tick_sched_timer_dying(int cpu)
      |      ^
kernel/time/tick-sched.h:111:20: note: previous definition is here
  111 | static inline void tick_sched_timer_dying(int cpu) { }
      |                    ^

This configuration only appears with ARM CONFIG_ARCH_BCM_MOBILE,
which should not actually select CONFIG_TICK_ONESHOT.

Adjust the #ifdef for the stub to match the condition for building the
tick-sched.c file for consistency with the definition and to avoid
the build regression.

Fixes: 3aedb7fcd88a ("tick/sched: Remove useless oneshot ifdeffery")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240228123850.3499024-1-arnd@kernel.org
2 months agovdso/datapage: Quick fix - use asm/page-def.h for ARM64
Anna-Maria Behnsen [Mon, 26 Feb 2024 17:50:23 +0000 (18:50 +0100)] 
vdso/datapage: Quick fix - use asm/page-def.h for ARM64

The vdso rework for the generic union vdso_data_store broke compat VDSO on
arm64:

In file included from arch/arm64/include/asm/lse.h:5,
 from arch/arm64/include/asm/cmpxchg.h:14,
 from arch/arm64/include/asm/atomic.h:16,
 from include/linux/atomic.h:7,
 from include/asm-generic/bitops/atomic.h:5,
 from arch/arm64/include/asm/bitops.h:25,
 from include/linux/bitops.h:68,
 from arch/arm64/include/asm/memory.h:209,
 from arch/arm64/include/asm/page.h:46,
 from include/vdso/datapage.h:22,
 from lib/vdso/gettimeofday.c:5,
 from <command-line>:
arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128'
  298 |         u128 full;
      |         ^~~~
arch/arm64/include/asm/atomic_ll_sc.h:305:24: error: unknown type name 'u128'
  305 | static __always_inline u128
 \
      |

The reason is the include of asm/page.h which in turn includes headers
which are outside the scope of compat VDSO. The only reason for the
asm/page.h include is the required definition of PAGE_SIZE. But as arm64
defines PAGE_SIZE in asm/page-def.h without extra header includes, this
could be used instead.

Caution: this is a quick fix only! The final fix is an upcoming cleanup of
Arnd which consolidates PAGE_SIZE definition. After the cleanup, the
include of asm/page.h to access PAGE_SIZE is no longer required.

Fixes: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all architectures")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240226175023.56679-1-anna-maria@linutronix.de
Link: https://lore.kernel.org/lkml/CA+G9fYtrXXm_KO9fNPz3XaRxHV7UD_yQp-TEuPQrNRHU+_0W_Q@mail.gmail.com/
2 months agodt-bindings: timer: Add support for cadence TTC PWM
Mubin Sayyed [Mon, 26 Feb 2024 09:33:33 +0000 (15:03 +0530)] 
dt-bindings: timer: Add support for cadence TTC PWM

Cadence TTC can act as PWM device, it will be supported through
separate PWM framework based driver. Decision to configure
specific TTC device as PWM or clocksource/clockevent would
be done based on presence of "#pwm-cells" property.

Also, interrupt property is not required for TTC PWM driver.
Update bindings to support TTC PWM configuration.

Signed-off-by: Mubin Sayyed <mubin.sayyed@amd.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240226093333.2581092-1-mubin.sayyed@amd.com
2 months agotimers: Assert no next dyntick timer look-up while CPU is offline
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:08 +0000 (23:55 +0100)] 
timers: Assert no next dyntick timer look-up while CPU is offline

The next timer (re-)evaluation, with the purpose of entering/updating
the dyntick mode, can happen from 3 sites and none of them are relevant
while the CPU is offline:

1) The idle loop:
a) From the quick check helping the cpuidle governor to heuristically
   predict the best C-state.
b) While stopping the tick.

   But if the CPU is offline, the tick has been cancelled and there is
   consequently no need to further stop the tick.

2) Remote expiry: when a CPU remotely expires global timers on behalf of
   another CPU, the latter target's next timer is re-evaluated
   afterwards. However remote expĂ®ry doesn't happen on offline CPUs.

3) IRQ exit: on nohz_full mode, the tick is (re-)evaluated on IRQ exit.
   But full dynticks is disabled on offline CPUs.

Therefore it is safe to assume that no next dyntick timer lookup can
be performed on offline CPUs.

Assert this expectation to report any surprise.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-17-frederic@kernel.org
2 months agotick: Assume timekeeping is correctly handed over upon last offline idle call
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:07 +0000 (23:55 +0100)] 
tick: Assume timekeeping is correctly handed over upon last offline idle call

The timekeeping duty is handed over from the outgoing CPU on stop
machine, then the oneshot tick is stopped right after.  Therefore it's
guaranteed that the current CPU isn't the timekeeper upon its last call
to idle.

Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes
into idle suggests that the tick is going to be stopped while it is
actually stopped already from the appropriate CPU hotplug state.

Remove the confusing call and the obsolete case handling and convert it
to a sanity check that verifies the above assumption.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-16-frederic@kernel.org
2 months agotick: Shut down low-res tick from dying CPU
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:06 +0000 (23:55 +0100)] 
tick: Shut down low-res tick from dying CPU

The timekeeping duty is handed over from the outgoing CPU within stop
machine. This works well if CONFIG_NO_HZ_COMMON=n or the tick is in
high-res mode. However in low-res dynticks mode, the tick isn't
cancelled until the clockevent is shut down, which can happen later. The
tick may therefore fire again once IRQs are re-enabled on stop machine
and until IRQs are disabled for good upon the last call to idle.

That's so many opportunities for a timekeeper to go idle and the
outgoing CPU to take over that duty. This is why
tick_nohz_idle_stop_tick() is called one last time on idle if the CPU
is seen offline: so that the timekeeping duty is handed over again in
case the CPU has re-taken the duty.

This means there are two timekeeping handovers on CPU down hotplug with
different undocumented constraints and purposes:

1) A handover on stop machine for !dynticks || highres. All online CPUs
   are guaranteed to be non-idle and the timekeeping duty can be safely
   handed-over. The hrtimer tick is cancelled so it is guaranteed that in
   dynticks mode the outgoing CPU won't take again the duty.

2) A handover on last idle call for dynticks && lowres.  Setting the
   duty to TICK_DO_TIMER_NONE makes sure that a CPU will take over the
   timekeeping.

Prepare for consolidating the handover to a single place (the first one)
with shutting down the low-res tick as well from
tick_cancel_sched_timer() as well. This will simplify the handover and
unify the tick cancellation between high-res and low-res.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-15-frederic@kernel.org
2 months agotick: Split nohz and highres features from nohz_mode
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:05 +0000 (23:55 +0100)] 
tick: Split nohz and highres features from nohz_mode

The nohz mode field tells about low resolution nohz mode or high
resolution nohz mode but it doesn't tell about high resolution non-nohz
mode.

In order to retrieve the latter state, tick_cancel_sched_timer() must
fiddle with struct hrtimer's internals to guess if the tick has been
initialized in high resolution.

Move instead the nohz mode field information into the tick flags and
provide two new bits: one to know if the tick is in nohz mode and
another one to know if the tick is in high resolution. The combination
of those two flags provides all the needed informations to determine
which of the three tick modes is running.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-14-frederic@kernel.org
2 months agotick: Move individual bit features to debuggable mask accesses
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:04 +0000 (23:55 +0100)] 
tick: Move individual bit features to debuggable mask accesses

The individual bitfields of struct tick_sched must be modified from
IRQs disabled places, otherwise local modifications can race due to them
sharing the same memory storage.

The recent move of the "got_idle_tick" bitfield to its own storage shows
that the use of these bitfields, as pretty as they look, can be as much
error prone.

In order to avoid future issues of the like and make sure that those
bitfields are safely accessed, move those flags to an explicit mask
along with a mutator function performing the basic IRQs disabled sanity
check.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-13-frederic@kernel.org
2 months agotick: Move got_idle_tick away from common flags
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:03 +0000 (23:55 +0100)] 
tick: Move got_idle_tick away from common flags

tick_nohz_idle_got_tick() is called by cpuidle_reflect() within the idle
loop with interrupts enabled. This function modifies the struct
tick_sched's bitfield "got_idle_tick". However this bitfield is stored
within the same mask as other bitfields that can be modified from
interrupts.

Fortunately so far it looks like the only race that can happen is while
writing ->got_idle_tick to 0, an interrupt fires and writes the
->idle_active field to 0. It's then possible that the interrupted write
to ->got_idle_tick writes back the old value of ->idle_active back to 1.

However if that happens, the worst possible outcome is that the time
spent between that interrupt and the upcoming call to
tick_nohz_idle_exit() is accounted as idle, which is negligible quantity.

Still all the bitfield writes within this struct tick_sched's shadow
mask should be IRQ-safe. Therefore move this bitfield out to its own
storage to avoid further suprises.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-12-frederic@kernel.org
2 months agotick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:02 +0000 (23:55 +0100)] 
tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode

The full-nohz update function checks if the nohz mode is active before
proceeding. It considers one exception though: if the tick is already
stopped even though the nohz mode is inactive, it still moves on in
order to update/restart the tick if needed.

However in order for the tick to be stopped, the nohz_mode has to be
either NOHZ_MODE_LOWRES or NOHZ_MODE_HIGHRES. Therefore it doesn't make
sense to test if the tick is stopped before verifying NOHZ_MODE_INACTIVE
mode.

Remove the needless related condition.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-11-frederic@kernel.org
2 months agotick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:01 +0000 (23:55 +0100)] 
tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING

The broadcast shutdown code is executed through a random explicit call
within stop machine from the outgoing CPU.

However the tick broadcast is a midware between the tick callback and
the clocksource, therefore it makes more sense to shut it down after the
tick callback and before the clocksource drivers.

Move it instead to the common tick shutdown CPU hotplug state where
related operations can be ordered from highest to lowest level.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-10-frederic@kernel.org
2 months agotick: Move tick cancellation up to CPUHP_AP_TICK_DYING
Frederic Weisbecker [Sun, 25 Feb 2024 22:55:00 +0000 (23:55 +0100)] 
tick: Move tick cancellation up to CPUHP_AP_TICK_DYING

The tick hrtimer is cancelled right before hrtimers are migrated. This
is done from the hrtimer subsystem even though it shouldn't know about
its actual users.

Move instead the tick hrtimer cancellation to the relevant CPU hotplug
state that aims at centralizing high level tick shutdown operations so
that the related flow is easy to follow.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-9-frederic@kernel.org
2 months agotick: Start centralizing tick related CPU hotplug operations
Frederic Weisbecker [Sun, 25 Feb 2024 22:54:59 +0000 (23:54 +0100)] 
tick: Start centralizing tick related CPU hotplug operations

During the CPU offlining process, the various timer tick features are
shut down from scattered places, sometimes from teardown callbacks on
stop machine, sometimes through explicit calls, sometimes from the
control CPU after the CPU died. The reason why these shutdown operations
are spread around is not always clear and it makes the tick lifecycle
hard to follow.

The tick should be shut down in order from highest to lowest level:

On stop machine from the dying CPU (high-level):

 1) Hand-over the timekeeping duty (tick_handover_do_timer())
 2) Cancel the tick implementation called by the clockevent callback
    (tick_cancel_sched_timer())
 3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline())

On stop machine from the dying CPU (low-level):

 4) Shutdown clockevents drivers (CPUHP_AP_*_TIMER_STARTING states)

From the control CPU after the CPU died (low-level):

 5) Shutdown/unregister/cleanup clockevents for the dead CPU
    (tick_cleanup_dead_cpu())

Instead the current order is 2, 4 (both from CPU hotplug states), then
1 and 3 through direct calls. This layout and order don't make much
sense. The operations 1, 2, 3 should be gathered together and in order.

Sort this situation with creating a new TICK shut-down CPU hotplug state
and start with introducing the timekeeping duty hand-over there. The
state must precede hrtimers migration because the tick hrtimer will be
stopped from it in a further patch.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-8-frederic@kernel.org
2 months agotick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
Frederic Weisbecker [Sun, 25 Feb 2024 22:54:58 +0000 (23:54 +0100)] 
tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()

The tick sched structure is already cleared from tick_cancel_sched_timer(),
so there is no need to clear that field again.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-7-frederic@kernel.org
2 months agotick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
Frederic Weisbecker [Sun, 25 Feb 2024 22:54:57 +0000 (23:54 +0100)] 
tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()

tick_nohz_stop_sched_tick() is only about NOHZ_full and not about
dynticks-idle. Reflect that in the function name to avoid confusion.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-6-frederic@kernel.org
2 months agotick: Use IS_ENABLED() whenever possible
Frederic Weisbecker [Sun, 25 Feb 2024 22:54:56 +0000 (23:54 +0100)] 
tick: Use IS_ENABLED() whenever possible

Avoid ifdeferry if it can be converted to IS_ENABLED() whenever possible

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-5-frederic@kernel.org
2 months agotick/sched: Remove useless oneshot ifdeffery
Frederic Weisbecker [Sun, 25 Feb 2024 22:54:55 +0000 (23:54 +0100)] 
tick/sched: Remove useless oneshot ifdeffery

tick-sched.c is only built when CONFIG_TICK_ONESHOT=y, which is selected
only if CONFIG_NO_HZ_COMMON=y or CONFIG_HIGH_RES_TIMERS=y. Therefore
the related ifdeferry in this file is needless and can be removed.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-4-frederic@kernel.org
2 months agotick/nohz: Remove duplicate between lowres and highres handlers
Peng Liu [Sun, 25 Feb 2024 22:54:54 +0000 (23:54 +0100)] 
tick/nohz: Remove duplicate between lowres and highres handlers

tick_nohz_lowres_handler() does the same work as
tick_nohz_highres_handler() plus the clockevent device reprogramming, so
make the former reuse the latter and rename it accordingly.

Signed-off-by: Peng Liu <liupeng17@lenovo.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-3-frederic@kernel.org
2 months agotick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_t...
Peng Liu [Sun, 25 Feb 2024 22:54:53 +0000 (23:54 +0100)] 
tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()

The ts->sched_timer initialization work of tick_nohz_switch_to_nohz()
is almost the same as that of tick_setup_sched_timer(), so adjust the
latter to get it reused by tick_nohz_switch_to_nohz().

This also makes the low resolution mode sched_timer benefit from the tick
skew boot option.

Signed-off-by: Peng Liu <liupeng17@lenovo.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-2-frederic@kernel.org
2 months agoclocksource/drivers/arm_global_timer: Simplify prescaler register access
Martin Blumenstingl [Sun, 25 Feb 2024 15:13:36 +0000 (16:13 +0100)] 
clocksource/drivers/arm_global_timer: Simplify prescaler register access

Use GENMASK() to define the prescaler mask and make the whole driver use
the mask (together with helpers such as FIELD_{GET,PREP,FIT}) instead of
needing an additional shift and max value constant.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240225151336.2728533-4-martin.blumenstingl@googlemail.com
2 months agoclocksource/drivers/arm_global_timer: Guard against division by zero
Martin Blumenstingl [Sun, 25 Feb 2024 15:13:35 +0000 (16:13 +0100)] 
clocksource/drivers/arm_global_timer: Guard against division by zero

The result of the division of new_rate by gt_target_rate can be zero (if
new_rate is smaller than gt_target_rate). Using that result as divisor
without checking can result in a division by zero error. Guard against
this by checking for a zero value earlier.
While here, also change the psv variable to an unsigned long to make
sure we don't overflow the datatype as all other types involved are also
unsiged long.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240225151336.2728533-3-martin.blumenstingl@googlemail.com
2 months agoclocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long
Martin Blumenstingl [Sun, 25 Feb 2024 15:13:34 +0000 (16:13 +0100)] 
clocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long

Change the data type of gt_target_rate to unsigned long as this is what
we get back from clk_get_rate().

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240225151336.2728533-2-martin.blumenstingl@googlemail.com
2 months agodt-bindings: timer: add Ralink SoCs system tick counter
Sergio Paracuellos [Tue, 12 Dec 2023 09:34:43 +0000 (10:34 +0100)] 
dt-bindings: timer: add Ralink SoCs system tick counter

Add YAML doc for the system tick counter which is present on Ralink SoCs.

cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20231212093443.1898591-1-sergio.paracuellos@gmail.com
2 months agohrtimer: Select housekeeping CPU during migration
Costa Shulyupin [Thu, 22 Feb 2024 20:08:56 +0000 (22:08 +0200)] 
hrtimer: Select housekeeping CPU during migration

During CPU-down hotplug, hrtimers may migrate to isolated CPUs,
compromising CPU isolation.

Address this issue by masking valid CPUs for hrtimers using
housekeeping_cpumask(HK_TYPE_TIMER).

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20240222200856.569036-1-costa.shul@redhat.com
2 months agotimers: Always queue timers on the local CPU
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:48 +0000 (10:05 +0100)] 
timers: Always queue timers on the local CPU

The timer pull model is in place so we can remove the heuristics which try
to guess the best target CPU at enqueue/modification time.

All non pinned timers are queued on the local CPU in the separate storage
and eventually pulled at expiry time to a remote CPU.

Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-21-anna-maria@linutronix.de
2 months agotimer_migration: Add tracepoints
Anna-Maria Behnsen [Thu, 22 Feb 2024 10:34:03 +0000 (11:34 +0100)] 
timer_migration: Add tracepoints

The timer pull logic needs proper debugging aids. Add tracepoints so the
hierarchical idle machinery can be diagnosed.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240222103403.31923-1-anna-maria@linutronix.de
2 months agotimers: Implement the hierarchical pull model
Anna-Maria Behnsen [Thu, 22 Feb 2024 10:37:10 +0000 (11:37 +0100)] 
timers: Implement the hierarchical pull model

Placing timers at enqueue time on a target CPU based on dubious heuristics
does not make any sense:

 1) Most timer wheel timers are canceled or rearmed before they expire.

 2) The heuristics to predict which CPU will be busy when the timer expires
    are wrong by definition.

So placing the timers at enqueue wastes precious cycles.

The proper solution to this problem is to always queue the timers on the
local CPU and allow the non pinned timers to be pulled onto a busy CPU at
expiry time.

Therefore split the timer storage into local pinned and global timers:
Local pinned timers are always expired on the CPU on which they have been
queued. Global timers can be expired on any CPU.

As long as a CPU is busy it expires both local and global timers. When a
CPU goes idle it arms for the first expiring local timer. If the first
expiring pinned (local) timer is before the first expiring movable timer,
then no action is required because the CPU will wake up before the first
movable timer expires. If the first expiring movable timer is before the
first expiring pinned (local) timer, then this timer is queued into an idle
timerqueue and eventually expired by another active CPU.

To avoid global locking the timerqueues are implemented as a hierarchy. The
lowest level of the hierarchy holds the CPUs. The CPUs are associated to
groups of 8, which are separated per node. If more than one CPU group
exist, then a second level in the hierarchy collects the groups. Depending
on the size of the system more than 2 levels are required. Each group has a
"migrator" which checks the timerqueue during the tick for remote expirable
timers.

If the last CPU in a group goes idle it reports the first expiring event in
the group up to the next group(s) in the hierarchy. If the last CPU goes
idle it arms its timer for the first system wide expiring timer to ensure
that no timer event is missed.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240222103710.32582-1-anna-maria@linutronix.de
2 months agotimers: Introduce function to check timer base is_idle flag
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:45 +0000 (10:05 +0100)] 
timers: Introduce function to check timer base is_idle flag

To prepare for the conversion of the NOHZ timer placement to a pull at
expiry time model it's required to have a function that returns the value
of the is_idle flag of the timer base to keep the hierarchy states during
online in sync with timer base state.

No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-18-anna-maria@linutronix.de
2 months agotick/sched: Split out jiffies update helper function
Richard Cochran (linutronix GmbH) [Wed, 21 Feb 2024 09:05:44 +0000 (10:05 +0100)] 
tick/sched: Split out jiffies update helper function

The logic to get the time of the last jiffies update will be needed by
the timer pull model as well.

Move the code into a global function in anticipation of the new caller.

No functional change.

Signed-off-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-17-anna-maria@linutronix.de
2 months agotimers: Check if timers base is handled already
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:43 +0000 (10:05 +0100)] 
timers: Check if timers base is handled already

Due to the conversion of the NOHZ timer placement to a pull at expiry
time model, the per CPU timer bases with non pinned timers are no
longer handled only by the local CPU. In case a remote CPU already
expires the non pinned timers base of the local CPU, nothing more
needs to be done by the local CPU. A check at the begin of the expire
timers routine is required, because timer base lock is dropped before
executing the timer callback function.

This is a preparatory work, but has no functional impact right now.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-16-anna-maria@linutronix.de
2 months agotimers: Restructure internal locking
Richard Cochran (linutronix GmbH) [Wed, 21 Feb 2024 09:05:42 +0000 (10:05 +0100)] 
timers: Restructure internal locking

Move the locking out from __run_timers() to the call sites, so the
protected section can be extended at the call site. Preparatory work for
changing the NOHZ timer placement to a pull at expiry time model.

No functional change.

Signed-off-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-15-anna-maria@linutronix.de
2 months agotimers: Add get next timer interrupt functionality for remote CPUs
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:41 +0000 (10:05 +0100)] 
timers: Add get next timer interrupt functionality for remote CPUs

To prepare for the conversion of the NOHZ timer placement to a pull at
expiry time model it's required to have functionality available getting the
next timer interrupt on a remote CPU.

Locking of the timer bases and getting the information for the next timer
interrupt functionality is split into separate functions. This is required
to be compliant with lock ordering when the new model is in place.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-14-anna-maria@linutronix.de
2 months agotimers: Split out "get next timer interrupt" functionality
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:40 +0000 (10:05 +0100)] 
timers: Split out "get next timer interrupt" functionality

The functionality for getting the next timer interrupt in
get_next_timer_interrupt() is split into a separate function
fetch_next_timer_interrupt() to be usable by other call sites.

This is preparatory work for the conversion of the NOHZ timer
placement to a pull at expiry time model. No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-13-anna-maria@linutronix.de
2 months agotimers: Retrieve next expiry of pinned/non-pinned timers separately
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:39 +0000 (10:05 +0100)] 
timers: Retrieve next expiry of pinned/non-pinned timers separately

For the conversion of the NOHZ timer placement to a pull at expiry time
model it's required to have separate expiry times for the pinned and the
non-pinned (movable) timers. Therefore struct timer_events is introduced.

No functional change

Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-12-anna-maria@linutronix.de
2 months agotimers: Keep the pinned timers separate from the others
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:38 +0000 (10:05 +0100)] 
timers: Keep the pinned timers separate from the others

Separate the storage space for pinned timers. Deferrable timers (doesn't
matter if pinned or non pinned) are still enqueued into their own base.

This is preparatory work for changing the NOHZ timer placement from a push
at enqueue time to a pull at expiry time model.

Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-11-anna-maria@linutronix.de
2 months agotimers: Split next timer interrupt logic
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:37 +0000 (10:05 +0100)] 
timers: Split next timer interrupt logic

Split the logic for getting next timer interrupt (no matter of recalculated
or already stored in base->next_expiry) into a separate function named
next_timer_interrupt(). Make it available to local call sites only.

No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-10-anna-maria@linutronix.de
2 months agotimers: Simplify code in run_local_timers()
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:36 +0000 (10:05 +0100)] 
timers: Simplify code in run_local_timers()

The logic for raising a softirq the way it is implemented right now, is
readable for two timer bases. When increasing the number of timer bases,
code gets harder to read. With the introduction of the timer migration
hierarchy, there will be three timer bases.

Therefore restructure the code to use a loop. No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-9-anna-maria@linutronix.de
2 months agotimers: Make sure TIMER_PINNED flag is set in add_timer_on()
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:35 +0000 (10:05 +0100)] 
timers: Make sure TIMER_PINNED flag is set in add_timer_on()

When adding a timer to the timer wheel using add_timer_on(), it is an
implicitly pinned timer. With the timer pull at expiry time model in place,
the TIMER_PINNED flag is required to make sure timers end up in proper
base.

Set the TIMER_PINNED flag unconditionally when add_timer_on() is executed.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-8-anna-maria@linutronix.de
2 months agoworkqueue: Use global variant for add_timer()
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:34 +0000 (10:05 +0100)] 
workqueue: Use global variant for add_timer()

The implementation of the NOHZ pull at expiry model will change the timer
bases per CPU. Timers, that have to expire on a specific CPU, require the
TIMER_PINNED flag. If the CPU doesn't matter, the TIMER_PINNED flag must be
dropped. This is required for call sites which use the timer alternately as
pinned and not pinned timer like workqueues do.

Therefore use add_timer_global() in __queue_delayed_work() for non-bound
delayed work to make sure the TIMER_PINNED flag is dropped.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-7-anna-maria@linutronix.de
2 months agotimers: Introduce add_timer() variants which modify timer flags
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:33 +0000 (10:05 +0100)] 
timers: Introduce add_timer() variants which modify timer flags

A timer might be used as a pinned timer (using add_timer_on()) and later on
as non-pinned timer using add_timer(). When the "NOHZ timer pull at expiry
model" is in place, the TIMER_PINNED flag is required to be used whenever a
timer needs to expire on a dedicated CPU. Otherwise the flag must not be
set if expiration on a dedicated CPU is not required.

add_timer_on()'s behavior will be changed during the preparation patches
for the "NOHZ timer pull at expiry model" to unconditionally set the
TIMER_PINNED flag. To be able to clear/ set the flag when queueing a
timer, two variants of add_timer() are introduced.

This is a preparatory step and has no functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-6-anna-maria@linutronix.de
2 months agotimers: Optimization for timer_base_try_to_set_idle()
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:32 +0000 (10:05 +0100)] 
timers: Optimization for timer_base_try_to_set_idle()

When tick is stopped also the timer base is_idle flag is set. When
reentering timer_base_try_to_set_idle() with the tick stopped, there is no
need to check whether the timer base needs to be set idle again. When a
timer was enqueued in the meantime, this is already handled by the
tick_nohz_next_event() call which was executed before
tick_nohz_stop_tick().

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-5-anna-maria@linutronix.de
2 months agotimers: Move marking timer bases idle into tick_nohz_stop_tick()
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:31 +0000 (10:05 +0100)] 
timers: Move marking timer bases idle into tick_nohz_stop_tick()

The timer base is marked idle when get_next_timer_interrupt() is
executed. But the decision whether the tick will be stopped and whether the
system is able to go idle is done later. When the timer bases is marked
idle and a new first timer is enqueued remote an IPI is raised. Even if it
is not required because the tick is not stopped and the timer base is
evaluated again at the next tick.

To prevent this, the timer base is marked idle in tick_nohz_stop_tick() and
get_next_timer_interrupt() is streamlined by only looking for the next timer
interrupt. All other work is postponed to timer_base_try_to_set_idle() which is
called by tick_nohz_stop_tick(). timer_base_try_to_set_idle() never resets
timer_base::is_idle state. This is done when the tick is restarted via
tick_nohz_restart_sched_tick().

With this, tick_sched::tick_stopped and timer_base::is_idle are always in
sync. So there is no longer the need to execute timer_clear_idle() in
tick_nohz_idle_retain_tick(). This was required before, as
tick_nohz_next_event() set timer_base::is_idle even if the tick would not be
stopped. So timer_clear_idle() is only executed, when timer base is idle. So the
check whether timer base is idle, is now no longer required as well.

While at it fix some nearby whitespace damage as well.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-4-anna-maria@linutronix.de
2 months agotimers: Split out get next timer interrupt
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:30 +0000 (10:05 +0100)] 
timers: Split out get next timer interrupt

Split out get_next_timer_interrupt() to be able to extend it and make it
reusable for other call sites.

No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-3-anna-maria@linutronix.de
2 months agotimers: Restructure get_next_timer_interrupt()
Anna-Maria Behnsen [Wed, 21 Feb 2024 09:05:29 +0000 (10:05 +0100)] 
timers: Restructure get_next_timer_interrupt()

get_next_timer_interrupt() contains two parts for the next timer interrupt
calculation. Those two parts are separated by forwarding the base
clock. But the second part does not depend on the forwarded base
clock.

Therefore restructure get_next_timer_interrupt() to keep things together
which belong together.

No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240221090548.36600-2-anna-maria@linutronix.de
2 months agoclocksource: Scale the watchdog read retries automatically
Feng Tang [Wed, 21 Feb 2024 06:08:59 +0000 (14:08 +0800)] 
clocksource: Scale the watchdog read retries automatically

On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled
during boot time on about one out of 120 boot attempts:

    clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns,
    wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable
    tsc: Marking TSC unstable due to clocksource watchdog
    TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
    sched_clock: Marking unstable (119294969739159204297)<-(125446229205, -5992055152)
    clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896.
    clocksource: Switched to clocksource hpet

The reason is that for platform with a large number of CPUs, there are
sporadic big or huge read latencies while reading the watchog/clocksource
during boot or when system is under stress work load, and the frequency and
maximum value of the latency goes up with the number of online CPUs.

The cCurrent code already has logic to detect and filter such high latency
case by reading the watchdog twice and checking the two deltas. Due to the
randomness of the latency, there is a low probabilty that the first delta
(latency) is big, but the second delta is small and looks valid. The
watchdog code retries the readouts by default twice, which is not
necessarily sufficient for systems with a large number of CPUs.

There is a command line parameter 'max_cswd_read_retries' which allows to
increase the number of retries, but that's not user friendly as it needs to
be tweaked per system. As the number of required retries is proportional to
the number of online CPUs, this parameter can be calculated at runtime.

Scale and enlarge the number of retries according to the number of online
CPUs and remove the command line parameter completely.

[ tglx: Massaged change log and comments ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jin Wang <jin1.wang@intel.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com
2 months agotime/kunit: Use correct format specifier
David Gow [Wed, 21 Feb 2024 09:27:17 +0000 (17:27 +0800)] 
time/kunit: Use correct format specifier

'days' is a s64 (from div_s64), and so should use a %lld specifier.

This was found by extending KUnit's assertion macros to use gcc's
__printf attribute.

Fixes: 276010551664 ("time: Improve performance of time64_to_tm()")
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240221092728.1281499-5-davidgow@google.com
2 months agoclocksource: arm_global_timer: fix non-kernel-doc comment
Randy Dunlap [Mon, 15 Jan 2024 05:36:41 +0000 (21:36 -0800)] 
clocksource: arm_global_timer: fix non-kernel-doc comment

Use a common C comment "/*" instead of a kernel-doc marker "/**"
to prevent kernel-doc warnings:

arm_global_timer.c:92: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * To ensure that updates to comparator value register do not set the
arm_global_timer.c:92: warning: missing initial short description on line:
 * To ensure that updates to comparator value register do not set the

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240115053641.29129-1-rdunlap@infradead.org
2 months agocsky/vdso: Use generic union vdso_data_store
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:39 +0000 (16:39 +0100)] 
csky/vdso: Use generic union vdso_data_store

There is already a generic union definition for vdso_data_store in the vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240219153939.75719-11-anna-maria@linutronix.de
2 months agoMIPS: vdso: Use generic union vdso_data_store
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:38 +0000 (16:39 +0100)] 
MIPS: vdso: Use generic union vdso_data_store

There is already a generic union definition for vdso_data_store in the vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240219153939.75719-10-anna-maria@linutronix.de
2 months agoLoongArch: vdso: Use generic union vdso_data_store
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:37 +0000 (16:39 +0100)] 
LoongArch: vdso: Use generic union vdso_data_store

There is already a generic union definition for vdso_data_store in vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240219153939.75719-9-anna-maria@linutronix.de
2 months agos390/vdso: Use generic union vdso_data_store
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:36 +0000 (16:39 +0100)] 
s390/vdso: Use generic union vdso_data_store

There is already a generic union definition for vdso_data_store in the vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219153939.75719-8-anna-maria@linutronix.de
2 months agoriscv: vdso: Use generic union vdso_data_store
Anna-Maria Behnsen [Tue, 20 Feb 2024 08:52:12 +0000 (09:52 +0100)] 
riscv: vdso: Use generic union vdso_data_store

There is already a generic union definition for vdso_data_store in the vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240220085212.6547-1-anna-maria@linutronix.de
2 months agoarm64: vdso: Use generic union vdso_data_store
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:34 +0000 (16:39 +0100)] 
arm64: vdso: Use generic union vdso_data_store

There is already a generic union definition for vdso_data_store in vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240219153939.75719-6-anna-maria@linutronix.de
2 months agovdso/ARM: Make union vdso_data_store available for all architectures
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:33 +0000 (16:39 +0100)] 
vdso/ARM: Make union vdso_data_store available for all architectures

The vDSO data page "union vdso_data_store" is defined in an ARM specific
header file and also defined in several other places.

Move the definition from the ARM header file into the generic vdso datapage
header to make it also usable for others and to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240219153939.75719-5-anna-maria@linutronix.de
2 months agocsky/vdso: Remove superfluous ifdeffery
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:32 +0000 (16:39 +0100)] 
csky/vdso: Remove superfluous ifdeffery

CSKY selects GENERIC_TIME_VSYSCALL. GENERIC_TIME_VSYSCALL dependent
ifdeffery is superfluous. Clean it up.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240219153939.75719-4-anna-maria@linutronix.de
2 months agos390/vdso/data: Drop unnecessary header include
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:31 +0000 (16:39 +0100)] 
s390/vdso/data: Drop unnecessary header include

vdso/datapage.h includes the architecture specific vdso/data.h header
file. So there is no need to do it also the other way round and including
the generic vdso/datapage.h header file inside the architecture specific
data.h header file.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219153939.75719-3-anna-maria@linutronix.de
2 months agovdso/helpers: Fix grammar in comments
Anna-Maria Behnsen [Mon, 19 Feb 2024 15:39:30 +0000 (16:39 +0100)] 
vdso/helpers: Fix grammar in comments

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240219153939.75719-2-anna-maria@linutronix.de
2 months agoMerge tag 'v6.8-rc5' into timers/core, to resolve conflict
Ingo Molnar [Mon, 19 Feb 2024 20:13:31 +0000 (21:13 +0100)] 
Merge tag 'v6.8-rc5' into timers/core, to resolve conflict

There's a conflict between this recent upstream fix:

  dad6a09f3148 ("hrtimer: Report offline hrtimer enqueue")

and a pending commit in the timers tree:

  1a4729ecafc2 ("hrtimers: Move hrtimer base related definitions into hrtimer_defs.h")

Resolve it by applying the upstream fix to the new <linux/hrtimer_defs.h> header.

 Conflict:
include/linux/hrtimer.h
 Semantic conflict:
include/linux/hrtimer_defs.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2 months agotimekeeping: Fix cross-timestamp interpolation for non-x86
Peter Hilber [Mon, 18 Dec 2023 07:38:41 +0000 (08:38 +0100)] 
timekeeping: Fix cross-timestamp interpolation for non-x86

So far, get_device_system_crosststamp() unconditionally passes
system_counterval.cycles to timekeeping_cycles_to_ns(). But when
interpolating system time (do_interp == true), system_counterval.cycles is
before tkr_mono.cycle_last, contrary to the timekeeping_cycles_to_ns()
expectations.

On x86, CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE will mitigate on
interpolating, setting delta to 0. With delta == 0, xtstamp->sys_monoraw
and xtstamp->sys_realtime are then set to the last update time, as
implicitly expected by adjust_historical_crosststamp(). On other
architectures, the resulting nonsense xtstamp->sys_monoraw and
xtstamp->sys_realtime corrupt the xtstamp (ts) adjustment in
adjust_historical_crosststamp().

Fix this by deriving xtstamp->sys_monoraw and xtstamp->sys_realtime from
the last update time when interpolating, by using the local variable
"cycles". The local variable already has the right value when
interpolating, unlike system_counterval.cycles.

Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices")
Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/20231218073849.35294-4-peter.hilber@opensynergy.com
2 months agotimekeeping: Fix cross-timestamp interpolation corner case decision
Peter Hilber [Mon, 18 Dec 2023 07:38:40 +0000 (08:38 +0100)] 
timekeeping: Fix cross-timestamp interpolation corner case decision

The cycle_between() helper checks if parameter test is in the open interval
(before, after). Colloquially speaking, this also applies to the counter
wrap-around special case before > after. get_device_system_crosststamp()
currently uses cycle_between() at the first call site to decide whether to
interpolate for older counter readings.

get_device_system_crosststamp() has the following problem with
cycle_between() testing against an open interval: Assume that, by chance,
cycles == tk->tkr_mono.cycle_last (in the following, "cycle_last" for
brevity). Then, cycle_between() at the first call site, with effective
argument values cycle_between(cycle_last, cycles, now), returns false,
enabling interpolation. During interpolation,
get_device_system_crosststamp() will then call cycle_between() at the
second call site (if a history_begin was supplied). The effective argument
values are cycle_between(history_begin->cycles, cycles, cycles), since
system_counterval.cycles == interval_start == cycles, per the assumption.
Due to the test against the open interval, cycle_between() returns false
again. This causes get_device_system_crosststamp() to return -EINVAL.

This failure should be avoided, since get_device_system_crosststamp() works
both when cycles follows cycle_last (no interpolation), and when cycles
precedes cycle_last (interpolation). For the case cycles == cycle_last,
interpolation is actually unneeded.

Fix this by changing cycle_between() into timestamp_in_interval(), which
now checks against the closed interval, rather than the open interval.

This changes the get_device_system_crosststamp() behavior for three corner
cases:

1. Bypass interpolation in the case cycles == tk->tkr_mono.cycle_last,
   fixing the problem described above.

2. At the first timestamp_in_interval() call site, cycles == now no longer
   causes failure.

3. At the second timestamp_in_interval() call site, history_begin->cycles
   == system_counterval.cycles no longer causes failure.
   adjust_historical_crosststamp() also works for this corner case,
   where partial_history_cycles == total_history_cycles.

These behavioral changes should not cause any problems.

Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices")
Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231218073849.35294-3-peter.hilber@opensynergy.com
2 months agotimekeeping: Fix cross-timestamp interpolation on counter wrap
Peter Hilber [Mon, 18 Dec 2023 07:38:39 +0000 (08:38 +0100)] 
timekeeping: Fix cross-timestamp interpolation on counter wrap

cycle_between() decides whether get_device_system_crosststamp() will
interpolate for older counter readings.

cycle_between() yields wrong results for a counter wrap-around where after
< before < test, and for the case after < test < before.

Fix the comparison logic.

Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices")
Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/20231218073849.35294-2-peter.hilber@opensynergy.com
2 months agojiffies: Transform comment about time_* functions into DOC block
Anna-Maria Behnsen [Tue, 23 Jan 2024 16:46:59 +0000 (17:46 +0100)] 
jiffies: Transform comment about time_* functions into DOC block

This general note about time_* functions is also useful to be available in
kernel documentation. Therefore transform it into a kernel-doc DOC block
with proper formatting.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240123164702.55612-6-anna-maria@linutronix.de
2 months agotimers: Add struct member description for timer_base
Anna-Maria Behnsen [Tue, 23 Jan 2024 16:46:58 +0000 (17:46 +0100)] 
timers: Add struct member description for timer_base

timer_base struct lacks description of struct members. Important struct
member information is sprinkled in comments or in code all over the place.

Collect information and write struct description to keep track of most
important information in a single place.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240123164702.55612-5-anna-maria@linutronix.de
2 months agotick/sched: Add function description for tick_nohz_next_event()
Anna-Maria Behnsen [Tue, 23 Jan 2024 16:46:57 +0000 (17:46 +0100)] 
tick/sched: Add function description for tick_nohz_next_event()

The return value of tick_nohz_next_event() is not obvious at the first
glance. Add a kernel-doc compatible function description which also covers
return values.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240123164702.55612-4-anna-maria@linutronix.de
2 months agohrtimers: Update formatting of documentation
Anna-Maria Behnsen [Tue, 23 Jan 2024 16:46:56 +0000 (17:46 +0100)] 
hrtimers: Update formatting of documentation

Documentation of functions lacks the annotations which are used by
kernel-doc and *.rst to make appearance in rendered documents more
user-friendly.

Use those annotations to improve user-friendliness. While at it prevent
duplication of comments and use a reference instead.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240123164702.55612-3-anna-maria@linutronix.de
2 months agohrtimers: Move hrtimer base related definitions into hrtimer_defs.h
Anna-Maria Behnsen [Tue, 23 Jan 2024 16:46:55 +0000 (17:46 +0100)] 
hrtimers: Move hrtimer base related definitions into hrtimer_defs.h

hrtimer base related struct definitions are part of hrtimers.h as it is
required there. With this, also the struct documentation which is for core
code internal use, is exposed into the general api.

To prevent this, move all core internal definitions and the related
includes into hrtimer_defs.h.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240123164702.55612-2-anna-maria@linutronix.de
2 months agoclocksource/drivers/arm_global_timer: Remove stray tab
Martin Blumenstingl [Sun, 18 Feb 2024 17:41:38 +0000 (18:41 +0100)] 
clocksource/drivers/arm_global_timer: Remove stray tab

Remove a stray tab in global_timer_of_register() which is different from
the coding style in the rest of the driver.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240218174138.1942418-3-martin.blumenstingl@googlemail.com
2 months agoclocksource/drivers/arm_global_timer: Fix maximum prescaler value
Martin Blumenstingl [Sun, 18 Feb 2024 17:41:37 +0000 (18:41 +0100)] 
clocksource/drivers/arm_global_timer: Fix maximum prescaler value

The prescaler in the "Global Timer Control Register bit assignments" is
documented to use bits [15:8], which means that the maximum prescaler
register value is 0xff.

Fixes: 171b45a4a70e ("clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240218174138.1942418-2-martin.blumenstingl@googlemail.com
2 months agoLinux 6.8-rc5 v6.8-rc5
Linus Torvalds [Sun, 18 Feb 2024 20:56:25 +0000 (12:56 -0800)] 
Linux 6.8-rc5

2 months agoMerge tag 'kbuild-fixes-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 18 Feb 2024 18:09:25 +0000 (10:09 -0800)] 
Merge tag 'kbuild-fixes-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Reformat nested if-conditionals in Makefiles with 4 spaces

 - Fix CONFIG_DEBUG_INFO_BTF builds for big endian

 - Fix modpost for module srcversion

 - Fix an escape sequence warning in gen_compile_commands.py

 - Fix kallsyms to ignore ARMv4 thunk symbols

* tag 'kbuild-fixes-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kallsyms: ignore ARMv4 thunks along with others
  modpost: trim leading spaces when processing source files list
  gen_compile_commands: fix invalid escape sequence warning
  kbuild: Fix changing ELF file type for output of gen_btf for big endian
  docs: kconfig: Fix grammar and formatting
  kbuild: use 4-space indentation when followed by conditionals

2 months agoMerge tag 'x86_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Feb 2024 17:22:48 +0000 (09:22 -0800)] 
Merge tag 'x86_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

 - Use a GB page for identity mapping only when memory of this size is
   requested so that mapping of reserved regions is prevented which
   would otherwise lead to system crashes on UV machines

* tag 'x86_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/ident_map: Use gbpages only where full GB page should be mapped.

2 months agoMerge tag 'irq_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 18 Feb 2024 17:14:12 +0000 (09:14 -0800)] 
Merge tag 'irq_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix GICv4.1 affinity update

 - Restore a quirk for ACPI-based GICv4 systems

 - Handle non-coherent GICv4 redistributors properly

 - Prevent spurious interrupts on Broadcom devices using GIC v3
   architecture

 - Other minor fixes

* tag 'irq_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
  irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems
  irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors
  irqchip/qcom-mpm: Fix IS_ERR() vs NULL check in qcom_mpm_init()
  irqchip/loongson-eiointc: Use correct struct type in eiointc_domain_alloc()
  irqchip/irq-brcmstb-l2: Add write memory barrier before exit

2 months agoMerge tag 'i2c-for-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sun, 18 Feb 2024 17:08:57 +0000 (09:08 -0800)] 
Merge tag 'i2c-for-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Two fixes for i801 and qcom-geni devices. Meanwhile, a fix from Arnd
  addresses a compilation error encountered during compile test on
  powerpc"

* tag 'i2c-for-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i801: Fix block process call transactions
  i2c: pasemi: split driver into two separate modules
  i2c: qcom-geni: Correct I2C TRE sequence

2 months agoclocksource/drivers/imx-sysctr: Add i.MX95 support
Peng Fan [Mon, 5 Feb 2024 03:17:59 +0000 (11:17 +0800)] 
clocksource/drivers/imx-sysctr: Add i.MX95 support

To i.MX95 System counter module, we use Read register space to get
the counter, not the Control register space to get the counter, because
System Manager firmware not allow Linux to read Control register space,
so add a new TIMER_OF_DECLARE entry for i.MX95.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240205-imx-sysctr-v4-3-ca5a6e1552e7@nxp.com
2 months agoclocksource/drivers/imx-sysctr: Drop use global variables
Peng Fan [Mon, 5 Feb 2024 03:17:58 +0000 (11:17 +0800)] 
clocksource/drivers/imx-sysctr: Drop use global variables

Clean up code to not use global variables and introduce sysctr_private
structure to prepare the support for i.MX95.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240205-imx-sysctr-v4-2-ca5a6e1552e7@nxp.com
2 months agodt-bindings: timer: nxp,sysctr-timer: support i.MX95
Peng Fan [Mon, 5 Feb 2024 03:17:57 +0000 (11:17 +0800)] 
dt-bindings: timer: nxp,sysctr-timer: support i.MX95

Add i.MX95 System counter module compatible string, the SCMI
firmware blocks access to control register, so should not
add "nxp,sysctr-timer" as fallback.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240205-imx-sysctr-v4-1-ca5a6e1552e7@nxp.com
2 months agoMerge tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 18 Feb 2024 00:59:31 +0000 (16:59 -0800)] 
Merge tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "This is a bit of a big batch for rc4, but just due to holiday hangover
  and because I didn't send any fixes last week due to a late revert
  request. I think next week should be back to normal.

   - Fix ftrace bug on boot caused by exit text sections with
     '-fpatchable-function-entry'

   - Fix accuracy of stolen time on pseries since the switch to
     VIRT_CPU_ACCOUNTING_GEN

   - Fix a crash in the IOMMU code when doing DLPAR remove

   - Set pt_regs->link on scv entry to fix BPF stack unwinding

   - Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke
     gdb

   - Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled

   - Fix build failures with KASAN enabled and 32KB stack size

   - Some other minor fixes

  Thanks to Arnd Bergmann, Benjamin Gray, Christophe Leroy, David
  Engraf, Gaurav Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias
  Schiffer, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A,
  R Nageswara Sastry, Shivaprasad G Bhat, Shrikanth Hegde, Spoorthy,
  Srikar Dronamraju, and Venkat Rao Bagalkote"

* tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
  powerpc/pseries: fix accuracy of stolen time
  powerpc/ftrace: Ignore ftrace locations in exit text sections
  powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
  powerpc/kasan: Limit KASAN thread size increase to 32KB
  Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
  powerpc: 85xx: mark local functions static
  powerpc: udbg_memcons: mark functions static
  powerpc/kasan: Fix addr error caused by page alignment
  powerpc/6xx: set High BAT Enable flag on G2_LE cores
  selftests/powerpc/papr_vpd: Check devfd before get_system_loc_code()
  powerpc/64: Set task pt_regs->link to the LR value on scv entry
  powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
  powerpc/pseries/papr-sysparm: use u8 arrays for payloads

2 months agoMerge tag 'bcachefs-2024-02-17' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Sat, 17 Feb 2024 21:17:32 +0000 (13:17 -0800)] 
Merge tag 'bcachefs-2024-02-17' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Mostly pretty trivial, the user visible ones are:

   - don't barf when replicas_required > replicas

   - fix check_version_upgrade() so it doesn't do something nonsensical
     when we're downgrading"

* tag 'bcachefs-2024-02-17' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix missing va_end()
  bcachefs: Fix check_version_upgrade()
  bcachefs: Clamp replicas_required to replicas
  bcachefs: fix missing endiannes conversion in sb_members
  bcachefs: fix kmemleak in __bch2_read_super error handling path
  bcachefs: Fix missing bch2_err_class() calls

2 months agoMerge tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 17 Feb 2024 16:56:41 +0000 (08:56 -0800)] 
Merge tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some driver core fixes, a kobject fix, and a documentation
  update for 6.8-rc5. In detail these changes are:

   - devlink fixes for reported issues with 6.8-rc1

   - topology scheduling regression fix that has been reported by many

   - kobject loosening of checks change in -rc1 is now reverted as some
     codepaths seemed to need the checks

   - documentation update for the CVE process. Has been reviewed by
     many, the last minute change to the document was to bring the .rst
     format back into the the new style rules, the contents did not
     change.

  All of these, except for the documentation update, have been in
  linux-next for over a week. The documentation update has been reviewed
  for weeks by a group of developers, and in public for a week and the
  wording has stabilized for now. If future changes are needed, we can
  do so before 6.8-final is out (or anytime after that)"

* tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: Document the Linux Kernel CVE process
  Revert "kobject: Remove redundant checks for whether ktype is NULL"
  driver core: fw_devlink: Improve logs for cycle detection
  driver core: fw_devlink: Improve detection of overlapping cycles
  driver core: Fix device_link_flag_is_sync_state_only()
  topology: Set capacity_freq_ref in all cases

2 months agoMerge tag 'char-misc-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 17 Feb 2024 16:52:38 +0000 (08:52 -0800)] 
Merge tag 'char-misc-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / miscdriver fixes from Greg KH:
 "Here is a small set of char/misc and IIO driver fixes for 6.8-rc5.

  Included in here are:

   - lots of iio driver fixes for reported issues

   - nvmem device naming fixup for reported problem

   - interconnect driver fixes for reported issues

  All of these have been in linux-next for a while with no reported the
  issues (the nvmem patch was included in a different branch in
  linux-next before sent to me for inclusion here)"

* tag 'char-misc-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  nvmem: include bit index in cell sysfs file name
  iio: adc: ad4130: only set GPIO_CTRL if pin is unused
  iio: adc: ad4130: zero-initialize clock init data
  interconnect: qcom: x1e80100: Add missing ACV enable_mask
  interconnect: qcom: sm8650: Use correct ACV enable_mask
  iio: accel: bma400: Fix a compilation problem
  iio: commom: st_sensors: ensure proper DMA alignment
  iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP
  iio: move LIGHT_UVA and LIGHT_UVB to the end of iio_modifier
  staging: iio: ad5933: fix type mismatch regression
  iio: humidity: hdc3020: fix temperature offset
  iio: adc: ad7091r8: Fix error code in ad7091r8_gpio_setup()
  iio: adc: ad_sigma_delta: ensure proper DMA alignment
  iio: imu: adis: ensure proper DMA alignment
  iio: humidity: hdc3020: Add Makefile, Kconfig and MAINTAINERS entry
  iio: imu: bno055: serdev requires REGMAP
  iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
  iio: pressure: bmp280: Add missing bmp085 to SPI id table
  iio: core: fix memleak in iio_device_register_sysfs
  interconnect: qcom: sm8550: Enable sync_state
  ...

2 months agoMerge tag 'tty-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 17 Feb 2024 16:46:57 +0000 (08:46 -0800)] 
Merge tag 'tty-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial fixes from Greg KH:
 "Here are three small tty and serial driver fixes for 6.8-rc5:

   - revert a 8250_pci1xxxx off-by-one change that was incorrect

   - two changes to fix the transmit path of the mxs-auart driver,
     fixing a regression in the 6.2 release

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'tty-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: mxs-auart: fix tx
  serial: core: introduce uart_port_tx_flags()
  serial: 8250_pci1xxxx: partially revert off by one patch

2 months agoMerge tag 'usb-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 17 Feb 2024 16:44:55 +0000 (08:44 -0800)] 
Merge tag 'usb-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt fixes from Greg KH:
 "Here are two small fixes for 6.8-rc5:

   - thunderbolt to fix a reported issue on many platforms

   - dwc3 driver revert of a commit that caused problems in -rc1

  Both of these changes have been in linux-next for over a week with no
  reported issues"

* tag 'usb-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  Revert "usb: dwc3: Support EBC feature of DWC_usb31"
  thunderbolt: Fix setting the CNS bit in ROUTER_CS_5

2 months agoMerge tag 'media/v6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sat, 17 Feb 2024 16:13:32 +0000 (08:13 -0800)] 
Merge tag 'media/v6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - regression fix for rkisp1 shared IRQ logic

 - fix atomisp breakage due to a kAPI change

 - permission fix for remote controller BPF support

 - memleak fix in ir_toy driver

 - Kconfig dependency fix for pwm-ir-rx

* tag 'media/v6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: pwm-ir-tx: Depend on CONFIG_HIGH_RES_TIMERS
  media: ir_toy: fix a memleak in irtoy_tx
  media: rc: bpf attach/detach requires write permission
  media: atomisp: Adjust for v4l2_subdev_state handling changes in 6.8
  media: rkisp1: Fix IRQ handling due to shared interrupts
  media: Revert "media: rkisp1: Drop IRQF_SHARED"

2 months agoMerge tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Sat, 17 Feb 2024 16:06:20 +0000 (08:06 -0800)] 
Merge tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Keep bridges in D0 if we need to poll downstream devices for PME to
   resolve a v6.6 regression where we failed to enumerate devices below
   bridges put in D3hot by runtime PM, e.g., NVMe drives connected via
   Thunderbolt or USB4 docks (Alex Williamson)

 - Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer

* tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
  PCI: Fix active state requirement in PME polling

2 months agoMerge tag 'probes-fixes-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 17 Feb 2024 15:59:47 +0000 (07:59 -0800)] 
Merge tag 'probes-fixes-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fix from Masami Hiramatsu:

 - tracing/probes: Fix BTF structure member finder to find the members
   which are placed after any anonymous union member correctly.

* tag 'probes-fixes-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Fix to search structure fields correctly

2 months agoMerge tag '6.8-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 17 Feb 2024 15:56:10 +0000 (07:56 -0800)] 
Merge tag '6.8-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Five smb3 client fixes, most also for stable:

   - Two multichannel fixes (one to fix potential handle leak on retry)

   - Work around possible serious data corruption (due to change in
     folios in 6.3, for cases when non standard maximum write size
     negotiated)

   - Symlink creation fix

   - Multiuser automount fix"

* tag '6.8-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: Fix regression in writes when non-standard maximum write size negotiated
  smb: client: handle path separator of created SMB symlinks
  smb: client: set correct id, uid and cruid for multiuser automounts
  cifs: update the same create_guid on replay
  cifs: fix underflow in parse_server_interfaces()

2 months agoDocumentation: Document the Linux Kernel CVE process
Greg Kroah-Hartman [Sat, 17 Feb 2024 12:55:31 +0000 (13:55 +0100)] 
Documentation: Document the Linux Kernel CVE process

The Linux kernel project now has the ability to assign CVEs to fixed
issues, so document the process and how individual developers can get a
CVE if one is not automatically assigned for their fixes.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/2024021731-essence-sadness-28fd@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 months agotracing/probes: Fix to search structure fields correctly
Masami Hiramatsu (Google) [Sat, 17 Feb 2024 12:25:42 +0000 (21:25 +0900)] 
tracing/probes: Fix to search structure fields correctly

Fix to search a field from the structure which has anonymous union
correctly.
Since the reference `type` pointer was updated in the loop, the search
loop suddenly aborted where it hits an anonymous union. Thus it can not
find the field after the anonymous union. This avoids updating the
cursor `type` pointer in the loop.

Link: https://lore.kernel.org/all/170791694361.389532.10047514554799419688.stgit@devnote2/
Fixes: 302db0f5b3d8 ("tracing/probes: Add a function to search a member of a struct/union")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2 months agoMerge tag 'i2c-host-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Wolfram Sang [Sat, 17 Feb 2024 12:13:33 +0000 (13:13 +0100)] 
Merge tag 'i2c-host-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

Three fixes are included here. Two are strictly hardware-related
for the i801 and qcom-geni devices. Meanwhile, a fix from Arnd
addresses a compilation error encountered during compile test on
powerpc.

2 months agoMAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
Siddharth Vadapalli [Fri, 16 Feb 2024 06:59:26 +0000 (12:29 +0530)] 
MAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer

Since I have been contributing to the driver for a while and wish to help
with the review process, add myself as a reviewer.

Link: https://lore.kernel.org/r/20240216065926.473805-1-s-vadapalli@ti.com
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 16 Feb 2024 22:05:02 +0000 (14:05 -0800)] 
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three fixes: the two fnic ones are a revert and a refix, which is why
  the diffstat is a bit big. The target one also extracts a function to
  add a check for configuration and so looks bigger than it is"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: fnic: Move fnic_fnic_flush_tx() to a work queue
  scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
  scsi: target: Fix unmap setup during configuration

2 months agoMerge tag 'wq-for-6.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 16 Feb 2024 22:00:19 +0000 (14:00 -0800)] 
Merge tag 'wq-for-6.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:
 "Just one patch to revert commit ca10d851b9ad ("workqueue: Override
  implicit ordered attribute in workqueue_apply_unbound_cpumask()").

  This commit could break ordering guarantees for ordered workqueues.
  The problem that the commit tried to resolve partially - making
  ordered workqueues follow unbound cpumask - is fully solved in
  wq/for-6.9 branch"

* tag 'wq-for-6.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()"

2 months agoMerge tag 'block-6.8-2024-02-16' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 16 Feb 2024 21:55:02 +0000 (13:55 -0800)] 
Merge tag 'block-6.8-2024-02-16' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
 "Just an nvme pull request via Keith:

   - Fabrics connection error handling (Chaitanya)

   - Use relaxed effects to reduce unnecessary queue freezes (Keith)"

* tag 'block-6.8-2024-02-16' of git://git.kernel.dk/linux:
  nvmet: remove superfluous initialization
  nvme: implement support for relaxed effects
  nvme-fabrics: fix I/O connect error handling

2 months agoMerge tag 'io_uring-6.8-2024-02-16' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 16 Feb 2024 21:51:20 +0000 (13:51 -0800)] 
Merge tag 'io_uring-6.8-2024-02-16' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
 "Just a single fix for a regression in how overflow is handled for
  multishot accept requests"

* tag 'io_uring-6.8-2024-02-16' of git://git.kernel.dk/linux:
  io_uring/net: fix multishot accept overflow handling

2 months agoMerge tag 'ceph-for-6.8-rc5' of https://github.com/ceph/ceph-client
Linus Torvalds [Fri, 16 Feb 2024 21:37:15 +0000 (13:37 -0800)] 
Merge tag 'ceph-for-6.8-rc5' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Additional cap handling fixes from Xiubo to avoid "client isn't
  responding to mclientcaps(revoke)" stalls on the MDS side"

* tag 'ceph-for-6.8-rc5' of https://github.com/ceph/ceph-client:
  ceph: add ceph_cap_unlink_work to fire check_caps() immediately
  ceph: always queue a writeback when revoking the Fb caps

2 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 16 Feb 2024 18:48:14 +0000 (10:48 -0800)] 
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM:

   - Avoid dropping the page refcount twice when freeing an unlinked
     page-table subtree.

   - Don't source the VFIO Kconfig twice

   - Fix protected-mode locking order between kvm and vcpus

  RISC-V:

   - Fix steal-time related sparse warnings

  x86:

   - Cleanup gtod_is_based_on_tsc() to return "bool" instead of an "int"

   - Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if
     and only if the incoming events->nmi.pending is non-zero. If the
     target vCPU is in the UNITIALIZED state, the spurious request will
     result in KVM exiting to userspace, which in turn causes QEMU to
     constantly acquire and release QEMU's global mutex, to the point
     where the BSP is unable to make forward progress.

   - Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl
     being incorrectly truncated, and ultimately causes KVM to think a
     fixed counter has already been disabled (KVM thinks the old value
     is '0').

   - Fix a stack leak in KVM_GET_MSRS where a failed MSR read from
     userspace that is ultimately ignored due to ignore_msrs=true
     doesn't zero the output as intended.

  Selftests cleanups and fixes:

   - Remove redundant newlines from error messages.

   - Delete an unused variable in the AMX test (which causes build
     failures when compiling with -Werror).

   - Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails
     with an error code other than ENOENT (a Hyper-V selftest bug
     resulted in an EMFILE, and the test eventually got skipped).

   - Fix TSC related bugs in several Hyper-V selftests.

   - Fix a bug in the dirty ring logging test where a sem_post() could
     be left pending across multiple runs, resulting in incorrect
     synchronization between the main thread and the vCPU worker thread.

   - Relax the dirty log split test's assertions on 4KiB mappings to fix
     false positives due to the number of mappings for memslot 0 (used
     for code and data that is NOT being dirty logged) changing, e.g.
     due to NUMA balancing"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked()
  RISC-V: KVM: Use correct restricted types
  RISC-V: paravirt: Use correct restricted types
  RISC-V: paravirt: steal_time should be static
  KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test
  KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
  KVM: x86: Fix KVM_GET_MSRS stack info leak
  KVM: arm64: Do not source virt/lib/Kconfig twice
  KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
  KVM: x86: Make gtod_is_based_on_tsc() return 'bool'
  KVM: selftests: Make hyperv_clock require TSC based system clocksource
  KVM: selftests: Run clocksource dependent tests with hyperv_clocksource_tsc_page too
  KVM: selftests: Use generic sys_clocksource_is_tsc() in vmx_nested_tsc_scaling_test
  KVM: selftests: Generalize check_clocksource() from kvm_clock_test
  KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
  KVM: arm64: Fix circular locking dependency
  KVM: selftests: Fail tests when open() fails with !ENOENT
  KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing
  KVM: selftests: Delete superfluous, unused "stage" variable in AMX test
  KVM: selftests: x86_64: Remove redundant newlines
  ...

2 months agoMerge tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Fri, 16 Feb 2024 18:33:51 +0000 (10:33 -0800)] 
Merge tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix the #ifndef that didn't have the 'CONFIG_' prefix on
   HAVE_DYNAMIC_FTRACE_WITH_REGS

   The fix to have dynamic trampolines work with x86 broke arm64 as the
   config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not
   CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the
   previous fix was to fix.

 - Fix tracing_on state

   The code to test if "tracing_on" is set incorrectly used
   ring_buffer_record_is_on() which returns false if the ring buffer
   isn't able to be written to.

   But the ring buffer disable has several bits that disable it. One is
   internal disabling which is used for resizing and other modifications
   of the ring buffer. But the "tracing_on" user space visible flag
   should only report if tracing is actually on and not internally
   disabled, as this can cause confusion as writing "1" when it is
   disabled will not enable it.

   Instead use ring_buffer_record_is_set_on() which shows the user space
   visible settings.

 - Fix a false positive kmemleak on saved cmdlines

   Now that the saved_cmdlines structure is allocated via alloc_page()
   and not via kmalloc() it has become invisible to kmemleak. The
   allocation done to one of its pointers was flagged as a dangling
   allocation leak. Make kmemleak aware of this allocation and free.

 - Fix synthetic event dynamic strings

   An update that cleaned up the synthetic event code removed the return
   value of trace_string(), and had it return zero instead of the
   length, causing dynamic strings in the synthetic event to always have
   zero size.

 - Clean up documentation and header files for seq_buf

* tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  seq_buf: Fix kernel documentation
  seq_buf: Don't use "proxy" headers
  tracing/synthetic: Fix trace_string() return value
  tracing: Inform kmemleak of saved_cmdlines allocation
  tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on()
  tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef