From: Mark Rutland Date: Tue, 7 Apr 2026 13:16:45 +0000 (+0100) Subject: entry: Split preemption from irqentry_exit_to_kernel_mode() X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=041aa7a85390c99b1de86dc28eddcff0890d8186;p=thirdparty%2Fkernel%2Flinux.git entry: Split preemption from irqentry_exit_to_kernel_mode() Some architecture-specific work needs to be performed between the state management for exception entry/exit and the "real" work to handle the exception. For example, arm64 needs to manipulate a number of exception masking bits, with different exceptions requiring different masking. Generally this can all be hidden in the architecture code, but for arm64 the current structure of irqentry_exit_to_kernel_mode() makes this particularly difficult to handle in a way that is correct, maintainable, and efficient. The gory details are described in the thread surrounding: https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/ The summary is: * Currently, irqentry_exit_to_kernel_mode() handles both involuntary preemption AND state management necessary for exception return. * When scheduling (including involuntary preemption), arm64 needs to have all arm64-specific exceptions unmasked, though regular interrupts must be masked. * Prior to the state management for exception return, arm64 needs to mask a number of arm64-specific exceptions, and perform some work with these exceptions masked (with RCU watching, etc). While in theory it is possible to handle this with a new arch_*() hook called somewhere under irqentry_exit_to_kernel_mode(), this is fragile and complicated, and doesn't match the flow used for exception return to user mode, which has a separate 'prepare' step (where preemption can occur) prior to the state management. To solve this, refactor irqentry_exit_to_kernel_mode() to match the style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic into a new irqentry_exit_to_kernel_mode_preempt() function, and moving state management in a new irqentry_exit_to_kernel_mode_after_preempt() function. The existing irqentry_exit_to_kernel_mode() is left as a caller of both of these, avoiding the need to modify existing callers. There should be no functional change as a result of this change. [ tglx: Updated kernel doc ] Signed-off-by: Mark Rutland Signed-off-by: Thomas Gleixner Reviewed-by: Jinjie Ruan Acked-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260407131650.3813777-6-mark.rutland@arm.com --- diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h index 66bc168bc6b5a..384520217bfe3 100644 --- a/include/linux/irq-entry-common.h +++ b/include/linux/irq-entry-common.h @@ -438,24 +438,46 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p } /** - * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after - * invoking the interrupt handler + * irqentry_exit_to_kernel_mode_preempt - Run preempt checks on return to kernel mode * @regs: Pointer to current's pt_regs * @state: Return value from matching call to irqentry_enter_from_kernel_mode() * - * This is the counterpart of irqentry_enter_from_kernel_mode() and runs the - * necessary preemption check if possible and required. It returns to the caller - * with interrupts disabled and the correct state vs. tracing, lockdep and RCU - * required to return to the interrupted context. + * This is to be invoked before irqentry_exit_to_kernel_mode_after_preempt() to + * allow kernel preemption on return from interrupt. + * + * Must be invoked with interrupts disabled and CPU state which allows kernel + * preemption. * - * It is the last action before returning to the low level ASM code which just - * needs to return. + * After returning from this function, the caller can modify CPU state before + * invoking irqentry_exit_to_kernel_mode_after_preempt(), which is required to + * re-establish the tracing, lockdep and RCU state for returning to the + * interrupted context. */ -static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, - irqentry_state_t state) +static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, + irqentry_state_t state) { - lockdep_assert_irqs_disabled(); + if (regs_irqs_disabled(regs) || state.exit_rcu) + return; + + if (IS_ENABLED(CONFIG_PREEMPTION)) + irqentry_exit_cond_resched(); +} +/** + * irqentry_exit_to_kernel_mode_after_preempt - Establish trace, lockdep and RCU state + * @regs: Pointer to current's pt_regs + * @state: Return value from matching call to irqentry_enter_from_kernel_mode() + * + * This is to be invoked after irqentry_exit_to_kernel_mode_preempt() and before + * actually returning to the interrupted context. + * + * There are no requirements for the CPU state other than being able to complete + * the tracing, lockdep and RCU state transitions. After this function returns + * the caller must return directly to the interrupted context. + */ +static __always_inline void +irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state) +{ if (!regs_irqs_disabled(regs)) { /* * If RCU was not watching on entry this needs to be done @@ -474,9 +496,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, } instrumentation_begin(); - if (IS_ENABLED(CONFIG_PREEMPTION)) - irqentry_exit_cond_resched(); - /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); @@ -490,6 +509,32 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, } } +/** + * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after + * invoking the interrupt handler + * @regs: Pointer to current's pt_regs + * @state: Return value from matching call to irqentry_enter_from_kernel_mode() + * + * This is the counterpart of irqentry_enter_from_kernel_mode() and combines + * the calls to irqentry_exit_to_kernel_mode_preempt() and + * irqentry_exit_to_kernel_mode_after_preempt(). + * + * The requirement for the CPU state is that it can schedule. After the function + * returns the tracing, lockdep and RCU state transitions are completed and the + * caller must return directly to the interrupted context. + */ +static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, + irqentry_state_t state) +{ + lockdep_assert_irqs_disabled(); + + instrumentation_begin(); + irqentry_exit_to_kernel_mode_preempt(regs, state); + instrumentation_end(); + + irqentry_exit_to_kernel_mode_after_preempt(regs, state); +} + /** * irqentry_enter - Handle state tracking on ordinary interrupt entries * @regs: Pointer to pt_regs of interrupted context