James Morse [Thu, 15 May 2025 16:58:55 +0000 (16:58 +0000)]
MAINTAINERS: Add reviewers for fs/resctrl
resctrl has existed for quite a while as a filesystem interface private to
arch/x86. To allow other architectures to support the same user interface
for similar hardware features, it has been moved to /fs/.
Add those with a vested interest in the common code as reviewers.
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Dave Martin <Dave.Martin@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-26-james.morse@arm.com
James Morse [Thu, 15 May 2025 16:58:53 +0000 (16:58 +0000)]
x86/resctrl: Always initialise rid field in rdt_resources_all[]
x86 has an array, rdt_resources_all[], of all possible resources.
The for-each-resource walkers depend on the rid field of all
resources being initialised.
If the array ever grows due to another architecture adding a resource
type that is not defined on x86, the for-each-resources walkers will
loop forever.
Initialise all the rid values in resctrl_arch_late_init() before
any for-each-resource walker can be called.
Dave Martin [Thu, 15 May 2025 16:58:51 +0000 (16:58 +0000)]
x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
rdt_init_fs_context() sizes a typed allocation using an explicit
sizeof(type) expression, which checkpatch.pl complains about.
Since this code is about to be factored out and made generic, this
is a good opportunity to fix the code to size the allocation based
on the target pointer instead, to reduce the chance of future mis-
maintenance.
Fix it.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-22-james.morse@arm.com
Dave Martin [Thu, 15 May 2025 16:58:50 +0000 (16:58 +0000)]
x86/resctrl: Squelch whitespace anomalies in resctrl core code
checkpatch.pl complains about some whitespace anomalies in the
resctrl core code.
This doesn't matter, but since this code is about to be factored
out and made generic, this is a good opportunity to fix these
issues and so reduce future checkpatch fuzz.
Fix them.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-21-james.morse@arm.com
James Morse [Thu, 15 May 2025 16:58:49 +0000 (16:58 +0000)]
x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
The resctrl pseudo-lock feature allows an architecture to allocate data
into particular cache portions, which are then treated as reserved to
avoid that data ever being evicted. Setting this up is deeply architecture
specific as it involves disabling prefetchers etc. It is not possible
to support this kind of feature on arm64. Risc-V is assumed to be the
same.
The prototypes for the architecture code were added to x86's asm/resctrl.h,
with other architectures able to provide stubs for their architecture. This
forces other architectures to provide identical stubs.
Move the prototypes and stubs to linux/resctrl.h, and switch between them
using the existing Kconfig symbol.
James Morse [Thu, 15 May 2025 16:58:48 +0000 (16:58 +0000)]
x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
resctrl_arch_mon_ctx_alloc() and resctrl_arch_mon_ctx_free() take an enum
resctrl_event_id that is already defined in resctrl_types.h to be
accessible to asm/resctrl.h.
James Morse [Thu, 15 May 2025 16:58:47 +0000 (16:58 +0000)]
x86/resctrl: Move enum resctrl_event_id to resctrl.h
resctrl_types.h contains common types and constants to enable architectures
to use these types in their definitions within asm/resctrl.h.
enum resctrl_event_id was placed in resctrl_types.h for
resctrl_arch_get_cdp_enabled() and resctrl_arch_set_cdp_enabled(), but
these two functions are no longer inlined by any architecture.
James Morse [Thu, 15 May 2025 16:58:44 +0000 (16:58 +0000)]
x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
The resctrl documentation is titled "User Interface for Resource Control
feature".
Once the documentation follows the code in a move to the filesystem, this
appears in the list of filesystems, but doesn't contain the name of the
filesystem, making it hard to find.
James Morse [Thu, 15 May 2025 16:58:43 +0000 (16:58 +0000)]
x86/resctrl: Split trace.h
trace.h contains all the tracepoints. After the move to /fs/resctrl, some
of these will be left behind. All the pseudo_lock tracepoints remain part
of the architecture. The lone tracepoint in monitor.c moves to /fs/resctrl.
Split trace.h so that each C file includes a different trace header file.
This means the trace header files are not modified when they are moved.
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-14-james.morse@arm.com
James Morse [Thu, 15 May 2025 16:58:42 +0000 (16:58 +0000)]
x86/resctrl: Expand the width of domid by replacing mon_data_bits
MPAM platforms retrieve the cache-id property from the ACPI PPTT table.
The cache-id field is 32 bits wide. Under resctrl, the cache-id becomes
the domain-id, and is packed into the mon_data_bits union bitfield.
The width of cache-id in this field is 14 bits.
Expanding the union would break 32bit x86 platforms as this union is
stored as the kernfs kn->priv pointer. This saved allocating memory
for the priv data storage.
The firmware on MPAM platforms have used the PPTT cache-id field to
expose the interconnect's id for the cache, which is sparse and uses
more than 14 bits. Use of this id is to enable PCIe direct cache
injection hints. Using this feature with VFIO means the value provided
by the ACPI table should be exposed to user-space.
To support cache-id values greater than 14 bits, convert the
mon_data_bits union to a structure. These are shared between control
and monitor groups, and are allocated on first use. The list of
allocated struct mon_data is free'd when the filesystem is umount()ed.
Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-13-james.morse@arm.com
James Morse [Thu, 15 May 2025 16:58:41 +0000 (16:58 +0000)]
x86/resctrl: Add end-marker to the resctrl_event_id enum
The resctrl_event_id enum gives names to the counter event numbers on x86.
These are used directly by resctrl.
To allow the MPAM driver to keep an array of these the size of the enum
needs to be known.
Add a 'num_events' enum entry which can be used to size an array. This is
added to the enum to reduce conflicts with another series, which in turn
requires get_arch_mbm_state() to have a default case.
James Morse [Thu, 15 May 2025 16:58:39 +0000 (16:58 +0000)]
x86/resctrl: Drop __init/__exit on assorted symbols
Because ARM's MPAM controls are probed using MMIO, resctrl can't be
initialised until enough CPUs are online to have determined the system-wide
supported num_closid. Arm64 also supports 'late onlined secondaries', where
only a subset of CPUs are online during boot.
These two combine to mean the MPAM driver may not be able to initialise
resctrl until user-space has brought 'enough' CPUs online.
To allow MPAM to initialise resctrl after __init text has been free'd, remove
all the __init markings from resctrl.
The existing __exit markings cause these functions to be removed by the linker
as it has never been possible to build resctrl as a module. MPAM has an error
interrupt which causes the driver to reset and disable itself. Remove the
__exit markings to allow the MPAM driver to tear down resctrl when an error
occurs.
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-10-james.morse@arm.com
James Morse [Thu, 15 May 2025 16:58:38 +0000 (16:58 +0000)]
x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
resctrl can't be built as a module, and the kernfs helpers are not exported so
this is unlikely to change. MPAM has an error interrupt which indicates the
MPAM driver has gone haywire. Should this occur tasks could run with the wrong
control values, leading to bad performance for important tasks. In this
scenario the MPAM driver will reset the hardware, but it needs a way to tell
resctrl that no further configuration should be attempted.
In particular, moving tasks between control or monitor groups does not
interact with the architecture code, so there is no opportunity for the arch
code to indicate that the hardware is no-longer functioning.
Using resctrl_exit() for this leaves the system in a funny state as resctrl is
still mounted, but cannot be un-mounted because the sysfs directory that is
typically used has been removed. Dave Martin suggests this may cause systemd
trouble in the future as not all filesystems can be unmounted.
Add calls to remove all the files and directories in resctrl, and remove the
sysfs_remove_mount_point() call that leaves the system in a funny state. When
triggered, this causes all the resctrl files to disappear. resctrl can be
unmounted, but not mounted again.
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-9-james.morse@arm.com
James Morse [Thu, 15 May 2025 16:58:37 +0000 (16:58 +0000)]
x86/resctrl: Check all domains are offline in resctrl_exit()
resctrl_exit() removes things like the resctrl mount point directory
and unregisters the filesystem prior to freeing data structures that
were allocated during resctrl_init().
This assumes that there are no online domains when resctrl_exit() is
called. If any domain were online, the limbo or overflow handler could
be scheduled to run.
Add a check for any online control or monitor domains, and document that
the architecture code is required to offline all monitor and control
domains before calling resctrl_exit().
James Morse [Thu, 15 May 2025 16:58:36 +0000 (16:58 +0000)]
x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
resctrl_sched_in() loads the architecture specific CPU MSRs with the
CLOSID and RMID values. This function was named before resctrl was
split to have architecture specific code, and generic filesystem code.
This function is obviously architecture specific, but does not begin
with 'resctrl_arch_', making it the odd one out in the functions an
architecture needs to support to enable resctrl.
Rename it for consistency. This is purely cosmetic.
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-7-james.morse@arm.com
Amit Singh Tomar [Thu, 15 May 2025 16:58:35 +0000 (16:58 +0000)]
x86/resctrl: Remove the limit on the number of CLOSID
Resctrl allocates and finds free CLOSID values using the bits of a u32.
This restricts the number of control groups that can be created by
user-space.
MPAM has an architectural limit of 2^16 CLOSID values, Intel x86 could
be extended beyond 32 values. There is at least one MPAM platform which
supports more than 32 CLOSID values.
Replace the fixed size bitmap with calls to the bitmap API to allocate
an array of a sufficient size.
ffs() returns '1' for bit 0, hence the existing code subtracts 1 from
the index to get the CLOSID value. find_first_bit() returns the bit
number which does not need adjusting.
[ morse: fixed the off-by-one in the allocator and the wrong not-found
value. Removed the limit. Rephrase the commit message. ]
Signed-off-by: Amit Singh Tomar <amitsinght@marvell.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-6-james.morse@arm.com
With the lack of cpumask_any_andnot_but(), cpumask_any_housekeeping()
has to abuse cpumask_nth() functions.
Update cpumask_any_housekeeping() to use the new cpumask_any_but()
and cpumask_any_andnot_but(). These two functions understand
RESCTRL_PICK_ANY_CPU, which simplifies cpumask_any_housekeeping()
significantly.
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: James Morse <james.morse@arm.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-5-james.morse@arm.com
Linus Torvalds [Sun, 11 May 2025 18:30:13 +0000 (11:30 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Avoid use of uninitialized memcache pointer in user_mem_abort()
- Always set HCR_EL2.xMO bits when running in VHE, allowing
interrupts to be taken while TGE=0 and fixing an ugly bug on
AmpereOne that occurs when taking an interrupt while clearing the
xMO bits (AC03_CPU_36)
- Prevent VMMs from hiding support for AArch64 at any EL virtualized
by KVM
- Save/restore the host value for HCRX_EL2 instead of restoring an
incorrect fixed value
- Make host_stage2_set_owner_locked() check that the entire requested
range is memory rather than just the first page
RISC-V:
- Add missing reset of smstateen CSRs
x86:
- Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid
causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to
sanitize the VMCB as its state is undefined after SHUTDOWN,
emulating INIT is the least awful choice).
- Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
KVM doesn't goof a sanity check in the future.
- Free obsolete roots when (re)loading the MMU to fix a bug where
pre-faulting memory can get stuck due to always encountering a
stale root.
- When dumping GHCB state, use KVM's snapshot instead of the raw GHCB
page to print state, so that KVM doesn't print stale/wrong
information.
- When changing memory attributes (e.g. shared <=> private), add
potential hugepage ranges to the mmu_invalidate_range_{start,end}
set so that KVM doesn't create a shared/private hugepage when the
the corresponding attributes will become mixed (the attributes are
commited *after* KVM finishes the invalidation).
- Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM
has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is
loaded led to very measurable performance regressions for non-KVM
workloads"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions
KVM: arm64: Fix memory check in host_stage2_set_owner_locked()
KVM: arm64: Kill HCRX_HOST_FLAGS
KVM: arm64: Properly save/restore HCRX_EL2
KVM: arm64: selftest: Don't try to disable AArch64 support
KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL
KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()
KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing
KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields
KVM: RISC-V: reset smstateen CSRs
KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload()
KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run()
KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception
Linus Torvalds [Sun, 11 May 2025 17:33:25 +0000 (10:33 -0700)]
Merge tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc timers fixes from Ingo Molnar:
- Fix time keeping bugs in CLOCK_MONOTONIC_COARSE clocks
- Work around absolute relocations into vDSO code that GCC erroneously
emits in certain arm64 build environments
- Fix a false positive lockdep warning in the i8253 clocksource driver
* tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()
arm64: vdso: Work around invalid absolute relocations from GCC
timekeeping: Prevent coarse clocks going backwards
Linus Torvalds [Sun, 11 May 2025 17:29:29 +0000 (10:29 -0700)]
Merge tag 'input-for-v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- Synaptics touchpad on multiple laptops (Dynabook Portege X30L-G,
Dynabook Portege X30-D, TUXEDO InfinityBook Pro 14 v5, Dell Precision
M3800, HP Elitebook 850 G1) switched from PS/2 to SMBus mode
- a number of new controllers added to xpad driver: HORI Drum
controller, PowerA Fusion Pro 4, PowerA MOGA XP-Ultra controller,
8BitDo Ultimate 2 Wireless Controller, 8BitDo Ultimate 3-mode
Controller, Hyperkin DuchesS Xbox One controller
- fixes to xpad driver to properly handle Mad Catz JOYTECH NEO SE
Advanced and PDP Mirror's Edge Official controllers
- fixes to xpad driver to properly handle "Share" button on some
controllers
- a fix for device initialization timing and for waking up the
controller in cyttsp5 driver
- a fix for hisi_powerkey driver to properly wake up from s2idle state
- other assorted cleanups and fixes
* tag 'input-for-v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - fix xpad_device sorting
Input: xpad - add support for several more controllers
Input: xpad - fix Share button on Xbox One controllers
Input: xpad - fix two controller table values
Input: hisi_powerkey - enable system-wakeup for s2idle
Input: synaptics - enable InterTouch on Dell Precision M3800
Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5
Input: synaptics - enable InterTouch on Dynabook Portege X30L-G
Input: synaptics - enable InterTouch on Dynabook Portege X30-D
Input: synaptics - enable SMBus for HP Elitebook 850 G1
Input: mtk-pmic-keys - fix possible null pointer dereference
Input: xpad - add support for 8BitDo Ultimate 2 Wireless Controller
Input: cyttsp5 - fix power control issue on wakeup
MAINTAINERS: .mailmap: update Mattijs Korpershoek's email address
dt-bindings: mediatek,mt6779-keypad: Update Mattijs' email address
Input: stmpe-ts - use module alias instead of device table
Input: cyttsp5 - ensure minimum reset pulse width
Input: sparcspkr - avoid unannotated fall-through
input/joystick: magellan: Mark __nonstring look-up table
Linus Torvalds [Sun, 11 May 2025 17:23:53 +0000 (10:23 -0700)]
Merge tag 'fixes-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
- Mark set_high_memory() as __init to fix section mismatch
- Accept memory allocated in memblock_double_array() to mitigate crash
of SNP guests
* tag 'fixes-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: Accept allocated memory before use in memblock_double_array()
mm,mm_init: Mark set_high_memory as __init
Vicki Pfau [Sun, 11 May 2025 05:59:25 +0000 (22:59 -0700)]
Input: xpad - fix Share button on Xbox One controllers
The Share button, if present, is always one of two offsets from the end of the
file, depending on the presence of a specific interface. As we lack parsing for
the identify packet we can't automatically determine the presence of that
interface, but we can hardcode which of these offsets is correct for a given
controller.
More controllers are probably fixable by adding the MAP_SHARE_BUTTON in the
future, but for now I only added the ones that I have the ability to test
directly.
Vicki Pfau [Fri, 28 Mar 2025 23:43:36 +0000 (16:43 -0700)]
Input: xpad - fix two controller table values
Two controllers -- Mad Catz JOYTECH NEO SE Advanced and PDP Mirror's
Edge Official -- were missing the value of the mapping field, and thus
wouldn't detect properly.
Ulf Hansson [Thu, 6 Mar 2025 11:50:21 +0000 (12:50 +0100)]
Input: hisi_powerkey - enable system-wakeup for s2idle
To wake up the system from s2idle when pressing the power-button, let's
convert from using pm_wakeup_event() to pm_wakeup_dev_event(), as it allows
us to specify the "hard" in-parameter, which needs to be set for s2idle.
Linus Torvalds [Sat, 10 May 2025 22:50:56 +0000 (15:50 -0700)]
Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"22 hotfixes. 13 are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates"
* tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mm: fix folio_pte_batch() on XEN PV
nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs()
mm/hugetlb: copy the CMA flag when demoting
mm, swap: fix false warning for large allocation with !THP_SWAP
selftests/mm: fix a build failure on powerpc
selftests/mm: fix build break when compiling pkey_util.c
mm: vmalloc: support more granular vrealloc() sizing
tools/testing/selftests: fix guard region test tmpfs assumption
ocfs2: stop quota recovery before disabling quotas
ocfs2: implement handshaking with ocfs2 recovery thread
ocfs2: switch osb->disable_recovery to enum
mailmap: map Uwe's BayLibre addresses to a single one
MAINTAINERS: add mm THP section
mm/userfaultfd: fix uninitialized output field for -EAGAIN race
selftests/mm: compaction_test: support platform with huge mount of memory
MAINTAINERS: add core mm section
ocfs2: fix panic in failed foilio allocation
mm/huge_memory: fix dereferencing invalid pmd migration entry
MAINTAINERS: add reverse mapping section
x86: disable image size check for test builds
...
Linus Torvalds [Sat, 10 May 2025 16:53:11 +0000 (09:53 -0700)]
Merge tag 'driver-core-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core fix for a regression for platform devices
that is a regression from a change that went into 6.15-rc1 that
affected Pixel devices. It has been in linux-next for over a week with
no reported problems"
* tag 'driver-core-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
platform: Fix race condition during DMA configure at IOMMU probe time
Linus Torvalds [Sat, 10 May 2025 16:18:05 +0000 (09:18 -0700)]
Merge tag 'usb-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 6.15-rc6. Included in here
are:
- typec driver fixes
- usbtmc ioctl fixes
- xhci driver fixes
- cdnsp driver fixes
- some gadget driver fixes
Nothing really major, just all little stuff that people have reported
being issues. All of these have been in linux-next this week with no
reported issues"
* tag 'usb-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: dbc: Avoid event polling busyloop if pending rx transfers are inactive.
usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue
usb: usbtmc: Fix erroneous generic_read ioctl return
usb: usbtmc: Fix erroneous wait_srq ioctl return
usb: usbtmc: Fix erroneous get_stb ioctl error returns
usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition
USB: usbtmc: use interruptible sleep in usbtmc_read
usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version
usb: typec: ucsi: displayport: Fix NULL pointer access
usb: typec: ucsi: displayport: Fix deadlock
usb: misc: onboard_usb_dev: fix support for Cypress HX3 hubs
usb: uhci-platform: Make the clock really optional
usb: dwc3: gadget: Make gadget_wakeup asynchronous
usb: gadget: Use get_status callback to set remote wakeup capability
usb: gadget: f_ecm: Add get_status callback
usb: host: tegra: Prevent host controller crash when OTG port is used
usb: cdnsp: Fix issue with resuming from L1
usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN
Linus Torvalds [Sat, 10 May 2025 16:08:19 +0000 (09:08 -0700)]
Merge tag 'staging-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are three small staging driver fixes for 6.15-rc6. These are:
- bcm2835-camera driver fix
- two axis-fifo driver fixes
All of these have been in linux-next for a few weeks with no reported
issues"
* tag 'staging-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: axis-fifo: Remove hardware resets for user errors
staging: axis-fifo: Correct handling of tx_fifo_depth for size validation
staging: bcm2835-camera: Initialise dev in v4l2_dev
Linus Torvalds [Sat, 10 May 2025 15:52:41 +0000 (08:52 -0700)]
Merge tag 'i2c-for-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- omap: use correct function to read from device tree
- MAINTAINERS: remove Seth from ISMT maintainership
* tag 'i2c-for-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: Remove entry for Seth Heasley
i2c: omap: fix deprecated of_property_read_bool() use
Linus Torvalds [Sat, 10 May 2025 15:44:36 +0000 (08:44 -0700)]
Merge tag 'for-linus-6.15a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A fix for the xenbus driver allowing to use a PVH Dom0 with
Xenstore running in another domain
- A fix for the xenbus driver addressing a rare race condition
resulting in NULL dereferences and other problems
- A fix for the xen-swiotlb driver fixing a problem seen on Arm
platforms
* tag 'for-linus-6.15a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: Use kref to track req lifetime
xenbus: Allow PVH dom0 a non-local xenstore
xen: swiotlb: Use swiotlb bouncing if kmalloc allocation demands it
Linus Torvalds [Sat, 10 May 2025 15:36:07 +0000 (08:36 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount fixes from Al Viro:
"A couple of races around legalize_mnt vs umount (both fairly old and
hard to hit) plus two bugs in move_mount(2) - both around 'move
detached subtree in place' logics"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix IS_MNT_PROPAGATING uses
do_move_mount(): don't leak MNTNS_PROPAGATING on failures
do_umount(): add missing barrier before refcount checks in sync case
__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock
Paolo Bonzini [Sat, 10 May 2025 15:11:06 +0000 (11:11 -0400)]
Merge tag 'kvm-x86-fixes-6.15-rcN' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.15-rcN
- Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing
problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the
VMCB as its state is undefined after SHUTDOWN, emulating INIT is the
least awful choice).
- Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
KVM doesn't goof a sanity check in the future.
- Free obsolete roots when (re)loading the MMU to fix a bug where
pre-faulting memory can get stuck due to always encountering a stale
root.
- When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page
to print state, so that KVM doesn't print stale/wrong information.
- When changing memory attributes (e.g. shared <=> private), add potential
hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM
doesn't create a shared/private hugepage when the the corresponding
attributes will become mixed (the attributes are commited *after* KVM
finishes the invalidation).
- Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at
least one active VM. Effectively BP_SPEC_REDUCE when KVM is loaded led
to very measurable performance regressions for non-KVM workloads.
Paolo Bonzini [Sat, 10 May 2025 15:10:02 +0000 (11:10 -0400)]
Merge tag 'kvmarm-fixes-6.15-3' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.15, round #3
- Avoid use of uninitialized memcache pointer in user_mem_abort()
- Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts
to be taken while TGE=0 and fixing an ugly bug on AmpereOne that
occurs when taking an interrupt while clearing the xMO bits
(AC03_CPU_36)
- Prevent VMMs from hiding support for AArch64 at any EL virtualized by
KVM
- Save/restore the host value for HCRX_EL2 instead of restoring an
incorrect fixed value
- Make host_stage2_set_owner_locked() check that the entire requested
range is memory rather than just the first page
Linus Torvalds [Fri, 9 May 2025 23:45:21 +0000 (16:45 -0700)]
Merge tag '6.15-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix dentry leak which can cause umount crash
- Add warning for parse contexts error on compounded operation
* tag '6.15-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Avoid race in open_cached_dir with lease breaks
smb3 client: warn when parse contexts returns error on compounded operation
Al Viro [Thu, 8 May 2025 19:35:51 +0000 (15:35 -0400)]
fix IS_MNT_PROPAGATING uses
propagate_mnt() does not attach anything to mounts created during
propagate_mnt() itself. What's more, anything on ->mnt_slave_list
of such new mount must also be new, so we don't need to even look
there.
When move_mount() had been introduced, we've got an additional
class of mounts to skip - if we are moving from anon namespace,
we do not want to propagate to mounts we are moving (i.e. all
mounts in that anon namespace).
Unfortunately, the part about "everything on their ->mnt_slave_list
will also be ignorable" is not true - if we have propagation graph
A -> B -> C
and do OPEN_TREE_CLONE open_tree() of B, we get
A -> [B <-> B'] -> C
as propagation graph, where B' is a clone of B in our detached tree.
Making B private will result in
A -> B' -> C
C still gets propagation from A, as it would after making B private
if we hadn't done that open_tree(), but now the propagation goes
through B'. Trying to move_mount() our detached tree on subdirectory
in A should have
* moved B' on that subdirectory in A
* skipped the corresponding subdirectory in B' itself
* copied B' on the corresponding subdirectory in C.
As it is, the logics in propagation_next() and friends ends up
skipping propagation into C, since it doesn't consider anything
downstream of B'.
IOW, walking the propagation graph should only skip the ->mnt_slave_list
of new mounts; the only places where the check for "in that one
anon namespace" are applicable are propagate_one() (where we should
treat that as the same kind of thing as "mountpoint we are looking
at is not visible in the mount we are looking at") and
propagation_would_overmount(). The latter is better dealt with
in the caller (can_move_mount_beneath()); on the first call of
propagation_would_overmount() the test is always false, on the
second it is always true in "move from anon namespace" case and
always false in "move within our namespace" one, so it's easier
to just use check_mnt() before bothering with the second call and
be done with that.
Fixes: 064fe6e233e8 ("mount: handle mount propagation for detached mount trees") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 29 Apr 2025 01:43:23 +0000 (21:43 -0400)]
do_move_mount(): don't leak MNTNS_PROPAGATING on failures
as it is, a failed move_mount(2) from anon namespace breaks
all further propagation into that namespace, including normal
mounts in non-anon namespaces that would otherwise propagate
there.
Fixes: 064fe6e233e8 ("mount: handle mount propagation for detached mount trees") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 29 Apr 2025 03:56:14 +0000 (23:56 -0400)]
do_umount(): add missing barrier before refcount checks in sync case
do_umount() analogue of the race fixed in 119e1ef80ecf "fix
__legitimize_mnt()/mntput() race". Here we want to make sure that
if __legitimize_mnt() doesn't notice our lock_mount_hash(), we will
notice their refcount increment. Harder to hit than mntput_no_expire()
one, fortunately, and consequences are milder (sync umount acting
like umount -l on a rare race with RCU pathwalk hitting at just the
wrong time instead of use-after-free galore mntput_no_expire()
counterpart used to be hit). Still a bug...
Fixes: 48a066e72d97 ("RCU'd vfsmounts") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 27 Apr 2025 19:41:51 +0000 (15:41 -0400)]
__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock
... or we risk stealing final mntput from sync umount - raising mnt_count
after umount(2) has verified that victim is not busy, but before it
has set MNT_SYNC_UMOUNT; in that case __legitimize_mnt() doesn't see
that it's safe to quietly undo mnt_count increment and leaves dropping
the reference to caller, where it'll be a full-blown mntput().
Check under mount_lock is needed; leaving the current one done before
taking that makes no sense - it's nowhere near common enough to bother
with.
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Fri, 9 May 2025 19:41:34 +0000 (12:41 -0700)]
Merge tag 'drm-fixes-2025-05-10' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, bit bigger than last week, but overall amdgpu/xe
with some ivpu bits and a random few fixes, and dropping the
ttm_backup struct which wrapped struct file and was recently
frowned at.
xe:
- Prevent PF queue overflow
- Hold all forcewake during mocs test
- Remove GSC flush on reset path
- Fix forcewake put on error path
- Fix runtime warning when building without svm
i915:
- Fix oops on resume after disconnecting DP MST sinks during suspend
- Fix SPLC num_waiters refcounting
ivpu:
- Increase timeouts
- Fix deadlock in cmdq ioctl
- Unlock mutices in correct order
v3d:
- Avoid memory leak in job handling"
* tag 'drm-fixes-2025-05-10' of https://gitlab.freedesktop.org/drm/kernel: (32 commits)
drm/i915/dp: Fix determining SST/MST mode during MTP TU state computation
drm/xe: Add config control for svm flush work
drm/xe: Release force wake first then runtime power
drm/xe/gsc: do not flush the GSC worker from the reset path
drm/xe/tests/mocs: Hold XE_FORCEWAKE_ALL for LNCF regs
drm/xe: Add page queue multiplier
drm/amdgpu/hdp7: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp6: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp5: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp4: use memcfg register to post the write for HDP flush
drm/amdgpu: fix pm notifier handling
Revert "drm/amd: Stop evicting resources on APUs in suspend"
drm/amdgpu/vcn: using separate VCN1_AON_SOC offset
drm/amd/display: Fix wrong handling for AUX_DEFER case
drm/amd/display: Copy AUX read reply data whenever length > 0
drm/amd/display: Remove incorrect checking in dmub aux handler
drm/amd/display: Fix the checking condition in dmub aux handling
drm/amd/display: Shift DMUB AUX reply command if necessary
drm/amd/display: Call FP Protect Before Mode Programming/Mode Support
...
Dave Airlie [Fri, 9 May 2025 19:02:38 +0000 (05:02 +1000)]
Merge tag 'drm-xe-fixes-2025-05-09' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Prevent PF queue overflow
- Hold all forcewake during mocs test
- Remove GSC flush on reset path
- Fix forcewake put on error path
- Fix runtime warning when building without svm
Linus Torvalds [Fri, 9 May 2025 18:30:26 +0000 (11:30 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Move the arm64_use_ng_mappings variable from the .bss to the .data
section as it is accessed very early during boot with the MMU off and
before the .bss has been initialised.
This could lead to incorrect idmap page table"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpufeature: Move arm64_use_ng_mappings to the .data section to prevent wrong idmap generation
Linus Torvalds [Fri, 9 May 2025 18:17:50 +0000 (11:17 -0700)]
Merge tag 'riscv-for-linus-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- The compressed half-word misaligned access instructions (c.lhu, c.lh,
and c.sh) from the Zcb extension are now properly emulated
- A series of fixes to properly emulate permissions while handling
userspace misaligned accesses
- A pair of fixes for PR_GET_TAGGED_ADDR_CTRL to avoid accessing the
envcfg CSR on systems that don't support that CSR, and to report
those failures up to userspace
- The .rela.dyn section is no longer stripped from vmlinux, as it is
necessary to relocate the kernel under some conditions (including
kexec)
* tag 'riscv-for-linus-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Disallow PR_GET_TAGGED_ADDR_CTRL without Supm
scripts: Do not strip .rela.dyn section
riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRL
riscv: misaligned: use get_user() instead of __get_user()
riscv: misaligned: enable IRQs while handling misaligned accesses
riscv: misaligned: factorize trap handling
riscv: misaligned: Add handling for ZCB instructions
Linus Torvalds [Fri, 9 May 2025 17:34:50 +0000 (10:34 -0700)]
Merge tag 'block-6.15-20250509' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix for a regression in this series for loop and read/write iterator
handling
- zone append block update tweak
- remove a broken IO priority test
- NVMe pull request via Christoph:
- unblock ctrl state transition for firmware update (Daniel
Wagner)
* tag 'block-6.15-20250509' of git://git.kernel.dk/linux:
block: remove test of incorrect io priority level
nvme: unblock ctrl state transition for firmware update
block: only update request sector if needed
loop: Add sanity check for read/write_iter
Linus Torvalds [Fri, 9 May 2025 16:26:46 +0000 (09:26 -0700)]
Merge tag 'io_uring-6.15-20250509' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for linked timeouts arming and firing wrt prep and issue of the
request being managed by the linked timeout
- Fix for a CQE ordering issue between requests with multishot and
using the same buffer group. This is a dumbed down version for this
release and for stable, it'll get improved for v6.16
- Tweak the SQPOLL submit batch size. A previous commit made SQPOLL
manage its own task_work and chose a tiny batch size, bump it from 8
to 32 to fix a performance regression due to that
* tag 'io_uring-6.15-20250509' of git://git.kernel.dk/linux:
io_uring/sqpoll: Increase task_work submission batch size
io_uring: ensure deferred completions are flushed for multishot
io_uring: always arm linked timeouts prior to issue
Linus Torvalds [Fri, 9 May 2025 16:09:49 +0000 (09:09 -0700)]
Merge tag 'modules-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull modules fix from Petr Pavlu:
"A single fix to prevent use of an uninitialized completion pointer
when releasing a module_kobject in specific situations.
This addresses a latent bug exposed by commit f95bbfe18512 ("drivers:
base: handle module_kobject creation")"
* tag 'modules-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
module: ensure that kobject_put() is safe for module type kobjects
Dave Hansen [Thu, 8 May 2025 22:41:32 +0000 (15:41 -0700)]
x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm. But
should_flush_tlb() has a bug and suppresses the flush. Fix it by
widening the window where should_flush_tlb() sends an IPI.
Long Version:
=== History ===
There were a few things leading up to this.
First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier. But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask(). So code was added to cull
mm_cpumask() periodically[2]. But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them. So here we are
again.
=== Problem ===
The too-aggressive code in should_flush_tlb() strikes in this window:
// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.
this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!
should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed. Whoops.
=== Solution ===
Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.
This will cause more TLB flush IPIs. But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.
Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.
Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them. Add a barrier to ensure that they are observed in the
order they are written.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/ Fixes: 6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2] Reported-by: Stephen Dolan <sdolan@janestreet.com> Cc: stable@vger.kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Our QA team reported a 10%-23%, throughput reduction on an io_uring
sqpoll testcase doing IO to a null_blk, that I traced back to a
reduction of the device submission queue depth utilization. It turns out
that, after commit af5d68f8892f ("io_uring/sqpoll: manage task_work
privately"), we capped the number of task_work entries that can be
completed from a single spin of sqpoll to only 8 entries, before the
sqpoll goes around to (potentially) sleep. While this cap doesn't drive
the submission side directly, it impacts the completion behavior, which
affects the number of IO queued by fio per sqpoll cycle on the
submission side, and io_uring ends up seeing less ios per sqpoll cycle.
As a result, block layer plugging is less effective, and we see more
time spent inside the block layer in profilings charts, and increased
submission latency measured by fio.
There are other places that have increased overhead once sqpoll sleeps
more often, such as the sqpoll utilization calculation. But, in this
microbenchmark, those were not representative enough in perf charts, and
their removal didn't yield measurable changes in throughput. The major
overhead comes from the fact we plug less, and less often, when submitting
to the block layer.
In one machine, tested on top of Linux 6.15-rc1, we have the following
baseline:
READ: bw=4994MiB/s (5236MB/s), 4994MiB/s-4994MiB/s (5236MB/s-5236MB/s), io=439GiB (471GB), run=90001-90001msec
With this patch:
READ: bw=5762MiB/s (6042MB/s), 5762MiB/s-5762MiB/s (6042MB/s-6042MB/s), io=506GiB (544GB), run=90001-90001msec
which is a 15% improvement in measured bandwidth. The average
submission latency is noticeably lowered too. As measured by
fio:
Baseline:
lat (usec): min=20, max=241, avg=99.81, stdev=3.38
Patched:
lat (usec): min=26, max=226, avg=86.48, stdev=4.82
If we look at blktrace, we can also see the plugging behavior is
improved. In the baseline, we end up limited to plugging 8 requests in
the block layer regardless of the device queue depth size, while after
patching we can drive more io, and we manage to utilize the full device
queue.
In the baseline, after a stabilization phase, an ordinary submission
looks like:
254,0 1 49942 0.016028795 5977 U N [iou-sqp-5976] 7
After patching, I see consistently more requests per unplug.
254,0 1 4996 0.001432872 3145 U N [iou-sqp-3144] 32
Ideally, the cap size would at least be the deep enough to fill the
device queue, but we can't predict that behavior, or assume all IO goes
to a single device, and thus can't guess the ideal batch size. We also
don't want to let the tw run unbounded, though I'm not sure it would
really be a problem. Instead, let's just give it a more sensible value
that will allow for more efficient batching. I've tested with different
cap values, and initially proposed to increase the cap to 1024. Jens
argued it is too big of a bump and I observed that, with 32, I'm no
longer able to observe this bottleneck in any of my machines.
Imre Deak [Wed, 7 May 2025 15:19:53 +0000 (18:19 +0300)]
drm/i915/dp: Fix determining SST/MST mode during MTP TU state computation
Determining the SST/MST mode during state computation must be done based
on the output type stored in the CRTC state, which in turn is set once
based on the modeset connector's SST vs. MST type and will not change as
long as the connector is using the CRTC. OTOH the MST mode indicated by
the given connector's intel_dp::is_mst flag can change independently of
the above output type, based on what sink is at any moment plugged to
the connector.
Fix the state computation accordingly.
Cc: Jani Nikula <jani.nikula@intel.com> Fixes: f6971d7427c2 ("drm/i915/mst: adapt intel_dp_mtp_tu_compute_config() for 128b/132b SST") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4607 Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/20250507151953.251846-1-imre.deak@intel.com
(cherry picked from commit 0f45696ddb2b901fbf15cb8d2e89767be481d59f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tom Lendacky [Thu, 8 May 2025 17:24:10 +0000 (12:24 -0500)]
memblock: Accept allocated memory before use in memblock_double_array()
When increasing the array size in memblock_double_array() and the slab
is not yet available, a call to memblock_find_in_range() is used to
reserve/allocate memory. However, the range returned may not have been
accepted, which can result in a crash when booting an SNP guest:
Mitigate this by calling accept_memory() on the memory range returned
before the slab is available.
Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
parameter instead of 'start' and 'size', therefore the accept_memory()
call must be adjusted to specify 'start + size' for 'end' when applying
to kernels prior to v6.12.
Linus Torvalds [Thu, 8 May 2025 21:28:49 +0000 (14:28 -0700)]
Merge tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- Some fixes to help with filesystem analysis: ensure superblock
error count gets written if we go ERO, don't discard the journal
aggressively (so it's available for list_journal -a)
- Fix lost wakeup on arm causing us to get stuck when reading btree
nodes
- Fix fsck failing to exit on ctrl-c
- An additional fix for filesystems with misaligned bucket sizes: we
now ensure that allocations are properly aligned
- Setting background target but not promote target will now leave that
data cached on the foreground target, as it used to
- Revert a change to when we allocate the VFS superblock, this was done
for implementing blk_holder_ops but ended up not being needed, and
allocating a superblock and not setting SB_BORN while we do recovery
caused sync() calls and other things to hang
- Assorted fixes for harmless error messages that caused concern to
users
* tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs:
bcachefs: Don't aggressively discard the journal
bcachefs: Ensure superblock gets written when we go ERO
bcachefs: Filter out harmless EROFS error messages
bcachefs: journal_shutdown is EROFS, not EIO
bcachefs: Call bch2_fs_start before getting vfs superblock
bcachefs: fix hung task timeout in journal read
bcachefs: Add missing barriers before wake_up_bit()
bcachefs: Ensure proper write alignment
bcachefs: Improve want_cached_ptr()
bcachefs: thread_with_stdio: fix spinning instead of exiting
v2 (Matt):
refine commit message to have more details
add Fixes tag
move the code to xe_svm.h which already have the config
remove a blank line per codestyle suggestion
Fixes: 63f6e480d115 ("drm/xe: Add SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250502170052.1787973-1-shuicheng.lin@intel.com
(cherry picked from commit 9d80698bcd97a5ad1088bcbb055e73fd068895e2) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Shuicheng Lin [Wed, 7 May 2025 02:23:02 +0000 (02:23 +0000)]
drm/xe: Release force wake first then runtime power
xe_force_wake_get() is dependent on xe_pm_runtime_get(), so for
the release path, xe_force_wake_put() should be called first then
xe_pm_runtime_put().
Combine the error path and normal path together with goto.
Fixes: 85d547608ef5 ("drm/xe/xe_gt_debugfs: Update handling of xe_force_wake_get return") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250507022302.2187527-1-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 432cd94efdca06296cc5e76d673546f58aa90ee1) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
drm/xe/gsc: do not flush the GSC worker from the reset path
The workqueue used for the reset worker is marked as WQ_MEM_RECLAIM,
while the GSC one isn't (and can't be as we need to do memory
allocations in the gsc worker). Therefore, we can't flush the latter
from the former.
The reason why we had such a flush was to avoid interrupting either
the GSC FW load or in progress GSC proxy operations. GSC proxy
operations fall into 2 categories:
1) GSC proxy init: this only happens once immediately after GSC FW load
and does not support being interrupted. The only way to recover from
an interruption of the proxy init is to do an FLR and re-load the GSC.
2) GSC proxy request: this can happen in response to a request that
the driver sends to the GSC. If this is interrupted, the GSC FW will
timeout and the driver request will be failed, but overall the GSC
will keep working fine.
Flushing the work allowed us to avoid interruption in both cases (unless
the hang came from the GSC engine itself, in which case we're toast
anyway). However, a failure on a proxy request is tolerable if we're in
a scenario where we're triggering a GT reset (i.e., something is already
gone pretty wrong), so what we really need to avoid is interrupting
the init flow, which we can do by polling on the register that reports
when the proxy init is complete (as that ensure us that all the load and
init operations have been completed).
Note that during suspend we still want to do a flush of the worker to
make sure it completes any operations involving the HW before the power
is cut.
v2: fix spelling in commit msg, rename waiter function (Julia)
Fixes: dd0e89e5edc2 ("drm/xe/gsc: GSC FW load") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4830 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://lore.kernel.org/r/20250502155104.2201469-1-daniele.ceraolospurio@intel.com
(cherry picked from commit 12370bfcc4f0bdf70279ec5b570eb298963422b5) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Matthew Brost [Tue, 8 Apr 2025 15:59:15 +0000 (08:59 -0700)]
drm/xe: Add page queue multiplier
For an unknown reason the math to determine the PF queue size does is
not correct - compute UMD applications are overflowing the PF queue
which is fatal. A multippier of 8 fixes the problem.
Linus Torvalds [Thu, 8 May 2025 19:09:22 +0000 (12:09 -0700)]
Merge tag 'vfio-v6.15-rc6' of https://github.com/awilliam/linux-vfio
Pull vfio fix from Alex Williamson:
- Fix an issue in vfio-pci huge_fault handling by aligning faults to
the order, resulting in deterministic use of huge pages. This
avoids a race where simultaneous aligned and unaligned faults to
the same PMD can result in a VM_FAULT_OOM and subsequent VM crash.
(Alex Williamson)
* tag 'vfio-v6.15-rc6' of https://github.com/awilliam/linux-vfio:
vfio/pci: Align huge faults to order
Palmer Dabbelt [Thu, 8 May 2025 16:40:21 +0000 (09:40 -0700)]
Merge tag 'riscv-fixes-6.15-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into fixes
riscv fixes for 6.15-rc6
- A fix to handle compressed halfword load/store instructions misaligned accesses
- A fix to allow user memory access while handling a misaligned access
- 2 fixes to return an error if the pointer masking extension is not implemented on the platform but userspace still tries to access it, which caused oops on some early platforms
- A fix to prevent the stripping of .rela.dyn so that a vmlinux loaded by kexec can successfully boot
* tag 'riscv-fixes-6.15-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux:
riscv: Disallow PR_GET_TAGGED_ADDR_CTRL without Supm
scripts: Do not strip .rela.dyn section
riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRL
riscv: misaligned: use get_user() instead of __get_user()
riscv: misaligned: enable IRQs while handling misaligned accesses
riscv: misaligned: factorize trap handling
riscv: misaligned: Add handling for ZCB instructions
- gre: fix again IPv6 link-local address generation.
- eth: b53: fix learning on VLAN unaware bridges
Previous releases - always broken:
- wifi: fix out-of-bounds access during multi-link element
defragmentation
- can:
- initialize spin lock on device probe
- fix order of unregistration calls
- openvswitch: fix unsafe attribute parsing in output_userspace()
- eth:
- virtio-net: fix total qstat values
- mtk_eth_soc: reset all TX queues on DMA free
- fbnic: firmware IPC mailbox fixes"
* tag 'net-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
virtio-net: fix total qstat values
net: export a helper for adding up queue stats
fbnic: Do not allow mailbox to toggle to ready outside fbnic_mbx_poll_tx_ready
fbnic: Pull fbnic_fw_xmit_cap_msg use out of interrupt context
fbnic: Improve responsiveness of fbnic_mbx_poll_tx_ready
fbnic: Cleanup handling of completions
fbnic: Actually flush_tx instead of stalling out
fbnic: Add additional handling of IRQs
fbnic: Gate AXI read/write enabling on FW mailbox
fbnic: Fix initialization of mailbox descriptor rings
net: dsa: b53: do not set learning and unicast/multicast on up
net: dsa: b53: fix learning on VLAN unaware bridges
net: dsa: b53: fix toggling vlan_filtering
net: dsa: b53: do not program vlans when vlan filtering is off
net: dsa: b53: do not allow to configure VLAN 0
net: dsa: b53: always rejoin default untagged VLAN on bridge leave
net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave
net: dsa: b53: fix flushing old pvid VLAN on pvid change
net: dsa: b53: fix clearing PVID of a port
net: dsa: b53: keep CPU port always tagged again
...
Linus Torvalds [Thu, 8 May 2025 15:29:13 +0000 (08:29 -0700)]
Merge tag 's390-6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix potential use-after-free bug and missing error handling in PCI
code
- Fix dcssblk build error
- Fix last breaking event handling in case of stack corruption to allow
for better error reporting
- Update defconfigs
* tag 's390-6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs
s390/pci: Fix missing check for zpci_create_device() error return
s390: Update defconfigs
s390/dcssblk: Fix build error with CONFIG_DAX=m and CONFIG_DCSSBLK=y
s390/entry: Fix last breaking event handling in case of stack corruption
s390/configs: Enable options required for TC flow offload
s390/configs: Enable VDPA on Nvidia ConnectX-6 network card
Linus Torvalds [Thu, 8 May 2025 15:22:35 +0000 (08:22 -0700)]
Merge tag 'v6.15-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix UAF closing file table (e.g. in tree disconnect)
- Fix potential out of bounds write
- Fix potential memory leak parsing lease state in open
- Fix oops in rename with empty target
* tag 'v6.15-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Fix UAF in __close_file_table_ids
ksmbd: prevent out-of-bounds stream writes by validating *pos
ksmbd: fix memory leak in parse_lease_state()
ksmbd: prevent rename with empty string
Aaron Lu [Thu, 8 May 2025 08:30:36 +0000 (16:30 +0800)]
block: remove test of incorrect io priority level
Ever since commit eca2040972b4("scsi: block: ioprio: Clean up interface
definition"), the macro IOPRIO_PRIO_LEVEL() will mask the level value to
something between 0 and 7 so necessarily, level will always be lower than
IOPRIO_NR_LEVELS(8).
Remove this obsolete check.
Reported-by: Kexin Wei <ys.weikexin@h3c.com> Cc: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Aaron Lu <ziqianlu@bytedance.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250508083018.GA769554@bytedance Signed-off-by: Jens Axboe <axboe@kernel.dk>
KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions
Set the magic BP_SPEC_REDUCE bit to mitigate SRSO when running VMs if and
only if KVM has at least one active VM. Leaving the bit set at all times
unfortunately degrades performance by a wee bit more than expected.
Use a dedicated spinlock and counter instead of hooking virtualization
enablement, as changing the behavior of kvm.enable_virt_at_load based on
SRSO_BP_SPEC_REDUCE is painful, and has its own drawbacks, e.g. could
result in performance issues for flows that are sensitive to VM creation
latency.
Defer setting BP_SPEC_REDUCE until VMRUN is imminent to avoid impacting
performance on CPUs that aren't running VMs, e.g. if a setup is using
housekeeping CPUs. Setting BP_SPEC_REDUCE in task context, i.e. without
blasting IPIs to all CPUs, also helps avoid serializing 1<=>N transitions
without incurring a gross amount of complexity (see the Link for details
on how ugly coordinating via IPIs gets).
Samuel Holland [Wed, 7 May 2025 14:52:18 +0000 (07:52 -0700)]
riscv: Disallow PR_GET_TAGGED_ADDR_CTRL without Supm
When the prctl() interface for pointer masking was added, it did not
check that the pointer masking ISA extension was supported, only the
individual submodes. Userspace could still attempt to disable pointer
masking and query the pointer masking state. commit 81de1afb2dd1
("riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRL") disallowed
the former, as the senvcfg write could crash on older systems.
PR_GET_TAGGED_ADDR_CTRL state does not crash, because it reads only
kernel-internal state and not senvcfg, but it should still be disallowed
for consistency.
Fixes: 09d6775f503b ("riscv: Add support for userspace pointer masking") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20250507145230.2272871-1-samuel.holland@sifive.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
The .rela.dyn section contains runtime relocations and is only emitted
for a relocatable kernel.
riscv uses this section to relocate the kernel at runtime but that section
is stripped from vmlinux. That prevents kexec to successfully load vmlinux
since it does not contain the relocations info needed.
Fixes: 559d1e45a16d ("riscv: Use --emit-relocs in order to move .rela.dyn in init") Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250408072851.90275-1-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Fixes: 09d6775f503b ("riscv: Add support for userspace pointer masking") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20250504101920.3393053-1-namcao@linutronix.de Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
riscv: misaligned: use get_user() instead of __get_user()
Now that we can safely handle user memory accesses while in the
misaligned access handlers, use get_user() instead of __get_user() to
have user memory access checks.
Jakub Kicinski [Wed, 7 May 2025 00:32:21 +0000 (17:32 -0700)]
virtio-net: fix total qstat values
NIPA tests report that the interface statistics reported
via qstat are lower than those reported via ip link.
Looks like this is because some tests flip the queue
count up and down, and we end up with some of the traffic
accounted on disabled queues.
Jakub Kicinski [Wed, 7 May 2025 00:32:20 +0000 (17:32 -0700)]
net: export a helper for adding up queue stats
Older drivers and drivers with lower queue counts often have a static
array of queues, rather than allocating structs for each queue on demand.
Add a helper for adding up qstats from a queue range. Expectation is
that driver will pass a queue range [netdev->real_num_*x_queues, MAX).
It was tempting to always use num_*x_queues as the end, but virtio
seems to clamp its queue count after allocating the netdev. And this
way we can trivaly reuse the helper for [0, real_..).
Paolo Abeni [Thu, 8 May 2025 09:33:32 +0000 (11:33 +0200)]
Merge branch 'fbnic-fw-ipc-mailbox-fixes'
Alexander Duyck says:
====================
fbnic: FW IPC Mailbox fixes
This series is meant to address a number of issues that have been found in
the FW IPC mailbox over the past several months.
The main issues addressed are:
1. Resolve a potential race between host and FW during initialization that
can cause the FW to only have the lower 32b of an address.
2. Block the FW from issuing DMA requests after we have closed the mailbox
and before we have started issuing requests on it.
3. Fix races in the IRQ handlers that can cause the IRQ to unmask itself if
it is being processed while we are trying to disable it.
4. Cleanup the Tx flush logic so that we actually lock down the Tx path
before we start flushing it instead of letting it free run while we are
shutting it down.
5. Fix several memory leaks that could occur if we failed initialization.
6. Cleanup the mailbox completion if we are flushing Tx since we are no
longer processing Rx.
7. Move several allocations out of a potential IRQ/atomic context.
There have been a few optimizations we also picked up since then. Rather
than split them out I just folded them into these diffs. They mostly
address minor issues such as how long it takes to initialize and/or fail so
I thought they could probably go in with the rest of the patches. They
consist of:
1. Do not sleep more than 20ms waiting on FW to respond as the 200ms value
likely originated from simulation/emulation testing.
2. Use jiffies to determine timeout instead of sleep * attempts for better
accuracy.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
====================
Alexander Duyck [Tue, 6 May 2025 16:00:25 +0000 (09:00 -0700)]
fbnic: Do not allow mailbox to toggle to ready outside fbnic_mbx_poll_tx_ready
We had originally thought to have the mailbox go to ready in the background
while we were doing other things. One issue with this though is that we
can't disable it by clearing the ready state without also blocking
interrupts or calls to mbx_poll as it will just pop back to life during an
interrupt.
In order to prevent that from happening we can pull the code for toggling
to ready out of the interrupt path and instead place it in the
fbnic_mbx_poll_tx_ready path so that it becomes the only spot where the
Rx/Tx can toggle to the ready state. By doing this we can prevent races
where we disable the DMA and/or free buffers only to have an interrupt fire
and undo what we have done.
Alexander Duyck [Tue, 6 May 2025 16:00:18 +0000 (09:00 -0700)]
fbnic: Pull fbnic_fw_xmit_cap_msg use out of interrupt context
This change pulls the call to fbnic_fw_xmit_cap_msg out of
fbnic_mbx_init_desc_ring and instead places it in the polling function for
getting the Tx ready. Doing that we can avoid the potential issue with an
interrupt coming in later from the firmware that causes it to get fired in
interrupt context.