Linus Torvalds [Thu, 7 Aug 2025 04:40:01 +0000 (07:40 +0300)]
Merge tag 'input-for-v6.17-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- updates to several drivers consuming GPIO APIs to use setters
returning error codes
- an infrastructure allowing to define "overlays" for touchscreens
carving out regions implementing buttons and other elements from a
bigger sensors and a corresponding update to st1232 driver
- an update to AT/PS2 keyboard driver to map F13-F24 by default
- Samsung keypad driver got a facelift
- evdev input handler will now bind to all devices using EV_SYN event
instead of abusing id->driver_info
- two new sub-drivers implementing 1A (capacitive buttons) and 21
(forcepad button) functions in Synaptics RMI driver
- support for polling mode in Goodix touchscreen driver
- support for support for FocalTech FT8716 in edt-ft5x06 driver
- support for MT6359 in mtk-pmic-keys driver
- removal of pcf50633-input driver since platform it was used on is
gone
- new definitions for game controller "grip" buttons (BTN_GRIP*) and
corresponding changes to xpad and hid-steam controller drivers
- a new definition for "performance" key
* tag 'input-for-v6.17-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (38 commits)
HID: hid-steam: Use new BTN_GRIP* buttons
Input: add keycode for performance mode key
Input: max77693 - convert to atomic pwm operation
Input: st1232 - add touch-overlay handling
dt-bindings: input: touchscreen: st1232: add touch-overlay example
Input: touch-overlay - add touchscreen overlay handling
dt-bindings: touchscreen: add touch-overlay property
Input: atkbd - correctly map F13 - F24
Input: xpad - use new BTN_GRIP* buttons
Input: Add and document BTN_GRIP*
Input: xpad - change buttons the D-Pad gets mapped as to BTN_DPAD_*
Documentation: Fix capitalization of XBox -> Xbox
Input: synaptics-rmi4 - add support for F1A
dt-bindings: input: syna,rmi4: Document F1A function
Input: synaptics-rmi4 - add support for Forcepads (F21)
Input: mtk-pmic-keys - add support for MT6359 PMIC keys
Input: remove special handling of id->driver_info when matching
Input: evdev - switch matching to EV_SYN
Input: samsung-keypad - use BIT() and GENMASK() where appropriate
Input: samsung-keypad - use per-chip parameters
...
Linus Torvalds [Thu, 7 Aug 2025 04:38:25 +0000 (07:38 +0300)]
Merge tag 'for-linus-6.17-1' of https://github.com/cminyard/linux-ipmi
Pull ipmi updates from Corey Minyard:
"Some small fixes for the IPMI driver
Nothing huge, some rate limiting on logs, a strncpy fix where the
source and destination could be the same, and removal of some unused
cruft"
* tag 'for-linus-6.17-1' of https://github.com/cminyard/linux-ipmi:
ipmi: Use dev_warn_ratelimited() for incorrect message warnings
char: ipmi: remove redundant variable 'type' and check
ipmi: Fix strcpy source and destination the same
Linus Torvalds [Thu, 7 Aug 2025 04:36:23 +0000 (07:36 +0300)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fix from Jason Gunthorpe:
"Single fix to correct the iov_iter construction in soft iwarp. This
avoids blktest crashes with recent changes to the allocators"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
Linus Torvalds [Thu, 7 Aug 2025 04:32:50 +0000 (07:32 +0300)]
Merge tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Fix imbalance where the no-iommu/cdev device path skips too much on
open, failing to increment a reference, but still decrements the
reference on close. Add bounds checking to prevent such underflows
(Jacob Pan)
- Fill missing detach_ioas op for pds_vfio_pci, fixing probe failure
when used with IOMMUFD (Brett Creeley)
- Split SR-IOV VFs to separate dev_set, avoiding unnecessary
serialization between VFs that appear on the same bus (Alex
Williamson)
- Fix a theoretical integer overflow is the mlx5-vfio-pci variant
driver (Artem Sadovnikov)
- Implement missing VF token checking support via vfio cdev/IOMMUFD
interface (Jason Gunthorpe)
- Add a cond_resched() call to avoid holding the CPU too long during
DMA mapping operations (Keith Busch)
* tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfio:
vfio/type1: conditional rescheduling while pinning
vfio/qat: add support for intel QAT 6xxx virtual functions
vfio/qat: Remove myself from VFIO QAT PCI driver maintainers
vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD
vfio/mlx5: fix possible overflow in tracking max message size
vfio/pci: Separate SR-IOV VF dev_set
vfio/pds: Fix missing detach_ioas op
vfio: Prevent open_count decrement to negative
vfio: Fix unbalanced vfio_df_close call in no-iommu mode
Linus Torvalds [Wed, 6 Aug 2025 12:52:56 +0000 (15:52 +0300)]
Merge tag 'for-6.17-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"A single btrfs commit. It fixes a problem that people started to hit
since 6.15.3 during log replay (e.g. after a crash).
The bug is old but got more likely to happen since commit 5e85262e542d ("btrfs: fix fsync of files with no hard links not
persisting deletion") got backported to stable (6.15 only)"
* tag 'for-6.17-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix log tree replay failure due to file with 0 links and extents
Linus Torvalds [Wed, 6 Aug 2025 12:44:25 +0000 (15:44 +0300)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is mostly fixes and cleanups and code reworks that trickled in
across the merge window and the weeks leading up. The only substantive
update is the Mediatek ufs driver which accounts for the bulk of the
additions"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (37 commits)
scsi: libsas: Use a bool for sas_deform_port() second argument
scsi: libsas: Move declarations of internal functions to sas_internal.h
scsi: libsas: Make sas_get_ata_info() static
scsi: libsas: Simplify sas_ata_wait_eh()
scsi: libsas: Refactor dev_is_sata()
scsi: sd: Make sd shutdown issue START STOP UNIT appropriately
scsi: arm64: dts: mediatek: mt8195: Add UFSHCI node
scsi: dt-bindings: mediatek,ufs: add MT8195 compatible and update clock nodes
scsi: dt-bindings: mediatek,ufs: Add ufs-disable-mcq flag for UFS host
scsi: ufs: ufs-mediatek: Add UFS host support for MT8195 SoC
scsi: ufs: ufs-pci: Remove control of UIC Completion interrupt for Intel MTL
scsi: ufs: core: Do not write interrupt enable register unnecessarily
scsi: ufs: core: Set and clear UIC Completion interrupt as needed
scsi: ufs: core: Remove duplicated code in ufshcd_send_bsg_uic_cmd()
scsi: ufs: core: Move ufshcd_enable_intr() and ufshcd_disable_intr()
scsi: ufs: ufs-pci: Remove UFS PCI driver's ->late_init() call back
scsi: ufs: ufs-pci: Fix default runtime and system PM levels
scsi: ufs: ufs-pci: Fix hibernate state transition for Intel MTL-like host controllers
scsi: ufs: host: mediatek: Support FDE (AES) clock scaling
scsi: ufs: host: mediatek: Support clock scaling with Vcore binding
...
Filipe Manana [Wed, 30 Jul 2025 18:18:37 +0000 (19:18 +0100)]
btrfs: fix log tree replay failure due to file with 0 links and extents
If we log a new inode (not persisted in a past transaction) that has 0
links and extents, then log another inode with an higher inode number, we
end up with failing to replay the log tree with -EINVAL. The steps for
this are:
1) create new file A
2) write some data to file A
3) open an fd on file A
4) unlink file A
5) fsync file A using the previously open fd
6) create file B (has higher inode number than file A)
7) fsync file B
8) power fail before current transaction commits
Now when attempting to mount the fs, the log replay will fail with
-ENOENT at replay_one_extent() when attempting to replay the first
extent of file A. The failure comes when trying to open the inode for
file A in the subvolume tree, since it doesn't exist.
Before commit 5f61b961599a ("btrfs: fix inode lookup error handling
during log replay"), the returned error was -EIO instead of -ENOENT,
since we converted any errors when attempting to read an inode during
log replay to -EIO.
The reason for this is that the log replay procedure fails to ignore
the current inode when we are at the stage LOG_WALK_REPLAY_ALL, our
current inode has 0 links and last inode we processed in the previous
stage has a non 0 link count. In other words, the issue is that at
replay_one_extent() we only update wc->ignore_cur_inode if the current
replay stage is LOG_WALK_REPLAY_INODES.
Fix this by updating wc->ignore_cur_inode whenever we find an inode item
regardless of the current replay stage. This is a simple solution and easy
to backport, but later we can do other alternatives like avoid logging
extents or inode items other than the inode item for inodes with a link
count of 0.
The problem with the wc->ignore_cur_inode logic has been around since
commit f2d72f42d5fa ("Btrfs: fix warning when replaying log after fsync
of a tmpfile") but it only became frequent to hit since the more recent
commit 5e85262e542d ("btrfs: fix fsync of files with no hard links not
persisting deletion"), because we stopped skipping inodes with a link
count of 0 when logging, while before the problem would only be triggered
if trying to replay a log tree created with an older kernel which has a
logged inode with 0 links.
Linus Torvalds [Wed, 6 Aug 2025 05:52:59 +0000 (08:52 +0300)]
Merge tag 'ata-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Damien Le Moal:
- Cleanup whitespace in messages in libata-core and the pata_pdc2027x,
pata_macio drivers (Colin)
- Fix ata_to_sense_error() to avoid seeing nonsensical sense data for
rare cases where we fail to get sense data from the drive. The
complementary fix to this is to ensure that we always return the
generic "ABORTED COMMAND" sense data for a failed command for which
we have no status or error fields
- The recent changes to link power management (LPM) which now prevent
the user from attempting to set an LPM policy through the
link_power_management_policy caused some regressions in test
environments because of the error that is now returned when writing
to that attribute when LPM is not supported. To allow users to not
trip on this, introduce the new link_power_management_supported
attribute to allow simple testing of a port/device LPM support (me)
* tag 'ata-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: pata_pdc2027x: Remove space before newline and abbreviations
ata: pata_macio: Remove space before newline
ata: libata-core: Remove space before newline
ata: libata-sata: Add link_power_management_supported sysfs attribute
ata: libata-scsi: Return aborted command when missing sense and result TF
ata: libata-scsi: Fix ata_to_sense_error() status handling
Linus Torvalds [Wed, 6 Aug 2025 04:32:52 +0000 (07:32 +0300)]
Merge tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
"This is the last pull request from me.
I'm grateful to have been able to continue as a maintainer for eight
years. From the next cycle, Nathan and Nicolas will maintain Kbuild.
- Fix a shortcut key issue in menuconfig
- Fix missing rebuild of kheaders
- Sort the symbol dump generated by gendwarfsyms
- Support zboot extraction in scripts/extract-vmlinux
- Migrate gconfig to GTK 3
- Add TAR variable to allow overriding the default tar command
- Hand over Kbuild maintainership"
* tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (92 commits)
MAINTAINERS: hand over Kbuild maintenance
kheaders: make it possible to override TAR
kbuild: userprogs: use correct linker when mixing clang and GNU ld
kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c
kconfig: lxdialog: replace strcpy with snprintf in print_autowrap
kconfig: gconf: refactor text_insert_help()
kconfig: gconf: remove unneeded variable in text_insert_msg
kconfig: gconf: use hyphens in signals
kconfig: gconf: replace GtkImageMenuItem with GtkMenuItem
kconfig: gconf: Fix Back button behavior
kconfig: gconf: fix single view to display dependent symbols correctly
scripts: add zboot support to extract-vmlinux
gendwarfksyms: order -T symtypes output by name
gendwarfksyms: use preferred form of sizeof for allocation
kconfig: qconf: confine {begin,end}Group to constructor and destructor
kconfig: qconf: fix ConfigList::updateListAllforAll()
kconfig: add a function to dump all menu entries in a tree-like format
kconfig: gconf: show GTK version in About dialog
kconfig: gconf: replace GtkHPaned and GtkVPaned with GtkPaned
kconfig: gconf: replace GdkColor with GdkRGBA
...
Sasha Levin [Tue, 5 Aug 2025 12:58:20 +0000 (08:58 -0400)]
media: venus: Fix OPP table error handling
The venus driver fails to check if dev_pm_opp_find_freq_{ceil,floor}()
returns an error pointer before calling dev_pm_opp_put(). This causes
a crash when OPP tables are not present in device tree.
Unable to handle kernel access to user memory outside uaccess routines
at virtual address 000000000000002e
...
pc : dev_pm_opp_put+0x1c/0x4c
lr : core_clks_enable+0x4c/0x16c [venus_core]
Add IS_ERR() checks before calling dev_pm_opp_put() to avoid
dereferencing error pointers.
Fixes: b179234b5e59 ("media: venus: pm_helpers: use opp-table for the frequency") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 6 Aug 2025 01:41:21 +0000 (04:41 +0300)]
Merge tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
Pull perf fixes from Thomas Gleixner:
"Perf fixes for perf_mmap() reference counting to prevent potential
reference count leaks which are caused by:
- VMA splits, which change the offset or size of a mapping, which
causes perf_mmap_close() to ignore the unmap or unmap the wrong
buffer.
- Several internal issues of perf_mmap(), which can cause reference
count leaks in the perf mmap, corrupt accounting or cause leaks in
perf drivers.
The main fix is to prevent VMA splits by implementing the
[may_]split() callback for vm operations.
The other issues are addressed by rearranging code, early returns on
failure and invocation of cleanups.
Also provide a selftest to validate the fixes.
The reference counting should be converted to refcount_t, but that
requires larger refactoring of the code and will be done once these
fixes are upstream"
* tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git:
selftests/perf_events: Add a mmap() correctness test
perf/core: Prevent VMA split of buffer mappings
perf/core: Handle buffer mapping fail correctly in perf_mmap()
perf/core: Exit early on perf_mmap() fail
perf/core: Don't leak AUX buffer refcount on allocation failure
perf/core: Preserve AUX buffer allocation failure result
Ammar Faizi [Wed, 6 Aug 2025 00:31:05 +0000 (07:31 +0700)]
net: usbnet: Fix the wrong netif_carrier_on() call
The commit referenced in the Fixes tag causes usbnet to malfunction
(identified via git bisect). Post-commit, my external RJ45 LAN cable
fails to connect. Linus also reported the same issue after pulling that
commit.
The code has a logic error: netif_carrier_on() is only called when the
link is already on. Fix this by moving the netif_carrier_on() call
outside the if-statement entirely. This ensures it is always called
when EVENT_LINK_CARRIER_ON is set and properly clears it regardless
of the link state.
Masahiro Yamada [Mon, 4 Aug 2025 14:20:07 +0000 (23:20 +0900)]
MAINTAINERS: hand over Kbuild maintenance
I'm stepping down as the maintainer of Kbuild/Kconfig.
It was enjoyable to refactor and improve the kernel build system,
but due to personal reasons, I believe it's difficult for me to
continue in this role any further.
I discussed this off-list with Nathan and Nicolas, and they have
kindly agreed to take over the maintenance of Kbuild with Odd Fixes.
I'm grateful to them for stepping in.
As for Kconfig, there are currently no designated reviewers, so the
maintainer position will remain vacant for now. I hope someone will
step up to take on the role.
Michał Górny [Tue, 29 Jul 2025 13:24:55 +0000 (15:24 +0200)]
kheaders: make it possible to override TAR
Commit 86cdd2fdc4e3 ("kheaders: make headers archive reproducible")
introduced a number of options specific to GNU tar to the `tar`
invocation in `gen_kheaders.sh` script. This causes the script to fail
to work on systems where `tar` is not GNU tar. This can occur e.g.
on recent Gentoo Linux installations that support using bsdtar from
libarchive instead.
Add a `TAR` make variable to make it possible to override the tar
executable used, e.g. by specifying:
make TAR=gtar
Link: https://bugs.gentoo.org/884061 Reported-by: Sam James <sam@gentoo.org> Tested-by: Sam James <sam@gentoo.org> Co-developed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Thomas Weißschuh [Mon, 28 Jul 2025 13:47:37 +0000 (15:47 +0200)]
kbuild: userprogs: use correct linker when mixing clang and GNU ld
The userprogs infrastructure does not expect clang being used with GNU ld
and in that case uses /usr/bin/ld for linking, not the configured $(LD).
This fallback is problematic as it will break when cross-compiling.
Mixing clang and GNU ld is used for example when building for SPARC64,
as ld.lld is not sufficient; see Documentation/kbuild/llvm.rst.
Relax the check around --ld-path so it gets used for all linkers.
Fixes: dfc1b168a8c4 ("kbuild: userprogs: use correct lld when linking through clang") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c
strcpy() performs no bounds checking and can lead to buffer overflows if
the input string exceeds the destination buffer size. This patch replaces
it with strncpy(), and null terminates the input string.
kconfig: lxdialog: replace strcpy with snprintf in print_autowrap
strcpy() does not perform bounds checking and can lead to buffer overflows
if the source string exceeds the destination buffer size. In
print_autowrap(), replace strcpy() with snprintf() to safely copy the
prompt string into the fixed-size tempstr buffer.
Keith Busch [Tue, 15 Jul 2025 18:46:22 +0000 (11:46 -0700)]
vfio/type1: conditional rescheduling while pinning
A large DMA mapping request can loop through dma address pinning for
many pages. In cases where THP can not be used, the repeated vmf_insert_pfn can
be costly, so let the task reschedule as need to prevent CPU stalls. Failure to
do so has potential harmful side effects, like increased memory pressure
as unrelated rcu tasks are unable to make their reclaim callbacks and
result in OOM conditions.
vfio/qat: add support for intel QAT 6xxx virtual functions
Extend the qat_vfio_pci variant driver to support QAT 6xxx Virtual
Functions (VFs). Add the relevant QAT 6xxx VF device IDs to the driver's
probe table, enabling proper detection and initialization of these devices.
Update the module description to reflect that the driver now supports all
QAT generations.
Signed-off-by: Małgorzata Mielnik <malgorzata.mielnik@intel.com> Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Link: https://lore.kernel.org/r/20250715081150.1244466-1-suman.kumar.chakraborty@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Jason Gunthorpe [Mon, 14 Jul 2025 16:08:25 +0000 (13:08 -0300)]
vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD
This was missed during the initial implementation. The VFIO PCI encodes
the vf_token inside the device name when opening the device from the group
FD, something like:
This is used to control access to a VF unless there is co-ordination with
the owner of the PF.
Since we no longer have a device name in the cdev path, pass the token
directly through VFIO_DEVICE_BIND_IOMMUFD using an optional field
indicated by VFIO_DEVICE_BIND_FLAG_TOKEN.
Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-bdd8716e85fe+3978a-vfio_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Marcos Alano [Tue, 5 Aug 2025 20:44:29 +0000 (13:44 -0700)]
Input: add keycode for performance mode key
Alienware calls this key "Performance Boost". Dell calls it "G-Mode".
The goal is to have a specific keycode to detect when this key is
pressed, so userspace can act upon it and do what have to do, usually
starting the power profile for performance.
Lorenzo Stoakes [Sat, 2 Aug 2025 20:55:35 +0000 (22:55 +0200)]
selftests/perf_events: Add a mmap() correctness test
Exercise various mmap(), munmap() and mremap() invocations, which might
cause a perf buffer mapping to be split or truncated.
To avoid hard coding the perf event and having dependencies on
architectures and configuration options, scan through event types in sysfs
and try to open them. On success, try to mmap() and if that succeeds try to
mmap() the AUX buffer.
In case that no AUX buffer supporting event is found, only test the base
buffer mapping. If no mappable event is found or permissions are not
sufficient, skip the tests.
Reserve a PROT_NONE region for both rb and aux tests to allow testing the
case where mremap unmaps beyond the end of a mapped VMA to prevent it from
unmapping unrelated mappings.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Co-developed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Thomas Gleixner [Wed, 30 Jul 2025 21:01:21 +0000 (23:01 +0200)]
perf/core: Prevent VMA split of buffer mappings
The perf mmap code is careful about mmap()'ing the user page with the
ringbuffer and additionally the auxiliary buffer, when the event supports
it. Once the first mapping is established, subsequent mapping have to use
the same offset and the same size in both cases. The reference counting for
the ringbuffer and the auxiliary buffer depends on this being correct.
Though perf does not prevent that a related mapping is split via mmap(2),
munmap(2) or mremap(2). A split of a VMA results in perf_mmap_open() calls,
which take reference counts, but then the subsequent perf_mmap_close()
calls are not longer fulfilling the offset and size checks. This leads to
reference count leaks.
As perf already has the requirement for subsequent mappings to match the
initial mapping, the obvious consequence is that VMA splits, caused by
resizing of a mapping or partial unmapping, have to be prevented.
Implement the vm_operations_struct::may_split() callback and return
unconditionally -EINVAL.
That ensures that the mapping offsets and sizes cannot be changed after the
fact. Remapping to a different fixed address with the same size is still
possible as it takes the references for the new mapping and drops those of
the old mapping.
Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27504 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: stable@vger.kernel.org
Thomas Gleixner [Sat, 2 Aug 2025 10:48:55 +0000 (12:48 +0200)]
perf/core: Handle buffer mapping fail correctly in perf_mmap()
After successful allocation of a buffer or a successful attachment to an
existing buffer perf_mmap() tries to map the buffer read only into the page
table. If that fails, the already set up page table entries are zapped, but
the other perf specific side effects of that failure are not handled. The
calling code just cleans up the VMA and does not invoke perf_mmap_close().
This leaks reference counts, corrupts user->vm accounting and also results
in an unbalanced invocation of event::event_mapped().
Cure this by moving the event::event_mapped() invocation before the
map_range() call so that on map_range() failure perf_mmap_close() can be
invoked without causing an unbalanced event::event_unmapped() call.
perf_mmap_close() undoes the reference counts and eventually frees buffers.
Fixes: b709eb872e19 ("perf: map pages in advance") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
Thomas Gleixner [Sat, 2 Aug 2025 10:49:48 +0000 (12:49 +0200)]
perf/core: Exit early on perf_mmap() fail
When perf_mmap() fails to allocate a buffer, it still invokes the
event_mapped() callback of the related event. On X86 this might increase
the perf_rdpmc_allowed reference counter. But nothing undoes this as
perf_mmap_close() is never called in this case, which causes another
reference count leak.
Return early on failure to prevent that.
Fixes: 1e0fb9ec679c ("perf: Add pmu callbacks to track event mapping and unmapping") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
Thomas Gleixner [Sat, 2 Aug 2025 10:39:39 +0000 (12:39 +0200)]
perf/core: Don't leak AUX buffer refcount on allocation failure
Failure of the AUX buffer allocation leaks the reference count.
Set the reference count to 1 only when the allocation succeeds.
Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
Thomas Gleixner [Mon, 4 Aug 2025 20:22:09 +0000 (22:22 +0200)]
perf/core: Preserve AUX buffer allocation failure result
A recent overhaul sets the return value to 0 unconditionally after the
allocations, which causes reference count leaks and corrupts the user->vm
accounting.
Preserve the AUX buffer allocation failure return value, so that the
subsequent code works correctly.
Fixes: 0983593f32c4 ("perf/core: Lift event->mmap_mutex in perf_mmap()") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
Pedro Falcato [Tue, 29 Jul 2025 12:03:48 +0000 (13:03 +0100)]
RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"),
we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
size_t size)
[...]
/* Calculate the number of bytes we need to push, for this page
* specifically */
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
/* If we can't splice it, then copy it in, as normal */
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
/* Set the bvec pointing to the page, with len $bytes */
bvec_set_page(&bvec, page[i], bytes, offset);
/* Set the iter to $size, aka the size of the whole sendpages (!!!) */
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
try_page_again:
lock_sock(sk);
/* Sendmsg with $size size (!!!) */
rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls
for a while. This has a been a benign bug because sendpage_ok() always
returned true. With the recent slab allocator changes being slowly
introduced into next (that disallow sendpage on large kmalloc
allocations), we have recently hit out-of-bounds crashes, due to slight
differences in iov_iter behavior between the MSG_SPLICE_PAGES and
"regular" copy paths:
(MSG_SPLICE_PAGES)
skb_splice_from_iter
iov_iter_extract_pages
iov_iter_extract_bvec_pages
uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere
skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES)
skb_copy_to_page_nocache copy=iov_iter_count
[...]
copy_from_iter
/* this doesn't help */
if (unlikely(iter->count < len))
len = iter->count;
iterate_bvec
... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the
correct byte count to tcp_sendmsg_locked.
Link: https://patch.msgid.link/r/20250729120348.495568-1-pfalcato@suse.de Cc: stable@vger.kernel.org Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linus Torvalds [Tue, 5 Aug 2025 13:55:03 +0000 (16:55 +0300)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM update from Russell King:
"Just one development update this time:
- Finish removing Coresight support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9449/1: coresight: Finish removal of Coresight support in arch/arm/kernel
Linus Torvalds [Tue, 5 Aug 2025 13:37:05 +0000 (16:37 +0300)]
Merge tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Use generic_write_sync instead of vfs_fsync_range in exfat_file_write_iter.
It will fix an issue where fdatasync would be set incorrectly.
- Fix potential infinite loop by the self-linked chain.
* tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: add cluster chain loop check for dir
exfat: fdatasync flag should be same like generic_write_sync()
Linus Torvalds [Tue, 5 Aug 2025 13:02:07 +0000 (16:02 +0300)]
Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "mseal cleanups" (Lorenzo Stoakes)
Some mseal cleaning with no intended functional change.
- "Optimizations for khugepaged" (David Hildenbrand)
Improve khugepaged throughput by batching PTE operations for large
folios. This gain is mainly for arm64.
- "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)
A bugfix, additional debug code and cleanups to the execmem code.
- "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)
Bugfixes, cleanups and performance improvememnts to the mTHP swapin
code"
* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
mm: mempool: fix crash in mempool_free() for zero-minimum pools
mm: correct type for vmalloc vm_flags fields
mm/shmem, swap: fix major fault counting
mm/shmem, swap: rework swap entry and index calculation for large swapin
mm/shmem, swap: simplify swapin path and result handling
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
mm/shmem, swap: tidy up swap entry splitting
mm/shmem, swap: tidy up THP swapin checks
mm/shmem, swap: avoid redundant Xarray lookup during swapin
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
execmem: drop writable parameter from execmem_fill_trapping_insns()
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
execmem: move execmem_force_rw() and execmem_restore_rox() before use
execmem: rework execmem_cache_free()
execmem: introduce execmem_alloc_rw()
execmem: drop unused execmem_update_copy()
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
mm/rmap: add anon_vma lifetime debug check
mm: remove mm/io-mapping.c
...
Linus Torvalds [Mon, 4 Aug 2025 23:37:29 +0000 (16:37 -0700)]
Merge tag 'i2c-for-6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
"A few more patches from I2C. Some are fixes which would be nice to
have in rc1 already, some patches have nearly been fallen through the
cracks, some just needed a bit more testing.
- acpi: enable 100kHz workaround for DLL0945
- apple: add support for Apple A7–A11, T2 chips; Kconfig update
- mux: mule: fix error handling path
- qcom-geni: fix controller frequency mapping
- stm32f7: add DMA-safe transfer support
- tegra: use controller reset if device reset is missing
- tegra: remove unnecessary dma_sync*() calls"
* tag 'i2c-for-6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: muxes: mule: Fix an error handling path in mule_i2c_mux_probe()
i2c: Force DLL0945 touchpad i2c freq to 100khz
i2c: apple: Drop default ARCH_APPLE in Kconfig
i2c: qcom-geni: fix I2C frequency table to achieve accurate bus rates
dt-bindings: i2c: apple,i2c: Document Apple A7-A11, T2 compatibles
i2c: tegra: Remove dma_sync_*() calls
i2c: tegra: Use internal reset when reset property is not available
i2c: stm32f7: support i2c_*_dma_safe_msg_buf APIs
Linus Torvalds [Mon, 4 Aug 2025 23:27:21 +0000 (16:27 -0700)]
Merge tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"Three main updates: folio conversion by Matthew, switch to a new mount
API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS
with finer granularity by Daeho.
There are also patches to address bugs and issues in the existing
features such as GCs, file pinning, write-while-dio-read, contingous
block allocation, and memory access violations.
Enhancements:
- switch to new mount API and folio conversion
- add sysfs nodes to controle F2FS GCs for ZUFS
- improve performance on the nat entry cache
- drop inode from the donation list when the last file is closed
- avoid splitting bio when reading multiple pages
Bug fixes:
- fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
- make sure zoned device GC to use FG_GC in shortage of free section
- fix to calculate dirty data during has_not_enough_free_secs()
- fix to update upper_p in __get_secs_required() correctly
- wait for inflight dio completion, excluding pinned files read using dio
- don't break allocation when crossing contiguous sections
- vm_unmap_ram() may be called from an invalid context
- fix to avoid out-of-boundary access in dnode page
- fix to avoid panic in f2fs_evict_inode
- fix to avoid UAF in f2fs_sync_inode_meta()
- fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
- fix UAF of f2fs_inode_info in f2fs_free_dic
- fix to avoid invalid wait context issue
- fix bio memleak when committing super block
- handle nat.blkaddr corruption in f2fs_get_node_info()
In addition, there are also clean-ups and minor bug fixes"
* tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits)
f2fs: drop inode from the donation list when the last file is closed
f2fs: add gc_boost_gc_greedy sysfs node
f2fs: add gc_boost_gc_multiple sysfs node
f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
f2fs: fix to calculate dirty data during has_not_enough_free_secs()
f2fs: fix to update upper_p in __get_secs_required() correctly
f2fs: directly add newly allocated pre-dirty nat entry to dirty set list
f2fs: avoid redundant clean nat entry move in lru list
f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio
f2fs: ignore valid ratio when free section count is low
f2fs: don't break allocation when crossing contiguous sections
f2fs: remove unnecessary tracepoint enabled check
f2fs: merge the two conditions to avoid code duplication
f2fs: vm_unmap_ram() may be called from an invalid context
f2fs: fix to avoid out-of-boundary access in dnode page
f2fs: switch to the new mount api
f2fs: introduce fs_context_operation structure
f2fs: separate the options parsing and options checking
f2fs: Add f2fs_fs_context to record the mount options
f2fs: Allow sbi to be NULL in f2fs_printk
...
Linus Torvalds [Mon, 4 Aug 2025 17:54:36 +0000 (10:54 -0700)]
Merge tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Add new "hash_pointers=[auto|always|never]" boot parameter to force
the hashing even with "slab_debug" enabled
- Allow to stop CPU, after losing nbcon console ownership during
panic(), even without proper NMI
- Allow to use the printk kthread immediately even for the 1st
registered nbcon
- Compiler warning removal
* tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: nbcon: Allow reacquire during panic
printk: Allow to use the printk kthread immediately even for 1st nbcon
slab: Decouple slab_debug and no_hash_pointers
vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format'
compiler-gcc.h: Introduce __diag_GCC_all
Linus Torvalds [Mon, 4 Aug 2025 15:58:53 +0000 (08:58 -0700)]
Merge tag 'for-6.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- fix checking for request-based stackable devices (dm-table)
- fix corrupt_bio_byte setup checks (dm-flakey)
- add support for resync w/o metadata devices (dm raid)
- small code simplification (dm, dm-mpath, vm-vdo, dm-raid)
- remove support for asynchronous hashes (dm-verity)
- close smatch warning (dm-zoned-target)
- update the documentation and enable inline-crypto passthrough
(dm-thin)
* tag 'for-6.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: set DM_TARGET_PASSES_CRYPTO feature for dm-thin
dm-thin: update the documentation
dm-raid: do not include dm-core.h
vdo: omit need_resched() before cond_resched()
md: dm-zoned-target: Initialize return variable r to avoid uninitialized use
dm-verity: remove support for asynchronous hashes
dm-mpath: don't print the "loaded" message if registering fails
dm-mpath: make dm_unregister_path_selector return void
dm: ima: avoid extra calls to strlen()
dm: Simplify dm_io_complete()
dm: Remove unnecessary return in dm_zone_endio()
dm raid: add support for resync w/o metadata devices
dm-flakey: Fix corrupt_bio_byte setup checks
dm-table: fix checking for rq stackable devices
Linus Torvalds [Mon, 4 Aug 2025 15:37:46 +0000 (08:37 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in headers (Thomas Huth)
* tag 'for-linus' of https://github.com/openrisc/linux:
openrisc: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
openrisc: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
Linus Torvalds [Mon, 4 Aug 2025 15:17:28 +0000 (08:17 -0700)]
Merge tag 'apparmor-pr-2025-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
"This has one major feature, it pulls in a cleaned up version of
af_unix mediation that Ubuntu has been carrying for years. It is
placed behind a new abi to ensure that it does cause policy
regressions. With pulling in the af_unix mediation there have been
cleanups and some refactoring of network socket mediation. This
accounts for the majority of the changes in the diff.
In addition there are a few improvements providing minor code
optimizations. several code cleanups, and bug fixes.
Features:
- improve debug printing
- carry mediation check on label (optimization)
- improve ability for compiler to optimize
__begin_current_label_crit_section
- transition for a linked list of rulesets to a vector of rulesets
- don't hardcode profile signal, allow it to be set by policy
- ability to mediate caps via the state machine instead of lut
- Add Ubuntu af_unix mediation, put it behind new v9 abi
Cleanups:
- fix typos and spelling errors
- cleanup kernel doc and code inconsistencies
- remove redundant checks/code
- remove unused variables
- Use str_yes_no() helper function
- mark tables static where appropriate
- make all generated string array headers const char *const
- refactor to doc semantics of file_perm checks
- replace macro calls to network/socket fns with explicit calls
- refactor/cleanup socket mediation code preparing for finer grained
mediation of different network families
- several updates to kernel doc comments
Bug fixes:
- fix incorrect profile->signal range check
- idmap mount fixes
- policy unpack unaligned access fixes
- kfree_sensitive() where appropriate
- fix oops when freeing policy
- fix conflicting attachment resolution
- fix exec table look-ups when stacking isn't first
- fix exec auditing
- mitigate userspace generating overly large xtables"
* tag 'apparmor-pr-2025-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: (60 commits)
apparmor: fix: oops when trying to free null ruleset
apparmor: fix Regression on linux-next (next-20250721)
apparmor: fix test error: WARNING in apparmor_unix_stream_connect
apparmor: Remove the unused variable rules
apparmor: fix: accept2 being specifie even when permission table is presnt
apparmor: transition from a list of rules to a vector of rules
apparmor: fix documentation mismatches in val_mask_to_str and socket functions
apparmor: remove redundant perms.allow MAY_EXEC bitflag set
apparmor: fix kernel doc warnings for kernel test robot
apparmor: Fix unaligned memory accesses in KUnit test
apparmor: Fix 8-byte alignment for initial dfa blob streams
apparmor: shift uid when mediating af_unix in userns
apparmor: shift ouid when mediating hard links in userns
apparmor: make sure unix socket labeling is correctly updated.
apparmor: fix regression in fs based unix sockets when using old abi
apparmor: fix AA_DEBUG_LABEL()
apparmor: fix af_unix auditing to include all address information
apparmor: Remove use of the double lock
apparmor: update kernel doc comments for xxx_label_crit_section
apparmor: make __begin_current_label_crit_section() indicate whether put is needed
...
John Johansen [Sat, 2 Aug 2025 03:36:06 +0000 (20:36 -0700)]
apparmor: fix: oops when trying to free null ruleset
profile allocation is wrongly setting the number of entries on the
rules vector before any ruleset is assigned. If profile allocation
fails between ruleset allocation and assigning the first ruleset,
free_ruleset() will be called with a null pointer resulting in an
oops.
Don't set the count until a ruleset is actually allocated and
guard against free_ruleset() being called with a null pointer.
Reported-by: Ryan Lee <ryan.lee@canonical.com> Fixes: 217af7e2f4de ("apparmor: refactor profile rules and attachments") Signed-off-by: John Johansen <john.johansen@canonical.com>
Uwe Kleine-König [Mon, 30 Jun 2025 16:27:26 +0000 (09:27 -0700)]
Input: max77693 - convert to atomic pwm operation
The driver called pwm_config() and pwm_enable() separately. Today both
are wrappers for pwm_apply_might_sleep() and it's more effective to call
this function directly and only once. Also don't configure the
duty_cycle and period if the next operation is to disable the PWM so
configure the PWM in max77693_haptic_enable().
With the direct use of pwm_apply_might_sleep() the need to call
pwm_apply_args() in .probe() is now gone, too, so drop this one.
Linus Torvalds [Mon, 4 Aug 2025 03:17:34 +0000 (20:17 -0700)]
Merge tag 'rtc-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Support for a new RTC in an existing driver and all the drivers
exposing clocks using the common clock framework have been converted
to determine_rate(). Summary:
Subsystem:
- Convert drivers exposing a clock from round_rate() to determine_rate()
Drivers:
- ds1307: oscillator stop flag handling for ds1341
- pcf85063: add support for RV8063"
* tag 'rtc-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (34 commits)
rtc: ds1685: Update Joshua Kinard's email address.
rtc: rv3032: convert from round_rate() to determine_rate()
rtc: rv3028: convert from round_rate() to determine_rate()
rtc: pcf8563: convert from round_rate() to determine_rate()
rtc: pcf85063: convert from round_rate() to determine_rate()
rtc: nct3018y: convert from round_rate() to determine_rate()
rtc: max31335: convert from round_rate() to determine_rate()
rtc: m41t80: convert from round_rate() to determine_rate()
rtc: hym8563: convert from round_rate() to determine_rate()
rtc: ds1307: convert from round_rate() to determine_rate()
rtc: rv3028: fix incorrect maximum clock rate handling
rtc: pcf8563: fix incorrect maximum clock rate handling
rtc: pcf85063: fix incorrect maximum clock rate handling
rtc: nct3018y: fix incorrect maximum clock rate handling
rtc: hym8563: fix incorrect maximum clock rate handling
rtc: ds1307: fix incorrect maximum clock rate handling
rtc: pcf85063: scope pcf85063_config structures
rtc: Optimize calculations in rtc_time64_to_tm()
dt-bindings: rtc: amlogic,a4-rtc: Add compatible string for C3
rtc: ds1307: handle oscillator stop flag (OSF) for ds1341
...
Linus Torvalds [Mon, 4 Aug 2025 02:15:04 +0000 (19:15 -0700)]
Merge tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fixes for several issues in the powernv PCI hotplug path
- Fix htmldoc generation for htm.rst in toctree
- Add jit support for load_acquire and store_release in ppc64 bpf jit
Thanks to Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar
Bhaskar, Shawn Anastasio, Timothy Pearson, and Vishal Parmar
* tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc64/bpf: Add jit support for load_acquire and store_release
docs: powerpc: add htm.rst to toctree
PCI: pnv_php: Enable third attention indicator state
PCI: pnv_php: Fix surprise plug detection and recovery
powerpc/eeh: Make EEH driver device hotplug safe
powerpc/eeh: Export eeh_unfreeze_pe()
PCI: pnv_php: Work around switches with broken presence detection
PCI: pnv_php: Clean up allocated IRQs on unplug
Linus Torvalds [Sun, 3 Aug 2025 23:23:09 +0000 (16:23 -0700)]
Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
us closer to being able to remove page->mapping
- "relayfs: misc changes" (Jason Xing) does some maintenance and
minor feature addition work in relayfs
- "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
us from static preallocation of the kdump crashkernel's working
memory over to dynamic allocation. So the difficulty of a-priori
estimation of the second kernel's needs is removed and the first
kernel obtains extra memory
- "generalize panic_print's dump function to be used by other
kernel parts" (Feng Tang) implements some consolidation and
rationalization of the various ways in which a failing kernel
splats information at the operator
* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
tools/getdelays: add backward compatibility for taskstats version
kho: add test for kexec handover
delaytop: enhance error logging and add PSI feature description
samples: Kconfig: fix spelling mistake "instancess" -> "instances"
fat: fix too many log in fat_chain_add()
scripts/spelling.txt: add notifer||notifier to spelling.txt
xen/xenbus: fix typo "notifer"
net: mvneta: fix typo "notifer"
drm/xe: fix typo "notifer"
cxl: mce: fix typo "notifer"
KVM: x86: fix typo "notifer"
MAINTAINERS: add maintainers for delaytop
ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
ucount: fix atomic_long_inc_below() argument type
kexec: enable CMA based contiguous allocation
stackdepot: make max number of pools boot-time configurable
lib/xxhash: remove unused functions
init/Kconfig: restore CONFIG_BROKEN help text
lib/raid6: update recov_rvv.c zero page usage
docs: update docs after introducing delaytop
...
Linus Torvalds [Sun, 3 Aug 2025 22:03:04 +0000 (15:03 -0700)]
Merge tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
- Remove unneeded goto out statements
Over time, the logic was restructured but left a "goto out" where the
out label simply did a "return ret;". Instead of jumping to this out
label, simply return immediately and remove the out label.
- Add guard(ring_buffer_nest)
Some calls to the tracing ring buffer can happen when the ring buffer
is already being written to at the same context (for example, a
trace_printk() in between a ring_buffer_lock_reserve() and a
ring_buffer_unlock_commit()).
In order to not trigger the recursion detection, these functions use
ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard()
for these functions so that their use cases can be simplified and not
need to use goto for the release.
- Clean up the tracing code with guard() and __free() logic
There were several locations that were prime candidates for using
guard() and __free() helpers. Switch them over to use them.
- Fix output of function argument traces for unsigned int values
The function tracer with "func-args" option set will record up to 6
argument registers and then use BTF to format them for human
consumption when the trace file is read. There are several arguments
that are "unsigned long" and even "unsigned int" that are either and
address or a mask. It is easier to understand if they were printed
using hexadecimal instead of decimal. The old method just printed all
non-pointer values as signed integers, which made it even worse for
unsigned integers.
* tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have unsigned int function args displayed as hexadecimal
ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace)
tracing: Use __free(kfree) in trace.c to remove gotos
tracing: Add guard() around locks and mutexes in trace.c
tracing: Add guard(ring_buffer_nest)
tracing: Remove unneeded goto out logic
Linus Torvalds [Sun, 3 Aug 2025 21:16:52 +0000 (14:16 -0700)]
Merge tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull module updates from Daniel Gomez:
"This is a small set of changes for modules, primarily to extend module
users to use the module data structures in combination with the
already no-op stub module functions, even when support for modules is
disabled in the kernel configuration. This change follows the kernel's
coding style for conditional compilation and allows kunit code to drop
all CONFIG_MODULES ifdefs, which is also part of the changes. This
should allow others part of the kernel to do the same cleanup.
The remaining changes include a fix for module name length handling
which could potentially lead to the removal of an incorrect module,
and various cleanups"
* tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
module: Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN
tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LEN
module: Restore the moduleparam prefix length check
module: Remove unnecessary +1 from last_unloaded_module::name size
module: Prevent silent truncation of module name in delete_module(2)
kunit: test: Drop CONFIG_MODULE ifdeffery
module: make structure definitions always visible
module: move 'struct module_use' to internal.h
Linus Torvalds [Sun, 3 Aug 2025 20:49:10 +0000 (13:49 -0700)]
Merge tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Enable a set of Clippy lints: 'ptr_as_ptr', 'ptr_cast_constness',
'as_ptr_cast_mut', 'as_underscore', 'cast_lossless' and
'ref_as_ptr'
These are intended to avoid type casts with the 'as' operator,
which are quite powerful, into restricted variants that are less
powerful and thus should help to avoid mistakes
- Remove the 'author' key now that most instances were moved to the
plural one in the previous cycle
'kernel' crate:
- New 'bug' module: add 'warn_on!' macro which reuses the existing
'BUG'/'WARN' infrastructure, i.e. it respects the usual sysctls and
kernel parameters:
warn_on!(value == 42);
To avoid duplicating the assembly code, the same strategy is
followed as for the static branch code in order to share the
assembly between both C and Rust
This required a few rearrangements on C arch headers -- the
existing C macros should still generate the same outputs, thus no
functional change expected there
- 'workqueue' module: add delayed work items, including a
'DelayedWork' struct, a 'impl_has_delayed_work!' macro and an
'enqueue_delayed' method, e.g.:
/// Enqueue the struct for execution on the system workqueue,
/// where its value will be printed 42 jiffies later.
fn print_later(value: Arc<MyStruct>) {
let _ = workqueue::system().enqueue_delayed(value, 42);
}
- New 'bits' module: add support for 'bit' and 'genmask' functions,
with runtime- and compile-time variants, e.g.:
- 'uaccess' module: add 'UserSliceReader::strcpy_into_buf', which
reads NUL-terminated strings from userspace into a '&CStr'
Introduce 'UserPtr' newtype, similar in purpose to '__user' in C,
to minimize mistakes handling userspace pointers, including mixing
them up with integers and leaking them via the 'Debug' trait. Add
it to the prelude, too
- Start preparations for the replacement of our custom 'CStr' type
with the analogous type in the 'core' standard library. This will
take place across several cycles to make it easier. For this one,
it includes a new 'fmt' module, using upstream method names and
some other cleanups
Replace 'fmt!' with a re-export, which helps Clippy lint properly,
and clean up the found 'uninlined-format-args' instances
- 'dma' module:
- Clarify wording and be consistent in 'coherent' nomenclature
- Convert the 'read!()' and 'write!()' macros to return a 'Result'
- Add 'as_slice()', 'write()' methods in 'CoherentAllocation'
- Expose 'count()' and 'size()' in 'CoherentAllocation' and add
the corresponding type invariants
- Make 'Instant' generic over clock source. This allows the
compiler to assert that arithmetic expressions involving the
'Instant' use 'Instants' based on the same clock source
- Make 'HrTimer' generic over the timer mode. 'HrTimer' timers
take a 'Duration' or an 'Instant' when setting the expiry time,
depending on the timer mode. With this change, the compiler can
check the type matches the timer mode
- Add an abstraction for 'fsleep'. 'fsleep' is a flexible sleep
function that will select an appropriate sleep method depending
on the requested sleep time
- Avoid 64-bit divisions on 32-bit hardware when calculating
timestamps
- Seal the 'HrTimerMode' trait. This prevents users of the
'HrTimerMode' from implementing the trait on their own types
- Pass the correct timer mode ID to 'hrtimer_start_range_ns()'
- 'list' module: remove 'OFFSET' constants, allowing to remove
pointer arithmetic; now 'impl_list_item!' invokes
'impl_has_list_links!' or 'impl_has_list_links_self_ptr!'. Other
simplifications too
- 'types' module: remove 'ForeignOwnable::PointedTo' in favor of a
constant, which avoids exposing the type of the opaque pointer, and
require 'into_foreign' to return non-null
Remove the 'Either<L, R>' type as well. It is unused, and we want
to encourage the use of custom enums for concrete use cases
- 'sync' module: implement 'Borrow' and 'BorrowMut' for 'Arc' types
to allow them to be used in generic APIs
- 'alloc' module: implement 'Borrow' and 'BorrowMut' for 'Box<T, A>';
and 'Borrow', 'BorrowMut' and 'Default' for 'Vec<T, A>'
- 'Opaque' type: add 'cast_from' method to perform a restricted cast
that cannot change the inner type and use it in callers of
'container_of!'. Rename 'raw_get' to 'cast_into' to match it
- 'rbtree' module: add 'is_empty' method
- 'sync' module: new 'aref' submodule to hold 'AlwaysRefCounted' and
'ARef', which are moved from the too general 'types' module which
we want to reduce or eventually remove. Also fix a safety comment
in 'static_lock_class'
'pin-init' crate:
- Add 'impl<T, E> [Pin]Init<T, E> for Result<T, E>', so results are
now (pin-)initializers
- Add 'Zeroable::init_zeroed()' that delegates to 'init_zeroed()'
- New 'zeroed()', a safe version of 'mem::zeroed()' and also provide
it via 'Zeroable::zeroed()'
- Implement 'Zeroable' for 'Option<&T>', 'Option<&mut T>' and for
'Option<[unsafe] [extern "abi"] fn(...args...) -> ret>' for
'"Rust"' and '"C"' ABIs and up to 20 arguments
- Changed blanket impls of 'Init' and 'PinInit' from 'impl<T, E>
[Pin]Init<T, E> for T' to 'impl<T> [Pin]Init<T> for T'
- Renamed 'zeroed()' to 'init_zeroed()'
- Upstream dev news: improve CI more to deny warnings, use
'--all-targets'. Check the synchronization status of the two
'-next' branches in upstream and the kernel
MAINTAINERS:
- Add Vlastimil Babka, Liam R. Howlett, Uladzislau Rezki and Lorenzo
Stoakes as reviewers (thanks everyone)
And a few other cleanups and improvements"
* tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (76 commits)
rust: Add warn_on macro
arm64/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
riscv/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
x86/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
rust: kernel: move ARef and AlwaysRefCounted to sync::aref
rust: sync: fix safety comment for `static_lock_class`
rust: types: remove `Either<L, R>`
rust: kernel: use `core::ffi::CStr` method names
rust: str: add `CStr` methods matching `core::ffi::CStr`
rust: str: remove unnecessary qualification
rust: use `kernel::{fmt,prelude::fmt!}`
rust: kernel: add `fmt` module
rust: kernel: remove `fmt!`, fix clippy::uninlined-format-args
scripts: rust: emit path candidates in panic message
scripts: rust: replace length checks with match
rust: list: remove nonexistent generic parameter in link
rust: bits: add support for bits/genmask macros
rust: list: remove OFFSET constants
rust: list: add `impl_list_item!` examples
rust: list: use fully qualified path
...
fangzhong.zhou [Sat, 2 Aug 2025 23:15:54 +0000 (07:15 +0800)]
i2c: Force DLL0945 touchpad i2c freq to 100khz
This patch fixes an issue where the touchpad cursor movement becomes
slow on the Dell Precision 5560. Force the touchpad freq to 100khz
as a workaround.
Tested on Dell Precision 5560 with 6.14 to 6.14.6. Cursor movement
is now smooth and responsive.
Signed-off-by: fangzhong.zhou <myth5@myth5.com>
[wsa: kept sorting and removed unnecessary parts from commit msg] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Sven Peter [Thu, 12 Jun 2025 21:11:29 +0000 (21:11 +0000)]
i2c: apple: Drop default ARCH_APPLE in Kconfig
When the first driver for Apple Silicon was upstreamed we accidentally
included `default ARCH_APPLE` in its Kconfig which then spread to almost
every subsequent driver. As soon as ARCH_APPLE is set to y this will
pull in many drivers as built-ins which is not what we want.
Thus, drop `default ARCH_APPLE` from Kconfig.
Signed-off-by: Sven Peter <sven@kernel.org> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Wolfram Sang [Sun, 3 Aug 2025 20:25:12 +0000 (22:25 +0200)]
Merge tag 'i2c-host-6.17-pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow
i2c-host for v6.17, part 2
- apple: add support for Apple A7–A11, T2 chips
- qcom-geni: fix controller frequency mapping
- stm32f7: add DMA-safe transfer support
- tegra: use controller reset if device reset is missing
- tegra: remove unnecessary dma_sync*() calls
Brian Masney [Thu, 10 Jul 2025 15:20:35 +0000 (11:20 -0400)]
rtc: rv3032: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:34 +0000 (11:20 -0400)]
rtc: rv3028: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:33 +0000 (11:20 -0400)]
rtc: pcf8563: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:32 +0000 (11:20 -0400)]
rtc: pcf85063: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:31 +0000 (11:20 -0400)]
rtc: nct3018y: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:30 +0000 (11:20 -0400)]
rtc: max31335: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:29 +0000 (11:20 -0400)]
rtc: m41t80: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:28 +0000 (11:20 -0400)]
rtc: hym8563: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:27 +0000 (11:20 -0400)]
rtc: ds1307: convert from round_rate() to determine_rate()
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Brian Masney [Thu, 10 Jul 2025 15:20:26 +0000 (11:20 -0400)]
rtc: rv3028: fix incorrect maximum clock rate handling
When rv3028_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
Brian Masney [Thu, 10 Jul 2025 15:20:25 +0000 (11:20 -0400)]
rtc: pcf8563: fix incorrect maximum clock rate handling
When pcf8563_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
Brian Masney [Thu, 10 Jul 2025 15:20:24 +0000 (11:20 -0400)]
rtc: pcf85063: fix incorrect maximum clock rate handling
When pcf85063_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
Brian Masney [Thu, 10 Jul 2025 15:20:23 +0000 (11:20 -0400)]
rtc: nct3018y: fix incorrect maximum clock rate handling
When nct3018y_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
Brian Masney [Thu, 10 Jul 2025 15:20:22 +0000 (11:20 -0400)]
rtc: hym8563: fix incorrect maximum clock rate handling
When hym8563_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
Brian Masney [Thu, 10 Jul 2025 15:20:21 +0000 (11:20 -0400)]
rtc: ds1307: fix incorrect maximum clock rate handling
When ds3231_clk_sqw_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
Yadan Fan [Thu, 31 Jul 2025 18:14:45 +0000 (02:14 +0800)]
mm: mempool: fix crash in mempool_free() for zero-minimum pools
The mempool wake-up fix introduced in commit a5867a218d7c ("mm: mempool:
fix wake-up edge case bug for zero-minimum pools") inlined the
add_element() logic in mempool_free() to return the element to the
zero-minimum pool:
pool->elements[pool->curr_nr++] = element;
This causes crash, because mempool_init_node() does not initialize with
real allocation for zero-minimum pool, it only returns ZERO_SIZE_PTR to
the elements array which is unable to be dereferenced, and the
pre-allocation of this array never happened since the while test:
while (pool->curr_nr < pool->min_nr)
can never be satisfied as min_nr is zero, so the pool does not actually
reserve any buffer, the only way so far is to call alloc_fn() to get
buffer from SLUB, but if the memory is under high pressure the alloc_fn()
could never get any buffer, the waiting thread would be in an indefinite
loop of wake-sleep in a period until there is free memory to get.
This patch changes mempool_init_node() to allocate 1 element for the
elements array of zero-minimum pool, so that the pool will have reserved
buffer to use. This will fix the crash issue and let the waiting thread
can get the reserved element when alloc_fn() failed to get buffer under
high memory pressure.
Also modify add_element() to support zero-minimum pool with simplifying
codes of zero-minimum handling in mempool_free().
Lorenzo Stoakes [Tue, 29 Jul 2025 11:49:06 +0000 (12:49 +0100)]
mm: correct type for vmalloc vm_flags fields
Several functions refer to the unfortunately named 'vm_flags' field when
referencing vmalloc flags, which happens to be the precise same name used
for VMA flags.
As a result these were erroneously changed to use the vm_flags_t type
(which currently is a typedef equivalent to unsigned long).
Currently this has no impact, but in future when vm_flags_t changes this
will result in issues, so change the type to unsigned long to account for
this.
[lorenzo.stoakes@oracle.com: fixup very disguised vmalloc flags parameter] Link: https://lkml.kernel.org/r/e74dd8de-7e60-47ab-8a45-2c851f3c5d26@lucifer.local Link: https://lkml.kernel.org/r/20250729114906.55347-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Harry Yoo <harry.yoo@oracle.com> Closes: https://lore.kernel.org/all/aIgSpAnU8EaIcqd9@hyeyoo/ Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 28 Jul 2025 07:53:06 +0000 (15:53 +0800)]
mm/shmem, swap: fix major fault counting
If the swapin failed, don't update the major fault count. There is a long
existing comment for doing it this way, now with previous cleanups, we can
finally fix it.
Link: https://lkml.kernel.org/r/20250728075306.12704-9-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 28 Jul 2025 07:53:05 +0000 (15:53 +0800)]
mm/shmem, swap: rework swap entry and index calculation for large swapin
Instead of calculating the swap entry differently in different swapin
paths, calculate it early before the swap cache lookup and use that for
the lookup and later swapin. And after swapin have brought a folio,
simply round it down against the size of the folio.
This is simple and effective enough to verify the swap value. A folio's
swap entry is always aligned by its size. Any kind of parallel split or
race is acceptable because the final shmem_add_to_page_cache ensures that
all entries covered by the folio are correct, and thus there will be no
data corruption.
This also prevents false positive cache lookup. If a shmem read request's
index points to the middle of a large swap entry, previously, shmem will
try the swap cache lookup using the large swap entry's starting value
(which is the first sub swap entry of this large entry). This will lead
to false positive lookup results if only the first few swap entries are
cached but the actual requested swap entry pointed by the index is
uncached. This is not a rare event, as swap readahead always tries to
cache order 0 folios when possible.
And this shouldn't cause any increased repeated faults. Instead, no
matter how the shmem mapping is split in parallel, as long as the mapping
still contains the right entries, the swapin will succeed.
The final object size and stack usage are also reduced due to simplified
code:
./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-145 (-145)
Function old new delta
shmem_swapin_folio 4056 3911 -145
Total: Before=33242, After=33097, chg -0.44%
And while at it, round down the index too if swap entry is round down.
The index is used either for folio reallocation or confirming the mapping
content. In either case, it should be aligned with the swap folio.
Link: https://lkml.kernel.org/r/20250728075306.12704-8-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 28 Jul 2025 07:53:04 +0000 (15:53 +0800)]
mm/shmem, swap: simplify swapin path and result handling
Slightly tidy up the different handling of swap in and error handling for
SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices. Now swapin will
always use either shmem_swap_alloc_folio or shmem_swapin_cluster, then
check the result.
Simplify the control flow and avoid a redundant goto label.
Link: https://lkml.kernel.org/r/20250728075306.12704-7-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 28 Jul 2025 07:53:03 +0000 (15:53 +0800)]
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
For SWP_SYNCHRONOUS_IO devices, if a cache bypassing THP swapin failed due
to reasons like memory pressure, partially conflicting swap cache or ZSWAP
enabled, shmem will fallback to cached order 0 swapin.
Right now the swap cache still has a non-trivial overhead, and readahead
is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip
the readahead and swap cache even if the swapin falls back to order 0.
So handle the fallback logic without falling back to the cached read.
Link: https://lkml.kernel.org/r/20250728075306.12704-6-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 28 Jul 2025 07:53:02 +0000 (15:53 +0800)]
mm/shmem, swap: tidy up swap entry splitting
Instead of keeping different paths of splitting the entry before the swap
in start, move the entry splitting after the swapin has put the folio in
swap cache (or set the SWAP_HAS_CACHE bit). This way we only need one
place and one unified way to split the large entry. Whenever swapin
brought in a folio smaller than the shmem swap entry, split the entry and
recalculate the entry and index for verification.
This removes duplicated codes and function calls, reduces LOC, and the
split is less racy as it's guarded by swap cache now. So it will have a
lower chance of repeated faults due to raced split. The compiler is also
able to optimize the coder further:
bloat-o-meter results with GCC 14:
With DEBUG_SECTION_MISMATCH (-fno-inline-functions-called-once):
./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-143 (-143)
Function old new delta
shmem_swapin_folio 2358 2215 -143
Total: Before=32933, After=32790, chg -0.43%
With !DEBUG_SECTION_MISMATCH:
add/remove: 0/1 grow/shrink: 1/0 up/down: 1069/-749 (320)
Function old new delta
shmem_swapin_folio 2871 3940 +1069
shmem_split_large_entry.isra 749 - -749
Total: Before=32806, After=33126, chg +0.98%
Since shmem_split_large_entry is only called in one place now. The
compiler will either generate more compact code, or inlined it for
better performance.
Link: https://lkml.kernel.org/r/20250728075306.12704-5-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 28 Jul 2025 07:53:01 +0000 (15:53 +0800)]
mm/shmem, swap: tidy up THP swapin checks
Move all THP swapin related checks under CONFIG_TRANSPARENT_HUGEPAGE, so
they will be trimmed off by the compiler if not needed.
And add a WARN if shmem sees a order > 0 entry when
CONFIG_TRANSPARENT_HUGEPAGE is disabled, that should never happen unless
things went very wrong.
There should be no observable feature change except the new added WARN.
Link: https://lkml.kernel.org/r/20250728075306.12704-4-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 28 Jul 2025 07:53:00 +0000 (15:53 +0800)]
mm/shmem, swap: avoid redundant Xarray lookup during swapin
Patch series "mm/shmem, swap: bugfix and improvement of mTHP swap in", v6.
The current THP swapin path have several problems. It may potentially
hang, may cause redundant faults due to false positive swap cache lookup,
and it issues redundant Xarray walks. !CONFIG_TRANSPARENT_HUGEPAGE builds
may also contain unnecessary THP checks.
This series fixes all of the mentioned issues, the code should be more
robust and prepared for the swap table series. Now 4 walks is reduced to
3 (get order & confirm, confirm, insert folio),
!CONFIG_TRANSPARENT_HUGEPAGE build overhead is also minimized, and comes
with a sanity check now.
The performance is slightly better after this series, sequential swap in
of 24G data from ZRAM, using transparent_hugepage_tmpfs=always (24 samples
each):
Before: avg: 10.66s, stddev: 0.04
After patch 1: avg: 10.58s, stddev: 0.04
After patch 2: avg: 10.65s, stddev: 0.05
After patch 3: avg: 10.65s, stddev: 0.04
After patch 4: avg: 10.67s, stddev: 0.04
After patch 5: avg: 9.79s, stddev: 0.04
After patch 6: avg: 9.79s, stddev: 0.05
After patch 7: avg: 9.78s, stddev: 0.05
After patch 8: avg: 9.79s, stddev: 0.04
Several patches improve the performance by a little, which is about ~8%
faster in total.
Build kernel test showed very slightly improvement, testing with make -j48
with defconfig in a 768M memcg also using ZRAM as swap, and
transparent_hugepage_tmpfs=always (6 test runs):
Before: avg: 3334.66s, stddev: 43.76
After patch 1: avg: 3349.77s, stddev: 18.55
After patch 2: avg: 3325.01s, stddev: 42.96
After patch 3: avg: 3354.58s, stddev: 14.62
After patch 4: avg: 3336.24s, stddev: 32.15
After patch 5: avg: 3325.13s, stddev: 22.14
After patch 6: avg: 3285.03s, stddev: 38.95
After patch 7: avg: 3287.32s, stddev: 26.37
After patch 8: avg: 3295.87s, stddev: 46.24
This patch (of 7):
Currently shmem calls xa_get_order to get the swap radix entry order,
requiring a full tree walk. This can be easily combined with the swap
entry value checking (shmem_confirm_swap) to avoid the duplicated lookup
and abort early if the entry is gone already. Which should improve the
performance.
Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20250728075306.12704-3-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
For the most part ftrace uses text poking and can handle ROX memory. The
only place that requires writable memory is create_trampoline() that
updates the allocated memory and in the end makes it ROX.
Use execmem_alloc_rw() in x86::ftrace::alloc_tramp() and enable ROX cache
for EXECMEM_FTRACE when configuration and CPU features allow that.
Link: https://lkml.kernel.org/r/20250713071730.4117334-9-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Instead of overriding this method, add EXECMEM_KPROBES entry in
execmem_info with pgprot set to PAGE_KERNEL_ROX and use ROX cache when
configuration and CPU features allow it.
Link: https://lkml.kernel.org/r/20250713071730.4117334-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
execmem: drop writable parameter from execmem_fill_trapping_insns()
After update of execmem_cache_free() that made memory writable before
updating it, there is no need to update read only memory, so the writable
parameter to execmem_fill_trapping_insns() is not needed. Drop it.
Link: https://lkml.kernel.org/r/20250713071730.4117334-7-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
When execmem populates ROX cache it uses vmalloc(VM_ALLOW_HUGE_VMAP).
Although vmalloc falls back to allocating base pages if high order
allocation fails, it may happen that it still cannot allocate enough
memory.
Right now ROX cache is only used by modules and in majority of cases the
allocations happen at boot time when there's plenty of free memory, but
upcoming enabling ROX cache for ftrace and kprobes would mean that execmem
allocations can happen when the system is under memory pressure and a
failure to allocate large page worth of memory becomes more likely.
Fallback to regular vmalloc() if vmalloc(VM_ALLOW_HUGE_VMAP) fails.
Link: https://lkml.kernel.org/r/20250713071730.4117334-6-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently execmem_cache_free() ignores potential allocation failures that
may happen in execmem_cache_add(). Besides, it uses text poking to fill
the memory with trapping instructions before returning it to cache
although it would be more efficient to make that memory writable, update
it using memcpy and then restore ROX protection.
Rework execmem_cache_free() so that in case of an error it will defer
freeing of the memory to a delayed work.
With this the happy fast path will now change permissions to RW, fill the
memory with trapping instructions using memcpy, restore ROX permissions,
add the memory back to the free cache and clear the relevant entry in
busy_areas.
If any step in the fast path fails, the entry in busy_areas will be marked
as pending_free. These entries will be handled by a delayed work and
freed asynchronously.
To make the fast path faster, use __GFP_NORETRY for memory allocations and
let asynchronous handler try harder with GFP_KERNEL.
Link: https://lkml.kernel.org/r/20250713071730.4117334-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Some callers of execmem_alloc() require the memory to be temporarily
writable even when it is allocated from ROX cache. These callers use
execemem_make_temp_rw() right after the call to execmem_alloc().
Wrap this sequence in execmem_alloc_rw() API.
Link: https://lkml.kernel.org/r/20250713071730.4117334-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes", v3.
These patches enable use of EXECMEM_ROX_CACHE for ftrace and kprobes
allocations on x86.
They also include some ground work in execmem.
Since the execmem model for caching large ROX pages changed from the
initial assumption that the memory that is allocated from ROX cache is
always ROX to the current state where memory can be temporarily made RW
and then restored to ROX, we can stop using text poking to update it.
This also saves the hassle of trying lock text_mutex in
execmem_cache_free() when kprobes already hold that mutex.
This patch (of 8):
The execmem_update_copy() that used text poking was required when memory
allocated from ROX cache was always read-only. Since now its permissions
can be switched to read-write there is no need in a function that updates
memory with text poking.
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
If an anon folio is mapped into userspace, its anon_vma must be alive,
otherwise rmap walks can hit UAF.
There have been syzkaller reports a few months ago[1][2] of UAF in rmap
walks that seems to indicate that there can be pages with elevated
mapcount whose anon_vma has already been freed, but I think we never
figured out what the cause is; and syzkaller only hit these UAFs when
memory pressure randomly caused reclaim to rmap-walk the affected pages,
so it of course didn't manage to create a reproducer.
Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous folios
to hopefully catch such issues more reliably.
Lorenzo Stoakes [Fri, 25 Jul 2025 14:29:01 +0000 (15:29 +0100)]
mm: remove mm/io-mapping.c
This is dead code, which was used from commit b739f125e4eb ("i915: use
io_mapping_map_user") but reverted a month later by commit 0e4fe0c9f2f9
("Revert "i915: use io_mapping_map_user"") back in 2021.
Since then nobody has used it, so remove it.
[akpm@linux-foundation.org: update Documentation/core-api/mm-api.rst, per Vlastimil] Link: https://lkml.kernel.org/r/20250725142901.81502-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Thu, 24 Jul 2025 05:23:01 +0000 (10:53 +0530)]
khugepaged: optimize collapse_pte_mapped_thp() by PTE batching
Use PTE batching to batch process PTEs mapping the same large folio. An
improvement is expected due to batching mapcount manipulation on the
folios, and for arm64 which supports contig mappings, the number of
TLB flushes is also reduced.
Note that we do not need to make a change to the check
"if (folio_page(folio, i) != page)"; if i'th page of the folio is equal
to the first page of our batch, then i + 1, .... i + nr_batch_ptes - 1
pages of the folio will be equal to the corresponding pages of our
batch mapping consecutive pages.
Link: https://lkml.kernel.org/r/20250724052301.23844-4-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Thu, 24 Jul 2025 05:23:00 +0000 (10:53 +0530)]
khugepaged: optimize __collapse_huge_page_copy_succeeded() by PTE batching
Use PTE batching to batch process PTEs mapping the same large folio. An
improvement is expected due to batching refcount-mapcount manipulation on
the folios, and for arm64 which supports contig mappings, the number of
TLB flushes is also reduced.
Link: https://lkml.kernel.org/r/20250724052301.23844-3-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If the underlying folio mapped by the ptes is large, we can process those
ptes in a batch using folio_pte_batch().
For arm64 specifically, this results in a 16x reduction in the number of
ptep_get() calls, since on a contig block, ptep_get() on arm64 will
iterate through all 16 entries to collect a/d bits. Next, ptep_clear()
will cause a TLBI for every contig block in the range via
contpte_try_unfold(). Instead, use clear_ptes() to only do the TLBI at
the first and last contig block of the range.
For split folios, there will be no pte batching; the batch size returned
by folio_pte_batch() will be 1. For pagetable split folios, the ptes will
still point to the same large folio; for arm64, this results in the
optimization described above, and for other arches, a minor improvement is
expected due to a reduction in the number of function calls and batching
atomic operations.
This patch (of 3):
Let's add variants to be used where "full" does not apply -- which will
be the majority of cases in the future. "full" really only applies if
we are about to tear down a full MM.
Use get_and_clear_ptes() in existing code, clear_ptes() users will
be added next.
Link: https://lkml.kernel.org/r/20250724052301.23844-2-dev.jain@arm.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>