Linus Torvalds [Fri, 1 Aug 2025 17:29:36 +0000 (10:29 -0700)]
Merge tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing
When tracefs was first introduced back in 2014, the directory
/sys/kernel/tracing was added and is the designated location to mount
tracefs. To keep backward compatibility, tracefs was auto-mounted in
/sys/kernel/debug/tracing as well.
All distros now mount tracefs on /sys/kernel/tracing. Having it seen
in two different locations has lead to various issues and
inconsistencies.
The VFS folks have to also maintain debugfs_create_automount() for
this single user.
It's been over 10 years. Tooling and scripts should start replacing
the debugfs location with the tracefs one. The reason tracefs was
created in the first place was to allow access to the tracing
facilities without the need to configure debugfs into the kernel.
Using tracefs should now be more robust.
A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED which is
default y, so that the kernel is still built with the automount. This
config allows those that want to remove the automount from debugfs to
do so.
When tracefs is accessed from /sys/kernel/debug/tracing, the
following printk is triggerd:
pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n");
This gives users another 5 years to fix their scripts.
- Use queue_rcu_work() instead of call_rcu() for freeing event filters
The number of filters to be free can be many depending on the number
of events within an event system. Freeing them from softirq context
can potentially cause undesired latency. Use the RCU workqueue to
free them instead.
- Remove pointless memory barriers in latency code
Memory barriers were added to some of the latency code a long time
ago with the idea of "making them visible", but that's not what
memory barriers are for. They are to synchronize access between
different variables. There was no synchronization here making them
pointless.
- Remove "__attribute__()" from the type field of event format
When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y
and PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with
the following:
This confuses parsers. Add code to strip these tags from the strings.
- Add eprobe config option CONFIG_EPROBE_EVENTS
Eprobes were added back in 5.15 but were only enabled when another
probe was enabled (kprobe, fprobe, uprobe, etc). The eprobes had no
config option of their own. Add one as they should be a separate
entity.
It's default y to keep with the old kernels but still has
dependencies on TRACING and HAVE_REGS_AND_STACK_ACCESS_API.
- Add eprobe documentation
When eprobes were added back in 5.15 no documentation was added to
describe them. This needs to be rectified.
- Replace open coded cpumask_next_wrap() in move_to_next_cpu()
- Have preemptirq_delay_run() use off-stack CPU mask
- Remove obsolete comment about pelt_cfs event
DECLARE_TRACE() appends "_tp" to trace events now, but the comment
above pelt_cfs still mentioned appending it manually.
- Remove EVENT_FILE_FL_SOFT_MODE flag
The SOFT_MODE flag was required when the soft enabling and disabling
of trace events was first introduced. But there was a bug with this
approach as it only worked for a single instance. When multiple users
required soft disabling and disabling the code was changed to have a
ref count. The SOFT_MODE flag is now set iff the ref count is non
zero. This is redundant and just reading the ref count is good
enough.
- Fix typo in comment
* tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation: tracing: Add documentation about eprobes
tracing: Have eprobes have their own config option
tracing: Remove "__attribute__()" from the type field of event format
tracing: Deprecate auto-mounting tracefs in debugfs
tracing: Fix comment in trace_module_remove_events()
tracing: Remove EVENT_FILE_FL_SOFT_MODE flag
tracing: Remove pointless memory barriers
tracing/sched: Remove obsolete comment on suffixes
kernel: trace: preemptirq_delay_test: use offstack cpu mask
tracing: Use queue_rcu_work() to free filters
tracing: Replace opencoded cpumask_next_wrap() in move_to_next_cpu()
Linus Torvalds [Fri, 1 Aug 2025 17:23:13 +0000 (10:23 -0700)]
Merge tag 'trace-tools-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tools updates from Steven Rostedt:
- Introduce enum timerlat_tracing_mode
Now that BPF based sampling has been added to timerlat, add an enum
to represent which mode timerlat is running in
- Add action on timelat threshold feature
A new option, --on-threshold, is added, taking an argument that
further specifies the action. Actions added in this patch are:
- trace[,file=<filename>]: Saves tracefs buffer, optionally taking a
filename
- signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might
be specified instead of number to send signal to parent process
- shell,command=<command>: Execute shell command
- Allow resuming tracing in timerlat bpf
rtla-timerlat BPF program uses a global variable stored in a .bss
section to store whether tracing has been stopped. Map it to allow it
to resume tracing after it has been stopped
- Add continue action to timerlat
Introduce option to resume tracing after a latency threshold
overflow. The option is implemented as an action named "continue"
- Add action on end feature to timerlat
Implement actions on end next to actions on threshold. A new option,
--on-end is added, parallel to --on-threshold. Instead of being
executed whenever a latency threshold is reached, it is executed at
the end of the measurement
- Have rtla tests check output with grep
Add argument to the check command in the test suite that takes a
regular expression that the output of rtla command is checked
against. This allows testing for specific information in rtla output
in addition to checking the return value
- Add tests for timerlat actions
- Update the documentation for the new features
* tag 'trace-tools-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rtla/tests: Test timerlat -P option using actions
rtla/tests: Add grep checks for base test cases
Documentation/rtla: Add actions feature
rtla/tests: Limit duration to maximum of 10s
rtla/tests: Add tests for actions
rtla/tests: Check rtla output with grep
rtla/timerlat: Add action on end feature
rtla/timerlat: Add continue action
rtla/timerlat_bpf: Allow resuming tracing
rtla/timerlat: Add action on threshold feature
rtla/timerlat: Introduce enum timerlat_tracing_mode
Linus Torvalds [Fri, 1 Aug 2025 16:46:24 +0000 (09:46 -0700)]
Merge tag 'trace-deferred-unwind-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull initial deferred unwind infrastructure from Steven Rostedt:
"This is the core infrastructure for the deferred unwinder that is
required for sframes[1]. Several other patch series are based on this
work although those patch series are not dependent on each other. In
order to simplify the development, having this core series upstream
will allow the other series to be worked on in parallel. The other
series are:
- The two patches to implement x86 support [2] [3]
- The s390 work [4]
- The perf work [5]
- The ftrace work [6]
- The sframe work [7]
And more is on the way.
The core infrastructure adds the following in kernel APIs:
- int unwind_user_faultable(struct unwind_stacktrace *trace);
Performs a user space stack trace that may fault user pages in.
- int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func);
Allows a tracer to register with the unwind deferred
infrastructure.
- int unwind_deferred_request(struct unwind_work *work, u64 *cookie);
Used when a tracer request a deferred trace. Can be called from
interrupt or NMI context.
Linus Torvalds [Fri, 1 Aug 2025 04:47:36 +0000 (21:47 -0700)]
Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Just a bunch of amdgpu and xe fixes.
amdgpu:
- DSC divide by 0 fix
- clang fix
- DC debugfs fix
- Userq fixes
- Avoid extra evict-restore with KFD
- Backlight fix
- Documentation fix
- RAS fix
- Add new kicker handling
- DSC fix for DCN 3.1.4
- PSR fix
- Atomic fix
- DC reset fixes
- DCN 3.0.1 fix
- MMHUB client mapping fix
xe:
- Fix BMG probe on unsupported mailbox command
- Fix OA static checker warning about null gt
- Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter
- Fix missing unwind goto in GuC/HuC
- Don't register I2C devices if VF
- Clear whole GuC g2h_fence during initialization
- Avoid call kfree for drmm_kzalloc
- Fix pci_dev reference leak on configfs
- SRIOV: Disable CSC support on VF
* tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits)
drm/xe/vf: Disable CSC support on VF
drm/amdgpu: update mmhub 4.1.0 client id mappings
drm/amd/display: Allow DCN301 to clear update flags
drm/amd/display: Pass up errors for reset GPU that fails to init HW
drm/amd/display: Only finalize atomic_obj if it was initialized
drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported
drm/amd/display: Disable dsc_power_gate for dcn314 by default
drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14
drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access
drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c'
drm/amd/display: fix initial backlight brightness calculation
drm/amdgpu: Avoid extra evict-restore process.
drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop
drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities
drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()
drm/amd/display: Fix divide by zero when calculating min ODM factor
drm/xe/configfs: Fix pci_dev reference leak
drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc()
drm/xe/guc: Clear whole g2h_fence during initialization
drm/xe/vf: Don't register I2C devices if VF
...
Linus Torvalds [Fri, 1 Aug 2025 04:39:01 +0000 (21:39 -0700)]
Merge tag 'for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply core:
- battery-info: replace any DT specific bits with fwnode usage
- replace any device-tree code with generic fwnode based handling
Power-supply drivers:
- ug3105_battery: use battery-info API
- qcom_battmgr: report capacity
- qcom_battmgr: support LiPo battery reporting
- add missing missing power-supply ref to a bunch of DT bindings
- update drivers regarding pm_runtime_autosuspend() usage
- misc minor fixes and cleanups
Reset drivers:
- misc minor cleanups"
* tag 'for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (32 commits)
power: supply: core: fix static checker warning
power: supply: twl4030_charger: Remove redundant pm_runtime_mark_last_busy() calls
power: supply: bq24190: Remove redundant pm_runtime_mark_last_busy() calls
MAINTAINERS: rectify file entry in QUALCOMM SMB CHARGER DRIVER
power: supply: max1720x correct capacity computation
MAINTAINERS: add myself as smbx charger driver maintainer
power: supply: pmi8998_charger: rename to qcom_smbx
power: supply: qcom_pmi8998_charger: fix wakeirq
power: supply: max14577: Handle NULL pdata when CONFIG_OF is not set
power: return the correct error code
power: reset: POWER_RESET_TORADEX_EC should depend on ARCH_MXC
power: supply: cpcap-charger: Fix null check for power_supply_get_by_name
power: supply: bq25980_charger: Constify reg_default array
power: supply: bq256xx_charger: Constify reg_default array
power: reset: at91-sama5d2_shdwc: Refactor wake-up source logging to use dev_info
power: reset: qcom-pon: Rename variables to use generic naming
power: supply: qcom_battmgr: Add lithium-polymer entry
power: supply: qcom_battmgr: Report battery capacity
power: supply: bq24190: Free battery_info
power: supply: ug3105_battery: Switch to power_supply_batinfo_ocv2cap()
...
Linus Torvalds [Fri, 1 Aug 2025 04:26:05 +0000 (21:26 -0700)]
Merge tag 'hid-for-linus-2025073101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- hardening of HID core parser against conversion to 0 bits in s32ton()
by buggy/malicious devices (Alan Stern)
- fix for potential NULL pointer dereference in hid-apple that could be
caused by malicious device with APPLE_MAGIC_BACKLIGHT quirk present
triggering overflow in data field (Qasim Ijaz)
- support for Wake-on-touch in intel-thc (Even Xu)
- support for "Input max input size control" and "Input interrupt
delay" I2C features in order to improve compatibility of THC devices
with legacy HIDI2C touch devices (Even Xu)
- support for Touch Bars on x86 MacBook Pros (Kerem Karabay)
- support for XP-PEN Artist 22R Pro (Joshua Goins)
- third party trackpart support for MacBookPro15,1 (Aditya Garg)
- Apple Magic Keyboard A311[89] USB-C support (Aditya Garg, Grigorii
Sokoli)
- support for operating modes in amd-sfh (Basavaraj Natikar)
- avoid setting up battery timer for Apple and Magicmouse devices
without battery (Aditya Garg)
- fix for behavior of the hid-mcp2221 driver for !CONFIG_IIO cases
(Heiko Schocher)
- other assorted fixups and device ID additions
* tag 'hid-for-linus-2025073101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (54 commits)
HID: core: Harden s32ton() against conversion to 0 bits
HID: apple: validate feature-report field count to prevent NULL pointer dereference
HID: core: Improve the kerneldoc for hid_report_len()
selftests/hid: sync python tests to hid-tools 0.10
selftests/hid: sync the python tests to hid-tools 0.8
selftests/hid: run ruff format on the python part
HID: magicmouse: use secs_to_jiffies() for battery timeout
HID: apple: use secs_to_jiffies() for battery timeout
HID: magicmouse: avoid setting up battery timer when not needed
HID: apple: avoid setting up battery timer for devices without battery
HID: amd_sfh: Enable operating mode
HID: uclogic: Add support for XP-PEN Artist 22R Pro
HID: rate-limit hid_warn to prevent log flooding
HID: replace scnprintf() with sysfs_emit()
HID: uclogic: make read-only array reconnect_event static const
HID: mcp-2221: Replace manual comparison with min() macro
HID: intel-thc-hid: Separate max input size control conditional list
HID: mcp2221: set gpio pin mode
HID: multitouch: add device ID for Apple Touch Bar
HID: multitouch: specify that Apple Touch Bar is direct
...
Linus Torvalds [Fri, 1 Aug 2025 04:22:04 +0000 (21:22 -0700)]
Merge tag 'v6.17-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- Fix network namespace refcount leak
- Multichannel reconnect fix
- Perf improvement to not do unneeded EA query on native symlinks
- Performance improvement for directory leases to allow extending lease
for actively queried directories
- Improve debugging of directory leases by adding pseudofile to show
them
- Five minor mount cleanup patches
- Minor directory lease cleanup patch
- Allow creating special files via reparse points over SMB1
- Two minor improvements to FindFirst over SMB1
- Two NTLMSSP session setup fixes
* tag 'v6.17-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3 client: add way to show directory leases for improved debugging
smb: client: get rid of kstrdup() when parsing iocharset mount option
smb: client: get rid of kstrdup() when parsing domain mount option
smb: client: get rid of kstrdup() when parsing pass2 mount option
smb: client: get rid of kstrdup() when parsing pass mount option
smb: client: get rid of kstrdup() when parsing user mount option
cifs: Add support for creating reparse points over SMB1
cifs: Do not query WSL EAs for native SMB symlink
cifs: Optimize CIFSFindFirst() response when not searching
cifs: Fix calling CIFSFindFirst() for root path without msearch
smb: client: fix session setup against servers that require SPN
smb: client: allow parsing zero-length AV pairs
cifs: add new field to track the last access time of cfid
smb: change return type of cached_dir_lease_break() to bool
cifs: reset iface weights when we cannot find a candidate
smb: client: fix netns refcount leak after net_passive changes
Merge tag 'bitmap-for-6.17' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- find_random_bit() series (Yury)
- GENMASK() consolidation (Vincent)
- random cleanups (Shaopeng, Ben, Yury)
* tag 'bitmap-for-6.17' of https://github.com/norov/linux:
bitfield: Ensure the return values of helper functions are checked
test_bits: add tests for __GENMASK() and __GENMASK_ULL()
bits: unify the non-asm GENMASK*()
bits: split the definition of the asm and non-asm GENMASK*()
cpumask: Remove unnecessary cpumask_nth_andnot()
watchdog: fix opencoded cpumask_next_wrap() in watchdog_next_cpu()
clocksource: Improve randomness in clocksource_verify_choose_cpus()
cpumask: introduce cpumask_random()
bitmap: generalize node_random()
Merge tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Add support for cgroup "cpu.max" interface
- Code organization cleanup so that ext_idle.c doesn't depend on the
source-file-inclusion build method of sched/
- Drop UP paths in accordance with sched core changes
- Documentation and other misc changes
* tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_bpf_reenqueue_local() reference
sched_ext: Drop kfuncs marked for removal in 6.15
sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic
kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments
sched_ext: Add support for cgroup bandwidth control interface
sched_ext, sched/core: Factor out struct scx_task_group
sched_ext: Return NULL in llc_span
sched_ext: Always use SMP versions in kernel/sched/ext_idle.h
sched_ext: Always use SMP versions in kernel/sched/ext_idle.c
sched_ext: Always use SMP versions in kernel/sched/ext.h
sched_ext: Always use SMP versions in kernel/sched/ext.c
sched_ext: Documentation: Clarify time slice handling in task lifecycle
sched_ext: Make scx_locked_rq() inline
sched_ext: Make scx_rq_bypassing() inline
sched_ext: idle: Make local functions static in ext_idle.c
sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
Merge tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Allow css_rstat_updated() in NMI context to enable memory accounting
for allocations in NMI context.
- /proc/cgroups doesn't contain useful information for cgroup2 and was
updated to only show v1 controllers. This unfortunately broke
something in the wild. Add an option to bring back the old behavior
to ease transition.
- selftest updates and other cleanups.
* tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Add compatibility option for content of /proc/cgroups
selftests/cgroup: fix cpu.max tests
cgroup: llist: avoid memory tears for llist_node
selftests: cgroup: Fix missing newline in test_zswap_writeback_one
selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup
memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
cgroup: remove per-cpu per-subsystem locks
cgroup: make css_rstat_updated nmi safe
cgroup: support to enable nmi-safe css_rstat_updated
selftests: cgroup: Fix compilation on pre-cgroupns kernels
selftests: cgroup: Optionally set up v1 environment
selftests: cgroup: Add support for named v1 hierarchies in test_core
selftests: cgroup_util: Add helpers for testing named v1 hierarchies
Documentation: cgroup: add section explaining controller availability
cgroup: Drop sock_cgroup_classid() dummy implementation
Merge tag 'wq-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Prepare for defaulting to unbound workqueue. A separate branch was
created to ease pulling in from other trees but none of the
conversions have landed yet
- Memory allocation profiling support added
- Misc changes
* tag 'wq-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Use atomic_try_cmpxchg_relaxed() in tryinc_node_nr_active()
workqueue: Remove unused work_on_cpu_safe
workqueue: Add new WQ_PERCPU flag
workqueue: Add system_percpu_wq and system_dfl_wq
workqueue: Basic memory allocation profiling support
workqueue: fix opencoded cpumask_next_and_wrap() in wq_select_unbound_cpu()
Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"As usual, many cleanups. The below blurbiage describes 42 patchsets.
21 of those are partially or fully cleanup work. "cleans up",
"cleanup", "maintainability", "rationalizes", etc.
I never knew the MM code was so dirty.
"mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
mapped VMAs were not eligible for merging with existing adjacent
VMAs.
"mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
adds a new kernel module which simplifies the setup and usage of
DAMON in production environments.
"stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
is a cleanup to the writeback code which removes a couple of
pointers from struct writeback_control.
"drivers/base/node.c: optimization and cleanups" (Donet Tom)
contains largely uncorrelated cleanups to the NUMA node setup and
management code.
"mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
does some maintenance work on the userfaultfd code.
"Readahead tweaks for larger folios" (Ryan Roberts)
implements some tuneups for pagecache readahead when it is reading
into order>0 folios.
"selftests/mm: Tweaks to the cow test" (Mark Brown)
provides some cleanups and consistency improvements to the
selftests code.
"Optimize mremap() for large folios" (Dev Jain)
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
"Remove zero_user()" (Matthew Wilcox)
expunges zero_user() in favor of the more modern memzero_page().
"mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
addresses some warts which David noticed in the huge page code.
These were not known to be causing any issues at this time.
"mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
provides some cleanup and consolidation work in DAMON.
"use vm_flags_t consistently" (Lorenzo Stoakes)
uses vm_flags_t in places where we were inappropriately using other
types.
"mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
increases the reliability of large page allocation in the memfd
code.
"mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
removes several now-unneeded PFN_* flags.
"mm/damon: decouple sysfs from core" (SeongJae Park)
implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
"madvise cleanup" (Lorenzo Stoakes)
does quite a lot of cleanup/maintenance work in the madvise() code.
"madvise anon_name cleanups" (Vlastimil Babka)
provides additional cleanups on top or Lorenzo's effort.
"Implement numa node notifier" (Oscar Salvador)
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory
on/offline notifier.
"Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
cleans up the pageblock isolation code and fixes a potential issue
which doesn't seem to cause any problems in practice.
"selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
adds additional drgn- and python-based DAMON selftests which are
more comprehensive than the existing selftest suite.
"Misc rework on hugetlb faulting path" (Oscar Salvador)
fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
"cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
rationalizes and cleans up the highmem-specific code in the CMA
allocator.
"mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
provides cleanups and future-preparedness to the migration code.
"mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
adds some tracepoints to some DAMON auto-tuning code.
"mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
does that.
"mm/damon: misc cleanups" (SeongJae Park)
also does what it claims.
"mm: folio_pte_batch() improvements" (David Hildenbrand)
cleans up the large folio PTE batching code.
"mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
facilitates dynamic alteration of DAMON's inter-node allocation
policy.
"Remove unmap_and_put_page()" (Vishal Moola)
provides a couple of page->folio conversions.
"mm: per-node proactive reclaim" (Davidlohr Bueso)
implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
"mm/damon: remove damon_callback" (SeongJae Park)
replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
"mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
implements a number of mremap cleanups (of course) in preparation
for adding new mremap() functionality: newly permit the remapping
of multiple VMAs when the user is specifying MREMAP_FIXED. It still
excludes some specialized situations where this cannot be performed
reliably.
"drop hugetlb_free_pgd_range()" (Anthony Yznaga)
switches some sparc hugetlb code over to the generic version and
removes the thus-unneeded hugetlb_free_pgd_range().
"mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
augments the present userspace-requested update of DAMON sysfs
monitoring files. Automatic update is now provided, along with a
tunable to control the update interval.
"Some randome fixes and cleanups to swapfile" (Kemeng Shi)
does what is claims.
"mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
provides (and uses) a means by which debug-style functions can grab
a copy of a pageframe and inspect it locklessly without tripping
over the races inherent in operating on the live pageframe
directly.
"use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
addresses the large contention issues which can be triggered by
reads from that procfs file. Latencies are reduced by more than
half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
"__folio_split() clean up" (Zi Yan)
cleans up __folio_split()!
"Optimize mprotect() for large folios" (Dev Jain)
provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
"selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
does some cleanup work in the selftests code.
"tools/testing: expand mremap testing" (Lorenzo Stoakes)
extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
"selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
extends the DAMON sysfs interface selftest so that it tests all
possible user-requested parameters. Rather than the present minimal
subset"
* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
MAINTAINERS: add missing headers to mempory policy & migration section
MAINTAINERS: add missing file to cgroup section
MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
MAINTAINERS: add missing zsmalloc file
MAINTAINERS: add missing files to page alloc section
MAINTAINERS: add missing shrinker files
MAINTAINERS: move memremap.[ch] to hotplug section
MAINTAINERS: add missing mm_slot.h file THP section
MAINTAINERS: add missing interval_tree.c to memory mapping section
MAINTAINERS: add missing percpu-internal.h file to per-cpu section
mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
selftests/damon: introduce _common.sh to host shared function
selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
selftests/damon/sysfs.py: test non-default parameters runtime commit
selftests/damon/sysfs.py: generalize DAMON context commit assertion
selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
selftests/damon/sysfs.py: test DAMOS filters commitment
selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
selftests/damon/sysfs.py: test DAMOS destinations commitment
...
- support for Wake-on-touch in intel-thc (Even Xu)
- support for "Input max input size control" and "Input interrupt delay"
I2C features in order to improve compatibility of THC devices with
legacy HIDI2C touch devices (Even Xu)
Merge tag 'mtd/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd updates from Miquel Raynal:
"MTD changes:
- Apart from a binding conversion to yaml, only minor changes/small
fixes have been merged.
Raw NAND changes:
- Minor fixes for various controller drivers like DMA mapping checks,
better timing derivations or bitflip statistics.
- some Hynix NAND flashes were not supporting read-retries, so don't
even try to do it
SPI NAND changes:
- In order to support high-speed modes, certain chips need extra
configuration like adding more dummy cycles. This is now possible,
especially on Winbond chips.
- Aside from that, Gigadevice gets support for a new chip (GD5F1GM9).
SPI NOR changes:
- A notable changes is the fix for exiting 4-byte addressing on
Infineon SEMPER flashes. These flashes do not support the standard
EX4B opcode (E9h), and use a vendor-specific opcode (B8h) instead.
- There is also a fix for unlocking flashes that are write-protected
at power-on. This was caused by using an uninitialized mtd_info in
spi_nor_try_unlock_all()"
* tag 'mtd/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (26 commits)
mtd: spinand: winbond: Add comment about the maximum frequency
mtd: spinand: winbond: Enable high-speed modes on w35n0xjw
mtd: spinand: winbond: Enable high-speed modes on w25n0xjw
mtd: spinand: Add a ->configure_chip() hook
mtd: spinand: Add a frequency field to all READ_FROM_CACHE variants
mtd: spinand: Fix macro alignment
spi: spi-mem: Take into account the actual maximum frequency
spi: spi-mem: Use picoseconds for calculating the op durations
mtd: rawnand: atmel: set pmecc data setup time
mtd: spinand: propagate spinand_wait() errors from spinand_write_page()
mtd: rawnand: fsmc: Add missing check after DMA map
mtd: rawnand: rockchip: Add missing check after DMA map
mtd: rawnand: hynix: don't try read-retry on SLC NANDs
mtd: rawnand: atmel: Fix dma_mapping_error() address
mtd: nand: brcmnand: fix mtd corrected bits stat
mtd: rawnand: renesas: Add missing check after DMA map
mtd: spinand: gigadevice: Add support for GD5F1GM9 chips
mtd: nand: brcmnand: replace manual string choices with standard helpers
mtd: map: Don't use "proxy" headers
mtd: spi-nor: Fix spi_nor_try_unlock_all()
...
- fix for potential NULL pointer dereference in hid-apple that could be caused
by malicious device with APPLE_MAGIC_BACKLIGHT quirk present triggering
overflow in data field (Qasim Ijaz)
- third party trackpart support for MacBookPro15,1 (Aditya Garg)
- Apple Magic Keyboard A311[89] USB-C support (Aditya Garg, Grigorii Sokolik)
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"This is the usual collection of primarily clk driver updates.
The big part of the diff is all the new Qualcomm clk drivers added for
a few SoCs they're working on. The other two vendors with significant
work this cycle are Renesas and Amlogic. Renesas adds a bunch of clks
to existing drivers and supports some new SoCs while Amlogic is
starting a significant refactoring to simplify their code.
The core framework gained a pair of helpers to get the 'struct device'
or 'struct device_node' associated with a 'struct clk_hw'. Some
associated KUnit tests were added for these simple helpers as well.
Beyond that core change there are lots of little fixes throughout the
clk drivers for the stuff we see every day, wrong clk driver data that
affects tree topology or supported frequencies, etc. They're not found
until the clks are actually used by some consumer device driver.
New Drivers:
- Global, display, gpu, video, camera, tcsr, and rpmh clock
controller for the Qualcomm Milos SoC
- Camera, display, GPU, and video clock controllers for Qualcomm
QCS615
- Video clock controller driver for Qualcomm SM6350
- Camera clock controller driver for Qualcomm SC8180X
- I3C clocks and resets on Renesas RZ/G3E
- Expanded Serial Peripheral Interface (xSPI) clocks and resets on
Renesas RZ/V2H(P) and RZ/V2N
- SPI (RSPI) clocks and resets on Renesas RZ/V2H(P)
- SDHI and I2C clocks on Renesas RZ/T2H and RZ/N2H
- Ethernet clocks and resets on Renesas RZ/G3E
- Initial support for the Renesas RZ/T2H (R9A09G077) and RZ/N2H
(R9A09G087) SoCs
- Ethernet clocks and resets on Renesas RZ/V2H and RZ/V2N
- Timer, I2C, watchdog, GPU, and USB2.0 clocks and resets on Renesas
RZ/V2N
Updates:
- Support atomic PWMs in the PWM clk driver
- clk_hw_get_dev() and clk_hw_get_of_node() helpers
- Replace round_rate() with determine_rate() in various clk drivers
- Convert clk DT bindings to DT schema format for DT validation
- Various clk driver cleanups and refactorings from static analysis
tools and possibly real humans
- A lot of little fixes here and there to things like clk tree
topology, missing frequencies, flagging clks as critical, etc"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (216 commits)
clk: clocking-wizard: Fix the round rate handling for versal
clk: Fix typos
clk: spacemit: ccu_pll: fix error return value in recalc_rate callback
clk: tegra: periph: Make tegra_clk_periph_ops static
clk: tegra: periph: Fix error handling and resolve unsigned compare warning
clk: imx: scu: convert from round_rate() to determine_rate()
clk: imx: pllv4: convert from round_rate() to determine_rate()
clk: imx: pllv3: convert from round_rate() to determine_rate()
clk: imx: pllv2: convert from round_rate() to determine_rate()
clk: imx: pll14xx: convert from round_rate() to determine_rate()
clk: imx: pfd: convert from round_rate() to determine_rate()
clk: imx: frac-pll: convert from round_rate() to determine_rate()
clk: imx: fracn-gppll: convert from round_rate() to determine_rate()
clk: imx: fixup-div: convert from round_rate() to determine_rate()
clk: imx: cpu: convert from round_rate() to determine_rate()
clk: imx: busy: convert from round_rate() to determine_rate()
clk: imx: composite-93: remove round_rate() in favor of determine_rate()
clk: imx: composite-8m: remove round_rate() in favor of determine_rate()
clk: qcom: Remove redundant pm_runtime_mark_last_busy() calls
clk: imx: Remove redundant pm_runtime_mark_last_busy() calls
...
Merge tag 'pwm/for-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fixes from Uwe Kleine-König:
"Two fixes for the mediatek and the imx-tpm driver. Both are old
(v4.12-rc1 and v5.2-rc1 respectively).
The mediatek issue is that both period and duty_cycle were configured
to higher values than requested. For most applications the period part
is no tragedy, but a PWM that is configured for duty_cycle = 0 should
really emit a constant inactive signal. That was noticed by an LED not
being completely off in this case (two commits for one fix: a
preparatory one and the actual fix in the second one).
For the imx-tpm PWM driver the fixed issue is that the first period is
quite a bit too long under some circumstances. So it might take up to
UINT32_MAX << 7 clock ticks until the PWM starts toggling. With an
assumed input clock rate of 166 MHz (completely made up) that's 55
minutes"
* tag 'pwm/for-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: imx-tpm: Reset counter if CMOD is 0
pwm: mediatek: Fix duty and period setting
pwm: mediatek: Handle hardware enable and clock enable separately
Miguel Ojeda [Thu, 31 Jul 2025 19:41:37 +0000 (21:41 +0200)]
gpu: nova-core: fix up formatting after merge
In the merge 260f6f4fda93 ("Merge tag 'drm-next-2025-07-30' of
https://gitlab.freedesktop.org/drm/kernel"), the formatting in the
conflict resolution doesn't match what `make rustfmt` wants to make it.
Fix it up appropriately.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'media/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- v4l2 core:
- sub-device framework routing improvements
- NV12M tiled variants added to v4l2_format_info
- some fixes at control handler freeing logic
- fixed H264 SEPARATE_COLOUR_PLANE check
- new staging driver: Intel IPU7 PCI
- Rockchip video decoder driver got promoted from staging
- iris: added HEVC/VP9 encoder/decoder support
- vsp1: driver has gained Renesas VSPX support
- uvc:
- switched to vb2 ioctl helpers
- added MSXU 1.5 metadata support
- atomisp: GC0310 sensor driver cleanups in preparation for moving it
out of staging
- Lots of cleanup, fixes and improvements
* tag 'media/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (310 commits)
media: rkvdec: Unstage the driver
media: rkvdec: Remove TODO file
media: dt-bindings: rockchip: Add RK3576 Video Decoder bindings
media: dt-bindings: rockchip: Document RK3588 Video Decoder bindings
media: amphion: Support dmabuf and v4l2 buffer without binding
media: verisilicon: postproc: 4K support
media: v4l2: Add support for NV12M tiled variants to v4l2_format_info()
media: uvcvideo: Use a count variable for meta_formats instead of 0 terminating
media: uvcvideo: Auto-set UVC_QUIRK_MSXU_META
media: uvcvideo: Introduce V4L2_META_FMT_UVC_MSXU_1_5
media: uvcvideo: Introduce dev->meta_formats
media: Documentation: Add note about UVCH length field
media: uvcvideo: Do not mark valid metadata as invalid
media: uvcvideo: uvc_v4l2_unlocked_ioctl: Invert PM logic
media: core: export v4l2_translate_cmd
media: uvcvideo: Turn on the camera if V4L2_EVENT_SUB_FL_SEND_INITIAL
media: uvcvideo: Remove stream->is_streaming field
media: uvcvideo: Split uvc_stop_streaming()
media: uvcvideo: Handle locks in uvc_queue_return_buffers
media: uvcvideo: Use vb2 ioctl and fop helpers
...
Merge tag 'libnvdimm-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ira Weiny:
"Header file cleanups. Specifically use specific headers rather than
just kernel.h in libnvdimm.h to reduce header file usage.
The second patch is a fix to CXL but is going through my tree as a
libnvdimm patch caused the issue"
* tag 'libnvdimm-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
cxl: Include range.h in cxl.h
libnvdimm: Don't use "proxy" headers
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"This broadly brings the assigned HW command queue support to iommufd.
This feature is used to improve SVA performance in VMs by avoiding
paravirtualization traps during SVA invalidations.
Along the way I think some of the core logic is in a much better state
to support future driver backed features.
Summary:
- IOMMU HW now has features to directly assign HW command queues to a
guest VM. In this mode the command queue operates on a limited set
of invalidation commands that are suitable for improving guest
invalidation performance and easy for the HW to virtualize.
This brings the generic infrastructure to allow IOMMU drivers to
expose such command queues through the iommufd uAPI, mmap the
doorbell pages, and get the guest physical range for the command
queue ring itself.
- An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built
on the new iommufd command queue features. It works with the
existing SMMU driver support for cmdqv in guest VMs.
- Many precursor cleanups and improvements to support the above
cleanly, changes to the general ioctl and object helpers, driver
support for VDEVICE, and mmap pgoff cookie infrastructure.
- Sequence VDEVICE destruction to always happen before VFIO device
destruction. When using the above type features, and also in future
confidential compute, the internal virtual device representation
becomes linked to HW or CC TSM configuration and objects. If a VFIO
device is removed from iommufd those HW objects should also be
cleaned up to prevent a sort of UAF. This became important now that
we have HW backing the VDEVICE.
- Fix one syzkaller found error related to math overflows during iova
allocation"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits)
iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size
iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3
iommufd: Rename some shortterm-related identifiers
iommufd/selftest: Add coverage for vdevice tombstone
iommufd/selftest: Explicitly skip tests for inapplicable variant
iommufd/vdevice: Remove struct device reference from struct vdevice
iommufd: Destroy vdevice on idevice destroy
iommufd: Add a pre_destroy() op for objects
iommufd: Add iommufd_object_tombstone_user() helper
iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
iommufd/selftest: Test reserved regions near ULONG_MAX
iommufd: Prevent ALIGN() overflow
iommu/tegra241-cmdqv: import IOMMUFD module namespace
iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops
...
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
- Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1,
rxe, erdma, mana_ib
- Prefetch supprot for rxe ODP
- Remove memory window support from hns as new device FW is no longer
support it
- Remove qib, it is very old and obsolete now, Cornelis wishes to
restructure the hfi1/qib shared layer
- Fix a race in destroying CQs where we can still end up with work
running because the work is cancled before the driver stops
triggering it
- Improve interaction with namespaces:
* Follow the devlink namespace for newly spawned RDMA devices
* Create iopoib net devces in the parent IB device's namespace
* Allow CAP_NET_RAW checks to pass in user namespaces
- A new flow control scheme for IB MADs to try and avoid queue
overflows in the network
- Fix 2G message sizes in bnxt_re
- Optimize mkey layout for mlx5 DMABUF
- New "DMA Handle" concept to allow controlling PCI TPH and steering
tags
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits)
RDMA/siw: Change maintainer email address
RDMA/mana_ib: add support of multiple ports
RDMA/mlx5: Refactor optional counters steering code
RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr
IB: Extend UVERBS_METHOD_REG_MR to get DMAH
RDMA/mlx5: Add DMAH object support
RDMA/core: Introduce a DMAH object and its alloc/free APIs
IB/core: Add UVERBS_METHOD_REG_MR on the MR object
net/mlx5: Add support for device steering tag
net/mlx5: Expose IFC bits for TPH
PCI/TPH: Expose pcie_tph_get_st_table_size()
RDMA/mlx5: Fix incorrect MKEY masking
RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey()
RDMA/mlx5: remove redundant check on err on return expression
RDMA/mana_ib: add additional port counters
RDMA/mana_ib: Fix DSCP value in modify QP
RDMA/efa: Add CQ with external memory support
RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers
RDMA/uverbs: Add a common way to create CQ with umem
RDMA/mlx5: Optimize DMABUF mkey page size
...
Merge tag 'leds-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"Improvements & Fixes:
- A fix for QCOM Flash to prevent incorrect register access when the
driver is re-bound. This is solved by duplicating a static register
array during the probe function, which prevents register addresses
from being miscalculated after multiple binds
- The LP50xx driver now correctly handles the 'reg' property in
device tree child nodes to ensure the multi_index is set correctly.
This prevents issues where LEDs could be controlled incorrectly if
the device tree nodes were processed in a different order. An error
is returned if the reg property is missing or out of range
- Add a Kconfig dependency on between TPS6131x and
V4L2_FLASH_LED_CLASS to prevent a build failure when the driver is
built-in and the V4L2 flash infrastructure is a loadable module
- Fix a potential buffer overflow warning in PCA955x reported by
older GCC versions by using a more precise format specifier when
creating the default LED label.
Cleanups & Refactoring:
- Correct the MAINTAINERS file entry for the TPS6131X flash LED
driver to point to the correct device tree binding file name
- Fix a comment in the Flash Class for the flash_timeout setter to
"flash timeout" from "flash duration" for accuracy
- The of_led_get() function is no longer exported as it has no users
outside of its own module.
Removals:
- Revert the commit to configure LED blink intervals for hardware
offload in the Netdev Trigger. This change was found to break
existing PHY drivers by putting their LEDs into a permanent,
unconditional blinking state.
Device Tree Bindings Updates:
- Update the binding for LP50xx to document that the child reg
property is the index within the LED bank. The example was also
updated to use correct values
- Update the JNCP5623 binding to add 0x39 as a valid I2C address, as
it is used by the NCP5623C variant"
* tag 'leds-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds:
dt-bindings: leds: ncp5623: Add 0x39 as a valid I2C address
Revert "leds: trigger: netdev: Configure LED blink interval for HW offload"
leds: pca955x: Avoid potential overflow when filling default_label (take 2)
leds: Unexport of_led_get()
leds: tps6131x: Add V4L2_FLASH_LED_CLASS dependency
dt-bindings: leds: lp50xx: Document child reg, fix example
leds: leds-lp50xx: Handle reg to get correct multi_index
leds: led-class-flash:: Fix flash_timeout comment
MAINTAINERS: Adjust file entry in TPS6131X FLASH LED DRIVER
leds: flash: leds-qcom-flash: Fix registry access after re-bind
Merge tag 'mfd-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Support & Features:
- Add extensive support for the Analog Devices ADP5589 I/O expander,
including core MFD, GPIO, PWM, and a new keypad matrix input
driver. This also adds support for handling various events
including GPI, keypad, reset and unlock ev ents
- Add support for the TI TPS652G1 PMIC, a stripped-down version of
the TPS65224, including core MFD, PFSM, pinctrl, and GPIO support
- Add support for the Apple Silicon System Management Controller
(SMC), including the core MFD driver which handles the RTKit-based
protocol, a new GPIO driver for PMU GPIOs, and a new
reboot/power-off driver.
Improvements & Fixes:
- Dynamically add ADP5585 sub-devices based on device tree properties
- Move ADP5585 oscillator control from the child PWM driver to the
main MFD driver to better handle shared resources
- Add support for a hardware reset pin and VDD regulator to the
ADP5585 driver
- Update the TPS65219 MFD cell's GPIO compatible string for the
TPS65214 to reflect hardware capabilities correctly
- Separate the ChromeOS EC charge-control probing from the USB-PD
subsystem, allowing it to probe independently based on the
dedicated EC_FEATURE_CHARGER
- Fix an interrupt naming typo in the MT6370 driver
- Fix RK806 PMIC reset behavior by allowing the reset mode to be
customized via a new device tree property
- Fix AXP20X regulator cell ID conflicts for secondary PMICs on
boards without an IRQ line connected
- Fix MT6397 keypad sub-device creation to use specific names instead
of a generic one, ensuring correct driver binding
- Fix a build warning in the stm32-timers driver by adding a missing
include for export.h.
Cleanups & Refactoring:
- Refactor the ADP5585 driver to simplify how regmap defaults are
handled, making it easier to add new chip variants
- Introduce per-chip register map structures for the ADP5585/ADP5589
family to handle differences between the devices
- Convert several drivers to use dev_fwnode() instead of
of_fwnode_handle()
- Make various static structures const in the cs40l50, rohm-bd71828,
tps65219, and twl6040 drivers
- Remove redundant pm_runtime_mark_last_busy() calls from several
drivers
- Alphabetize Kconfig entries for Cirrus Logic and Maxim drivers
- Remove unused fields from the 'tps65219' struct
- Update several MFD-related headers to follow the 'Include What You
Use' (IWYU) principle.
Removals:
- Remove the old, platform-data-based adp5589-keys input driver,
which is now superseded by the new MFD-based adp5585-keys driver
- Remove the unused twl6030_mmc_card_detect() functions and
associated header declarations
- Remove the now unused pcf50633/core.h header file
- Remove the fsl,imx8qxp-csr device tree binding, which was being
used incorrectly.
Device Tree Bindings Updates:
- Add support for the Analog Devices ADP5589 I/O expander to the
adi,adp5585.yaml binding
- Add new properties to the adi,adp5585.yaml binding for input
events, including keypad pins, unlock events, and reset events
- Add a reset-gpios property to the adi,adp5585.yaml binding
- Add the TI TPS652G1 PMIC to the ti,tps6594.yaml binding
- Add new bindings for the Apple Mac System Management Controller
(SMC) and its sub-devices: apple,smc.yaml, apple,smc-gpio.yaml, and
apple,smc-reboot.yaml
- Convert the Freescale MXS LRADC binding (mxs-lradc) to YAML schema
format
- Convert and combine the NXP LPC1850 CREG, DMAMUX, and USB OTG PHY
bindings into a single YAML schema file
- Convert the TI TPS65910 binding to YAML schema format
- Add a comment to the samsung,s2mps11.yaml binding to clarify the
use of 'oneOf' for interrupt properties
- Add the rockchip,reset-mode property to the rockchip,rk806.yaml
binding to allow customization of the PMIC's reset behavior"
* tag 'mfd-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (28 commits)
mfd: dt-bindings: Convert TPS65910 to DT schema
mfd: Minor Cirrus/Maxim Kconfig order fixes
mfd: Remove redundant pm_runtime_mark_last_busy() calls
mfd: mt6397: Do not use generic name for keypad sub-devices
mfd: axp20x: Set explicit ID for regulator cell if no IRQ line is present
mfd: mt6370: Fix the interrupt naming typo
mfd: rk8xx-core: Allow to customize RK806 reset mode
dt-bindings: mfd: rk806: Allow to customize PMIC reset mode
mfd: syscon: atmel-smc: Don't use "proxy" headers
mfd: madera: Don't use "proxy" headers
mfd: wm8350-core: Don't use "proxy" headers
dt-bindings: mfd: samsung,s2mps11: Add comment about interrupts properties
mfd: davinci_voicecodec: Don't use "proxy" headers
mfd: pcf50633: Remove the header file core.h
mfd: tps65219: Remove another unused field from 'struct tps65219'
mfd: tps65219: Remove an unused field from 'struct tps65219'
mfd: tps65219: Constify struct regmap_irq_sub_irq_map and tps65219_chip_data
mfd: rohm-bd71828: Constify some structures
dt-bindings: mfd: fsl,imx8qxp-csr: Remove binding documentation
mfd: axp20x: Set explicit ID for AXP313 regulator
...
Merge tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity update from Mimi Zohar:
"A single commit to permit disabling IMA from the boot command line for
just the kdump kernel.
The exception itself sort of makes sense. My concern is that
exceptions do not remain as exceptions, but somehow morph to become
the norm"
* tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add a knob ima= to allow disabling IMA in kdump kernel
Merge tag 'caps-pr-20250729' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux
Pull capabilities update from Serge Hallyn:
- Fix broken link in documentation in capability.h
- Correct the permission check for unsafe exec
During exec, different effective and real credentials were assumed to
mean changed credentials, making it impossible in the no-new-privs
case to keep different uid and euid
* tag 'caps-pr-20250729' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
uapi: fix broken link in linux/capability.h
exec: Correct the permission check for unsafe exec
Merge tag 'sh-for-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh update from John Paul Adrian Glaubitz:
"A single patch by Ben Hutchings replaces the hyphen in the exported
shell variable ld-bfd with an underscore to avoid issues with certain
shells such as dash which do not pass through environment variables
whose name includes a hyphen"
* tag 'sh-for-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Do not use hyphen in exported variable name
Merge tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"A couple of small improvements for fsnotify subsystem.
The most interesting is probably Amir's change modifying the meaning
of fsnotify fmode bits (and I spell it out specifically because I know
you care about those). There's no change for the common cases of no
fsnotify watches or no permission event watches. But when there are
permission watches (either for open or for pre-content events) but no
FAN_ACCESS_PERM watch (which nobody uses in practice) we are now able
optimize away unnecessary cache loads from the read path"
* tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases
fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook
samples: fix building fs-monitor on musl systems
fanotify: sanitize handle_type values when reporting fid
Merge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy
Pull jfs updates from Dave Kleikamp:
"Fixes and cleanups for JFS filesystem"
* tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy:
jfs: fix metapage reference count leak in dbAllocCtl
jfs: stop using write_cache_pages
jfs: truncate good inode pages when hard link is 0
jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage()
jfs: Regular file corruption check
jfs: upper bound check of tree index in dbAllocAG
Merge tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Fixes for string handling in debugfs and sysfs:
- change scnprintf to sysfs_emit in sysfs code.
- change sprintf to scnprintf in debugfs code.
- refactor debugfs mask-to-string code for readability and slightly
improved functionality"
* tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
fs/orangefs: Allow 2 more characters in do_c_string()
fs: orangefs: replace scnprintf() with sysfs_emit()
fs/orangefs: use snprintf() instead of sprintf()
Merge tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
"UBIFS:
- No longer use write_cache_pages()
UBI:
- Remove an unused function"
* tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: stop using write_cache_pages
mtd: ubi: Remove unused ubi_flush
Merge tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Major ext4 changes for 6.17:
- Better scalability for ext4 block allocation
- Fix insufficient credits when writing back large folios
Miscellaneous bug fixes, especially when handling exteded attriutes,
inline data, and fast commit"
* tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (39 commits)
ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr
ext4: implement linear-like traversal across order xarrays
ext4: refactor choose group to scan group
ext4: convert free groups order lists to xarrays
ext4: factor out ext4_mb_scan_group()
ext4: factor out ext4_mb_might_prefetch()
ext4: factor out __ext4_mb_scan_group()
ext4: fix largest free orders lists corruption on mb_optimize_scan switch
ext4: fix zombie groups in average fragment size lists
ext4: merge freed extent with existing extents before insertion
ext4: convert sbi->s_mb_free_pending to atomic_t
ext4: fix typo in CR_GOAL_LEN_SLOW comment
ext4: get rid of some obsolete EXT4_MB_HINT flags
ext4: utilize multiple global goals to reduce contention
ext4: remove unnecessary s_md_lock on update s_mb_last_group
ext4: remove unnecessary s_mb_last_start
ext4: separate stream goal hits from s_bal_goals for better tracking
ext4: add ext4_try_lock_group() to skip busy groups
ext4: initialize superblock fields in the kballoc-test.c kunit tests
ext4: refactor the inline directory conversion and new directory codepaths
...
Various controller drivers received minor fixes like DMA mapping checks,
better timing derivations or bitflip statistics.
It has also been discovered that some Hynix NAND flashes were not
supporting read-retries, which is not properly supported.
* SPI NAND changes:
In order to support high-speed modes, certain chips need extra
configuration like adding more dummy cycles. This is now possible,
especially on Winbond chips.
Aside from that, Gigadevice gets support for a new chip (GD5F1GM9).
- Fix exiting 4-byte addressing on Infineon SEMPER flashes. These
flashes do not support the standard EX4B opcode (E9h), and use a
vendor-specific opcode (B8h) instead.
- Fix unlocking of flashes that are write-protected at power-on. This
was caused by using an uninitialized mtd_info in
spi_nor_try_unlock_all().
Merge tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Allow hash drivers without fallbacks (e.g., hardware key)
Algorithms:
- Add hmac hardware key support (phmac) on s390
- Re-enable sha384 in FIPS mode
- Disable sha1 in FIPS mode
- Convert zstd to acomp
Drivers:
- Lower priority of qat skcipher and aead
- Convert aspeed to partial block API
- Add iMX8QXP support in caam
- Add rate limiting support for GEN6 devices in qat
- Enable telemetry for GEN6 devices in qat
- Implement full backlog mode for hisilicon/sec2"
* tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: keembay - Use min() to simplify ocs_create_linked_list_from_sg()
crypto: hisilicon/hpre - fix dma unmap sequence
crypto: qat - make adf_dev_autoreset() static
crypto: ccp - reduce stack usage in ccp_run_aes_gcm_cmd
crypto: qat - refactor ring-related debug functions
crypto: qat - fix seq_file position update in adf_ring_next()
crypto: qat - fix DMA direction for compression on GEN2 devices
crypto: jitter - replace ARRAY_SIZE definition with header include
crypto: engine - remove {prepare,unprepare}_crypt_hardware callbacks
crypto: engine - remove request batching support
crypto: qat - flush misc workqueue during device shutdown
crypto: qat - enable rate limiting feature for GEN6 devices
crypto: qat - add compression slice count for rate limiting
crypto: qat - add get_svc_slice_cnt() in device data structure
crypto: qat - add adf_rl_get_num_svc_aes() in rate limiting
crypto: qat - relocate service related functions
crypto: qat - consolidate service enums
crypto: qat - add decompression service for rate limiting
crypto: qat - validate service in rate limiting sysfs api
crypto: hisilicon/sec2 - implement full backlog mode for sec
...
Merge tag 'ipe-pr-20250728' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull ipe update from Fan Wu:
"A single commit from Eric Biggers to simplify the IPE (Integrity
Policy Enforcement) policy audit with the SHA-256 library API"
* tag 'ipe-pr-20250728' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: use SHA-256 library API instead of crypto_shash API
Stephen Boyd [Thu, 31 Jul 2025 16:10:06 +0000 (09:10 -0700)]
Merge branch 'clk-fixes' into clk-next
Resolve conflicts with i.MX95 changes 88768d6f8c13 ("clk:
imx95-blk-ctl: Rename lvds and displaymix csr blk") in clk-imx
and aacc875a448d ("clk: imx: Fix an out-of-bounds access in
dispmix_csr_clk_dev_data") in clk-fixes.
* clk-fixes:
clk: sunxi-ng: v3s: Fix TCON clock parents
clk: sunxi-ng: v3s: Fix CSI1 MCLK clock name
clk: sunxi-ng: v3s: Fix CSI SCLK clock name
dt-bindings: clock: mediatek: Add #reset-cells property for MT8188
clk: imx: Fix an out-of-bounds access in dispmix_csr_clk_dev_data
clk: scmi: Handle case where child clocks are initialized before their parents
clk: sunxi-ng: a523: Mark MBUS clock as critical
Pull documentation updates from Jonathan Corbet:
"It has been a relatively busy cycle for docs, especially the build
system:
- The Perl kernel-doc script was added to 2.3.52pre1 just after the
turn of the millennium. Over the following 25 years, it accumulated
a vast amount of cruft, all in a language few people want to deal
with anymore. Mauro's Python replacement in 6.16 faithfully
reproduced all of the cruft in the hope of avoiding regressions.
Now that we have a more reasonable code base, though, we can work
on cleaning it up; many of the changes this time around are toward
that end.
- A reorganization of the ext4 docs into the usual TOC format.
- Various Chinese translations and updates.
- A new script from Mauro to help with docs-build testing.
- A new document for linked lists
- A sweep through MAINTAINERS fixing broken GitHub git:// repository
links.
...and lots of fixes and updates"
* tag 'docs-6.17' of git://git.lwn.net/linux: (147 commits)
scripts: add origin commit identification based on specific patterns
sphinx: kernel_abi: fix performance regression with O=<dir>
Documentation: core-api: entry: Replace deprecated KVM entry/exit functions
docs: fault-injection: drop reference to md-faulty
docs: document linked lists
scripts: kdoc: make it backward-compatible with Python 3.7
docs: kernel-doc: emit warnings for ancient versions of Python
Documentation/rtla: Describe exit status
Documentation/rtla: Add include common_appendix.rst
docs: kernel: Clarify printk_ratelimit_burst reset behavior
Documentation: ioctl-number: Don't repeat macro names
Documentation: ioctl-number: Shorten macros table
Documentation: ioctl-number: Correct full path to papr-physical-attestation.h
Documentation: ioctl-number: Extend "Include File" column width
Documentation: ioctl-number: Fix linuxppc-dev mailto link
overlayfs.rst: fix typos
docs: kdoc: emit a warning for ancient versions of Python
docs: kdoc: clean up check_sections()
docs: kdoc: directly access the always-there KdocItem fields
docs: kdoc: straighten up dump_declaration()
...
Ben Horgan [Wed, 9 Jul 2025 09:38:08 +0000 (10:38 +0100)]
bitfield: Ensure the return values of helper functions are checked
As type##_replace_bits() has no side effects it is only useful if its
return value is checked. Add __must_check to enforce this usage. To have
the bits replaced in-place typep##_replace_bits() can be used instead.
Although, type_##_get_bits() and type_##_encode_bits() are harder to misuse
they are still only useful if the return value is checked. For
consistency, also add __must_check to these.
Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Vincent Mailhol [Mon, 9 Jun 2025 02:45:47 +0000 (11:45 +0900)]
test_bits: add tests for __GENMASK() and __GENMASK_ULL()
The definitions of GENMASK() and GENMASK_ULL() do not depend any more
on __GENMASK() and __GENMASK_ULL(). Duplicate the existing unit tests
so that __GENMASK{,ULL}() are still covered.
Because __GENMASK() and __GENMASK_ULL() do use GENMASK_INPUT_CHECK(),
drop the TEST_GENMASK_FAILURES negative tests.
It would be good to have a small assembly test case for GENMASK*() in
case somebody decides to unify both in the future. However, I lack
expertise in assembly to do so. Instead add a FIXME message to
highlight the absence of the asm unit test.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Vincent Mailhol [Mon, 9 Jun 2025 02:45:45 +0000 (11:45 +0900)]
bits: split the definition of the asm and non-asm GENMASK*()
In an upcoming change, the non-asm GENMASK*() will all be unified to
depend on GENMASK_TYPE() which indirectly depend on sizeof(), something
not available in asm.
Instead of adding further complexity to GENMASK_TYPE() to make it work
for both asm and non asm, just split the definition of the two variants.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Shaopeng Tan [Mon, 23 Jun 2025 07:46:45 +0000 (16:46 +0900)]
cpumask: Remove unnecessary cpumask_nth_andnot()
Commit 94f753143028("x86/resctrl: Optimize cpumask_any_housekeeping()")
switched the only user of cpumask_nth_andnot() to other cpumask
functions, but left the function cpumask_nth_andnot() unused.
This makes function find_nth_andnot_bit() unused as well. Delete them.
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
clocksource: Improve randomness in clocksource_verify_choose_cpus()
The current algorithm of picking a random CPU works OK for dense online
cpumask, but if cpumask is non-dense, the distribution of picked CPUs
is skewed.
For example, on 8-CPU board with CPUs 4-7 offlined, the probability of
selecting CPU 0 is 5/8. Accordingly, cpus 1, 2 and 3 are chosen with
probability 1/8 each. The proper algorithm should pick each online CPU
with probability 1/4.
Switch it to cpumask_random(), which has better statistical
characteristics.
CC: Andrew Morton <akpm@linux-foundation.org> Acked-by: John Stultz <jstultz@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Yury Norov [NVIDIA]" <yury.norov@gmail.com>
Steve French [Mon, 28 Jul 2025 17:32:53 +0000 (12:32 -0500)]
smb3 client: add way to show directory leases for improved debugging
When looking at performance issues around directory caching, or debugging
directory lease issues, it is helpful to be able to display the current
directory leases (as we can e.g. or open files). Create pseudo-file
/proc/fs/cifs/open_dirs that displays current directory leases. Here
is sample output:
Steven Rostedt [Tue, 29 Jul 2025 18:23:14 +0000 (14:23 -0400)]
unwind: Finish up unwind when a task exits
On do_exit() when a task is exiting, if a unwind is requested and the
deferred user stacktrace is deferred via the task_work, the task_work
callback is called after exit_mm() is called in do_exit(). This means that
the user stack trace will not be retrieved and an empty stack is created.
Instead, add a function unwind_deferred_task_exit() and call it just
before exit_mm() so that the unwinder can call the requested callbacks
with the user space stack.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.504259474@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:13 +0000 (14:23 -0400)]
unwind deferred: Use SRCU unwind_deferred_task_work()
Instead of using the callback_mutex to protect the link list of callbacks
in unwind_deferred_task_work(), use SRCU instead. This gets called every
time a task exits that has to record a stack trace that was requested.
This can happen for many tasks on several CPUs at the same time. A mutex
is a bottleneck and can cause a bit of contention and slow down performance.
As the callbacks themselves are allowed to sleep, regular RCU cannot be
used to protect the list. Instead use SRCU, as that still allows the
callbacks to sleep and the list can be read without needing to hold the
callback_mutex.
Link: https://lore.kernel.org/all/ca9bd83a-6c80-4ee0-a83c-224b9d60b755@efficios.com/ Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.331548065@kernel.org Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:12 +0000 (14:23 -0400)]
unwind: Add USED bit to only have one conditional on way back to user space
On the way back to user space, the function unwind_reset_info() is called
unconditionally (but always inlined). It currently has two conditionals.
One that checks the unwind_mask which is set whenever a deferred trace is
called and is used to know that the mask needs to be cleared. The other
checks if the cache has been allocated, and if so, it resets the
nr_entries so that the unwinder knows it needs to do the work to get a new
user space stack trace again (it only does it once per entering the
kernel).
Use one of the bits in the unwind mask as a "USED" bit that gets set
whenever a trace is created. This will make it possible to only check the
unwind_mask in the unwind_reset_info() to know if it needs to do work or
not and eliminates a conditional that happens every time the task goes
back to user space.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.155422551@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:11 +0000 (14:23 -0400)]
unwind deferred: Add unwind_completed mask to stop spurious callbacks
If there's more than one registered tracer to the unwind deferred
infrastructure, it is currently possible that one tracer could cause extra
callbacks to happen for another tracer if the former requests a deferred
stacktrace after the latter's callback was executed and before the task
went back to user space.
Here's an example of how this could occur:
[Task enters kernel]
tracer 1 request -> add cookie to its buffer
tracer 1 request -> add cookie to its buffer
<..>
[ task work executes ]
tracer 1 callback -> add trace + cookie to its buffer
[tracer 2 requests and triggers the task work again]
[ task work executes again ]
tracer 1 callback -> add trace + cookie to its buffer
tracer 2 callback -> add trace + cookie to its buffer
[Task exits back to user space]
This is because the bit for tracer 1 gets set in the task's unwind_mask
when it did its request and does not get cleared until the task returns
back to user space. But if another tracer were to request another deferred
stacktrace, then the next task work will executed all tracer's callbacks
that have their bits set in the task's unwind_mask.
To fix this issue, add another mask called unwind_completed and place it
into the task's info->cache structure. The cache structure is allocated
on the first occurrence of a deferred stacktrace and this unwind_completed
mask is not needed until then. It's better to have it in the cache than to
permanently waste space in the task_struct.
After a tracer's callback is executed, it's bit gets set in this
unwind_completed mask. When the task_work enters, it will AND the task's
unwind_mask with the inverse of the unwind_completed which will eliminate
any work that already had its callback executed since the task entered the
kernel.
When the task leaves the kernel, it will reset this unwind_completed mask
just like it resets the other values as it enters user space.
Link: https://lore.kernel.org/all/20250716142609.47f0e4a5@batman.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.989222722@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:10 +0000 (14:23 -0400)]
unwind deferred: Use bitmask to determine which callbacks to call
In order to know which registered callback requested a stacktrace for when
the task goes back to user space, add a bitmask to keep track of all
registered tracers. The bitmask is the size of long, which means that on a
32 bit machine, it can have at most 32 registered tracers, and on 64 bit,
it can have at most 64 registered tracers. This should not be an issue as
there should not be more than 10 (unless BPF can abuse this?).
When a tracer registers with unwind_deferred_init() it will get a bit
number assigned to it. When a tracer requests a stacktrace, it will have
its bit set within the task_struct. When the task returns back to user
space, it will call the callbacks for all the registered tracers where
their bits are set in the task's mask.
When a tracer is removed by the unwind_deferred_cancel() all current tasks
will clear the associated bit, just in case another tracer gets registered
immediately afterward and then gets their callback called unexpectedly.
To prevent live locks from happening if an event that happens between the
task_work and when the task goes back to user space, triggers the deferred
unwind, have the unwind_mask get cleared on exit to user space and not
after the callback is made.
Move the pending bit from a value on the task_struct to bit zero of the
unwind_mask (saves space on the task_struct). This will allow modifying
the pending bit along with the work bits atomically.
Instead of clearing a work's bit after its callback is called, it is
delayed until exit. If the work is requested again, the task_work is not
queued again and the request will be notified that the task has already been
called by returning a positive number (the same as if it was already
pending).
The pending bit is cleared before calling the callback functions but the
current work bits remain. If one of the called works registers again, it
will not trigger a task_work if its bit is still present in the task's
unwind_mask.
If a new work requests a deferred unwind, then it will set both the
pending bit and its own bit. Note this will also cause any work that was
previously queued and had their callback already executed to be executed
again. Future work will remove these spurious callbacks.
The use of atomic_long bit operations were suggested by Peter Zijlstra: Link: https://lore.kernel.org/all/20250715102912.GQ1613200@noisy.programming.kicks-ass.net/
The unwind_mask could not be converted to atomic_long_t do to atomic_long
not having all the bit operations needed by unwind_mask. Instead it
follows other use cases in the kernel and just typecasts the unwind_mask
to atomic_long_t when using the two atomic_long functions.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.822789300@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:09 +0000 (14:23 -0400)]
unwind_user/deferred: Make unwind deferral requests NMI-safe
Make unwind_deferred_request() NMI-safe so tracers in NMI context can
call it and safely request a user space stacktrace when the task exits.
Note, this is only allowed for architectures that implement a safe
cmpxchg. If an architecture requests a deferred stack trace from NMI
context that does not support a safe NMI cmpxchg, it will get an -EINVAL
and trigger a warning. For those architectures, they would need another
method (perhaps an irqwork), to request a deferred user space stack trace.
That can be dealt with later if one of theses architectures require this
feature.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.657072238@kernel.org Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add an interface for scheduling task work to unwind the user space stack
before returning to user space. This solves several problems for its
callers:
- Ensure the unwind happens in task context even if the caller may be
running in interrupt context.
- Avoid duplicate unwinds, whether called multiple times by the same
caller or by different callers.
- Create a "context cookie" which allows trace post-processing to
correlate kernel unwinds/traces with the user unwind.
A concept of a "cookie" is created to detect when the stacktrace is the
same. A cookie is generated the first time a user space stacktrace is
requested after the task enters the kernel. As the stacktrace is saved on
the task_struct while the task is in the kernel, if another request comes
in, if the cookie is still the same, it will use the saved stacktrace,
and not have to regenerate one.
The cookie is passed to the caller on request, and when the stacktrace is
generated upon returning to user space, it calls the requester's callback
with the cookie as well as the stacktrace. The cookie is cleared
when it goes back to user space. Note, this currently adds another
conditional to the unwind_reset_info() path that is always called
returning to user space, but future changes will put this back to a single
conditional.
A global list is created and protected by a global mutex that holds
tracers that register with the unwind infrastructure. The number of
registered tracers will be limited in future changes. Each perf program or
ftrace instance will register its own descriptor to use for deferred
unwind stack traces.
Note, in the function unwind_deferred_task_work() that gets called when
returning to user space, it uses a global mutex for synchronization which
will cause a big bottleneck. This will be replaced by SRCU, but that
change adds some complex synchronization that deservers its own commit.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.488066537@kernel.org Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cache the results of the unwind to ensure the unwind is only performed
once, even when called by multiple tracers.
The cache nr_entries gets cleared every time the task exits the kernel.
When a stacktrace is requested, nr_entries gets set to the number of
entries in the stacktrace. If another stacktrace is requested, if
nr_entries is not zero, then it contains the same stacktrace that would be
retrieved so it is not processed again and the entries is given to the
caller.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.319691167@kernel.org Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-By: Indu Bhagat <indu.bhagat@oracle.com> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"Highlights:
- Intel xe enable Panthor Lake, started adding WildCat Lake
- amdgpu has a bunch of reset improvments along with the usual IP
updates
- msm got VM_BIND support which is important for vulkan sparse memory
- more drm_panic users
- gpusvm common code to handle a bunch of core SVM work outside
drivers.
Detail summary:
Changes outside drm subdirectory:
- 'shrink_shmem_memory()' for better shmem/hibernate interaction
- Rust support infrastructure:
- make ETIMEDOUT available
- add size constants up to SZ_2G
- add DMA coherent allocation bindings
- mtd driver for Intel GPU non-volatile storage
- i2c designware quirk for Intel xe
core:
- atomic helpers: tune enable/disable sequences
- add task info to wedge API
- refactor EDID quirks
- connector: move HDR sink to drm_display_info
- fourcc: half-float and 32-bit float formats
- mode_config: pass format info to simplify
dma-buf:
- heaps: Give CMA heap a stable name
ci:
- add device tree validation and kunit
displayport:
- change AUX DPCD access probe address
- add quirk for DPCD probe
- add panel replay definitions
- backlight control helpers
fbdev:
- make CONFIG_FIRMWARE_EDID available on all arches
fence:
- fix UAF issues
format-helper:
- improve tests
gpusvm:
- introduce devmem only flag for allocation
- add timeslicing support to GPU SVM
panel:
- switch to reference counter drm_panel allocations
- fwnode panel lookup
- Huiling hl055fhv028c support
- Raspberry Pi 7" 720x1280 support
- edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
- simple: AUO P238HAN01
- st7701: Winstar wf40eswaa6mnn0
- visionox: rm69299-shift
- Renesas R61307, Renesas R69328 support
- DJN HX83112B
hdmi:
- add CEC handling
- YUV420 output support
xe:
- WildCat Lake support
- Enable PanthorLake by default
- mark BMG as SRIOV capable
- update firmware recommendations
- Expose media OA units
- aux-bux support for non-volatile memory
- MTD intel-dg driver for non-volatile memory
- Expose fan control and voltage regulator in sysfs
- restructure migration for multi-device
- Restore GuC submit UAF fix
- make GEM shrinker drm managed
- SRIOV VF Post-migration recovery of GGTT nodes
- W/A additions/reworks
- Prefetch support for svm ranges
- Don't allocate managed BO for each policy change
- HWMON fixes for BMG
- Create LRC BO without VM
- PCI ID updates
- make SLPC debugfs files optional
- rework eviction rejection of bound external BOs
- consolidate PAT programming logic for pre/post Xe2
- init changes for flicker-free boot
- Enable GuC Dynamic Inhibit Context switch
i915:
- drm_panic support for i915/xe
- initial flip queue off by default for LNL/PNL
- Wildcat Lake Display support
- Support for DSC fractional link bpp
- Support for simultaneous Panel Replay and Adaptive sync
- Support for PTL+ double buffer LUT
- initial PIPEDMC event handling
- drm_panel_follower support
- DPLL interface renames
- allocate struct intel_display dynamically
- flip queue preperation
- abstract DRAM detection better
- avoid GuC scheduling stalls
- remove DG1 force probe requirement
- fix MEI interrupt handler on RT kernels
- use backlight control helpers for eDP
- more shared display code refactoring
amdgpu:
- add userq slot to INFO ioctl
- SR-IOV hibernation support
- Suspend improvements
- Backlight improvements
- Use scaling for non-native eDP modes
- cleaner shader updates for GC 9.x
- Remove fence slab
- SDMA fw checks for userq support
- RAS updates
- DMCUB updates
- DP tunneling fixes
- Display idle D3 support
- Per queue reset improvements
- initial smartmux support
amdkfd:
- enable KFD on loongarch
- mtype fix for ext coherent system memory
radeon:
- CS validation additional GL extensions
- drop console lock during suspend/resume
- bump driver version
msm:
- VM BIND support
- CI: infrastructure updates
- UBWC single source of truth
- decouple GPU and KMS support
- DP: rework I/O accessors
- DPU: SM8750 support
- DSI: SM8750 support
- GPU: X1-45 support and speedbin support for X1-85
- MDSS: SM8750 support
nova:
- register! macro improvements
- DMA object abstraction
- VBIOS parser + fwsec lookup
- sysmem flush page support
- falcon: generic falcon boot code and HAL
- FWSEC-FRTS: fb setup and load/execute
panfrost:
- MT8370 support
- bo labeling
- 64-bit register access
qaic:
- add RAS support
rockchip:
- convert inno_hdmi to a bridge
rz-du:
- add RZ/V2H(P) support
- MIPI-DSI DCS support
sitronix:
- ST7567 support
sun4i:
- add H616 support
tidss:
- add TI AM62L support
- AM65x OLDI bridge support
bochs:
- drm panic support
vkms:
- YUV and R* format support
- use faux device
vmwgfx:
- fence improvements
hyperv:
- move out of simple
- add drm_panic support"
* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
drm/tidss: encoder: convert to devm_drm_bridge_alloc()
drm/amdgpu: move reset support type checks into the caller
drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
drm/amdgpu: Add WARN_ON to the resource clear function
drm/amd/pm: Use cached metrics data on SMUv13.0.6
drm/amd/pm: Use cached data for min/max clocks
gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
drm/amdgpu: Replace HQD terminology with slots naming
drm/amdgpu: Add user queue instance count in HW IP info
drm/amd/amdgpu: Add helper functions for isp buffers
drm/amd/amdgpu: Initialize swnode for ISP MFD device
...
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on
hardware that previously advertised it unconditionally
- Map supporting endpoints with cacheable memory attributes on
systems with FEAT_S2FWB and DIC where KVM no longer needs to
perform cache maintenance on the address range
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
guest hypervisor to inject external aborts into an L2 VM and take
traps of masked external aborts to the hypervisor
- Convert more system register sanitization to the config-driven
implementation
- Fixes to the visibility of EL2 registers, namely making VGICv3
system registers accessible through the VGIC device instead of the
ONE_REG vCPU ioctls
- Various cleanups and minor fixes
LoongArch:
- Add stat information for in-kernel irqchip
- Add tracepoints for CPUCFG and CSR emulation exits
- Enhance in-kernel irqchip emulation
- Various cleanups
RISC-V:
- Enable ring-based dirty memory tracking
- Improve perf kvm stat to report interrupt events
- Delegate illegal instruction trap to VS-mode
- MMU improvements related to upcoming nested virtualization
s390x
- Fixes
x86:
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
APIC, PIC, and PIT emulation at compile time
- Share device posted IRQ code between SVM and VMX and harden it
against bugs and runtime errors
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
O(1) instead of O(n)
- For MMIO stale data mitigation, track whether or not a vCPU has
access to (host) MMIO based on whether the page tables have MMIO
pfns mapped; using VFIO is prone to false negatives
- Rework the MSR interception code so that the SVM and VMX APIs are
more or less identical
- Recalculate all MSR intercepts from scratch on MSR filter changes,
instead of maintaining shadow bitmaps
- Advertise support for LKGS (Load Kernel GS base), a new instruction
that's loosely related to FRED, but is supported and enumerated
independently
- Fix a user-triggerable WARN that syzkaller found by setting the
vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
possible path leading to architecturally forbidden states is hard
and even risks breaking userspace (if it goes from valid to valid
state but passes through invalid states), so just wait until
KVM_RUN to detect that the vCPU state isn't allowed
- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
interception of APERF/MPERF reads, so that a "properly" configured
VM can access APERF/MPERF. This has many caveats (APERF/MPERF
cannot be zeroed on vCPU creation or saved/restored on suspend and
resume, or preserved over thread migration let alone VM migration)
but can be useful whenever you're interested in letting Linux
guests see the effective physical CPU frequency in /proc/cpuinfo
- Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
created, as there's no known use case for changing the default
frequency for other VM types and it goes counter to the very reason
why the ioctl was added to the vm file descriptor. And also, there
would be no way to make it work for confidential VMs with a
"secure" TSC, so kill two birds with one stone
- Dynamically allocation the shadow MMU's hashed page list, and defer
allocating the hashed list until it's actually needed (the TDP MMU
doesn't use the list)
- Extract many of KVM's helpers for accessing architectural local
APIC state to common x86 so that they can be shared by guest-side
code for Secure AVIC
- Various cleanups and fixes
x86 (Intel):
- Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
Failure to honor FREEZE_IN_SMM can leak host state into guests
- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
prevent L1 from running L2 with features that KVM doesn't support,
e.g. BTF
x86 (AMD):
- WARN and reject loading kvm-amd.ko instead of panicking the kernel
if the nested SVM MSRPM offsets tracker can't handle an MSR (which
is pretty much a static condition and therefore should never
happen, but still)
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code
- Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation
- Extend enable_ipiv module param support to SVM, by simply leaving
IsRunning clear in the vCPU's physical ID table entry
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected
by erratum #1235, to allow (safely) enabling AVIC on such CPUs
- Request GA Log interrupts if and only if the target vCPU is
blocking, i.e. only if KVM needs a notification in order to wake
the vCPU
- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
the vCPU's CPUID model
- Accept any SNP policy that is accepted by the firmware with respect
to SMT and single-socket restrictions. An incompatible policy
doesn't put the kernel at risk in any way, so there's no reason for
KVM to care
- Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
use WBNOINVD instead of WBINVD when possible for SEV cache
maintenance
- When reclaiming memory from an SEV guest, only do cache flushes on
CPUs that have ever run a vCPU for the guest, i.e. don't flush the
caches for CPUs that can't possibly have cache lines with dirty,
encrypted data
Generic:
- Rework irqbypass to track/match producers and consumers via an
xarray instead of a linked list. Using a linked list leads to
O(n^2) insertion times, which is hugely problematic for use cases
that create large numbers of VMs. Such use cases typically don't
actually use irqbypass, but eliminating the pointless registration
is a future problem to solve as it likely requires new uAPI
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
"void *", to avoid making a simple concept unnecessarily difficult
to understand
- Decouple device posted IRQs from VFIO device assignment, as binding
a VM to a VFIO group is not a requirement for enabling device
posted IRQs
- Clean up and document/comment the irqfd assignment code
- Disallow binding multiple irqfds to an eventfd with a priority
waiter, i.e. ensure an eventfd is bound to at most one irqfd
through the entire host, and add a selftest to verify eventfd:irqfd
bindings are globally unique
- Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
related to private <=> shared memory conversions
- Drop guest_memfd's .getattr() implementation as the VFS layer will
call generic_fillattr() if inode_operations.getattr is NULL
- Fix issues with dirty ring harvesting where KVM doesn't bound the
processing of entries in any way, which allows userspace to keep
KVM in a tight loop indefinitely
- Kill off kvm_arch_{start,end}_assignment() and x86's associated
tracking, now that KVM no longer uses assigned_device_count as a
heuristic for either irqbypass usage or MDS mitigation
Selftests:
- Fix a comment typo
- Verify KVM is loaded when getting any KVM module param so that
attempting to run a selftest without kvm.ko loaded results in a
SKIP message about KVM not being loaded/enabled (versus some random
parameter not existing)
- Skip tests that hit EACCES when attempting to access a file, and
print a "Root required?" help message. In most cases, the test just
needs to be run with elevated permissions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
Documentation: KVM: Use unordered list for pre-init VGIC registers
RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
RISC-V: perf/kvm: Add reporting of interrupt events
RISC-V: KVM: Enable ring-based dirty memory tracking
RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
RISC-V: KVM: Delegate illegal instruction fault to VS mode
RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
RISC-V: KVM: Factor-out g-stage page table management
RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
RISC-V: KVM: Introduce struct kvm_gstage_mapping
RISC-V: KVM: Factor-out MMU related declarations into separate headers
RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
RISC-V: KVM: Don't flush TLB when PTE is unchanged
RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
...
Merge tag 'for-linus-6.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- fix for a UAF in the xen gntdev-dmabuf driver
- fix in the xen netfront driver avoiding spurious interrupts
- fix in the gntdev driver avoiding a large stack allocation
- cleanup removing some dead code
- build warning fix
- cleanup of the sysfs code in the xen-pciback driver
* tag 'for-linus-6.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/netfront: Fix TX response spurious interrupts
xen/gntdev: remove struct gntdev_copy_batch from stack
xen: fix UAF in dmabuf_exp_from_pages()
xen: Remove some deadcode (x)
xen-pciback: Replace scnprintf() with sysfs_emit_at()
xen/xenbus: fix W=1 build warning in xenbus_va_dev_error function
Merge tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracepoint cleanup from Steven Rostedt:
"Remove or hide unused tracepoints
Tracepoints take up memory (around 5K per tracepoint) even when they
are unused. Changes are being made to detect when a tracepoint is
defined but unused and a warning is shown at build. But those changes
are not yet ready for inclusion.
- Fix some of the unused tracepoints that it detected
Some tracepoints were removed and others were hidden by config
settings to match the config settings of where they are
instantiated. Some tracepoints were moved into architecture
specific code as only one architecture used them.
- Call the ftrace_test_filter tracepoint in an unreachable if
statement
The ftrace_test_filter tracepoint which is defined when ftrace
selftests are configured and is used to test the filter logic, but
the tracepoint is not actually called. It is put into an if
statement to not have it get compiled out, but also not warn for
not being used"
* tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING
powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64
tracing: Call trace_ftrace_test_filter() for the event
tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
binder: Remove unused binder lock events
PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS
PM: tracing: Hide device_pm_callback events under PM_SLEEP
PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE
PM: cpufreq: powernv/tracing: Move powernv_throttle trace event
alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured
tracing, AER: Hide PCIe AER event when PCIEAER is not configured
Merge tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull runtime verification updates from Steven Rostedt:
- Added Linear temporal logic monitors for RT application
Real-time applications may have design flaws causing them to have
unexpected latency. For example, the applications may raise page
faults, or may be blocked trying to take a mutex without priority
inheritance.
However, while attempting to implement DA monitors for these
real-time rules, deterministic automaton is found to be inappropriate
as the specification language. The automaton is complicated, hard to
understand, and error-prone.
For these cases, linear temporal logic is found to be more suitable.
The LTL is more concise and intuitive.
- Make printk_deferred() public
The new monitors needed access to printk_deferred(). Make them
visible for the entire kernel.
- Add a vpanic() to allow for va_list to be passed to panic.
- Add rtapp container monitor.
A collection of monitors that check for common problems with
real-time applications that cause unexpected latency.
- Add page fault tracepoints to risc-v
These tracepoints are necessary to for the RV monitor to run on
risc-v.
- Fix the behaviour of the rv tool with -s and idle tasks.
- Allow the rv tool to gracefully terminate with SIGTERM
- Adjusts dot2c not to create lines over 100 columns
- Properly order nested monitors in the RV Kconfig file
- Return the registration error in all DA monitor instead of 0
- Update and add new sched collection monitors
Replace tss and sncid monitors with more complete sts:
Not only prove that switches occur in scheduling context and scheduling
needs interrupt disabled but also that each call to the scheduler
disables interrupts to (optionally) switch.
New monitor: nrp
Preemption requires need resched which is cleared by any switch
(includes a non optimal workaround for /nested/ preemptions)
New monitor: sssw
suspension requires setting the task to sleepable and, after the
switch occurs, the task requires a wakeup to come back to runnable
New monitor: opid
waking and need-resched operations occur with interrupts and
preemption disabled or in IRQ without explicitly disabling
preemption"
* tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (48 commits)
rv: Add opid per-cpu monitor
rv: Add nrp and sssw per-task monitors
rv: Replace tss and sncid monitors with more complete sts
sched: Adapt sched tracepoints for RV task model
rv: Retry when da monitor detects race conditions
rv: Adjust monitor dependencies
rv: Use strings in da monitors tracepoints
rv: Remove trailing whitespace from tracepoint string
rv: Add da_handle_start_run_event_ to per-task monitors
rv: Fix wrong type cast in reactors_show() and monitor_reactor_show()
rv: Fix wrong type cast in monitors_show()
rv: Remove struct rv_monitor::reacting
rv: Remove rv_reactor's reference counter
rv: Merge struct rv_reactor_def into struct rv_reactor
rv: Merge struct rv_monitor_def into struct rv_monitor
rv: Remove unused field in struct rv_monitor_def
rv: Return init error when registering monitors
verification/rvgen: Organise Kconfig entries for nested monitors
tools/dot2c: Fix generated files going over 100 column limit
tools/rv: Stop gracefully also on SIGTERM
...
Merge tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
- Rewind persistent ring buffer on boot
When the persistent ring buffer is being used for live kernel tracing
and the system crashes, the tool that is reading the trace may not
have recorded the data when the system crashed.
Although the persistent ring buffer still has that data, when reading
it after a reboot, it will start where it left off. That is, what was
read will not be accessible.
Instead, on reboot, have the persistent ring buffer restart where the
data starts and this will allow the tooling to recover what was lost
when the crash occurred.
- Remove the ring_buffer_read_prepare_sync() logic
Reading the trace file required stopping writing to the ring buffer
as the trace file is only an iterator and does not consume what it
read. It was originally not safe to read the ring buffer in this mode
and required disabling writing. The ring_buffer_read_prepare_sync()
logic was used to stop each per_cpu ring buffer, call
synchronize_rcu() and then start the iterator. This was used instead
of calling synchronize_rcu() for each per_cpu buffer.
Today, the iterator has been updated where it is safe to read the
trace file while writing to the ring buffer is still occurring. There
is no more need to do this synchronization and it is causing large
delays on machines with many CPUs. Remove this unneeded
synchronization.
- Make static string array a constant in show_irq_str()
Making the string array into a constant has shown to decrease code
text/data size.
* tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Make the const read-only 'type' static
ring-buffer: Remove ring_buffer_read_prepare_sync()
tracing: ring_buffer: Rewind persistent ring buffer on reboot
Merge tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Keep track of when fgraph_ops are registered or not
Keep accounting of when fgraph_ops are registered as if a fgraph_ops
is registered twice it can mess up the accounting and it will not
work as expected later. Trigger a warning if something registers it
twice as to catch bugs before they are found by things just not
working as expected.
- Make DYNAMIC_FTRACE always enabled for architectures that support it
As static ftrace (where all functions are always traced) is very
expensive and only exists to help architectures support ftrace, do
not make it an option. As soon as an architecture supports
DYNAMIC_FTRACE make it use it. This simplifies the code.
The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it
redundant with the HAVE_DYNAMIC_FTRACE.
- Make pid_ptr string size match the comment
In print_graph_proc() the pid_ptr string is of size 11, but the
comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12.
* tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it
fgraph: Keep track of when fgraph_ops are registered or not
fgraph: Make pid_str size match the comment
The above sets the variable "PATCH_START" to HEAD~1 and the
OUTPUT_DIR option to "/work/build/urgent".
This is useful because currently the only way to make a slight change
to a config file is by modifying that config file. For one time
changes, this can be annoying. Having a way to do a one time override
from the command line simplifies the workflow.
Temp variables (PATCH_START) will override every temp variable in the
config file, whereas options will act like a normal OVERRIDE option
and will only affect the session they define.
-DBUILD_OUTPUT=/work/git/linux.git
Replaces the default BUILD_OUTPUT option.
'-DBUILD_OUTPUT[2]=/work/git/linux.git'
Only replaces the BUILD_OUTPUT variable for test #2.
- If an option contains itself, just drop it instead of going into an
infinite loop and failing to parse (it doesn't crash, it detects the
recursion after 100 iterations anyway).
Some configs may define a variable with the same name as the option:
ADD_CONFIG := $(ADD_CONFIG)
But if the option doesn't exist, it the above will fail to parse. In
these cases, just ignore evaluating the option inside the definition
of another option if it has the same name.
- Display the BUILD_DIR and OUTPUT_DIR options at the start of every
test
It is useful to know which kernel source and what destination a test
is using when it starts, in case a mistake is made. This makes it
easier to abort the test if the wrong source or destination is being
used instead of waiting until the test completes.
- Add new PATCHCHECK_SKIP option
When testing a series of commits that also includes changes to the
Linux tools directory, it is useless to test the changes in tools as
they may not affect the kernel itself. Doing tests on the kernel for
changes that do not affect the kernel is a waste of time.
Add a PATCHCHECK_SKIP that takes a series of shas that will be
skipped while doing the individual commit tests.
* tag 'ktest-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Add new PATCHCHECK_SKIP option to skip testing individual commits
ktest.pl: Always display BUILD_DIR and OUTPUT_DIR at the start of tests
ktest.pl: Prevent recursion of default variable options
ktest.pl: Have -D option work without a space
ktest.pl: Allow command option -D to override temp variables
ktest.pl: Add -D option to override options
Merge tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
"Stack usage reduction for probe events:
- Allocate string buffers from the heap for uprobe, eprobe, kprobe,
and fprobe events to avoid stack overflow
- Allocate traceprobe_parse_context from the heap to prevent
potential stack overflow
- Fix a typo in the above commit
New features for eprobe and tprobe events:
- Add support for arrays in eprobes
- Support multiple tprobes on the same tracepoint
Improve efficiency:
- Register fprobe-events only when it is enabled to reduce overhead
- Register tracepoints for tprobe events only when enabled to resolve
a lock dependency
Code Cleanup:
- Add kerneldoc for traceprobe_parse_event_name() and
__get_insn_slot()
- Sort #include alphabetically in the probes code
- Remove the unused 'mod' field from the tprobe-event
- Clean up the entry-arg storing code in probe-events
Selftest update
- Enable fprobe events before checking enable_functions in selftests"
* tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: trace_fprobe: Fix typo of the semicolon
tracing: Have eprobes handle arrays
tracing: probes: Add a kerneldoc for traceprobe_parse_event_name()
tracing: uprobe-event: Allocate string buffers from heap
tracing: eprobe-event: Allocate string buffers from heap
tracing: kprobe-event: Allocate string buffers from heap
tracing: fprobe-event: Allocate string buffers from heap
tracing: probe: Allocate traceprobe_parse_context from heap
tracing: probes: Sort #include alphabetically
kprobes: Add missing kerneldoc for __get_insn_slot
tracing: tprobe-events: Register tracepoint when enable tprobe event
selftests: tracing: Enable fprobe events before checking enable_functions
tracing: fprobe-events: Register fprobe-events only when it is enabled
tracing: tprobe-events: Support multiple tprobes on the same tracepoint
tracing: tprobe-events: Remove mod field from tprobe-event
tracing: probe-events: Cleanup entry-arg storing code
Merge tag 'probes-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
- Fix a potential infinite recursion in fprobe by using preempt_*_notrace()
* tag 'probes-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: fprobe: Fix infinite recursion using preempt_*_notrace()
Merge tag 'bootconfig-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:
- tools/bootconfig:
- Fix unaligned access when building footer to avoid SIGBUS
- Cleanup bootconfig footer size calculations
- test scripts:
- Fix to add shebang for a test script
- Improve script portability using portable commands
- Improve script portability using printf instead of echo
- Enclose regex with quotes for syntax highlighter
* tag 'bootconfig-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Fix unaligned access when building footer
tools/bootconfig: scripts/ftrace.sh was missing the shebang line, so added it
tools/bootconfig: Cleanup bootconfig footer size calculations
tools/bootconfig: Replace some echo with printf for more portability
tools/bootconfig: Improve portability
tools: bootconfig: Regex enclosed with quotes to make syntax highlight proper
Merge tag 'slab-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Convert struct slab to its own flags instead of referencing page
flags, which is another preparation step before separating it from
struct page completely.
Along with that, a bunch of documentation fixes and cleanups (Matthew
Wilcox)
- Convert large kmalloc to use frozen pages in order to be consistent
with non-large kmalloc slabs (Vlastimil Babka)
- MAINTAINERS updates (Matthew Wilcox, Lorenzo Stoakes)
- Restore NUMA policy support for large kmalloc, broken by mistake in
v6.1 (Vlastimil Babka)
* tag 'slab-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
MAINTAINERS: add missing files to slab section
slab: Update MAINTAINERS entry
memcg_slabinfo: Fix use of PG_slab
kfence: Remove mention of PG_slab
vmcoreinfo: Remove documentation of PG_slab and PG_hugetlb
doc: Add slab internal kernel-doc
slub: Fix a documentation build error for krealloc()
slab: Add SL_pfmemalloc flag
slab: Add SL_partial flag
slab: Rename slab->__page_flags to slab->flags
doc: Move SLUB documentation to the admin guide
mm, slab: use frozen pages for large kmalloc
mm, slab: restore NUMA policy support for large kmalloc
Merge tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
"Expedited grace period updates:
- Protect against early RCU exp quiescent state reporting during exp
grace period initialization
- Remove superfluous barrier in task unblock path
- Remove the CPU online quiescent state report optimization, which is
error prone for certain scenarios
- Add warning for unexpected pending requested expedited quiescent
state on dying CPU
Core:
- Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
indicators of the actual context tracking state of a CPU
- Handle ->defer_qs_iw_pending field data race
- Enable rcu_normal_wake_from_gp by default on systems with <= 16
CPUs
- Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls
- Refactor expedited handling condition in rcu_read_unlock_special()
- Documentation updates for hotplug and GP init scan ordering,
separation of rcu_state and rnp's gp_seq states, quiescent state
reporting for offline CPUs
torture-scripts:
- Cleanup and improve scripts : remove superfluous warnings for
disabled tests; better handling of kvm.sh --kconfig arg; suppress
some confusing diagnostics; tolerate bad kvm.sh args; add new
diagnostic for build output; fail allmodconfig testing on warnings
- Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels
- Disable default RCU-tasks and clocksource-wdog testing on arm64
- Add EXPERT Kconfig option for arm64 KCSAN runs
- Remove SRCU-lite testing
rcutorture:
- Start torture writer threads creation after reader threads to
handle race in SRCU-P scenario
- Add SRCU down_read()/up_read() test
- Add diagnostics for delayed SRCU up_read(), unmatched up_read(),
print number of up/down readers and the number of such readers
which migrated to other CPU
- Ignore certain unsupported configurations for trivial RCU test
- Fix splats in RT kernels due to inaccurate checks for BH-disabled
context
- Enable checks and logs to capture intentionally exercised
unexpected scenarios (too short readers) for BUSTED test
- Remove SRCU-lite testing
srcu:
- Expedite SRCU-fast grace periods
- Remove SRCU-lite implementation
- Add guards for SRCU-fast readers
rcu nocb:
- Dump NOCB group leader state on stall detection
- Robustify nocb_cb_kthread pointer accesses
- Fix delayed execution of hurry callbacks when LAZY_RCU is enabled
refscale:
- Fix multiplication overflow in "loops" and "nreaders" calculations"
* tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (49 commits)
rcu: Document concurrent quiescent state reporting for offline CPUs
rcu: Document separation of rcu_state and rnp's gp_seq
rcu: Document GP init vs hotplug-scan ordering requirements
srcu: Add guards for SRCU-fast readers
rcu: Fix delayed execution of hurry callbacks
rcu: Refactor expedited handling check in rcu_read_unlock_special()
checkpatch: Remove SRCU-lite deprecation
srcu: Remove SRCU-lite implementation
srcu: Expedite SRCU-fast grace periods
rcutorture: Remove support for SRCU-lite
rcutorture: Remove SRCU-lite scenarios
torture: Remove support for SRCU-lite
torture: Make torture.sh --allmodconfig testing fail on warnings
torture: Add "ERROR" diagnostic for testing kernel-build output
torture: Make torture.sh tolerate runs having bad kvm.sh arguments
torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs
torture: Extract testid.txt generation to separate script
torture: Suppress "find" diagnostics from torture.sh --do-none run
torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
...
Merge tag 'iommu-updates-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Will Deacon:
"Core:
- Remove the 'pgsize_bitmap' member from 'struct iommu_ops'
- Convert the x86 drivers over to msi_create_parent_irq_domain()
AMD-Vi:
- Add support for examining driver/device internals via debugfs
- Add support for "HATDis" to disable host translation when it is not
supported
- Add support for limiting the maximum host translation level based
on EFR[HATS]
Apple DART:
- Don't enable as built-in by default when ARCH_APPLE is selected
Arm SMMU:
- Devicetree bindings update for the Qualcomm SMMU in the "Milos" SoC
- Support for Qualcomm SM6115 MDSS parts
- Disable PRR on Qualcomm SM8250 as using these bits causes the
hypervisor to explode
Intel VT-d:
- Reorganize Intel VT-d to be ready for iommupt
- Optimize iotlb_sync_map for non-caching/non-RWBF modes
- Fix missed PASID in dev TLB invalidation in cache_tag_flush_all()
Mediatek:
- Fix build warnings when W=1
Samsung Exynos:
- Add support for reserved memory regions specified by the bootloader
TI OMAP:
- Use syscon_regmap_lookup_by_phandle_args() instead of parsing the
node manually
Misc:
- Cleanups and minor fixes across the board"
* tag 'iommu-updates-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits)
iommu/vt-d: Fix UAF on sva unbind with pending IOPFs
iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain
dt-bindings: arm-smmu: Remove sdm845-cheza specific entry
iommu/amd: Fix geometry.aperture_end for V2 tables
iommu/amd: Wrap debugfs ABI testing symbols snippets in literal code blocks
iommu/amd: Add documentation for AMD IOMMU debugfs support
iommu/amd: Add debugfs support to dump IRT Table
iommu/amd: Add debugfs support to dump device table
iommu/amd: Add support for device id user input
iommu/amd: Add debugfs support to dump IOMMU command buffer
iommu/amd: Add debugfs support to dump IOMMU Capability registers
iommu/amd: Add debugfs support to dump IOMMU MMIO registers
iommu/amd: Refactor AMD IOMMU debugfs initial setup
dt-bindings: arm-smmu: document the support on Milos
iommu/exynos: add support for reserved regions
iommu/arm-smmu: disable PRR on SM8250
iommu/arm-smmu-v3: Revert vmaster in the error path
iommu/io-pgtable-arm: Remove unused macro iopte_prot
iommu/arm-smmu-qcom: Add SM6115 MDSS compatible
iommu/qcom: Fix pgsize_bitmap
...
- Introduce Strongly Connected Component (SCC) in the verifier to
detect loops and refine register liveness (Eduard Zingerman)
- Allow 'void *' cast using bpf_rdonly_cast() and corresponding
'__arg_untrusted' for global function parameters (Eduard Zingerman)
- Improve precision for BPF_ADD and BPF_SUB operations in the verifier
(Harishankar Vishwanathan)
- Teach the verifier that constant pointer to a map cannot be NULL
(Ihor Solodrai)
- Introduce BPF streams for error reporting of various conditions
detected by BPF runtime (Kumar Kartikeya Dwivedi)
- Teach the verifier to insert runtime speculation barrier (lfence on
x86) to mitigate speculative execution instead of rejecting the
programs (Luis Gerhorst)
- Various improvements for 'veristat' (Mykyta Yatsenko)
- For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to
improve bug detection by syzbot (Paul Chaignon)
- Support BPF private stack on arm64 (Puranjay Mohan)
- Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's
node (Song Liu)
- Introduce kfuncs for read-only string opreations (Viktor Malik)
- Implement show_fdinfo() for bpf_links (Tao Chen)
- Implement mprog API for cgroup-bpf programs (Yonghong Song)
* tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits)
selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite
selftests/bpf: Add selftest for attaching tracing programs to functions in deny list
bpf: Add log for attaching tracing programs to functions in deny list
bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
bpf: Fix various typos in verifier.c comments
bpf: Add third round of bounds deduction
selftests/bpf: Test invariants on JSLT crossing sign
selftests/bpf: Test cross-sign 64bits range refinement
selftests/bpf: Update reg_bound range refinement logic
bpf: Improve bounds when s64 crosses sign boundary
bpf: Simplify bounds refinement from s32
selftests/bpf: Enable private stack tests for arm64
bpf, arm64: JIT support for private stack
bpf: Move bpf_jit_get_prog_name() to core.c
bpf, arm64: Fix fp initialization for exception boundary
umd: Remove usermode driver framework
bpf/preload: Don't select USERMODE_DRIVER
selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
selftests/bpf: Increase xdp data size for arm64 64K page size
...
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
Steven Rostedt [Wed, 30 Jul 2025 14:07:55 +0000 (10:07 -0400)]
Documentation: tracing: Add documentation about eprobes
Eprobes was added back in 5.15, but was never documented. It became a
"secret" interface even though it has been a topic of several
presentations. For some reason, when eprobes was added, documenting it
never became a priority, until now.
Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/20250730140945.528135548@kernel.org Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Wed, 30 Jul 2025 14:07:54 +0000 (10:07 -0400)]
tracing: Have eprobes have their own config option
Eprobes were added in 5.15 and were selected whenever any of the other
probe events were selected. If kprobe events were enabled (which it is by
default if kprobes are enabled) it would enable eprobe events as well. The
same for uprobes and fprobes.
Have eprobes have its own config and it gets enabled by default if tracing
is enabled.
Link: https://lore.kernel.org/all/20250729102636.b7cce553e7cc263722b12365@kernel.org/ Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/20250730140945.360286733@kernel.org Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Miquel Raynal [Wed, 18 Jun 2025 12:14:25 +0000 (14:14 +0200)]
mtd: spinand: winbond: Add comment about the maximum frequency
Clarify that Winbond octal capable chips may be clocked at up to 166MHz,
which is their absolute maximum.
No per-operation maximum value (captured with a "0" in the table)
involves that in these cases the maximum frequency of the chip applies,
ie. the one commonly described in the DT.
Miquel Raynal [Wed, 18 Jun 2025 12:14:23 +0000 (14:14 +0200)]
mtd: spinand: winbond: Enable high-speed modes on w25n0xjw
w25n0xjw chips have a high-speed capability hidden in a configuration
register. Once enabled, dual/quad SDR reads may be performed at a much
higher frequency.
Implement the new ->configure_chip() hook for this purpose and configure
the SR4 register accordingly.
Miquel Raynal [Wed, 18 Jun 2025 12:14:22 +0000 (14:14 +0200)]
mtd: spinand: Add a ->configure_chip() hook
There is already a manufacturer hook, which is manufacturer specific but
not chip specific. We no longer have access to the actual NAND identity
at this stage so let's add a per-chip configuration hook to align the
chip configuration (if any) with the core's setting.
Miquel Raynal [Wed, 18 Jun 2025 12:14:21 +0000 (14:14 +0200)]
mtd: spinand: Add a frequency field to all READ_FROM_CACHE variants
These macros had initially no frequency field. When I added the "maximum
operation frequency" field, I did it initially on very common macros and
I decided to add an optional field for that (with VA_ARGS) in order to
prevent massively unreadable changes. I then added new variants in the
spinand.h header, and requested a frequency field for them by
default. Some times later, I also added maximum frequencies to other
existing variants, but I did it incorrectly, without noticing I was
wrong because the field was optional.
This mix is error prone, so let's do what I should have done since the
very beginning: add a frequency field to all READ_FROM_CACHE variants.
Miquel Raynal [Wed, 4 Jun 2025 13:52:19 +0000 (15:52 +0200)]
spi: spi-mem: Take into account the actual maximum frequency
In order to pick the best variant, the duration of each typical
operation is derived and then compared. These durations are based on the
maximum capabilities of the chips, which are commonly the limiting
factors. However there are other possible limiting pieces, such as the
hardware layout, EMC considerations and in some cases, the SPI controller
itself.
We need to take this into account to further refine our variant choice,
so let's use the actual frequency that will be used for the operation
instead of the theoretical maximum.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Mark Brown <broonie@kernel.org>
Miquel Raynal [Wed, 18 Jun 2025 12:14:18 +0000 (14:14 +0200)]
spi: spi-mem: Use picoseconds for calculating the op durations
spi_mem_calc_op_duration() is deriving the duration of a specific op, by
multiplying the number of cycles with the time a cycle will last. This
time was measured in nanoseconds, which means at high frequencies the
delta between two frequencies might not be properly catch due to
roundings.
For instance, the Winbond driver has a changing number of dummy cycles
depending on the speed, adding +8 dummy cycles when running at 166MHz
compared to 162MHz.
Both frequencies would lead to using a 6ns delay per cycle for the op
duration computation, whereas in practice there is a small difference
which actually offsets the number of extra dummy cycles on a normal page
read.
Augmenting the precision of the calculation by using picoseconds
prevents selecting a lower frequency if we can do slightly better with
another frequency involving more cycles. As a result, the above
situation leads to comparing cycles of 6024 and 6172 picoseconds which
leads to picking the most efficient variant.
Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
mtd: spinand: propagate spinand_wait() errors from spinand_write_page()
Since commit 3d1f08b032dc ("mtd: spinand: Use the external ECC engine
logic") the spinand_write_page() function ignores the errors returned
by spinand_wait(). Change the code to propagate those up to the stack
as it was done before the offending change.
Cc: stable@vger.kernel.org Fixes: 3d1f08b032dc ("mtd: spinand: Use the external ECC engine logic") Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>