]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
3 weeks agotimekeeping: Remove ktime_get_snapshot()
Thomas Gleixner [Fri, 29 May 2026 20:00:56 +0000 (22:00 +0200)] 
timekeeping: Remove ktime_get_snapshot()

All users have been converted to ktime_get_snapshot_id().

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.795510496@kernel.org
3 weeks agovirtio_rtc: Use provided clock ID for history snapshot
Thomas Gleixner [Fri, 29 May 2026 20:00:52 +0000 (22:00 +0200)] 
virtio_rtc: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which the system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Use ktime_get_snapshot_id() and hand the provided clock ID in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20260529195557.744271454@kernel.org
3 weeks agonet/mlx5: Use provided clock ID for history snapshot
Thomas Gleixner [Fri, 29 May 2026 20:00:48 +0000 (22:00 +0200)] 
net/mlx5: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which the system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Use ktime_get_snapshot_id() and hand the provided clock ID in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.689836531@kernel.org
3 weeks agoigc: Use provided clock ID for history snapshot
Thomas Gleixner [Fri, 29 May 2026 20:00:44 +0000 (22:00 +0200)] 
igc: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which the system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Save the provided clock ID and use it in igc_phc_get_syncdevicetime() for
taking the history snapshot.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.637381831@kernel.org
3 weeks agoice/ptp: Use provided clock ID for history snapshot
Thomas Gleixner [Fri, 29 May 2026 20:00:40 +0000 (22:00 +0200)] 
ice/ptp: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which then system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Save the provided clock ID and use it in ice_capture_crosststamp() for
taking the history snapshot.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.587226681@kernel.org
3 weeks agowifi: iwlwifi: Adopt PTP cross timestamps to core changes
Thomas Gleixner [Fri, 29 May 2026 20:00:36 +0000 (22:00 +0200)] 
wifi: iwlwifi: Adopt PTP cross timestamps to core changes

iwlwifi only supports CLOCK_REALTIME timestamps and provides an incomplete
result without system counter values etc.

It also zeros struct system_device_crosststamp, which is already zeroed in
the core and initialized with the clock ID.

Remove the zeroing and reject any request for a clock ID other than REALTIME.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.535447186@kernel.org
3 weeks agotimekeeping: Add CLOCK ID to system_device_crosststamp
Thomas Gleixner [Fri, 29 May 2026 20:00:32 +0000 (22:00 +0200)] 
timekeeping: Add CLOCK ID to system_device_crosststamp

The normal capture for system/device cross timestamps is CLOCK_REALTIME,
but that's meaningless for AUX clocks.

Add a clock_id field to struct system_device_crosststamp and initialize it
with CLOCK_REALTIME at the two places which prepare for cross
timestamps.

After the related code has been cleaned up, the core code will honor the
clock_id field when calculating the system time from the system counter
snapshot.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.482153523@kernel.org
3 weeks agotimekeeping: Add system_counterval_t to struct system_device_crosststamp
Thomas Gleixner [Fri, 29 May 2026 20:00:28 +0000 (22:00 +0200)] 
timekeeping: Add system_counterval_t to struct system_device_crosststamp

An upcoming extension to the PTP IOCTL requires to return the system counter
value and the clocksource ID to user space. get_device_system_crosststamp() has
this information already.

Extend struct system_device_crosststamp with a system_counterval_t member
and fill in the data.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.429406675@kernel.org
3 weeks agotimekeeping: Add CLOCK_AUX support for ktime_get_snapshot_id()
Thomas Gleixner [Fri, 29 May 2026 20:00:24 +0000 (22:00 +0200)] 
timekeeping: Add CLOCK_AUX support for ktime_get_snapshot_id()

Now that all users are converted it's possible to enable snapshotting of
CLOCK_AUX time. The underlying clocksource is the same as for all other
CLOCK variants.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.380601005@kernel.org
3 weeks agotimekeeping: Remove system_time_snapshot::real/boot/raw
Thomas Gleixner [Fri, 29 May 2026 20:00:20 +0000 (22:00 +0200)] 
timekeeping: Remove system_time_snapshot::real/boot/raw

All users are converted over to ktime_get_snapshot_id() and
system_time_snapshot::systime and ::monoraw.

Remove the leftovers.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.330029635@kernel.org
3 weeks agoptp: ptp_vmclock: Convert to ktime_get_snapshot_id()
Thomas Gleixner [Fri, 29 May 2026 20:00:16 +0000 (22:00 +0200)] 
ptp: ptp_vmclock: Convert to ktime_get_snapshot_id()

ktime_get_snapshot() is replaced by ktime_get_snapshot_id() which allows to
request a particular CLOCK ID to be captured along with the clocksource
counter.

Convert vmclock over and use the new system_time_snapshot::systime field,
which holds the system timestamp selected by the CLOCK ID argument.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.281425262@kernel.org
3 weeks agoKVM: arm64: Use ktime_get_snapshot_id() to snapshot CLOCK_REALTIME
Thomas Gleixner [Fri, 29 May 2026 20:00:12 +0000 (22:00 +0200)] 
KVM: arm64: Use ktime_get_snapshot_id() to snapshot CLOCK_REALTIME

ktime_get_snapshot() is replaced by ktime_get_snapshot_id() which allows to
request a particular CLOCK ID to be captured along with the clocksource
counter.

Convert the usage in kvm_get_ptp_time() over and use the new
system_time_snapshot::systime field, which holds the system timestamp
selected by the CLOCK ID argument.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20260529195557.225399927@kernel.org
3 weeks agoKVM: arm64: Use ktime_get_snapshot_id() to retrieve CLOCK_BOOTTIME
Thomas Gleixner [Fri, 29 May 2026 20:00:08 +0000 (22:00 +0200)] 
KVM: arm64: Use ktime_get_snapshot_id() to retrieve CLOCK_BOOTTIME

ktime_get_snapshot() is replaced by ktime_get_snapshot_id() which allows to
request a particular CLOCK ID to be captured along with the clocksource
counter.

Convert the tracing mechanism over and use the new
system_time_snapshot::systime field, which holds the system timestamp
selected by the CLOCK ID argument.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260529195557.174373054@kernel.org
3 weeks agotracing: Fix CFI violation in probestub being called by tprobes
Eva Kurchatova [Wed, 3 Jun 2026 15:31:42 +0000 (18:31 +0300)] 
tracing: Fix CFI violation in probestub being called by tprobes

The probestub is a function to allow tprobes to hook to a tracepoint to
gain access to its parameters. The function itself is only referenced by
the tracepoint structure which lives in the __tracepoint section. objtool
explicitly ignores that section and when processing functions in the
kernel, if it detects one that has no references it will seal it to have
its ENDBR stripped on boot up.

This means when a tprobe is attached to the sched_wakeup tracepoint, when it
is triggered it will call __probestub_sched_wakeup and due to the missing
ENDBR on a CFI-enabled machine it will take a #CP exception.

Fix this by adding CFI_NOSEAL annotation to probestub declaration.

Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260603153147.573589-1-eva.kurchatova@virtuozzo.com
Fixes: d5173f753750 ("objtool: Exclude __tracepoints data from ENDBR checks")
Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com>
[ Updated change log ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3 weeks agonvme: export controller reconnect event count via sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:55 +0000 (00:06 +0530)] 
nvme: export controller reconnect event count via sysfs

When an NVMe-oF link goes down, the driver attempts to recover the
connection by repeatedly reconnecting to the remote controller at
configured intervals. A maximum number of reconnect attempts is also
configured, after which recovery stops and the controller is removed
if the connection cannot be re-established.

The driver maintains a counter, nr_reconnects, which is incremented on
each reconnect attempt. However if in case the reconnect is successful
then this counter reset to zero. Moreover, currently, this counter is
only reported via kernel log messages and is not exposed to userspace.
Since dmesg is a circular buffer, this information may be lost over
time.

So introduce a new accumulator which accumulates nr_reconnect attempts
and also expose this accumulator per-fabric ctrl via a new sysfs
attribute reconnect_count, under diag attribute grroup to provide
persistent visibility into the number of reconnect attempts made by the
host. This information can help users diagnose unstable links or
connectivity issues. Furthermore, this sysfs attribute is also writable
so user may reset it to zero, if needed.

The reconnect_count can also be consumed by monitoring tools such as
nvme-top to improve controller-level observability.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agonvme: export controller reset event count via sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:54 +0000 (00:06 +0530)] 
nvme: export controller reset event count via sysfs

The NVMe controller transitions into the RESETTING state during error
recovery, link instability, firmware activation, or when a reset is
explicitly triggered by the user.

Expose a per-ctrl sysfs attribute reset_count, under diag attribute
group to provide visibility into these RESETTING state transitions.
Observing the frequency of reset events can help users identify issues
such as PCIe errors or unstable fabric links. This counter is also
writable thus allowing user to reset its value, if needed.

This counter can also be consumed by monitoring tools such as nvme-top
to improve controller-level observability.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agonvme: export I/O failure count when no path is available via sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:53 +0000 (00:06 +0530)] 
nvme: export I/O failure count when no path is available via sysfs

When I/O is submitted to the NVMe namespace head and no available path
can handle the request, the driver fails the I/O immediately. Currently,
such failures are only reported via kernel log messages, which may be
lost over time since dmesg is a circular buffer.

Add a new ns-head sysfs counter io_fail_no_available_path_count, under
diag attribute group to expose the number of I/Os that failed due to the
absence of an available path. This provides persistent visibility into
path-related I/O failures and can help users diagnose the cause of I/O
errors. This counter is also writable and so user may reset its value,
if needed.

This counter can also be consumed by monitoring tools such as nvme-top.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agonvme: export I/O requeue count when no path is usable via sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:52 +0000 (00:06 +0530)] 
nvme: export I/O requeue count when no path is usable via sysfs

When the NVMe namespace head determines that there is no currently
available path to handle I/O (for example, while a controller is
resetting/connecting or due to a transient link failure), incoming
I/Os are added to the requeue list.

Currently, there is no visibility into how many I/Os have been requeued
in this situation. Add a new ns-head sysfs counter
io_requeue_no_usable_path_count, under diag attribute group to expose
the number of I/Os that were requeued due to the absence of an available
path. This counter is also writable thus allowing user to reset it, if
needed.

This statistic can help users understand I/O slowdowns or stalls caused
by temporary path unavailability, and can be consumed by monitoring
tools such as nvme-top for real-time observability.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agonvme: export command error counters via sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:51 +0000 (00:06 +0530)] 
nvme: export command error counters via sysfs

When an NVMe command completes with an error status, the driver
logs the error to the kernel log. However, these messages may be
lost or overwritten over time since dmesg is a circular buffer.

Expose per-path and ctrl sysfs attribute command_error_count, under
diag attribute group to provide persistent visibility into error
occurrences. This allows users to observe the total number of commands
that have failed on a given path over time, which can be useful for
diagnosing path health and stability.

This attribute is both readable and writable thus allowing user to reset
these counters. These counters can also be consumed by observability
tools such as nvme-top to provide additional insight into NVMe error
behavior.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agonvme: export multipath failover count via sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:50 +0000 (00:06 +0530)] 
nvme: export multipath failover count via sysfs

When an NVMe command completes with a path-specific error, the NVMe
driver may retry the command on an alternate controller or path if one
is available. These failover events indicate that I/O was redirected
away from the original path.

Currently, the number of times requests are failed over to another
available path is not visible to userspace. Exposing this information
can be useful for diagnosing path health and stability.

Export per-path sysfs attribute "multipath_failover_count" under diag
attribute group. This attribute is both readable and writable and thus
allowing user to reset the counter. This counter can be consumed by
monitoring tools such as nvme-top to help identify paths that
consistently trigger failovers under load.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agonvme: export command retry count via sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:49 +0000 (00:06 +0530)] 
nvme: export command retry count via sysfs

When Advanced Command Retry Enable (ACRE) is configured, a controller
may interrupt command execution and return a completion status
indicating command interrupted with the DNR bit cleared. In this case,
the driver retries the command based on the Command Retry Delay (CRD)
value provided in the completion status.

Currently, these command retries are handled entirely within the NVMe
driver and are not visible to userspace. As a result, there is no
observability into retry behavior, which can be a useful diagnostic
signal.

Expose a per-namespace sysfs attribute command_retries_count, under
diag attribute group to provide visibility into retry activity. This
information can help identify controller-side congestion under load
and enables comparison across paths in multipath setups (for example,
detecting cases where one path experiences significantly more retries
than another under identical workloads).

This exported metric is intended for diagnostics and monitoring tools
such as nvme-top, and does not change command retry behavior. A new
sysfs attribute named "command_retries_count" is added for this purpose.
This attribute is both readable as well as writable. So user could
reset this counter if needed.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agonvme: add diag attribute group under sysfs
Nilay Shroff [Sat, 16 May 2026 18:36:48 +0000 (00:06 +0530)] 
nvme: add diag attribute group under sysfs

Add a new diag attribute group under:
/sys/class/nvme/<ctrl>/
/sys/block/<nvme-path-dev>/
/sys/block/<ns-head-dev>/

This new sysfs attribute group will be used to organize NVMe diagnostic
and telemetry-related counters under it.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agocoresight: ultrasoc-smb: Fix OOB write in smb_sync_perf_buffer()
Junrui Luo [Thu, 4 Jun 2026 07:34:25 +0000 (15:34 +0800)] 
coresight: ultrasoc-smb: Fix OOB write in smb_sync_perf_buffer()

When the SMB sink is used as a perf AUX sink, smb_update_buffer() calls
smb_sync_perf_buffer() to copy hardware trace data into the perf AUX ring
buffer pages. It derives pg_idx = head >> PAGE_SHIFT from @head, which is
handle->head, and indexes dst_pages[pg_idx]. The pg_idx %= nr_pages
normalization is only applied after the first loop iteration.

This leaves the initial page index underived from the buffer size, which
can result in an out-of-bounds write past dst_pages[] when head exceeds
the AUX buffer size.

Normalize head modulo the AUX buffer size before deriving the page index
and offset, mirroring tmc_etr_sync_perf_buffer().

Fixes: 06f5c2926aaa ("drivers/coresight: Add UltraSoc System Memory Buffer driver")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/SYBPR01MB788156B3380A36835DB22290AF102@SYBPR01MB7881.ausprd01.prod.outlook.com
3 weeks agocpu: Add lockdep_is_cpus_held()/lockdep_is_cpus_write_held() stubs for !CONFIG_HOTPLU...
Reinette Chatre [Thu, 4 Jun 2026 03:38:47 +0000 (20:38 -0700)] 
cpu: Add lockdep_is_cpus_held()/lockdep_is_cpus_write_held() stubs for !CONFIG_HOTPLUG_CPU

lockdep_is_cpus_held() and lockdep_is_cpus_write_held() are undefined when
!CONFIG_HOTPLUG_CPU. This is ok because their few callers protect the calls
with a "if (IS_ENABLED(CONFIG_HOTPLUG_CPU) ..." check.

It is error prone to require callers to protect lockdep_is_cpus_held()
and lockdep_is_cpus_write_held() with an IS_ENABLED(CONFIG_HOTPLUG_CPU)
check while the custom for equivalent functions, for example the more
prevalent lockdep_is_held(), is to not require similar protection.
It is also inconsistent with CPU hotplug lockdep code self since related
call lockdep_assert_cpus_held() does not require protection.

Create stubs for lockdep_is_cpus_held() and lockdep_is_cpus_write_held()
that returns 1 (LOCK_STATE_UNKNOWN/LOCK_STATE_HELD) when !CONFIG_HOTPLUG_CPU.
This makes the CPU hotplug lockdep checks consistent while following
existing lockdep custom. Drop the "extern" from the function declaration
as part of the move to match kernel coding style.

Keep the IS_ENABLED(CONFIG_HOTPLUG_CPU) checks in existing users since
removing them would change the logic of these expressions.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/7484f0b58fd86153d445819cc4e172adba16cff9.1780543665.git.reinette.chatre@intel.com
Closes: https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=1
3 weeks agoMAINTAINERS: Add include/linux/cpuhplock.h to CPU HOTPLUG area
Reinette Chatre [Thu, 4 Jun 2026 03:38:46 +0000 (20:38 -0700)] 
MAINTAINERS: Add include/linux/cpuhplock.h to CPU HOTPLUG area

The move of CPU hotplug function declarations to include/linux/cpuhplock.h
did not update the CPU HOTPLUG section of MAINTAINERS. Update it now.

Fixes: 195fb517ee25 ("cpu: Move CPU hotplug function declarations into their own header")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/ef78e9d61d0ae041a1f44bd3cf8bb6e733b294d6.1780543665.git.reinette.chatre@intel.com
3 weeks agortla: Fix parsing of multi-character short options
Tomas Glozar [Tue, 2 Jun 2026 12:55:06 +0000 (14:55 +0200)] 
rtla: Fix parsing of multi-character short options

A bug was reported where the parsing of multi-character short options,
be it a short option with an argument specified without space (e.g.
"-p100") or multiple short options in one argument (e.g. -un), ignores
options specific to individual tools.

Furthermore, if the rest of the option is supposed to be an argument, it
gets reinterpreted as a string of options. For example, -p100 gets
interpreted as -100, which is due to hackish implementation read as
--no-thread --no-irq --no-irq with timerlat hist, causing rtla to error
out:

$ rtla timerlat hist -p100
no-irq and no-thread set, there is nothing to do here

This behavior is caused by getopt_long() being called twice on each
argument, once in common_parse_options(), once in [tool]_parse_args():

- common_parse_options() calls getopt_long() with an array of options
  common for all rtla tools, while suppressing errors (opterr = 0).
- If the option fails to parse, common_parse_options() returns 0.
- If 0 is returned from common_parse_options(), [tool]_parse_args()
  calls getopt_long() again, with its own set of options.

* [tool] means one of {osnoise,timerlat}_{top,hist}

At least in glibc, getopt_long() increments its internal nextchar
variable even if the option is not recognized. That means that in the
case of "-p100", common_parse_options() sets nextchar pointing to '1',
and timerlat_hist_parse_args() sees '1', not 'p'; the same then repeats
for the first and second '0'.

As there is no way to restore the correct internal state of
getopt_long() reliably, fix the issue by merging the common options back
to the longopt array and option string of the [tool]_parse_args()
functions using a macro; only the switch part is left in the original
function, which is renamed to set_common_option().

Fixes: 850cd24cb6d6 ("tools/rtla: Add common_parse_options()")
Reported-by: John Kacur <jkacur@redhat.com>
Tested-by: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/r/20260602125506.3325345-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
3 weeks agogeneve: fix length used in GRO hint UDP checksum adjustment
Antoine Tenart [Fri, 29 May 2026 14:47:00 +0000 (16:47 +0200)] 
geneve: fix length used in GRO hint UDP checksum adjustment

In geneve_post_decap_hint the length used for adjusting the UDP checksum
should be 'skb->len - gro_hint->nested_tp_offset' (UDP length) instead
of 'skb->len - gro_hint->nested_nh_offset' (IP length).

Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260521131436.748832-1-jhs%40mojatatu.com
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260529144713.780938-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge patch series "Remove b_end_io from struct buffer_head"
Christian Brauner [Thu, 4 Jun 2026 08:28:17 +0000 (10:28 +0200)] 
Merge patch series "Remove b_end_io from struct buffer_head"

Matthew Wilcox (Oracle) <willy@infradead.org> says:

There are four benefits to this patchset.  First, it removes an
indirect function call from the completion path.  Instead of setting
bio->bi_end_io to end_bio_bh_io_sync() which then calls bh->b_end_io(),
we set bio->bi_end_io to the appropriate completion handler, replacing
two indirect function calls with one.

Second, there is a slight security advantage to this.  It is one fewer
function pointer in the middle of a writable data structure that can
be corrupted.  Third, it shrinks struct buffer_head from 104 bytes to 96
bytes, allowing for appropriximately 7% reduction in the amount of memory
used by buffer_heads (or, alternatively, allows 7% more buffer_heads to
be cached in the same amount of memory).  Fourth, it removes some
atomic operations as the buffer refcount is no longer incremented before
calling the end_io handler.

I've run ext4 through its paces, and everything seems OK.  I've only
compiled ocfs2/gfs2/nilfs/md-bitmap.  Hopefully the maintainers can give
this series a try.  I'm sending the entire series to linux-fsdevel
and cc'ing the fs-specific mailing lists for the fs-specific patches.

* patches from https://patch.msgid.link/20260528173150.1093780-1-willy@infradead.org: (34 commits)
  buffer: Remove end_buffer_write_sync()
  buffer: Change calling convention for end_buffer_read_sync()
  buffer: Remove b_end_io
  buffer: Remove submit_bh()
  md-bitmap: Convert read_file_page and write_file_page to bh_submit()
  nilfs2: Convert nilfs_mdt_submit_block to bh_submit()
  nilfs2: Convert nilfs_gccache_submit_read_data to bh_submit()
  nilfs2: Convert nilfs_btnode_submit_block to bh_submit()
  buffer: Remove mark_buffer_async_write()
  gfs2: Convert gfs2_aspace_write_folio to bh_submit()
  gfs2: Remove use of b_end_io in gfs2_meta_read_endio()
  gfs2: Convert gfs2_dir_readahead to bh_submit()
  gfs2: Convert gfs2_metapath_ra to bh_submit()
  ocfs2: Convert ocfs2_write_super_or_backup to bh_submit()
  ocfs2: Convert ocfs2_read_blocks to bh_submit()
  ocfs2: Convert ocfs2_read_block to bh_submit()
  ocfs2: Convert ocfs2_write_block to bh_submit()
  jbd2: Convert jbd2_write_superblock() to bh_submit()
  jbd2: Convert journal commit to bh_submit()
  ext4: Convert ext4_commit_super() to bh_submit()
  ...

Link: https://patch.msgid.link/20260528173150.1093780-1-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agobuffer: Remove end_buffer_write_sync()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:47 +0000 (18:31 +0100)] 
buffer: Remove end_buffer_write_sync()

It has no callers left, so delete it.  Inline __end_buffer_write_sync()
into bh_end_write().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-35-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Change calling convention for end_buffer_read_sync()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:46 +0000 (18:31 +0100)] 
buffer: Change calling convention for end_buffer_read_sync()

Unify end_buffer_read_sync() and __end_buffer_read_notouch()
by requiring the caller put the refcount on the buffer.  The only caller
is in the gfs2_meta_read() path, and there we can put the refcount
after locking the buffer.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-34-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Remove b_end_io
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:45 +0000 (18:31 +0100)] 
buffer: Remove b_end_io

This shrinks buffer_head by 8 bytes, letting us pack more buffer heads
per slab.  With a Debian config, it shrinks from 104 bytes to 96 bytes
which is 42 objects per 4KiB page rather than 39, a 7% reduction in the
amount of memory used.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-33-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Remove submit_bh()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:44 +0000 (18:31 +0100)] 
buffer: Remove submit_bh()

No users are left; remove this API.  Also remove/fix comments mentioning
it, and end_bio_bh_io_sync() as it's now unused.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-32-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agomd-bitmap: Convert read_file_page and write_file_page to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:43 +0000 (18:31 +0100)] 
md-bitmap: Convert read_file_page and write_file_page to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-31-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agonilfs2: Convert nilfs_mdt_submit_block to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:42 +0000 (18:31 +0100)] 
nilfs2: Convert nilfs_mdt_submit_block to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-30-willy@infradead.org
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-nilfs@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agonilfs2: Convert nilfs_gccache_submit_read_data to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:41 +0000 (18:31 +0100)] 
nilfs2: Convert nilfs_gccache_submit_read_data to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-29-willy@infradead.org
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-nilfs@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agonilfs2: Convert nilfs_btnode_submit_block to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:40 +0000 (18:31 +0100)] 
nilfs2: Convert nilfs_btnode_submit_block to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-28-willy@infradead.org
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-nilfs@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Remove mark_buffer_async_write()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:39 +0000 (18:31 +0100)] 
buffer: Remove mark_buffer_async_write()

There are no more callers of this function, so delete it.
end_buffer_async_write() then has only one caller left, so
inline it into bh_end_async_write().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-27-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agogfs2: Convert gfs2_aspace_write_folio to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:38 +0000 (18:31 +0100)] 
gfs2: Convert gfs2_aspace_write_folio to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-26-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agogfs2: Remove use of b_end_io in gfs2_meta_read_endio()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:37 +0000 (18:31 +0100)] 
gfs2: Remove use of b_end_io in gfs2_meta_read_endio()

All buffer heads submitted by gfs2_submit_bhs() use
end_buffer_read_sync() so we can call it directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-25-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agogfs2: Convert gfs2_dir_readahead to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:36 +0000 (18:31 +0100)] 
gfs2: Convert gfs2_dir_readahead to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh().  Also simplify the control flow now that the buffer
refcount is not put by bh_end_read().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-24-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agogfs2: Convert gfs2_metapath_ra to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:35 +0000 (18:31 +0100)] 
gfs2: Convert gfs2_metapath_ra to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead
of submit_bh().  Also simplify the control flow now that the buffer
refcount is not put by bh_end_read().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-23-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoocfs2: Convert ocfs2_write_super_or_backup to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:34 +0000 (18:31 +0100)] 
ocfs2: Convert ocfs2_write_super_or_backup to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-22-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoocfs2: Convert ocfs2_read_blocks to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:33 +0000 (18:31 +0100)] 
ocfs2: Convert ocfs2_read_blocks to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-21-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoocfs2: Convert ocfs2_read_block to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:32 +0000 (18:31 +0100)] 
ocfs2: Convert ocfs2_read_block to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-20-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoocfs2: Convert ocfs2_write_block to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:31 +0000 (18:31 +0100)] 
ocfs2: Convert ocfs2_write_block to bh_submit()

Avoid an extra indirect function call and changing the buffer
refcount by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-19-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agojbd2: Convert jbd2_write_superblock() to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:30 +0000 (18:31 +0100)] 
jbd2: Convert jbd2_write_superblock() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-18-willy@infradead.org
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agojbd2: Convert journal commit to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:29 +0000 (18:31 +0100)] 
jbd2: Convert journal commit to bh_submit()

Avoid an extra indirect function call by using bh_submit()
instead of submit_bh() in journal_submit_commit_record()
and jbd2_journal_commit_transaction().  These both use
journal_end_buffer_io_sync(), so it's more straightforward to do them
both at once.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-17-willy@infradead.org
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoext4: Convert ext4_commit_super() to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:28 +0000 (18:31 +0100)] 
ext4: Convert ext4_commit_super() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-16-willy@infradead.org
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoext4: Convert write_mmp_block_thawed() to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:27 +0000 (18:31 +0100)] 
ext4: Convert write_mmp_block_thawed() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-15-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoext4: Convert ext4_fc_submit_bh() to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:26 +0000 (18:31 +0100)] 
ext4: Convert ext4_fc_submit_bh() to bh_submit()

Avoid an extra indirect function call by converting
ext4_end_buffer_io_sync() from bh_end_io_t to bio_end_io_t and
calling bh_submit().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-14-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoext4; Convert __ext4_read_bh() to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:25 +0000 (18:31 +0100)] 
ext4; Convert __ext4_read_bh() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by converting ext4_end_bitmap_read() from bh_end_io_t to bio_end_io_t
and calling bh_submit().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-13-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Convert __block_write_full_folio to __bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:24 +0000 (18:31 +0100)] 
buffer: Convert __block_write_full_folio to __bh_submit()

Avoid an extra indirect function call by using __bh_submit() instead
of submit_bh_wbc().  Since there is only one caller of submit_bh_wbc()
left, inline it into submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-12-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Convert block_read_full_folio to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:23 +0000 (18:31 +0100)] 
buffer: Convert block_read_full_folio to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh().  Since mark_buffer_async_read() would collapse to a single
function call, inline it into block_read_full_folio() along with its
extensive comment.  Convert end_buffer_async_read_io() to
bh_end_async_read().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-11-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Convert __bh_read_batch to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:22 +0000 (18:31 +0100)] 
buffer: Convert __bh_read_batch to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-10-willy@infradead.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Convert __bh_read to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:21 +0000 (18:31 +0100)] 
buffer: Convert __bh_read to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-9-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Convert __sync_dirty_buffer to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:20 +0000 (18:31 +0100)] 
buffer: Convert __sync_dirty_buffer to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-8-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Convert __bread_slow to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:19 +0000 (18:31 +0100)] 
buffer: Convert __bread_slow to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-7-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Convert write_dirty_buffer to bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:18 +0000 (18:31 +0100)] 
buffer: Convert write_dirty_buffer to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-6-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Add bh_end_read(), bh_end_write() and bh_end_async_write()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:17 +0000 (18:31 +0100)] 
buffer: Add bh_end_read(), bh_end_write() and bh_end_async_write()

These are the bio_end_io_t versions of end_buffer_read_sync(),
end_buffer_write_sync() and end_buffer_async_write().  They do not
contain a put_bh() call as it is no longer necessary.

Also add the helper function bio_endio_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-5-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Remove mark_buffer_async_write_endio()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:16 +0000 (18:31 +0100)] 
buffer: Remove mark_buffer_async_write_endio()

All callers of mark_buffer_async_write_endio() pass
end_buffer_async_write, so we can inline mark_buffer_async_write_endio()
into mark_buffer_async_write() and just call that instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-4-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Add bh_submit()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:15 +0000 (18:31 +0100)] 
buffer: Add bh_submit()

bh_submit() takes a bio_end_io allowing users to avoid the indirect
function call through bh->b_end_io, and eventually allowing us to remove
bh->b_end_io.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-3-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agobuffer: Remove forward declaration of submit_bh_wbc()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:31:14 +0000 (18:31 +0100)] 
buffer: Remove forward declaration of submit_bh_wbc()

Rearrange functions to avoid this forward declaration.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-2-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoMerge patch series "eventpoll: Fix epoll_wait() report false negative"
Christian Brauner [Thu, 4 Jun 2026 08:25:14 +0000 (10:25 +0200)] 
Merge patch series "eventpoll: Fix epoll_wait() report false negative"

Nam Cao <namcao@linutronix.de> says:

While staring at epoll, I noticed ep_events_available() looks wrong. I
wrote a small program to confirm, and yes it is definitely wrong.

This series adds a reproducer to kselftest, and fix the bug.

* patches from https://patch.msgid.link/cover.1780422137.git.namcao@linutronix.de:
  eventpoll: Fix epoll_wait() report false negative
  selftests/eventpoll: Add test for multiple waiters

Link: https://patch.msgid.link/cover.1780422137.git.namcao@linutronix.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agoeventpoll: Fix epoll_wait() report false negative
Nam Cao [Tue, 2 Jun 2026 17:51:46 +0000 (19:51 +0200)] 
eventpoll: Fix epoll_wait() report false negative

ep_events_available() checks for available events by looking at
ep->rdllist and ep_is_scanning(). However, this is done without a lock
and can report false negative if ep_start_scan() or ep_done_scan() are
executed by another task concurrently. For example:
_________________________________________________________________________
                                   |ep_start_scan()
                                   |  list_splice_init(&ep->rdllist, ...)
ep_events_available()              |
  !list_empty_careful(&ep->rdllist)|
  || ep_is_scanning(ep)            |
                           |  ep_enter_scan(ep)
___________________________________|_____________________________________

Another example:
_________________________________________________________________________
ep_events_available()              |
                                   |ep_start_scan()
                                   |  list_splice_init(&ep->rdllist, ...)
                           |  ep_enter_scan(ep)
  !list_empty_careful(&ep->rdllist)|
                                   |ep_done_scan()
                                   |  ep_exit_scan(ep)
                                   |  list_splice(..., &ep->rdllist)
  || ep_is_scanning(ep)            |
___________________________________|_____________________________________

In the above examples, ep_events_available() sees no event despite
events being available. In case epoll_wait() is called with timeout=0,
epoll_wait() will wrongly return "no event" to user.

Introduce a sequence lock to resolve this issue.

Measuring the time consumption of 10 million loop iterations doing
epoll_wait(), the following performance drop is observed:

   timeout  #event  before    after    diff
     0ms      0     3727ms   3974ms   +6.6%
     0ms      1     8099ms   9134ms    +13%
     1ms      1    13525ms  13586ms  +0.45%

Considering the use case of epoll_wait() (wait for events, do something
with the events, repeat), it should only contribute to a small portion of
user's CPU consumption. Therefore this performance drop is not alarming.

Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Link: https://patch.msgid.link/4363cd8e34a21d4f0d257be1b33e84dc25030fdf.1780422138.git.namcao@linutronix.de
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoselftests/eventpoll: Add test for multiple waiters
Nam Cao [Tue, 2 Jun 2026 17:51:45 +0000 (19:51 +0200)] 
selftests/eventpoll: Add test for multiple waiters

Add a test whichs creates 64 threads who all epoll_wait() on the same
eventpoll. The source eventfd is written but never read, therefore all the
threads should always see an EPOLLIN event.

This test fails because of a kernel bug, which will be fixed by a follow-up
commit.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Link: https://patch.msgid.link/b11947013563875c046c0b0959c29fd95eeebd34.1780422138.git.namcao@linutronix.de
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoALSA: seq: oss: Use scoped cleanup for temporary MIDI use lock
Cássio Gabriel [Thu, 4 Jun 2026 04:48:14 +0000 (01:48 -0300)] 
ALSA: seq: oss: Use scoped cleanup for temporary MIDI use lock

The OSS sequencer write and out-of-band paths may receive a temporary
snd_use_lock_t reference from snd_seq_oss_process_event(). This was added
to keep MIDI device data alive until events with embedded SysEx data are
dispatched.

Use a scoped cleanup helper for that temporary reference. This keeps the
lifetime rule local to the variable declaration and avoids future missing
snd_use_lock_free() paths if these event handling paths gain more exits.

No functional change is intended.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260604-alsa-scoped-cleanups-v1-3-10c43152a728@gmail.com
3 weeks agoALSA: core: Add scoped cleanup helper for card references
Cássio Gabriel [Thu, 4 Jun 2026 04:48:13 +0000 (01:48 -0300)] 
ALSA: core: Add scoped cleanup helper for card references

Several ALSA paths acquire temporary card references with snd_card_ref()
and release them manually with snd_card_unref(). control_led.c already
defines a local cleanup helper for this pattern, while other core paths
still open-code the release.

Move the helper to the common ALSA core header and use it in control-layer
card-reference paths. This makes the ownership rule explicit and avoids
future missing-unref mistakes when adding early exits.

No functional change is intended.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260604-alsa-scoped-cleanups-v1-2-10c43152a728@gmail.com
3 weeks agoALSA: control: Use scoped cleanup for user control buffers
Cássio Gabriel [Thu, 4 Jun 2026 04:48:12 +0000 (01:48 -0300)] 
ALSA: control: Use scoped cleanup for user control buffers

User-defined control TLV data and enum names are copied from user space
with vmemdup_user() before being installed in the user_element. Until
ownership is transferred, these temporary buffers have to be released on
every validation exit.

Use __free(kvfree) for the temporary buffers and no_free_ptr() when
ownership is transferred to the user_element. This removes the manual
kvfree() calls from the unchanged-TLV and enum-name validation paths,
makes the ownership hand-off explicit, and keeps the existing allocation
accounting and ABI unchanged.

No functional change is intended.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260604-alsa-scoped-cleanups-v1-1-10c43152a728@gmail.com
3 weeks agonvme-tcp: lockdep: use dynamic lockdep keys per socket instance
Shin'ichiro Kawasaki [Thu, 4 Jun 2026 02:32:08 +0000 (11:32 +0900)] 
nvme-tcp: lockdep: use dynamic lockdep keys per socket instance

When NVMe-TCP controller setup and teardown are repeated with lockdep
enabled, lockdep reports false positives WARN for the following locks:

  1) &q->elevator_lock        : IO scheduler change context
  2) &q->q_usage_counter(io)  : SCSI disk probe context
  3) fs_reclaim               : CPU hotplug bring-up context
  4) cpu_hotplug_lock         : socket establishment context
  5) sk_lock-AF_INET-NVME     : MQ sched dispatch context for the socket
  6) set->srcu                : NVMe controller delete context

The lockdep WARN was observed by running blktests test case nvme/005 for
tcp transport on v7.1-rc1 kernel with a patch. Refer to the Link tag for
the details of the WARN.

This is a false positive because lockdep confuses lock 4) (socket
establishment) with lock 5) (socket in use) for different socket
instances. The locks belong to different sockets, but lockdep treats
them as the same due to shared static lockdep keys.

Fix this by using dynamically allocated lockdep keys per socket instance
instead of static keys nvme_tcp_sk_key[] and nvme_tcp_slock_key[]. Add
nvme_tcp_sk_key and nvme_tcp_slock_key fields to struct nvme_tcp_queue
and pass them to sock_lock_init_class_and_name() for proper lockdep
tracking. Change the argument of nvme_tcp_reclassify_socket() from
'struct socket *' to 'struct nvme_tcp_queue *' to pass both the socket
and the keys. Add CONFIG_DEBUG_LOCK_ALLOC guards to nvme_tcp_alloc_queue()
and nvme_tcp_free_queue() to register and unregister the dynamic keys.
Additionally, move nvme_tcp_reclassify_socket() inside these guards since
it's only needed when lockdep is enabled.

Link: https://lore.kernel.org/linux-nvme/afB5syZbUrppgsDQ@shinmob/
Suggested-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agoMerge patch series "mm: improve write performance with RWF_DONTCACHE"
Christian Brauner [Thu, 4 Jun 2026 08:18:25 +0000 (10:18 +0200)] 
Merge patch series "mm: improve write performance with RWF_DONTCACHE"

Jeff Layton <jlayton@kernel.org> says:

This patch series is intended to improve write performance with
RWF_DONTCACHE. This version fixes additional stat accounting issues
found during review: integer promotion on 32-bit, cgroup writeback
domain migration, folio split flag preservation, and a UAF that could
occur in filemap_dontcache_kick_writeback().

* patches from https://patch.msgid.link/20260511-dontcache-v7-0-2848ddce8090@kernel.org:
  mm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking
  mm: track DONTCACHE dirty pages per bdi_writeback
  mm: preserve PG_dropbehind flag during folio split

Link: https://patch.msgid.link/20260511-dontcache-v7-0-2848ddce8090@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agoALSA: hda/realtek: Add quirk for ASUS VivoBook X509DAP
Andrei Faleichyk [Wed, 3 Jun 2026 21:33:13 +0000 (01:33 +0400)] 
ALSA: hda/realtek: Add quirk for ASUS VivoBook X509DAP

The internal microphone on ASUS VivoBook X509DAP (subsystem ID
0x1043:0x197e) is not detected without a quirk entry. Add
ALC256_FIXUP_ASUS_MIC_NO_PRESENCE to fix the issue.

Signed-off-by: Andrei Faleichyk <andrei.faleichyk@noogadev.com>
Link: https://patch.msgid.link/20260603213313.6298-1-andrei.faleichyk@noogadev.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agomm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking
Jeff Layton [Mon, 11 May 2026 11:58:29 +0000 (07:58 -0400)] 
mm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking

The IOCB_DONTCACHE writeback path in generic_write_sync() calls
filemap_flush_range() on every write, submitting writeback inline in
the writer's context.  Perf lock contention profiling shows the
performance problem is not lock contention but the writeback submission
work itself — walking the page tree and submitting I/O blocks the writer
for milliseconds, inflating p99.9 latency from 23ms (buffered) to 93ms
(dontcache).

Replace the inline filemap_flush_range() call with a flusher kick that
drains dirty pages in the background.  This moves writeback submission
completely off the writer's hot path.

To avoid flushing unrelated buffered dirty data, add a dedicated
WB_start_dontcache bit and wb_check_start_dontcache() handler that uses
the per-wb WB_DONTCACHE_DIRTY counter to determine how many pages to
write back.  The flusher writes back that many pages from the oldest dirty
inodes (not restricted to dontcache-specific inodes). This helps
preserve I/O batching while limiting the scope of expedited writeback.

Like WB_start_all, the WB_start_dontcache bit coalesces multiple
DONTCACHE writes into a single flusher wakeup without per-write
allocations.  Use test_and_clear_bit to atomically consume the kick
request before reading the dirty counter and starting writeback, so that
concurrent DONTCACHE writes during writeback can re-set the bit and
schedule a follow-up flusher run.

Read the dirty counter with wb_stat_sum() (aggregating per-CPU batches)
rather than wb_stat() (which reads only the global counter) to ensure
small writes below the percpu batch threshold are visible to the flusher.

In filemap_dontcache_kick_writeback(), set the WB_start_dontcache bit
inside the unlocked_inode_to_wb_begin/end section for correct cgroup
writeback domain targeting, but defer the wb_wakeup() call until after
the section ends, since wb_wakeup() uses spin_unlock_irq() which would
unconditionally re-enable interrupts while the i_pages xa_lock may still
be held under irqsave during a cgroup writeback switch. Pin the wb with
wb_get() inside the RCU critical section before calling wb_wakeup()
outside it, since cgroup bdi_writeback structures are RCU-freed and the
wb pointer could become invalid after unlocked_inode_to_wb_end() drops
the RCU read lock.

Also add WB_REASON_DONTCACHE as a new writeback reason for tracing
visibility.

dontcache-bench results (same host, T6F_SKL_1920GBF, 251 GiB RAM,
xfs on NVMe, fio io_uring):

Buffered and direct I/O paths are unaffected by this patchset. All
improvements are confined to the dontcache path:

Single-stream throughput (MB/s):
                        Before    After    Change
  seq-write/dontcache      298      897    +201%
  rand-write/dontcache     131      236     +80%

Tail latency improvements (seq-write/dontcache):
  p99:    135,266 us  ->  23,986 us   (-82%)
  p99.9: 8,925,479 us ->  28,443 us   (-99.7%)

Multi-writer (4 jobs, sequential write):
                                Before    After    Change
  dontcache aggregate (MB/s)     2,529    4,532     +79%
  dontcache p99 (us)             8,553    1,002     -88%
  dontcache p99.9 (us)         109,314    1,057     -99%

  Dontcache multi-writer throughput now matches buffered (4,532 vs
  4,616 MB/s).

32-file write (Axboe test):
                                Before    After    Change
  dontcache aggregate (MB/s)     1,548    3,499    +126%
  dontcache p99 (us)            10,170      602     -94%
  Peak dirty pages (MB)          1,837      213     -88%

  Dontcache now reaches 81% of buffered throughput (was 35%).

Competing writers (dontcache vs buffered, separate files):
                                Before    After
  buffered writer                  868      433 MB/s
  dontcache writer                 415      433 MB/s
  Aggregate                      1,284      866 MB/s

  Previously the buffered writer starved the dontcache writer 2:1.
  With per-bdi_writeback tracking, both writers now receive equal
  bandwidth. The aggregate matches the buffered-vs-buffered baseline
  (863 MB/s), indicating fair sharing regardless of I/O mode.

  The dontcache writer's p99.9 latency collapsed from 119 ms to
  33 ms (-73%), eliminating the severe periodic stalls seen in the
  baseline. Both writers now share identical latency profiles,
  matching the buffered-vs-buffered pattern.

The per-bdi_writeback dirty tracking dramatically reduces peak dirty
pages in dontcache workloads, with the 32-file test dropping from
1.8 GB to 213 MB. Dontcache sequential write throughput triples and
multi-writer throughput reaches parity with buffered I/O, with tail
latencies collapsing by 1-2 orders of magnitude.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260511-dontcache-v7-3-2848ddce8090@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agomm: track DONTCACHE dirty pages per bdi_writeback
Jeff Layton [Mon, 11 May 2026 11:58:28 +0000 (07:58 -0400)] 
mm: track DONTCACHE dirty pages per bdi_writeback

Add a per-wb WB_DONTCACHE_DIRTY counter that tracks the number of dirty
pages with the dropbehind flag set (i.e., pages dirtied via RWF_DONTCACHE
writes).

Increment the counter alongside WB_RECLAIMABLE in folio_account_dirtied()
when the folio has the dropbehind flag set, and decrement it in
folio_clear_dirty_for_io() and folio_account_cleaned(). Also decrement it
when a non-DONTCACHE lookup atomically clears the dropbehind flag on a
dirty folio in __filemap_get_folio_mpol(), using folio_test_clear_dropbehind()
to prevent concurrent lookups from double-decrementing the counter, and
guarding the decrement with mapping_can_writeback() to match the increment
path.

Transfer the counter alongside WB_RECLAIMABLE in inode_do_switch_wbs() so
that the stat is properly migrated when an inode switches cgroup writeback
domains.

The counter will be used by the writeback flusher to determine how many
pages to write back when expediting writeback for IOCB_DONTCACHE writes,
without flushing the entire BDI's dirty pages.

Suggested-by: Jan Kara <jack@suse.cz>
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260511-dontcache-v7-2-2848ddce8090@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agomm: preserve PG_dropbehind flag during folio split
Jeff Layton [Mon, 11 May 2026 11:58:27 +0000 (07:58 -0400)] 
mm: preserve PG_dropbehind flag during folio split

__split_folio_to_order() copies page flags from the original folio to
newly created sub-folios using an explicit allowlist, but PG_dropbehind
is not included. When a large folio with PG_dropbehind set is split,
only the head sub-folio retains the flag; all tail sub-folios silently
lose it and will not be reclaimed eagerly after writeback completes.

Add PG_dropbehind to the flag copy mask so that the drop-behind hint
is preserved across folio splits.

Fixes: a323281cdfec ("mm: add PG_dropbehind folio flag")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260511-dontcache-v7-1-2848ddce8090@kernel.org
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoALSA: usb-audio: qcom: Use PAGE_ALIGN macro for buffer size calculation
wangdicheng [Wed, 3 Jun 2026 09:11:02 +0000 (17:11 +0800)] 
ALSA: usb-audio: qcom: Use PAGE_ALIGN macro for buffer size calculation

Use the kernel's PAGE_ALIGN() macro instead of open-coding the page
alignment calculation. This improves code readability and follows
kernel coding style.

The manual calculation:
  mult = len / PAGE_SIZE;
  remainder = len % PAGE_SIZE;
  len = mult * PAGE_SIZE;
  len += remainder ? PAGE_SIZE : 0;

is equivalent to:
  len = PAGE_ALIGN(len);

Signed-off-by: wangdicheng <wangdicheng@kylinos.cn>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260603091102.231370-4-wangdich9700@163.com
3 weeks agoALSA: usb-audio: qcom: Fix return value in qc_usb_audio_offload_fill_avail_pcms
wangdicheng [Wed, 3 Jun 2026 09:11:01 +0000 (17:11 +0800)] 
ALSA: usb-audio: qcom: Fix return value in qc_usb_audio_offload_fill_avail_pcms

The function qc_usb_audio_offload_fill_avail_pcms() always returns -1
regardless of how many PCM devices were successfully filled. This makes
it impossible for callers to know the actual number of available PCMs.

Return the actual count of filled PCM devices instead, which allows
callers to verify that all expected PCMs were properly enumerated.

Signed-off-by: wangdicheng <wangdicheng@kylinos.cn>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260603091102.231370-3-wangdich9700@163.com
3 weeks agoALSA: usb-audio: qcom: Use snprintf for mixer control name formatting
wangdicheng [Wed, 3 Jun 2026 09:11:00 +0000 (17:11 +0800)] 
ALSA: usb-audio: qcom: Use snprintf for mixer control name formatting

The current code uses sprintf() to format control names without bounds
checking, which could lead to buffer overflow if PCM index is large.
Replace sprintf with snprintf to ensure buffer safety.

The ctl_name buffer is 48 bytes, and the formatted string could exceed
this with large PCM index values. Using snprintf with sizeof(ctl_name)
prevents potential buffer overflow.

Signed-off-by: wangdicheng <wangdicheng@kylinos.cn>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260603091102.231370-2-wangdich9700@163.com
3 weeks agoALSA: usb-audio: qcom: Improve error logging in USB offload
wangdicheng [Wed, 3 Jun 2026 09:10:59 +0000 (17:10 +0800)] 
ALSA: usb-audio: qcom: Improve error logging in USB offload

Add error codes to error messages for better debugging.
This helps identify the root cause when USB audio offload fails.

Error messages now include the actual error code returned by
xhci_sideband operations, making it easier to diagnose failures
during USB audio offload setup.

Signed-off-by: wangdicheng <wangdicheng@kylinos.cn>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260603091102.231370-1-wangdich9700@163.com
3 weeks agoALSA: hda/realtek: ALC882: Fixup for Clevo P775TM1
Evelyn Ali [Tue, 2 Jun 2026 21:41:22 +0000 (17:41 -0400)] 
ALSA: hda/realtek: ALC882: Fixup for Clevo P775TM1

Clevo P775TM1 laptops come with an ESS Sabre HiFi DAC. Setting
0x1b pin VREF to 80% enables said DAC output.

Signed-off-by: Evelyn Ali <evelynali99@gmail.com>
Link: https://patch.msgid.link/20260602214122.78020-1-evelynali99@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoMerge patch series "libfs: set SB_I_NOEXEC and SB_I_NODEV in init_pseudo()"
Christian Brauner [Thu, 4 Jun 2026 08:10:58 +0000 (10:10 +0200)] 
Merge patch series "libfs: set SB_I_NOEXEC and SB_I_NODEV in init_pseudo()"

John Hubbard <jhubbard@nvidia.com> says:

This began as a one-line dma-buf fix for a path_noexec() warning added
by commit 1e7ab6f67824 ("anon_inode: rework assertions"). Christoph
pointed out that the fix belongs higher up: a pseudo filesystem has no
reason not to set SB_I_NOEXEC by default. This series does that.

  * Patch 1 sets both flags in init_pseudo(), so every pseudo
    filesystem gets them. This is the only patch that changes a flag,
    and the only one with Fixes:/Cc: stable.

  * Patch 2 drops the assignments that are now redundant in the callers
    that set them by hand.

Most callers already set one or both flags. I audited every
init_pseudo() caller. Here is what patch 1 actually changes for each.
The only visible effect is on dma-buf, where SB_I_NOEXEC silences the
warning. SB_I_NODEV is never consulted on these SB_NOUSER mounts, and
none of the callers that gain SB_I_NOEXEC are executed from.

  caller                       had        patch 1 adds
  ---------------------------  --------   --------------
  fs/anon_inodes.c             both       nothing new
  mm/secretmem.c               both       nothing new
  virt/kvm/guest_memfd.c       both       nothing new
  fs/nsfs.c                    both       nothing new
  fs/pidfs.c                   both       nothing new
  fs/aio.c                     NOEXEC     NODEV
  drivers/dma-buf/dma-buf.c    neither    NOEXEC + NODEV
  net/socket.c                 neither    NOEXEC + NODEV
  fs/pipe.c                    neither    NOEXEC + NODEV
  kernel/resource.c            neither    NOEXEC + NODEV
  fs/erofs/super.c             neither    NOEXEC + NODEV
  fs/btrfs/tests/...           neither    NOEXEC + NODEV
  drivers/vfio/vfio_main.c     neither    NOEXEC + NODEV
  drivers/gpu/drm/drm_drv.c    neither    NOEXEC + NODEV
  drivers/dax/super.c          neither    NOEXEC + NODEV
  block/bdev.c                 neither    NOEXEC + NODEV

* patches from https://patch.msgid.link/20260604025315.245910-1-jhubbard@nvidia.com:
  libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
  libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()

Link: https://patch.msgid.link/20260604025315.245910-1-jhubbard@nvidia.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agolibfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
John Hubbard [Thu, 4 Jun 2026 02:53:15 +0000 (19:53 -0700)] 
libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers

init_pseudo() now sets SB_I_NOEXEC and SB_I_NODEV by default, so the
per-caller assignments are redundant. Drop them.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Link: https://patch.msgid.link/20260604025315.245910-3-jhubbard@nvidia.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agolibfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
John Hubbard [Thu, 4 Jun 2026 02:53:14 +0000 (19:53 -0700)] 
libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()

Since commit 1e7ab6f67824 ("anon_inode: rework assertions"),
path_noexec() warns when an anonymous-inode file is mmap'd from a
superblock that has not set SB_I_NOEXEC. dma-buf backs its files this
way and never set the flag, so mmap of any exported buffer trips the
warning on a CONFIG_DEBUG_VFS=y kernel:

  WARNING: CPU: 11 PID: 121813 at fs/exec.c:118 path_noexec+0x47/0x50
   do_mmap+0x2b5/0x680
   vm_mmap_pgoff+0x129/0x210
   ksys_mmap_pgoff+0x177/0x240
   __x64_sys_mmap+0x33/0x70

init_pseudo() sets up internal SB_NOUSER mounts that are never
path-reachable. Set both flags here so every pseudo filesystem gets
them by default instead of each caller setting them.

SB_I_NODEV is inert for unreachable mounts. SB_I_NOEXEC has one
visible effect: an executable mapping of a pseudo-fs fd, such as a
dma-buf, now fails with -EPERM, which is the invariant the assertion
enforces. No in-tree caller maps these executable.

Reproduce on CONFIG_DEBUG_VFS=y:

  make -C tools/testing/selftests/dmabuf-heaps
  sudo ./tools/testing/selftests/dmabuf-heaps/dmabuf-heap -t system

Fixes: 1e7ab6f67824 ("anon_inode: rework assertions")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Link: https://patch.msgid.link/20260604025315.245910-2-jhubbard@nvidia.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoiomap: avoid potential null folio->mapping deref during error reporting
Joanne Koong [Thu, 4 Jun 2026 01:18:58 +0000 (18:18 -0700)] 
iomap: avoid potential null folio->mapping deref during error reporting

When a buffered read fails, iomap_finish_folio_read() reports the error
with fserror_report_io(folio->mapping->host, ...). This is called after
ifs->read_bytes_pending has been decremented by the bytes attempted to
be read.

For a folio split across multiple read completions, the folio is only
guaranteed to stay locked while read_bytes_pending > 0. Once
iomap_finish_folio_read() decrements read_bytes_pending, another
in-flight read can complete and end the read on the folio, which unlocks
it. This allows truncate logic to run and detach the folio (set
folio->mapping to NULL). The error reporting path then can dereference a
NULL folio->mapping. As reported by Sam Sun, this is the race that can
occur:

CPU0: failed completion      CPU1: final completion     CPU2: truncate
-----------------------      ----------------------     --------------
read_bytes_pending -= len
finished = false
/* preempted before
   fserror_report_io() */
     read_bytes_pending -= len
     finished = true
     folio_end_read()
truncate clears
folio->mapping
fserror_report_io(
  folio->mapping->host, ...)
      ^ NULL deref

Fix this by reporting the error first before decrementing
ifs->read_bytes_pending.

Fixes: a9d573ee88af ("iomap: report file I/O errors to the VFS")
Cc: stable@vger.kernel.org
Reported-by: Sam Sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAEkJfYPhWdd59RKmuNLJg-bkypHz7xiOwaWyNVu3A8CUqQCnvg@mail.gmail.com/
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260604011858.2297561-1-joannelkoong@gmail.com
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agofhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh()
Jann Horn [Wed, 3 Jun 2026 19:31:57 +0000 (21:31 +0200)] 
fhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh()

may_decode_fh() accesses mount::mnt_ns without holding any locks; that
means the mount can concurrently be unmounted, and the mnt_namespace can
concurrently be freed after an RCU grace period.

This race can happens as follows, assuming that the mount point was
created by open_tree(..., OPEN_TREE_CLONE):

thread 1            thread 2            RCU
                    __do_sys_open_by_handle_at
                      do_handle_open
                        handle_to_path
                          may_decode_fh
                            is_mounted
                              [mount::mnt_ns access]
                            [mount::mnt_ns access]
__do_sys_close
  fput_close_sync
    __fput
      dissolve_on_fput
        umount_tree
        class_namespace_excl_destructor
          namespace_unlock
            free_mnt_ns
              mnt_ns_tree_remove
                call_rcu(mnt_ns_release_rcu)
                                        mnt_ns_release_rcu
                                          mnt_ns_release
                                            kfree
                            [mnt_namespace::user_ns access] **UAF**

Fix it by taking rcu_read_lock() around the mount::mnt_ns access, like
in __prepend_path().
Additionally, document the semantics of mount::mnt_ns, and use WRITE_ONCE()
for writers that can race with lockless readers.

This bug is unreachable unless one of the following is set:

 - CONFIG_PREEMPTION
 - CONFIG_RCU_STRICT_GRACE_PERIOD

because it requires an RCU grace period to happen during a syscall without
an explicit preemption.

This doesn't seem to have interesting security impact; worst-case, it could
leak the result of an integer comparison to userspace (from the level
check in cap_capable()), cause an endless loop, or crash the kernel by
dereferencing an invalid address.

Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260603-vfs-fhandle-uaf-fix-v2-1-d05db76a5084@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoMerge tag 'zynqmp-soc-for-7.2' of https://github.com/Xilinx/linux-xlnx into soc/drivers
Linus Walleij [Thu, 4 Jun 2026 07:38:57 +0000 (09:38 +0200)] 
Merge tag 'zynqmp-soc-for-7.2' of https://github.com/Xilinx/linux-xlnx into soc/drivers

arm64: Xilinx SOC changes for 7.2

firmware:
- Add CSU register discovery with sysfs interface

zynqmp_power:
- Fix race condition in event registration
- Fix shutdown and free rx mailbox channel

* tag 'zynqmp-soc-for-7.2' of https://github.com/Xilinx/linux-xlnx:
  firmware: zynqmp: Add dynamic CSU register discovery and sysfs interface
  Documentation: ABI: add sysfs interface for ZynqMP CSU registers
  soc: xilinx: Shutdown and free rx mailbox channel
  soc: xilinx: Fix race condition in event registration

Signed-off-by: Linus Walleij <linusw@kernel.org>
3 weeks agokbuild: rust: rename flag to `-Zdebuginfo-for-profiling` for Rust >= 1.98
Miguel Ojeda [Tue, 2 Jun 2026 15:16:38 +0000 (17:16 +0200)] 
kbuild: rust: rename flag to `-Zdebuginfo-for-profiling` for Rust >= 1.98

Starting with Rust 1.98.0 (expected 2026-08-20), the
`-Zdebug-info-for-profiling` flag has been renamed to
`-Zdebuginfo-for-profiling` (i.e. one less dash, to match `debuginfo`s
in other flags) [1].

Without this change, one gets in the latest nightlies:

    error: unknown unstable option: `debug-info-for-profiling`

Thus pass the right name.

Link: https://github.com/rust-lang/rust/pull/156887
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260602151638.14358-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 weeks agoarm64: dts: hisilicon: hi3660-hikey960: move role-switch endpoint into connector
Akash Sukhavasi [Wed, 20 May 2026 21:53:25 +0000 (16:53 -0500)] 
arm64: dts: hisilicon: hi3660-hikey960: move role-switch endpoint into connector

The rt1711h Type-C controller on the HiKey960 has the USB role-switch
endpoint placed as a top-level 'port' node, outside the connector
subnode. This triggers two dtbs_check warnings against
richtek,rt1711h.yaml:

  - 'port' does not match any of the regexes: '^pinctrl-[0-9]+$'
  - connector:ports: 'port@0' is a required property

Move the role-switch endpoint into the connector's port@0, which is
where usb-connector.yaml expects it. Update the DWC3 remote-endpoint
phandle accordingly.

The TCPM core (tcpm.c) looks up the role switch starting from the
connector fwnode via fwnode_usb_role_switch_get(). With the endpoint
inside the connector's port@0, it is found through the primary lookup
path rather than the device-level fallback.

Cross-compiled for arm64. Verified with dt_binding_check and
dtbs_check. Not runtime-tested on hardware.

Signed-off-by: Akash Sukhavasi <akash.sukhavasi@gmail.com>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
3 weeks agoiommu/vt-d: Fix RB-tree corruption in probe error path
Pranjal Shrivastava [Thu, 4 Jun 2026 06:03:10 +0000 (14:03 +0800)] 
iommu/vt-d: Fix RB-tree corruption in probe error path

The info->node RB-tree member is zero-initialized via kzalloc. If
a device does not support ATS, the device_rbtree_insert() call is
skipped. If a subsequent probe step fails, the error path jumps to
device_rbtree_remove(), which misinterprets the zeroed node as
a tree root and corrupts the device RB-tree.

Fix this by explicitly initializing the RB-node as empty using
RB_CLEAR_NODE() during initialization and guarding the removal with
RB_EMPTY_NODE().

Fixes: 4f1492efb495 ("iommu/vt-d: Revert ATS timing change to fix boot failure")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/all/20260525205628.CD4431F000E9@smtp.kernel.org/
Suggested-by: Baolu Lu <baolu.lu@linux.intel.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
Link: https://lore.kernel.org/r/20260531170254.60493-2-praan@google.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
3 weeks agoiommu/vt-d: Improve IOMMU fault information
Guanghui Feng [Thu, 4 Jun 2026 06:03:09 +0000 (14:03 +0800)] 
iommu/vt-d: Improve IOMMU fault information

In some environments, multiple PCIe segments exist, and PCIe device
information needs to be differentiated and identified based on the
segment. When an IOMMU fault event occurs, the IOMMU and device segment
information should be output in detail in dmar_fault_do_one.

Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
Link: https://lore.kernel.org/r/20260528022943.1697564-1-guanghuifeng@linux.alibaba.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
3 weeks agoiommu/vt-d: Remove typo from pasid_pte_config_nested()
Michał Grzelak [Thu, 4 Jun 2026 06:03:08 +0000 (14:03 +0800)] 
iommu/vt-d: Remove typo from pasid_pte_config_nested()

Rename pasid_pte_config_nestd() into pasid_pte_config_nested(). Do it to
match other function names ending with _nested().

Signed-off-by: Michał Grzelak <michal.grzelak@intel.com>
Link: https://lore.kernel.org/r/20260509174503.831134-1-michal.grzelak@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
3 weeks agoiommu/vt-d: Clear Present bit before tearing down scalable-mode context entry
Michael Bommarito [Thu, 4 Jun 2026 06:03:07 +0000 (14:03 +0800)] 
iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry

device_pasid_table_teardown() zeroes the 128-bit scalable-mode context
entry with context_clear_entry() while the Present bit is still set. This
creates a window where the hardware can fetch a torn entry, with some
fields already zeroed while Present is still set, leading to unpredictable
behavior or spurious faults. The context-cache invalidation is issued only
after the entry has been zeroed, and intel_pasid_free_table() then frees
the PASID directory pages, so the IOMMU can keep walking a stale Present=1
entry that points at freed memory.

While x86 provides strong write ordering, the compiler may reorder the two
64-bit writes to the entry, and the hardware fetch is not guaranteed to be
atomic with respect to multiple CPU writes.

Commit c1e4f1dccbe9d ("iommu/vt-d: Clear Present bit before tearing down
context entry") fixed this exact pattern in domain_context_clear_one() and
the copied-context path, but device_pasid_table_teardown() was not
converted.

Align it with the "Guidance to Software for Invalidations" in the VT-d
spec, Section 6.5.3.3, using the same ownership handshake as the sibling
fix: clear only the Present bit, flush it to the IOMMU, perform the
context-cache invalidation, and only then zero the rest of the entry.

Fixes: 81e921fd32161 ("iommu/vt-d: Fix NULL domain on device release")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Link: https://lore.kernel.org/r/20260528025557.3209367-1-michael.bommarito@gmail.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
3 weeks agoiommu/vt-d: Avoid WARNING in sva unbind path
Lu Baolu [Thu, 4 Jun 2026 06:03:06 +0000 (14:03 +0800)] 
iommu/vt-d: Avoid WARNING in sva unbind path

The Intel IOMMU driver allows SVA on devices even if they do not support
PCI/PRI. Commit 39c20c4e83b9 ("iommu/vt-d: Only handle IOPF for SVA when
PRI is supported") modified the SVA bind path to allow this configuration
by skipping IOPF enablement when PRI is missing. However, it failed to
update the unbind path.

This creates an imbalance: the unbind path attempts to disable IOPF for
a device that never had it enabled, triggering a WARNING in
intel_iommu_disable_iopf():

 WARNING: drivers/iommu/intel/iommu.c:3475 at intel_iommu_disable_iopf+0x4f/0x90d
 Call Trace:
  <TASK>
  blocking_domain_set_dev_pasid+0x50/0x70
  iommu_detach_device_pasid+0x89/0xc0
  iommu_sva_unbind_device+0x73/0x150
  xe_vm_close_and_put+0x4d2/0x1200 [xe]

Fix this by bypassing IOPF operations for SVA domains on non-PRI hardware
in both the bind and unbind paths.

Fixes: 39c20c4e83b9 ("iommu/vt-d: Only handle IOPF for SVA when PRI is supported")
Cc: stable@vger.kernel.org
Reported-by: Nareshkumar Gollakoti <naresh.kumar.g@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260519052917.3729796-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
3 weeks agoUSB: serial: option: add usb-id for Dell Wireless DW5826e-m
Jack Wu [Thu, 4 Jun 2026 02:04:40 +0000 (10:04 +0800)] 
USB: serial: option: add usb-id for Dell Wireless DW5826e-m

Add support for Dell DW5826e-m with USB-id 0x413c:0x81ea

T:  Bus=03 Lev=01 Prnt=01 Port=04 Cnt=01 Dev#=  8 Spd=480  MxCh= 0
D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=413c ProdID=81ea Rev= 5.04
S:  Manufacturer=DELL
S:  Product=DW5826e-m Qualcomm Snapdragon X12 Global LTE-A
S:  SerialNumber=358988870177734
C:* #Ifs= 7 Cfg#= 1 Atr=a0 MxPwr=500mA
A:  FirstIf#=12 IfCount= 2 Cls=02(comm.) Sub=0e Prot=00
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E:  Ad=87(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
I:* If#=12 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=cdc_mbim
E:  Ad=88(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
I:  If#=13 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
I:* If#=13 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Signed-off-by: Jack Wu <jackbb_wu@compal.com>
Reviewed-by: Lars Melin <larsm17@gmail>
Cc: stable@vger.kernel.org
[ johan: reserve also interface 4 ]
Signed-off-by: Johan Hovold <johan@kernel.org>
3 weeks agodmaengine: tegra: Add Tegra264 support
Akhil R [Tue, 31 Mar 2026 10:23:02 +0000 (15:53 +0530)] 
dmaengine: tegra: Add Tegra264 support

Add compatible and chip data to support GPCDMA in Tegra264, which has
differences in register layout and address bits compared to previous
versions.

Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260331102303.33181-10-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
3 weeks agodmaengine: tegra: Use iommu-map for stream ID
Akhil R [Tue, 31 Mar 2026 10:23:01 +0000 (15:53 +0530)] 
dmaengine: tegra: Use iommu-map for stream ID

Use 'iommu-map', when provided, to get the stream ID to be programmed
for each channel. Iterate over the channels registered and configure
each channel device separately using of_dma_configure_id() to allow
it to use a separate IOMMU domain for the transfer. However, do this
in a second loop since the first loop populates the DMA device channels
list and async_device_register() registers the channels. Both are
prerequisites for using the channel device in the next loop.

Channels will continue to use the same global stream ID if the
'iommu-map' property is not present in the device tree.

Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260331102303.33181-9-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
3 weeks agodmaengine: tegra: Use managed DMA controller registration
Akhil R [Tue, 31 Mar 2026 10:23:00 +0000 (15:53 +0530)] 
dmaengine: tegra: Use managed DMA controller registration

Switch to managed registration in probe. This simplifies the error
paths in the probe and also removes the requirement of the driver
remove function.

Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Suggested-by: Frank Li <frank.li@nxp.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260331102303.33181-8-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
3 weeks agodmaengine: tegra: Support address width > 39 bits
Akhil R [Tue, 31 Mar 2026 10:22:59 +0000 (15:52 +0530)] 
dmaengine: tegra: Support address width > 39 bits

Tegra264 supports address width of 41 bits. Unlike older SoCs which use
a common high_addr register for upper address bits, Tegra264 has separate
src_high and dst_high registers to accommodate this wider address space.

Add an addr_bits property to the device data structure to specify the
number of address bits supported on each device and use that to program
the appropriate registers.

Update the sg_req struct to remove the high_addr field and use
dma_addr_t for src and dst to store the complete addresses. Extract
the high address bits only when programming the registers.

Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260331102303.33181-7-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
3 weeks agodmaengine: tegra: Use struct for register offsets
Akhil R [Tue, 31 Mar 2026 10:22:58 +0000 (15:52 +0530)] 
dmaengine: tegra: Use struct for register offsets

Repurpose the struct tegra_dma_channel_regs to define offsets for all the
channel registers. Previously, the struct only held the register values
for each transfer and was wrapped within tegra_dma_sg_req. Move the
values directly into tegra_dma_sg_req and use channel_regs for
storing the register offsets. Update all register reads/writes to use
the struct channel_regs. This prepares for the register offset change
in Tegra264.

Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260331102303.33181-6-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
3 weeks agodmaengine: tegra: Make reset control optional
Akhil R [Tue, 31 Mar 2026 10:22:57 +0000 (15:52 +0530)] 
dmaengine: tegra: Make reset control optional

On Tegra264, reset is not available for the driver to control as
this is handled by the boot firmware. Hence make the reset control
optional and update the error message to reflect the correct error.

Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260331102303.33181-5-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
3 weeks agodt-bindings: dma: nvidia,tegra186-gpc-dma: Add iommu-map property
Akhil R [Tue, 31 Mar 2026 10:22:56 +0000 (15:52 +0530)] 
dt-bindings: dma: nvidia,tegra186-gpc-dma: Add iommu-map property

Add iommu-map property to specify separate stream IDs for each DMA
channel. This enables each channel to be in its own IOMMU domain,
keeping memory isolated from other devices sharing the same DMA
controller.

Define the constraints such that if the channel and stream IDs are
contiguous, a single entry can map all the channels, but if the
channels or stream IDs are non-contiguous support multiple entries.

Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260331102303.33181-4-akhilrajeev@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>