Jason Gunthorpe [Tue, 12 May 2026 16:46:17 +0000 (13:46 -0300)]
iommupt: Fix the end_index calculation in __map_range_leaf()
Sashiko noticed a mismatch of units in this math: num_leaves is
actually the number of leaf *entries* (so a 16-item contiguous leaf
is one num_leaves), while index is in items. The mismatch in maths
causes __map_range_leaf() to exit early instead of efficiently
filling a larger range of contiguous PTEs.
The early exit is caught by the functions above and then
__map_range_leaf() is re-invoked, so there is no functional issue.
Correct the misuse of units by adjusting num_leaves with the leaf
size and avoid the performance cost of looping externally.
There are also some mismatched types for num_leaves; simplify
things to remove the duplicated calculations.
Jason Gunthorpe [Tue, 12 May 2026 16:46:16 +0000 (13:46 -0300)]
iommupt: Check for missing PAGE_SIZE in the pgsize_bitmap
Sashiko pointed out that the driver could drop PAGE_SIZE from the
pgsize_bitmap. That is technically allowed but nothing does it, and
such an iommu_domain would not be used with the DMA API today.
Still, it is against the design and it is trivial to fix up. Lift
the PT_WARN_ON to the if branch and just skip the fast path.
Jason Gunthorpe [Tue, 12 May 2026 16:46:15 +0000 (13:46 -0300)]
iommu: Handle unmap error when iommu_debug is enabled
Sashiko noticed a latent bug where the map error flow called iommu_unmap()
which calls iommu_debug_unmap_begin()/iommu_debug_unmap_end() however
since this is an error path the map flow never actually established the
original iommu_debug_map() it will malfunction.
Lift the unmap error handling into iommu_map_nosync() and reorder it so
the trace_map()/iommu_debug_map() records the partial mapping and then
immediately unmaps it. This avoid creating the unbalanced tracking and
provides saner tracing instead of a unmap unmatched to any map.
Jason Gunthorpe [Tue, 12 May 2026 16:46:14 +0000 (13:46 -0300)]
iommu: Fix up map/unmap debugging for iommupt domains
Sashiko noticed a few issues in this path, and a few more were
found on review. Tidy them up further. These are intertwined
because the debug code depends on some of the WARN_ONs to function
right:
Lift into iommu_map_nosync():
- The might_sleep_if()
- 0 pgsize_bitmap WARN_ON
- Promote the illegal domain->type to a WARN_ON
- WARN_ON for illegal gfp flags
Then remove the return 0 since it is now safe to call
iommu_debug_map().
Lift into __iommu_unmap():
- 0 pgsize_bitmap WARN_ON
- Promote the illegal domain->type to a WARN_ON
- iommu_debug_unmap_begin()
This now pairs with the unconditional iommu_debug_map() on the
mapping side. Thus iommu debugging now works for iommupt along
with some of the other debugging features.
Jason Gunthorpe [Tue, 12 May 2026 16:46:13 +0000 (13:46 -0300)]
iommu: Fix loss of errno on map failure for classic ops
A typo, likely from a rebase, inverted the condition and caused
errors to be lost. Fix it to be "if (ret)".
This was breaking iommu_create_device_direct_mappings() on drivers
that don't use iommupt and don't fully set up their domain in
alloc_pages() (i.e., SMMUv2). In this case the first call of
iommu_create_device_direct_mappings() should fail due to the
incompletely initialized domain. Since it wrongly returns success,
the second call to iommu_create_device_direct_mappings() doesn't
happen and IOMMU_RESV_DIRECT is never set up.
Guixin Liu [Fri, 24 Apr 2026 01:39:23 +0000 (09:39 +0800)]
scsi: target: tcm_loop: Fix NULL ptr dereference
The TCM_LOOP LUN creation process calls device_register() to create the
device, which in turn invokes tcm_loop_driver_probe() registered with
the TCM_LOOP bus to create and register the scsi_host. However, if the
scsi_host memory allocation fails or scsi_add_host() fails, the
device_register() process still returns success. Subsequently, when the
user binds the LUN to a specific backend device, it accesses the NULL or
freed scsi_host.
scsi: ufs: ufs-qcom: Enable Auto Hibern8 clock request support
On platforms that support Auto Hibern8 (AH8), the UFS controller can
autonomously de-assert clk_req signals to the Global Clock Controller
when entering the Hibern8 state. This allows Global Clock Controller
(GCC) to gate unused clocks, improving power efficiency.
Enable the Clock Request feature by setting the UFS_HW_CLK_CTRL_EN bit
in the UFS_AH8_CFG register, as recommended in the Hardware Programming
Guidelines.
scsi: ufs: core: Configure only active lanes during link
The number of connected lanes detected during UFS link startup can be
fewer than the lanes specified in the device tree. The current driver
logic attempts to configure all lanes defined in the device tree,
regardless of their actual availability. This mismatch may cause
failures during power mode changes.
Hence, Add a check during link startup to ensure that only the lanes
actually discovered are considered valid. If a mismatch is detected,
fail the initialization early, preventing the driver from entering an
unsupported configuration that could cause power mode transition
failures.
Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Palash Kambar <palash.kambar@oss.qualcomm.com> Link: https://patch.msgid.link/20260423102023.3779489-2-palash.kambar@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Core Changes:
- Bugfixes and cleanups to pagemap, dp/mst.
- Add lockdep annotations to gpu buddy manager.
- Updates to drm/dp for PR + VRR.
- Improve documentation's table of contents.
- Bump fpfn and lpfn in ttm to 64-bits.
Driver Changes:
- Assorted bugfixes, cleanups and updates to panthor, nouveau, qaic,
hisilicon.
- Add support for CMN N116BCN-EA1, CMN N140HCA-EEK, IVO M140NWFQ R5, IVO
R140NWFW R0, BOE NT140*, BOE NV133FHM-N4F, AUO B140*, AUO B133HAN06.6 and AUO B116XTN02.3 eDP panels.
- More implementation of AIE4 in amdxdna.
- Update panels to use refcounts instead of devm_kzalloc to make
drm_panel_init static.
- Add support for the RCade Display Adapter to gud.
scsi: isci: Fix use-after-free in device removal path
The ISCI completion tasklet is initialized in isci_host_alloc()
(drivers/scsi/isci/init.c:496) and scheduled from both MSI-X and legacy
interrupt handlers (drivers/scsi/isci/host.c:223,613).
isci_host_deinit() stops the controller and waits for stop completion,
but it never kills completion_tasklet before teardown continues. A
top-of-function tasklet_kill() is not sufficient here: interrupts are
only disabled when isci_host_stop_complete() runs, so until
wait_for_stop() returns the IRQ handlers can still requeue the
tasklet. The tasklet callback also re-enables interrupts after draining
completions, so killing the tasklet before the source is quiesced leaves
the same race open.
Once wait_for_stop() returns, no further IRQ-driven scheduling can
occur. Kill completion_tasklet there so teardown cannot race a queued
tasklet running on a dead ihost. On remove or unload, the stale callback
can otherwise dereference ihost and touch ihost->smu_registers after the
host lifetime ends.
A UML + KASAN analogue reproduced the failure class both with no
tasklet_kill() and with tasklet_kill() placed before source quiesce, and
stayed clean once the kill happened after quiescing the scheduling
source.
This mirrors commit f6ab594672d4 ("scsi: aic94xx: fix use-after-free in
device removal path"), but ISCI needs the kill after wait_for_stop().
Fixes: 6f231dda6808 ("isci: Intel(R) C600 Series Chipset Storage Control Unit Driver") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Assisted-by: Codex:gpt-5-4 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260419210420.2134639-1-michael.bommarito@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: pm8001: Reject non-fatal dump when controller is crashed
pm80xx_get_non_fatal_dump() can be called even after the controller has
entered a fatal error state. In that case the forensic memory contents
are not safe to access for a non-fatal dump request, and attempting to
do so can trigger a call trace.
Check controller_fatal_error before reading the non-fatal dump buffer
and return -EINVAL when the controller is already in a crashed state.
This prevents non-fatal dump collection from running in an invalid
controller state.
scsi: pm8001: Reject firmware update in fatal error state
pm8001_store_update_fw() allows a firmware update request even when the
controller has already entered a fatal error state.
Firmware update is not valid once the controller is in that state, and
attempting it can lead to a call trace. Reject the request early by
checking controller_fatal_error, set the firmware status to
FAIL_PARAMETERS, and return -EINVAL.
During SAS phy up, link->status is set to DL_STATE_AVAILABLE in
device_links_driver_bound, then this setting influences
__device_links_no_driver() before driver rmmod and caused WARNING.
Add the slave_destroy interface to make sure link is removed after flush
workque.
scsi: mvsas: Don't emit __LINE__ in debug messages
__LINE__ changes quite easily for cleanup commits. So when checking if a
cleanup patch introduces changes to the resulting binary each usage of
__LINE__ is source of annoyance.
So instead of __FILE__ and __LINE__ emit __func__ to give at least some
more indication about where the messages originates from than __FILE__
alone; with that and the actual message the situation should be clear
enough.
While at it reduce duplication by implementing mv_dprintk() using
mv_printk().
Mike Christie [Mon, 11 May 2026 17:53:17 +0000 (12:53 -0500)]
scsi: sd: Fix return code handling in sd_spinup_disk()
As found by smatch-ci, scsi_execute_cmd() can return negative or positve
values so we should use a int instead of unsigned int.
Fixes: b4d0c33a32c3 ("scsi: sd: Fix sshdr use in sd_spinup_disk") Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/linux-scsi/agFbI7E6JQwd3wGW@stanley.mountain/T/#u Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260511175317.114007-1-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wang Yan [Mon, 11 May 2026 09:30:30 +0000 (17:30 +0800)]
scsi: libiscsi: Fix spelling and format errors
Fix two issues in libiscsi.c:
- Correct typo "numer" to "number" in iscsi_session_setup() comment
- Fix format string "seconds\n." to "seconds.\n" in recv timeout
warning
Signed-off-by: Wang Yan <wangyan01@kylinos.cn> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Chris Leech <cleech@redhat.com> Link: https://patch.msgid.link/20260511093030.63542-1-wangyan01@kylinos.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: scsi_transport_srp: Move long delayed work to system_dfl_long_wq
Currently the code enqueue work items using {queue|mod}_delayed_work(),
using system_long_wq. This workqueue should be used when long works are
expected and it is a per-cpu workqueue.
The function(s) end up calling __queue_delayed_work(), which set a
global timer that could fire anywhere, enqueuing the work where the
timer fired.
Unbound works could benefit from scheduler task placement, to optimize
performance and power consumption. Long work shouldn't stick to a single
CPU.
Recently, a new unbound workqueue specific for long running work has
been added:
c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")
Since the workqueue work doesn't rely on per-cpu variables, there is no
obvious reason that justify the use of a per-cpu workqueue. So change
system_long_wq with system_dfl_long_wq so that the work may benefit from
scheduler task placement.
Simon Trimmer [Thu, 14 May 2026 15:18:54 +0000 (16:18 +0100)]
ASoC: cs35l56: Log SoundWire status updates only on changes
The SoundWire slave update_status() callback can be invoked when the
status has not changed. To prevent large amounts of log noise with debug
enabled, log them only when the status changes. This also helps with
understanding them, because they now log an actual change in state.
Ingyu Jang [Thu, 14 May 2026 18:52:15 +0000 (03:52 +0900)]
ASoC: ti: omap-dmic: Fix IS_ERR() vs NULL check bug in omap_dmic_select_fclk()
clk_get_parent() returns NULL when the clock has no parent (or when the
input clk is NULL); it never returns an ERR_PTR. The current IS_ERR(mux)
check therefore never triggers - a NULL return falls through silently
to clk_set_parent(NULL, parent_clk), which simply fails with -EINVAL.
Use a NULL check so the dedicated error path runs and the prior
clk_get() reference is released via clk_put().
ifb_dev_init() allocates dp->tx_private to dev->num_tx_queues
entries via kzalloc_objs(*txp, dev->num_tx_queues). Both IFB
per-queue RX and TX stats live in those entries: ifb_xmit() updates
txp->rx_stats using the skb queue mapping, ifb_ri_tasklet() updates
txp->tx_stats, and ifb_stats64() aggregates both over
dev->num_tx_queues.
The ethtool stats callbacks instead size and walk the per-queue
stats with dev->real_num_rx_queues and dev->real_num_tx_queues. With
an asymmetric device where the RX queue count exceeds the TX queue
count, for example:
ip link add name ifb10 numtxqueues 1 numrxqueues 8 type ifb
ethtool -S ifb10
ifb_get_ethtool_stats() indexes past the tx_private allocation and
copies adjacent slab data through ETHTOOL_GSTATS.
Use dev->num_tx_queues consistently for the stats strings, the
stats count, and the stats data walks. This reports one RX stats
group and one TX stats group for each backing ifb_q_private entry,
which is the queue set IFB can actually populate.
Reproduced under UML+KASAN at v7.1-rc2:
BUG: KASAN: slab-out-of-bounds in ifb_fill_stats_data+0x3c/0xae
Read of size 8 at addr 0000000062dbd228 by task ethtool/36
ifb_fill_stats_data+0x3c/0xae
ifb_get_ethtool_stats+0xc0/0x129
__dev_ethtool+0x1ca5/0x363c
dev_ethtool+0x123/0x1b3
dev_ioctl+0x56c/0x744
sock_do_ioctl+0x15f/0x1b2
sock_ioctl+0x4d5/0x50a
sys_ioctl+0xd8b/0xde9
With the patch applied, the same UML+KASAN repro is silent and
ethtool -S ifb10 reports only the stats backed by the single
allocated tx_private entry.
scsi: ufs: ufshcd-pci: Use PCI_VDEVICE and named initializers for pci array
The pci_device_id array uses a mixture of ways to initialize
ufshcd_pci_tbl[]. List initializers are hard to read unless you memoized
the order of the struct members. Use the PCI_VDEVICE for all entries and
a named initializer for .driver_data.
This allows to idiomatically assign the members without using zeros to
fill the fields before .driver_data (either explicitly or hidding in
PCI_VDEVICE()).
There are no changes to the compiled result of the array; verified with
builds for x86 and arm64.
scsi: ufs: tc-dwc-g210-pci: Simplify initialization of pci_device_id array
A list initializer is hard to parse for a human if they don't see or
know the order of the members of struct pci_device_id. So use the
PCI_VDEVICE macro which is much more idiomatic and skip assigning
explicit zeros.
There are no changes to the compiled result of the array; verified with
builds for x86 and arm64.
Or Har-Toov [Wed, 13 May 2026 06:36:40 +0000 (09:36 +0300)]
net/mlx5: Skip disabled vports when setting max TX speed
When setting vports max TX speed during LAG activation or bond state
changes, the code iterates over all eswitch vports. However, some
vports may not be enabled yet.
Skip vports that are not enabled to avoid sending FW commands for
uninitialized vports. Save the LAG aggregated speed in the vport
struct so it can be applied when the vport is enabled later.
Fixes: 50f1d188c580 ("net/mlx5: Propagate LAG effective max_tx_speed to vports") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260513063640.334132-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jeroen Massar [Wed, 13 May 2026 06:33:02 +0000 (09:33 +0300)]
net/mlx5: Do not restore destination-less TC rules
After IPsec policy/state TX rules are added, any TC flow rule, which
forwards packets to uplink, is modified to forward to IPsec TX tables.
As these tables are destroyed dynamically, whenever there is no
reference to them, the destinations of this kind of rules must be
restored to uplink, unless there is no destination for that rule.
The flow rules FLOW_ACTION_ACCEPT, DROP, TRAP, GOTO and SAMPLE do not
have a destination port, and thus out_count = 0.
At cleanup time of the rules in mlx5_esw_ipsec_modify_flow_dests
we call mlx5_eswitch_restore_ipsec_rule but as the above types
do not have a destination we get an underflow of out_count, as
the port is passed, which is esw_attr->out_count - 1.
This change avoids calling mlx5_eswitch_restore_ipsec_rule when
there are no output destinations and thus avoids the underflow.
Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules") Signed-off-by: Jeroen Massar <jmassar@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260513063302.333761-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Wed, 13 May 2026 06:27:37 +0000 (09:27 +0300)]
net/mlx5e: Don't leak RSS context in case of error
If mlx5e_rx_res_rss_set_rxfh() fails during mlx5e_create_rxfh_context(),
the RSS context is not cleaned up.
This leaves a stale entry in 'res->rss[rss_idx]' that occupies a context
slot.
Destroy the RSS context before returning the error.
Fixes: 6c2509d44636 ("net/mlx5e: Add error flow for ethtool -X command") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260513062737.333259-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andy Shevchenko [Wed, 13 May 2026 16:26:12 +0000 (18:26 +0200)]
ASoC: cs35l56: Drop malformed default N from Kconfig
First of all, it has to be 'default n' (small letter n), otherwise
it looks for CONFIG_N which is absent and in case of appearance
will enable something unrelated. Second and most important is that
'n' *is* the default 'default' already. Hence just drop malformed
line.
Chuck Lever [Wed, 13 May 2026 12:58:25 +0000 (08:58 -0400)]
tls: Preserve sk_err across recvmsg() when data has been copied
The sk_err check in tls_rx_rec_wait() consumes the error via
sock_error(), which clears sk_err atomically. When the caller
(tls_sw_recvmsg, tls_sw_splice_read, or tls_sw_read_sock) already
has bytes copied to userspace, it returns those bytes and discards
the error from this call. sk_err is now zero on the socket, so the
next read syscall observes only RCV_SHUTDOWN and reports a clean
EOF instead of the actual error (typically -ECONNRESET).
The race is reachable when tls_read_flush_backlog()'s periodic
sk_flush_backlog() triggers tcp_reset() in the middle of a
multi-record read.
Pass a has_copied flag to tls_rx_rec_wait(). When has_copied is
false, consume sk_err via sock_error() as before. When has_copied
is true, report the error from READ_ONCE() but leave sk_err set:
the caller returns the byte count and discards the err from this
call, and the next read syscall surfaces the preserved sk_err. This
mirrors the tcp_recvmsg() preserve-and-surface pattern.
The decrypt-abort path is unaffected: tls_err_abort() raises
sk_err to EBADMSG after tls_rx_rec_wait() returns, and nothing
on the caller's return path consumes it, so the EBADMSG surfaces
on the next read.
tls_sw_splice_read() passes has_copied=false: it processes
one record per call, so no bytes have been copied within the
function when tls_rx_rec_wait() runs. A reset that arrives
between iterations of splice_direct_to_actor() (the sendfile()
path) is still consumed by sock_error() in the later call, and the
outer loop returns the prior iterations' byte count and drops the
error. tcp_splice_read() exhibits the same pattern at the iteration
boundary; addressing it belongs at the splice_direct_to_actor()
layer and is out of scope here.
Fixes: c46b01839f7a ("tls: rx: periodically flush socket backlog") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260513125825.205189-1-cel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dawei Feng [Wed, 13 May 2026 15:13:20 +0000 (23:13 +0800)]
octeontx2-pf: fix double free in rvu_rep_rsrc_init()
rvu_rep_rsrc_init() allocates queue memory before calling
otx2_init_hw_resources(). When hardware resource setup fails,
otx2_init_hw_resources() already unwinds the partially initialized
SQ, CQ, and aura state before returning an error. The representor
error path then calls otx2_free_hw_resources() again and can free
the same resources a second time.
Fix this by splitting the cleanup labels so that a failure from
otx2_init_hw_resources() only releases queue memory. Keep the
otx2_free_hw_resources() call for failures that happen after
hardware resource initialization completed successfully.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc3.
Runtime validation was not performed because reproducing this path
requires OcteonTX2 representor hardware.
This series continues the rework of the KSZ driver initiated by a previous
series (see [1]), following the discussion we had here [2].
The KSZ driver got way too convoluted over time because it uses a common
framework to handle more than 20 switches split in 5 families (see below
table)
The previous series ([1]) replaced the unique dsa_swicth_ops struct used
by all the KSZ families with one dsa_switch_ops struct for each family.
These dsa_switch_ops structs still rely on common functions that redirect
the calls to ksz_dev_ops operations which are custom to each switch
family. Many of hese ksz_dev_ops callbacks have a direct equivalent in the
struct dsa_switch_ops. This series directly connects the implementations of
these ksz_dev_ops operations to the relevant dsa_switch_ops attribute
to get rid of one unnecessary level of indirection.
====================
Vladimir Oltean [Tue, 12 May 2026 13:06:29 +0000 (15:06 +0200)]
net: dsa: microchip: bypass dev_ops for phylink_get_caps()
ksz_phylink_get_caps() is a bit different from other generic methods.
It has a dev_ops->get_caps() call in the middle of the function, and it
does other stuff before (set some supported_interfaces) and after (set
lpi_interfaces from supported_interfaces).
Whereas the dev_ops->get_caps() methods set mac_capabilities and
(optionally) logically OR the supported_interfaces with that of the PCS.
The idea is that this can be expressed simpler, and avoid a indirect
function call to dev_ops->get_caps(). If we tail-call the common
ksz_phylink_get_caps() from individual phylink_get_caps() methods, we do
reorder the settings, but in an inconsequential way (the transfer from
supported_interfaces to lpi_interfaces still sees a complete list of the
supported_interfaces).
Remove the no longer used get_caps() callbacl the ksz_dev_ops.
Vladimir Oltean [Tue, 12 May 2026 13:06:28 +0000 (15:06 +0200)]
net: dsa: microchip: bypass dev_ops for mirror operations
Mirror operations are handled through a common function that redirects
the treatment to ksz_dev_ops callbacks. This layer of indirection isn't
needed since we now have a dsa_switch_ops for each switch family.
Remove this indirection layer for KSZ switches, by connecting the
ksz_dev_ops :: mirror_add() and mirror_del() operations directly to
dsa_switch_ops.
Remove the now unused mirror callbacks from ksz_dev_ops.
Vladimir Oltean [Tue, 12 May 2026 13:06:27 +0000 (15:06 +0200)]
net: dsa: microchip: bypass dev_ops for FDB and MDB operations
FDB and MDB operations are handled through a common function that
redirects the treatment to ksz_dev_ops callbacks. This layer of
indirection isn't needed since we now have a dsa_switch_ops for each kind
of switch.
Remove one indirection layer for KSZ switches, by connecting the
ksz_dev_ops :: fdb_dump(), fdb_add(), fdb_del(), mdb_add() and mdb_del()
operations directly to dsa_switch_ops.
Remove the FDB and MDB operations from ksz_dev_ops.
Vladimir Oltean [Tue, 12 May 2026 13:06:26 +0000 (15:06 +0200)]
net: dsa: microchip: bypass dev_ops for VLAN operations
VLAN operations are handled through a common function that redirects the
treatment to ksz_dev_ops callbacks. This level of indirection isn't
needed since we now have a dsa_switch_ops for each kind of switch.
Remove this useless layer of indirection by connecting directly the VLAN
operations to the relevant dsa_switch_ops.
Adapt their prototypes to match dsa_switch_ops expectations.
Remove the now unused VLAN callbacks from ksz_dev_ops.
Vladimir Oltean [Tue, 12 May 2026 13:06:25 +0000 (15:06 +0200)]
net: dsa: microchip: bypass dev_ops for change_mtu() operation
MTU changing is done through a common function that redirects the
treatment to a specific ksz_dev_ops callback. This layer of indirection
isn't needed since we now have a dsa_switch_ops struct for each switch
family.
Remove this indirection layer in MTU changing for KSZ switches, by
directly connecting the ksz_dev_ops :: change_mtu() implementations to
dsa_switch_ops.
Remove the no longer used change_mtu() callback from ksz_dev_ops
Vladimir Oltean [Tue, 12 May 2026 13:06:24 +0000 (15:06 +0200)]
net: dsa: microchip: bypass dev_ops for FDB ageing operations
dsa_switch_ops :: set_ageing_time() goes through ksz_set_ageing_time(),
further dispatched through ksz_dev_ops :: set_ageing_time(). Only
ksz9477 and lan937x provide an implementation for this, so remove the
(optional) method from ksz8463_switch_ops, ksz87xx_switch_ops,
ksz88xx_switch_ops. Also, hook up ksz9477 and lan937x dsa_switch_ops
directly to their respective implementations.
Every switch family provides a dsa_switch_ops :: port_fast_age()
implementation, which is dispatched through ksz_dev_ops ::
flush_dyn_mac_table(). Remove the dev_ops indirection and connect the
flush_dyn_mac_table() methods directly to their respective dsa_switch_ops.
Jens Axboe [Fri, 15 May 2026 01:14:33 +0000 (19:14 -0600)]
Merge tag 'nvme-7.1-2026-05-14' of git://git.infradead.org/nvme into block-7.1
Pull NVMe fixes from Keith:
"- Fix memory leak on a passthrough integrity mapping failure (Keith)
- Hide secrets behind debug option (Hannes)
- Fix pci use-after-free for host memory buffer (Chia-Lin Kao)
- Fix tcp taregt use-after-free for data digest (Sagi)
- Revert a mistaken quirk (Alan Cui)
- Fix uevent and controller state race condition (Maurizio)
- Fix apple submission queue re-initialization (Nick Chan)"
* tag 'nvme-7.1-2026-05-14' of git://git.infradead.org/nvme:
nvme-apple: Reset q->sq_tail during queue init
nvme: fix race condition between connected uevent and STARTED_ONCE flag
Revert "nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808"
nvmet-tcp: Fix potential UAF when ddgst mismatch
nvme-pci: fix use-after-free in nvme_free_host_mem()
nvmet-auth: Do not print DH-HMAC-CHAP secrets
nvme: fix bio leak on mapping failure
nvme: make prp passthrough usage less scary
Jann Horn [Tue, 12 May 2026 14:02:03 +0000 (16:02 +0200)]
net: block MSG_NO_SHARED_FRAGS in sendmsg()
This change should cause no difference in behavior; it just cleans up some
hazardous code that could have become a problem in the future.
MSG_NO_SHARED_FRAGS is a kernel-internal flag that cancels the effect of
MSG_SPLICE_PAGES, another kernel-internal flag that influences the
data-sharing semantics of SKBs.
Prevent passing this flag in from userspace via sendmsg() by adding it to
MSG_INTERNAL_SENDMSG_FLAGS.
This is not currently an observable problem because MSG_NO_SHARED_FRAGS
only has an effect if kernel code adds MSG_SPLICE_PAGES to it.
The only codepath that adds MSG_SPLICE_PAGES to user-supplied flags from
which MSG_NO_SHARED_FRAGS hasn't been cleared is the path
tcp_bpf_sendmsg -> tcp_bpf_send_verdict -> tcp_bpf_push, and that is not a
problem because tcp_bpf_sendmsg always intentionally sets
MSG_NO_SHARED_FRAGS anyway.
William Bowling [Wed, 13 May 2026 04:16:35 +0000 (04:16 +0000)]
net: skbuff: preserve shared-frag marker during coalescing
skb_try_coalesce() can attach paged frags from @from to @to. If @from
has SKBFL_SHARED_FRAG set, the resulting @to skb can contain the same
externally-owned or page-cache-backed frags, but the shared-frag marker
is currently lost.
That breaks the invariant relied on by later in-place writers. In
particular, ESP input checks skb_has_shared_frag() before deciding
whether an uncloned nonlinear skb can skip skb_cow_data(). If TCP
receive coalescing has moved shared frags into an unmarked skb, ESP can
see skb_has_shared_frag() as false and decrypt in place over page-cache
backed frags.
Propagate SKBFL_SHARED_FRAG when skb_try_coalesce() transfers paged
frags. The tailroom copy path does not need the marker because it copies
bytes into @to's linear data rather than transferring frag descriptors.
Fixes: cef401de7be8 ("net: fix possible wrong checksum generation") Fixes: f4c50a4034e6 ("xfrm: esp: avoid in-place decrypt on shared skb frags") Signed-off-by: William Bowling <vakzz@zellic.io> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260513041635.1289541-1-vakzz@zellic.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matt Fleming [Wed, 13 May 2026 11:22:26 +0000 (12:22 +0100)]
net/mlx5e: Fix use-after-free in mlx5e_tx_reporter_timeout_recover
mlx5e_tx_reporter_timeout_recover() accesses sq->netdev after
mlx5e_safe_reopen_channels() has torn down and freed the channel (and
its embedded SQs). Replace the three sq->netdev references with
priv->netdev which is safe because priv outlives channel teardown.
The netdev_err() call already used priv->netdev for this reason; make
the trylock/unlock and health_channel_eq_recover calls consistent.
This fixes the following KASAN splat:
BUG: KASAN: use-after-free in mlx5e_tx_reporter_timeout_recover+0x1dd/0x360 [mlx5_core]
Read of size 8 at addr ffff889860ed0b28 by task kworker/u113:2/5277
Minxi Hou [Tue, 12 May 2026 07:08:41 +0000 (15:08 +0800)]
selftests: openvswitch: add pop_vlan test
Add test_pop_vlan() to verify OVS kernel datapath pop_vlan action
correctly strips 802.1Q VLAN tags from frames.
Test structure:
- Baseline: untagged forwarding validates basic connectivity.
- Negative: forward without pop_vlan, tagged frame is invisible
to ns2 (no VLAN sub-interface), ping fails.
- Positive: pop_vlan strips tag on forward path, push_vlan
restores tag on return path, ping succeeds.
Use static ARP entries to avoid VLAN-tagged ARP complexity.
Rely on ping success/failure for verification -- no tcpdump or
pcap files needed.
Minxi Hou [Tue, 12 May 2026 07:08:40 +0000 (15:08 +0800)]
selftests: openvswitch: add vlan() and encap() flow string parsing
Add VLAN TCI formatting and parsing support to ovs-dpctl.py:
- Add _vlan_dpstr() to decompose TCI into vid/pcp/cfi fields,
with raw tci=0x%04x fallback when cfi=0 for round-trip safety.
- Add _parse_vlan_from_flowstr() boundary check for missing ')'.
- Add encap_ovskey subclass restricting nla_map to L2-L4 attributes
(slots 0-21) that appear inside 802.1Q ENCAP, with metadata
attributes set to "none".
- Check encap parse() return value for unrecognized trailing content.
- Support callable format functions in dpstr() output.
- Change OVS_KEY_ATTR_VLAN type from uint16 to be16 to match the
kernel __be16 wire format; uint16 decodes in host byte order,
which gives wrong values on little-endian architectures.
- Change OVS_KEY_ATTR_ENCAP type from none to encap_ovskey to
enable recursive parsing of 802.1Q encapsulated flow keys.
- Add push_vlan action class with fields matching kernel struct
ovs_action_push_vlan (vlan_tpid, vlan_tci as network-order u16).
- Add push_vlan dpstr format and parse with range validation
(vid 0-4095, pcp 0-7, tpid 0-0xFFFF) and CFI forced to 1.
Maoyi Xie [Tue, 12 May 2026 14:28:07 +0000 (22:28 +0800)]
rds_tcp: close NULL deref window in rds_tcp_set_callbacks
rds_tcp_set_callbacks() links a new rds_tcp_connection onto
rds_tcp_tc_list under rds_tcp_tc_list_lock. It releases the
lock, then assigns tc->t_sock = sock outside the lock.
rds_tcp_tc_info() and rds6_tcp_tc_info() walk rds_tcp_tc_list
under the same lock. Both dereference tc->t_sock->sk without
a NULL check.
A reader can acquire rds_tcp_tc_list_lock between the writer's
spin_unlock and the t_sock store. It then sees a list entry
whose t_sock is NULL. The dereference of tc->t_sock->sk is a
NULL access.
Move tc->t_sock = sock inside rds_tcp_tc_list_lock, before
list_add_tail. A reader holding the lock then observes the
linkage and the t_sock store together.
The restore path is safe. rds_tcp_restore_callbacks() does
list_del_init inside the lock. The matching tc->t_sock = NULL
after unlink is harmless to readers holding the lock.
Fixes: 70041088e3b9 ("RDS: Add TCP transport to RDS") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260512142807.1855619-1-maoyi.xie@ntu.edu.sg Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ian Rogers [Thu, 30 Apr 2026 16:17:25 +0000 (09:17 -0700)]
perf metricgroup: Avoid scanning unnecessary PMUs for identifier match
Only uncore PMUs can have an identifier, so add an optimized
perf_pmus__scan routine for that case to avoid all PMU types being
created.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 14 Apr 2026 17:58:55 +0000 (10:58 -0700)]
perf pmu-events AMD: Switch l2_itlb_misses to bp_l1_tlb_miss_l2_tlb_miss.all
l2_itlb_misses is a valid legacy cache event name, hence allowing it
in all_events in metric.py. l2_itlb_misses was also a json event for
AMD zen1, zen2 and zen3.
For zen4, zen5 and zen6 the checking that metric events are within the
json was skipping l2_itlb_misses as it is a valid legacy event, however,
the PMU driver lacks the event mapping causing it to be a bad event when
used in the metric.
Add bp_l1_tlb_miss_l2_tlb_miss.all as the l2 itlb miss event (bp =
branch predictor, the AMD way to say itlb), so that is used in
preference to l2_itlb_misses when the event exists.
Remove l2_itlb_misses from metric.py as the legacy event isn't used by
any metrics and having it is error prone for newer AMD zen models.
Fixes: e596f329668ec2b5 ("perf jevents: Add itlb metric group for AMD") Reviewed-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 12 May 2026 05:41:40 +0000 (22:41 -0700)]
perf record: Refactor ARM64 leaf caller setup out of arch
Code in tools/perf/arch causes portability issues/opaqueness and LTO
issues due to the use of weak symbols. Move the adding of LR to the
sample_user_regs into arm64-frame-pointer-unwind-support.c conditional
on EM_HOST == EM_AARCH64 (false on all non-ARM64 builds).
This also better encapsulates the use of the sampled registers by
get_leaf_frame_caller_aarch64 and the set up by the new
add_leaf_frame_caller_opts_aarch64, exposing opportunities for possibly
sampling PC and SP to help the unwinder.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shimin Guo <shimin.guo@skydio.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
====================
Support BPF traversal of wakeup sources
This patchset adds requisite kfuncs for BPF programs to safely traverse
wakeup_sources, and puts a config flag around the sysfs interface.
Currently, a traversal of wakeup sources require going through
/sys/class/wakeup/* or /d/wakeup_sources/*. The repeated syscalls to query
sysfs is inefficient, as there can be hundreds of wakeup_sources, with each
wakeup source also having multiple attributes. debugfs is unstable and
insecure.
Adding kfuncs to lock/unlock wakeup sources allows BPF program to safely
traverse the wakeup sources list, and a kfunc to get head of wakeup
sources list is needed to start traversing the list.
On a quiescent Pixel 6 traversing 150 wakeup_sources, I am seeing ~34x
speedup (sampled 75 times in table below). For a device under load, the
speedup is greater.
+-------+----+----------+----------+
| | n | AVG (ms) | STD (ms) |
+-------+----+----------+----------+
| sysfs | 75 | 44.9 | 12.6 |
+-------+----+----------+----------+
| BPF | 75 | 1.3 | 0.7 |
+-------+----+----------+----------+
The initial attempts for BPF traversal of wakeup_sources was with BPF
iterators [1]. However, BPF already allows for traversing of a simple list
with bpf_for(), and this current patchset has the added benefit of being
~2-3x more performant than BPF iterators.
Changes in v4:
- Removed `.owner = THIS_MODULE` for btf_kfunc_id_set per Greg
- Add a graceful exit in selftest if bpf_wakeup_sources_get_head() is not
present due to kernel configs without CONFIG_PM_SLEEP (e.g. s390)
- Relaxed substr match in wakeup_source_unlock_null() selftest link: https://lore.kernel.org/all/20260331153413.2469218-1-wusamuel@google.com/
Changes in v3:
- Changed return type of bpf_wakeup_sources_get_head() to `void *` per Alexei
- Added failure test for direct dereference of wakeup source head
- Use bpf_core_cast() instead of macros in BPF program per Kumar link: https://lore.kernel.org/all/20260326112521.2827500-1-wusamuel@google.com/
Changes in v2:
- Dropped CONFIG_PM_WAKEUP_STATS_SYSFS patch for future patchset
- Added declarations for kfuncs to .h to fix sparse and checkpatch warnings
- Added kfunc to get address of wakeup_source's head
- Added example bpf prog selftest for traversal of wakeup sources per Kumar
- Added *_fail.c selftest per Kumar
- More concise commit message in patch 1/2 link: https://lore.kernel.org/all/20260320160055.4114055-1-wusamuel@google.com/
====================
Samuel Wu [Mon, 11 May 2026 17:45:57 +0000 (10:45 -0700)]
selftests/bpf: Add tests for wakeup_sources kfuncs
Introduce a set of BPF selftests to verify the safety and functionality
of wakeup_source kfuncs.
The suite includes:
1. A functional test (test_wakeup_source.c) that iterates over the
global wakeup_sources list. It uses CO-RE to read timing statistics
and validates them in user-space via the BPF ring buffer.
2. A negative test suite (wakeup_source_fail.c) ensuring the BPF
verifier correctly enforces reference tracking and type safety.
3. Enable CONFIG_PM_WAKELOCKS in the test config, allowing creation of
wakeup sources via /sys/power/wake_lock.
A shared header (wakeup_source.h) is introduced to ensure consistent
memory layout for the Ring Buffer data between BPF and user-space.
Samuel Wu [Mon, 11 May 2026 17:45:56 +0000 (10:45 -0700)]
PM: wakeup: Add kfuncs to traverse over wakeup_sources
Iterating through wakeup sources via sysfs or debugfs can be inefficient
or restricted. Introduce BPF kfuncs to allow high-performance and safe
in-kernel traversal of the wakeup_sources list. There is at least a 30x
speedup for walking 150 wakeup sources and all their attributes.
The new kfuncs include:
- bpf_wakeup_sources_get_head() to obtain the list head.
- bpf_wakeup_sources_read_lock/unlock() to manage the SRCU lock.
For verifier safety, the underlying SRCU index is wrapped in an opaque
'struct bpf_ws_lock' pointer. This enables the use of KF_ACQUIRE and
KF_RELEASE flags, allowing the BPF verifier to strictly enforce paired
lock/unlock cycles and prevent resource leaks.
Signed-off-by: Samuel Wu <wusamuel@google.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20260511174559.659782-2-wusamuel@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ralf Lici [Wed, 13 May 2026 13:26:10 +0000 (15:26 +0200)]
ovpn: disable BHs when updating device stats
ovpn updates dev->dstats from both process and softirq contexts. In
particular, TCP paths may run from socket callbacks, workqueues or
strparser work, while UDP receive and ovpn's ndo_start_xmit path may
update the same per-device dstats from BH context.
Add ovpn device drop-stat helpers that disable BHs around
dev_dstats_rx_dropped() and dev_dstats_tx_dropped(), and use them for
drop accounting.
The successful RX dev_dstats_rx_add() update is already covered by the
BH-disabled section around gro_cells_receive(). For the successful TCP
TX dev_dstats_tx_add() update, replace the existing preempt-disabled
section with a BH-disabled one.
Namhyung Kim [Sun, 10 May 2026 20:23:46 +0000 (13:23 -0700)]
perf trace: Update beautifier script for clone flags
According to the change in the sched.h, update the script to generate
the flags array like below. Note that '+1' is needed to detect bitmask
pattern at index 0.
Namhyung Kim [Sun, 10 May 2026 20:23:45 +0000 (13:23 -0700)]
perf trace: Add beautifier script for fsmount flags
And move the existing one to fsmount_attr.sh to be more precise.
Now the fsmount_flags[] is generated from the mount.h like below.
The ilog2() + 1 is an existing pattern to handle bit flags.
Namhyung Kim [Sun, 10 May 2026 20:23:44 +0000 (13:23 -0700)]
perf build: Add make check-headers target
Don't print header differences during the perf build as it's noisy.
Mostly people won't care and find it annoying.
As it's to improve perf trace beautifier to catch up new changes mostly
in UAPIs, we can make it a separate build target and call it
occasionally. Make it and build-test related targets phony.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Linus Torvalds [Thu, 14 May 2026 21:30:01 +0000 (14:30 -0700)]
Merge tag 'hid-for-linus-2026051401' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fixes for a few OOB/UAF in several HID drivers (Florian Pradines, Lee
Jones, Michael Zaidman, Rosalie Wanders, Sangyun Kim and Tomasz
Pakuła)
- more general sanitation of input data, dealing with potentially
malicious hardware in hid-core (Benjamin Tissoires)
- a few device-specific quirks and fixups
* tag 'hid-for-linus-2026051401' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (22 commits)
HID: logitech-hidpp: Add support for newer Bluetooth keyboards
HID: pidff: Fix integer overflow in pidff_rescale
HID: i2c-hid: add reset quirk for BLTP7853 touchpad
HID: core: introduce hid_safe_input_report()
HID: pass the buffer size to hid_report_raw_event
HID: google: hammer: stop hardware on devres action failure
HID: appletb-kbd: run inactivity autodim from workqueues
HID: appletb-kbd: fix UAF in inactivity-timer cleanup path
HID: playstation: Clamp num_touch_reports
HID: magicmouse: Prevent out-of-bounds (OOB) read during DOUBLE_REPORT_ID
HID: mcp2221: fix OOB write in mcp2221_raw_event()
HID: quirks: really enable the intended work around for appledisplay
HID: hid-sjoy: race between init and usage
HID: uclogic: Fix regression of input name assignment
HID: intel-thc-hid: Intel-quickspi: Fix some error codes
HID: hid-lenovo-go-s: restore OS_TYPE after resume from s2idle
HID: elan: Add support for ELAN SB974D touchpad
HID: sony: add missing size validation for Rock Band 3 Pro instruments
HID: sony: add missing size validation for SMK-Link remotes
HID: sony: remove unneeded WARN_ON() in sony_leds_init()
...
Tao Cui [Thu, 14 May 2026 06:50:33 +0000 (14:50 +0800)]
cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution
Add per-cgroup local event counters to track RDMA resource limit
exhaustion from the perspective of individual cgroups. The
rdma.events.local file reports two per-resource counters:
- max: number of times this cgroup's limit was the one that blocked
an allocation in the subtree
- alloc_fail: number of allocation attempts originating from this
cgroup that failed due to an ancestor's limit
This mirrors the design of pids.events.local, where events are
attributed to the cgroup that imposed the limit, not necessarily the
cgroup where the allocation was attempted.
Also extend rdma.events with a hierarchical alloc_fail counter that
tracks allocation failures propagating upward from the requesting
cgroup, complementing the existing max counter, so that rdma.events
and rdma.events.local share the same output format.
Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
Tao Cui [Thu, 14 May 2026 06:50:32 +0000 (14:50 +0800)]
cgroup/rdma: add rdma.events to track resource limit exhaustion
Add per-device hierarchical event counters to track when RDMA resource
limits are exceeded. The rdma.events file reports max event counts
propagated upward from the cgroup whose limit was hit to all ancestors.
This mirrors the design of pids.events, where events are attributed to
the cgroup that imposed the limit, not necessarily the cgroup where the
allocation was attempted. Userspace can monitor this file via
poll/epoll for real-time notification of resource exhaustion.
Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
Tao Cui [Thu, 14 May 2026 06:50:31 +0000 (14:50 +0800)]
cgroup/rdma: add rdma.peak for per-device peak usage tracking
rdma.peak tracks the high watermark of resource usage per device,
giving a better baseline on which to set rdma.max. Polling
rdma.current isn't feasible since it would miss short-lived spikes.
This interface is analogous to memory.peak.
Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Thu, 14 May 2026 21:06:31 +0000 (14:06 -0700)]
Merge tag 'acpi-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support fixes from Rafael Wysocki:
"These fix several platform drivers that use the ACPI companion of the
given platform device without checking its presence, which may lead to
a NULL pointer dereference or other kind of malfunction if the driver
is forced to match a device without an ACPI companion via driver
override, and restore debug log level for some messages in the ACPI
CPPC library:
- Check ACPI_COMPANION() against NULL during probe in several core
ACPI device drivers (Rafael Wysocki)
- Restore log level of messages in amd_set_max_freq_ratio() (Mario
Limonciello)"
* tag 'acpi-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PAD: xen: Check ACPI_COMPANION() against NULL
ACPI: driver: Check ACPI_COMPANION() against NULL during probe
Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"
Stephen Smalley [Wed, 13 May 2026 18:05:06 +0000 (14:05 -0400)]
lsm: hold cred_guard_mutex for lsm_set_self_attr()
Just as proc_pid_attr_write() already does before calling the LSM
hook. This only matters for SELinux and AppArmor which check
whether the process is being ptraced and if so, whether to
allow the transition.
Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Felix Gu [Wed, 6 May 2026 11:29:02 +0000 (19:29 +0800)]
soc: microchip: mpfs-sys-controller: fix resource leak on probe error
In mpfs_sys_controller_probe(), when device_get_match_data() returns
NULL, it returns -EINVAL directly without freeing the mbox channel
or the allocated sys_controller memory, causing a resource leak.
Fixes: 63b5305ad84d ("soc: microchip: mpfs-sys-controller: add support for pic64gx") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Gary Guo [Tue, 12 May 2026 12:09:48 +0000 (13:09 +0100)]
rust: pin-init: internal: add `PhantomInvariant` and `PhantomInvariantLifetime`
Currently, the `pin_init` library has an `Invariant` type alias, and it is
instantiated using `PhantomData`. Generated code from `pin_data` on the
other hand cannot access the crate-local type alias, so it generates
`PhantomData<fn(T) -> T>` directly. This is all very inconsistent, despite
the exact same use case of ensuring invariance.
Add `PhantomInvariant` and `PhantomInvariantLifetime` and switch all users
that need to express the concept of invariance to use these. They're
polyfills of unstable types in the same names in the Rust standard library.
Mohamad Alsadhan [Tue, 12 May 2026 12:09:47 +0000 (13:09 +0100)]
rust: pin-init: internal: pin_data: add struct to record field info
Introduce `FieldInfo` struct to encapsulate field and other relevant data,
instead of carrying a pair of `(pinned, field)` in all places. This allows
us to add more information to the struct in the future.
Gary Guo [Tue, 12 May 2026 12:09:46 +0000 (13:09 +0100)]
rust: pin-init: internal: pin_data: use closure for `handle_field`
`handle_field` is currently a function, which precludes it from referencing
things in the scope of the parent function. Given that it's only called
once, inline its contents to the closure that invokes it instead, so it can
directly reference `struct_name` without having to pass in as argument.
Michal Wajdeczko [Tue, 12 May 2026 18:33:39 +0000 (20:33 +0200)]
drm/xe/display: Add macro with display driver ops
Instead of updating the drm_driver.fbdev_probe field in the runtime,
we can use macro which value depends on the actual Kconfig setup.
The .fbdev_probe hook will not be used by the drm core unless we
also enable a DRIVER_MODESET driver feature flag, and this flag
still depends on the xe_modparam.probe_display parameter.