]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
5 weeks agoMerge branch 'net-fix-protodown-with-macvlan'
Jakub Kicinski [Fri, 8 May 2026 23:50:29 +0000 (16:50 -0700)] 
Merge branch 'net-fix-protodown-with-macvlan'

Ido Schimmel says:

====================
net: Fix protodown with macvlan

When protodown is enabled on a macvlan, two bugs cause the macvlan to
incorrectly gain carrier:

1. Toggling the lower device's carrier while protodown is enabled on the
macvlan causes the macvlan to gain carrier, effectively bypassing the
protodown mechanism.

2. Toggling protodown on and then off on the macvlan while the lower
device has no carrier causes the macvlan to gain carrier, since
netif_change_proto_down() unconditionally turns the carrier on.

Patch #1 is a preparation.

Patch #2 solves the first problem by making netif_carrier_on() return
early when protodown is on.

Patch #3 solves the second problem by only calling netif_carrier_on()
when protodown is turned off if there is no linked net device or if the
linked net device has a carrier.

Patch #4 adds a selftest covering both bugs and the basic protodown
functionality.

Targeting at net-next since these are not regressions (i.e., never
worked).

Note that while these changes are in the core, they should only affect
macvlan as protodown is only supported by macvlan and vxlan and only
the former has a linked net device.
====================

Link: https://patch.msgid.link/20260507105906.891817-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: net: Add protodown tests
Ido Schimmel [Thu, 7 May 2026 10:59:06 +0000 (13:59 +0300)] 
selftests: net: Add protodown tests

Add a selftest for the protodown mechanism.

Five test cases are included:

1. Basic protodown toggling: Verify that setting protodown on macvlan
   results in DOWN operational state and clearing it restores UP.

2. Same as the previous test case, but with vxlan.

3. Protodown reasons: Verify that protodown cannot be cleared while
   there are active protodown reasons, but can be cleared once all
   reasons are removed.

4. Protodown with lower device being toggled: Verify that toggling the
   lower device's carrier while protodown is on does not cause the
   macvlan to gain carrier.

5. Protodown with lower device down: Verify that toggling protodown
   while the lower device has no carrier does not cause the macvlan to
   gain carrier.

Note that the last two test cases fail without "net: Do not turn on
carrier when protodown is on" and "net: Do not unconditionally turn on
carrier when turning off protodown":

 # ./protodown.sh
 TEST: Basic protodown on/off with macvlan                           [ OK ]
 TEST: Basic protodown on/off with vxlan                             [ OK ]
 TEST: Protodown reasons                                             [ OK ]
 TEST: Protodown with lower device toggled                           [FAIL]
         Macvlan operational state is not DOWN despite protodown
 TEST: Protodown with lower device down                              [FAIL]
         Macvlan is not LOWERLAYERDOWN after clearing protodown

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: Do not unconditionally turn on carrier when turning off protodown
Ido Schimmel [Thu, 7 May 2026 10:59:05 +0000 (13:59 +0300)] 
net: Do not unconditionally turn on carrier when turning off protodown

The protodown functionality allows user space to turn off the carrier of
a net device:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>

When protodown is turned off, the core unconditionally turns on the
carrier of the net device:

 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

This is wrong as it means that a macvlan can end up with a carrier when
its lower device does not have a carrier:

 # ip link set dev dummy1 carrier off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev macvlan1 protodown on
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Solve this by resolving the linked net device and if one exists, inherit
its carrier state when protodown is turned off. Otherwise, if no linked
net device exists, as before, simply turn on the carrier.

Resolve the linked net device using a new helper and have it return the
device itself (in a similar fashion to dev_get_iflink()) if the device
does not implement both ndo_get_iflink() and get_link_net(). If the
latter is not implemented, it is unclear in which network namespace we
should look up the linked net device. Currently, this helper is only
used for net devices that support protodown (macvlan and vxlan) and for
both it returns the correct result.

Output with the patch:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev dummy1 carrier off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev macvlan1 protodown on
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev dummy1 carrier on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>
 # ip link set dev macvlan1 protodown on
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: Do not turn on carrier when protodown is on
Ido Schimmel [Thu, 7 May 2026 10:59:04 +0000 (13:59 +0300)] 
net: Do not turn on carrier when protodown is on

The protodown functionality allows user space to turn off the carrier of
a net device:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>

Different applications can set different protodown reasons, which
prevents an application from turning on the carrier of a net device as
long as others want it down:

 # ip link set dev macvlan1 protodown_reason 1 on
 # ip link set dev macvlan1 protodown_reason 2 on
 # ip link set dev macvlan1 protodown off
 Error: Cannot clear protodown, active reasons.
 # ip link set dev macvlan1 protodown_reason 2 off
 # ip link set dev macvlan1 protodown off
 Error: Cannot clear protodown, active reasons.
 # ip link set dev macvlan1 protodown_reason 1 off
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Unfortunately, this mechanism is not very useful when the carrier of a
net device can be toggled by toggling the carrier of its lower device:

 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev dummy1 carrier off
 # ip link set dev dummy1 carrier on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Obviously, this is not the intended behavior and it is unlikely to be
relied on by anyone. In fact, it is a problem for applications like FRR
that use protodown with macvlan on top of a bridge as part of Virtual
Router Redundancy Protocol (VRRP).

Solve this by preventing a net device configured with protodown on from
gaining carrier by making netif_carrier_on() a NOP when protodown is
turned on.

Output with the patch:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev dummy1 carrier off
 # ip link set dev dummy1 carrier on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: Set dev->proto_down before changing carrier state
Ido Schimmel [Thu, 7 May 2026 10:59:03 +0000 (13:59 +0300)] 
net: Set dev->proto_down before changing carrier state

A subsequent patch will make netif_carrier_on() a NOP for net devices
that have protodown turned on so that they will not accidentally gain
carrier. As a preparation, set dev->proto_down before calling
netif_carrier_{off,on}().

Note that the only driver that supports protodown and has a notion of a
carrier is macvlan and it is calling netif_carrier_{off,on}() with RTNL
held.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'keep-phy-link-during-wol-sleep-cycle'
Jakub Kicinski [Fri, 8 May 2026 23:39:00 +0000 (16:39 -0700)] 
Merge branch 'keep-phy-link-during-wol-sleep-cycle'

Justin Chen says:

====================
Keep PHY link during WoL sleep cycle

First we divide the init/deinit path to allow for a partial init/deinit
during a sleep cycle. We also remove some unnecessary small functions at
the same time.

Then we modify the suspend and resume path to allow for a partial bring
down and bring up. This allow us to keep the PHY link up and to resume
network traffic much quicker. Note we only do this when WoL is enabled
since the PHY is already powered. In the non-WoL case we want to follow
the same flow.
====================

Link: https://patch.msgid.link/20260506213114.2002886-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: bcmasp: Keep phy link during WoL sleep cycle
Justin Chen [Wed, 6 May 2026 21:31:14 +0000 (14:31 -0700)] 
net: bcmasp: Keep phy link during WoL sleep cycle

We currently more or less restart all the HW on resume. Since we also
stop the PHY, it takes a while for the PHY link to be re-negotiated on
resume. Instead of doing a full restart, we keep the HW state and the
PHY link, that way we can resume network traffic with a much smaller
delay.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260506213114.2002886-3-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: bcmasp: Divide init to allow partial bring up
Justin Chen [Wed, 6 May 2026 21:31:13 +0000 (14:31 -0700)] 
net: bcmasp: Divide init to allow partial bring up

To prepare for a partial bring up of the interface during resume,
we break apart the bcmasp_netif_init() function into smaller chunks
that can be called as necessary. Also consolidate some functions that
do not need to be standalone.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260506213114.2002886-2-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: lan966x: avoid unregistering netdev on register failure
Myeonghun Pak [Wed, 6 May 2026 12:43:11 +0000 (21:43 +0900)] 
net: lan966x: avoid unregistering netdev on register failure

lan966x_probe_port() stores the newly allocated net_device in the
port before calling register_netdev(). If register_netdev() fails,
the probe error path calls lan966x_cleanup_ports(), which sees
port->dev and calls unregister_netdev() for a device that was never
registered.

Destroy the phylink instance created for this port and clear port->dev
before returning the registration error. The common cleanup path now skips
ports without port->dev before reaching the registered netdev cleanup, so
it only handles ports that reached the registered-netdev lifetime.

This also avoids treating an uninitialized FDMA netdev and the failed port
as a NULL == NULL match in the common cleanup path.

Fixes: d28d6d2e37d1 ("net: lan966x: add port module support")
Co-developed-by: Ijae Kim <ae878000@gmail.com>
Signed-off-by: Ijae Kim <ae878000@gmail.com>
Signed-off-by: Myeonghun Pak <mhun512@gmail.com>
Link: https://patch.msgid.link/20260506124331.31945-1-mhun512@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agodt-bindings: net: microchip: Add LAN7500 and LAN7505 devices
Thomas Richard [Wed, 6 May 2026 12:13:03 +0000 (14:13 +0200)] 
dt-bindings: net: microchip: Add LAN7500 and LAN7505 devices

Add bindings for LAN7500 and LAN7505 USB Ethernet Devices which are similar
to LAN9500.

Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260506-b4-var-som-om44-lan7500-v2-1-b8af59ab877c@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 8 May 2026 23:18:35 +0000 (16:18 -0700)] 
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:

 - ptrace(PTRACE_SETREGSET) fix to zero the target's fpsimd_state rather
   than the tracer's

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's

5 weeks agoMerge tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 8 May 2026 23:08:58 +0000 (16:08 -0700)] 
Merge tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fixes from Bjorn Helgaas:

 - Don't fallback to bus reset after failed slot reset; a bus reset
   isn't safe if the .reset_slot() callback is implemented (Keith Busch)

 - Update saved_config_space upon resource assignment to fix passthrough
   regressions when x86 pcibios_assign_resources() updates BARs (Lukas
   Wunner)

 - Initialize a temporary pci_dev->dev in sysfs 'new_id' attribute to
   fix a lockdep regression after driver_override was moved from PCI to
   device core (Samiullah Khawaja)

 - Update MAINTAINERS email addresses (Marek Vasut, Hans Zhang)

 - Add MAINTAINERS reviewer for PCIe Cadence IP (Aksh Garg)

* tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer
  MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1
  MAINTAINERS: Update Marek Vasut email for PCIe R-Car
  PCI: Initialize temporary device in new_id_store()
  PCI: Update saved_config_space upon resource assignment
  PCI: Don't fallback to bus reset after failed slot reset

5 weeks agoMerge branch 'intel-wired-lan-driver-updates-2026-05-04-i40e-ice-idpf'
Jakub Kicinski [Fri, 8 May 2026 23:01:11 +0000 (16:01 -0700)] 
Merge branch 'intel-wired-lan-driver-updates-2026-05-04-i40e-ice-idpf'

Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf)

Matt Volrath fixes two issues with the i40e driver probe routine, ensuring
that PTP is properly cleaned up if the probe fails.

Emil corrects the initialization of the read_dev_clk_lock spinlock in
idpf_ptp_init, ensuring it is initialized prior to when the
ptp_schedule_worker() is called.

Greg KH fixes a double free and use-after free in the idpf auxiliary device
error paths.

Marcin fixes ice_set_rss_hfunc() to use the correct q_opt_flags field,
correcting the assignment and preventing submission of invalid data to the
firmware.

Bart corrects the locking in ice_dcb_rebuild(), ensuring that the tc_mutex
is held over the entire operation.

Ivan fixes the rclk pin state get for E810 devices, ensuring the index is
properly offset by the base_rclk_idx value. This ensures that the correct
pin index is used to look up recovered clock state. He additionally adds
bounds checking to prevent attempting to access pins outside of the pin
state array.

Ivan also moves the CGU register macros to the top of ice_dpll.h, inside
the header guard to avoid duplicate macro definitions should the ice_dpll.h
header is included multiple times.
====================

Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoice: dpll: fix misplaced header macros
Ivan Vecera [Wed, 6 May 2026 21:48:17 +0000 (14:48 -0700)] 
ice: dpll: fix misplaced header macros

The CGU register definitions (ICE_CGU_R10, ICE_CGU_R11 and related field
masks) were placed after the #endif of the _ICE_DPLL_H_ include guard,
leaving them unprotected. Move them inside the guard.

Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-8-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoice: dpll: fix rclk pin state get for E810
Ivan Vecera [Wed, 6 May 2026 21:48:16 +0000 (14:48 -0700)] 
ice: dpll: fix rclk pin state get for E810

The refactoring of ice_dpll_rclk_state_on_pin_get() to use
ice_dpll_pin_get_parent_idx() omitted the base_rclk_idx adjustment that was
correctly added in the ice_dpll_rclk_state_on_pin_set() path. This breaks
E810 devices where base_rclk_idx is non-zero, causing the wrong hardware
index to be used for pin state lookup and incorrect recovered clock state
to be reported via the DPLL subsystem. E825C is unaffected as its
base_rclk_idx is 0.

While at it, add bounds check against ICE_DPLL_RCLK_NUM_MAX on hw_idx after
the base_rclk_idx subtraction in both ice_dpll_rclk_state_on_pin_{get,set}()
to prevent out-of-bounds access on the pin state array.

Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-7-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoice: fix locking in ice_dcb_rebuild()
Bart Van Assche [Wed, 6 May 2026 21:48:15 +0000 (14:48 -0700)] 
ice: fix locking in ice_dcb_rebuild()

Move the mutex_lock() call up to prevent that DCB settings change after
the first ice_query_port_ets() call. The second ice_query_port_ets()
call in ice_dcb_rebuild() is already protected by pf->tc_mutex.

This also fixes a bug in an error path, as before taking the first
"goto dcb_error" in the function jumped over mutex_lock() to
mutex_unlock().

This bug has been detected by the clang thread-safety analyzer.

Cc: intel-wired-lan@lists.osuosl.org
Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-6-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoice: fix setting RSS VSI hash for E830
Marcin Szycik [Wed, 6 May 2026 21:48:14 +0000 (14:48 -0700)] 
ice: fix setting RSS VSI hash for E830

ice_set_rss_hfunc() performs a VSI update, in which it sets hashing
function, leaving other VSI options unchanged. However, ::q_opt_flags is
mistakenly set to the value of another field, instead of its original
value, probably due to a typo. What happens next is hardware-dependent:

On E810, only the first bit is meaningful (see
ICE_AQ_VSI_Q_OPT_PE_FLTR_EN) and can potentially end up in a different
state than before VSI update.

On E830, some of the remaining bits are not reserved. Setting them
to some unrelated values can cause the firmware to reject the update
because of invalid settings, or worse - succeed.

Reproducer:
  sudo ethtool -X $PF1 equal 8

Output in dmesg:
  Failed to configure RSS hash for VSI 6, error -5

Fixes: 352e9bf23813 ("ice: enable symmetric-xor RSS for Toeplitz hash function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-5-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoidpf: fix double free and use-after-free in aux device error paths
Greg Kroah-Hartman [Wed, 6 May 2026 21:48:13 +0000 (14:48 -0700)] 
idpf: fix double free and use-after-free in aux device error paths

When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
auxiliary_device_uninit() and falls through to err_aux_dev_init.  The
uninit call will trigger put_device(), which invokes the release
callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
iadev.  The fall-through then reads adev->id from the freed iadev for
ida_free() and double-frees iadev with kfree().

Free the IDA slot and clear the back-pointer before uninit, while adev
is still valid, then return immediately.

Commit 65637c3a1811 ("idpf: fix UAF in RDMA core aux dev deinitialization")
fixed the same use-after-free in the matching unplug path in this file but
missed both probe error paths.

Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: stable@kernel.org
Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
Fixes: f4312e6bfa2a ("idpf: implement core RDMA auxiliary dev create, init, and destroy")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-4-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoidpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()
Emil Tantilov [Wed, 6 May 2026 21:48:12 +0000 (14:48 -0700)] 
idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()

In idpf_ptp_init(), read_dev_clk_lock is initialized after
ptp_schedule_worker() had already been called (and after
idpf_ptp_settime64() could reach the lock). The PTP aux worker
fires immediately upon scheduling and can call into
idpf_ptp_read_src_clk_reg_direct(), which takes
spin_lock(&ptp->read_dev_clk_lock) on an uninitialized lock, triggering
the lockdep "non-static key" warning:

[12973.796587] idpf 0000:83:00.0: Device HW Reset initiated
[12974.094507] INFO: trying to register non-static key.
...
[12974.097208] Call Trace:
[12974.097213]  <TASK>
[12974.097218]  dump_stack_lvl+0x93/0xe0
[12974.097234]  register_lock_class+0x4c4/0x4e0
[12974.097249]  ? __lock_acquire+0x427/0x2290
[12974.097259]  __lock_acquire+0x98/0x2290
[12974.097272]  lock_acquire+0xc6/0x310
[12974.097281]  ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097311]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[12974.097318]  ? finish_task_switch.isra.0+0xd2/0x350
[12974.097330]  ? __pfx_ptp_aux_kworker+0x10/0x10 [ptp]
[12974.097343]  _raw_spin_lock+0x30/0x40
[12974.097353]  ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097373]  idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097391]  ? kthread_worker_fn+0x88/0x3d0
[12974.097404]  ? kthread_worker_fn+0x4e/0x3d0
[12974.097411]  idpf_ptp_update_cached_phctime+0x26/0x120 [idpf]
[12974.097428]  ? _raw_spin_unlock_irq+0x28/0x50
[12974.097436]  idpf_ptp_do_aux_work+0x15/0x20 [idpf]
[12974.097454]  ptp_aux_kworker+0x20/0x40 [ptp]
[12974.097464]  kthread_worker_fn+0xd5/0x3d0
[12974.097474]  ? __pfx_kthread_worker_fn+0x10/0x10
[12974.097482]  kthread+0xf4/0x130
[12974.097489]  ? __pfx_kthread+0x10/0x10
[12974.097498]  ret_from_fork+0x32c/0x410
[12974.097512]  ? __pfx_kthread+0x10/0x10
[12974.097519]  ret_from_fork_asm+0x1a/0x30
[12974.097540]  </TASK>

Move the call to spin_lock_init() up a bit to make sure read_dev_clk_lock
is not touched before it's been initialized.

Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-3-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoi40e: Cleanup PTP pins on probe failure
Matt Vollrath [Wed, 6 May 2026 21:48:11 +0000 (14:48 -0700)] 
i40e: Cleanup PTP pins on probe failure

PTP pin structs are allocated early in probe, but never cleaned up.

Fix this by calling i40e_ptp_free_pins in the error path.

To support this, i40e_ptp_free_pins is added to the header and
pin_config is correctly nullified after being freed.

This has been an issue since i40e_ptp_alloc_pins was introduced.

Fixes: 1050713026a08 ("i40e: add support for PTP external synchronization clock")
Reported-by: Kohei Enju <kohei@enjuk.jp>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Kohei Enju <kohei@enjuk.jp>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-2-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoi40e: Cleanup PTP registration on probe failure
Matt Vollrath [Wed, 6 May 2026 21:48:10 +0000 (14:48 -0700)] 
i40e: Cleanup PTP registration on probe failure

Fix two conditions which would leak PTP registration on probe failure:

1. i40e_setup_pf_switch can encounter an error in
   i40e_setup_pf_filter_control, call i40e_ptp_init, then return
   non-zero, sending i40e_probe to err_vsis.

2. i40e_setup_misc_vector can return non-zero, sending i40e_probe to
   err_vsis.

Both of these conditions have been present since PTP was introduced in
this driver.

Found with coccinelle.

Fixes: beb0dff1251db ("i40e: enable PTP")
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-1-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: dp83867: add MDI-X management
Luca Ellero [Wed, 6 May 2026 14:19:06 +0000 (16:19 +0200)] 
net: phy: dp83867: add MDI-X management

ethtool on this phy device always reports "MDI-X: Unknown" and doesn't
support forcing it to on or off.
This patch adds support for reading/forcing MDI-X mode from ethtool
properly.

Signed-off-by: Luca Ellero <l.ellero@asem.it>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260506141918.13136-1-l.ellero@asem.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: shaper: Reject reparenting of existing nodes
Mohsin Bashir [Wed, 6 May 2026 23:37:45 +0000 (16:37 -0700)] 
net: shaper: Reject reparenting of existing nodes

When an existing node-scope shaper is moved to a different parent
via the group operation, the framework fails to update the leaves
count on both the old and new parent shapers. Only newly created
nodes (handle.id == NET_SHAPER_ID_UNSPEC) trigger the parent
leaves increment at line 1039.

This causes the parent's leaves counter to diverge from the
actual number of children in the xarray. When the node is later
deleted, pre_del_node() allocates an array sized by the stale
leaves count, but the xarray iteration finds more children than
expected, hitting the WARN_ON_ONCE guard and returning -EINVAL.

Rather than adding reparenting support with complex leaves count
bookkeeping, reject group calls that attempt to change an existing
node's parent. Updates to an existing node's rate or leaves under
the same parent remain permitted. We expect that for any modification
of the topology user should always create new groups and let the
kernel garbage collect the leaf-less nodes.

Fixes: 5d5d4700e75d ("net-shapers: implement NL group operation")
Signed-off-by: Mohsin Bashir <hmohsin@meta.com>
Link: https://patch.msgid.link/20260506233745.111895-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agogenetlink: free the skb on 'group >= family->n_mcgrps'
Alice Ryhl [Wed, 6 May 2026 20:07:13 +0000 (20:07 +0000)] 
genetlink: free the skb on 'group >= family->n_mcgrps'

These methods generally consume ownership of the provided skb, so even
if an error path is encountered, the skb is freed. This is because the
very first thing they do after some initial setup is to unconditionally
consume the skb via consume_skb(skb). Any subsequent errors lead to the
core netlink layer freeing the skb.

However, there is one check that occurs before ownership is passed,
which is the check for the group index. So if this error condition is
encountered, then the skb is leaked. This error condition is generally
considered a violation of the netlink API, so it's not expected to occur
under normal circumstances. For the same reason, no callers check for
this error condition, and no callers need to be adjusted. However, we
should still follow the same ownership semantics of the rest of the
function. Thus, free the skb in this codepath.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Matthew Maurer <mmaurer@google.com>
Fixes: 2a94fe48f32c ("genetlink: make multicast groups const, prevent abuse")
Link: https://lore.kernel.org/r/845b36ba-7b3a-41f2-acb2-b284f253e2ca@lunn.ch
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260506-genlmsg-return-v2-1-a63ee2a055d6@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agogve: Use generic power management
Vaibhav Gupta [Wed, 6 May 2026 16:50:06 +0000 (16:50 +0000)] 
gve: Use generic power management

Switch to the generic power management and remove the usage of legacy
(pci_driver) hooks.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506165015.641738-1-vaibhavgupta40@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: nsh: fix incorrect header length macros
Ilya Maximets [Thu, 7 May 2026 12:04:26 +0000 (14:04 +0200)] 
net: nsh: fix incorrect header length macros

NSH header length is a 6-bit field that encodes the total length of
the header in 4-byte words.  So the maximum length is 0b111111 * 4,
which is 252 and not 256.  The maximum context length is the same
number minus the length of the base header (8), so 244.

These macros are used to validate push_nsh() action in openvswitch.
Miscalculation here doesn't cause any real issues.  In the worst case
the oversized context is truncated while building the header, so we'll
construct and send a broken packet, which is not a big problem, as any
receiver should validate the fields.  No invalid memory accesses will
happen during the header push.  But we should fix the macros to reject
the incorrect actions in the first place.

Using previously defined values and calculating the length instead
of defining numbers directly, so it's easier to understand where they
come from and harder to make a mistake.

Fixes: 1f0b7744c505 ("net: add NSH header structures and helpers")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20260507120434.2962505-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agobnxt_en: Drop pci_save_state() after pci_restore_state()
Lukas Wunner [Thu, 7 May 2026 13:04:59 +0000 (15:04 +0200)] 
bnxt_en: Drop pci_save_state() after pci_restore_state()

Commit 383d89699c50 ("treewide: Drop pci_save_state() after
pci_restore_state()") sought to purge all superfluous invocations of
pci_save_state() from the tree.

Unfortunately the commit missed one invocation in the Broadcom
NetXtreme-C/E driver.  Drop it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/39de1b025928d9a457976010b2324e7e99baa92a.1778158755.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ethtool: fix NULL pointer dereference in phy_reply_size
Quan Sun [Thu, 7 May 2026 13:17:38 +0000 (21:17 +0800)] 
net: ethtool: fix NULL pointer dereference in phy_reply_size

In phy_prepare_data(), several strings such as 'name', 'drvname',
'upstream_sfp_name', and 'downstream_sfp_name' are allocated using
kstrdup(). However, these allocations were not checked  for failure.

If kstrdup() fails for 'name', it returns NULL while the function
continues. This leads to a kernel NULL pointer dereference and panic
later in phy_reply_size() when it unconditionally calls strlen() on
the NULL pointer.

While other strings like 'upstream_sfp_name' might be checked before
access in certain code paths, failing to handle these allocations
consistently can lead to incomplete data reporting or hidden bugs.

Fix this by adding proper NULL checks for all kstrdup() calls in
phy_prepare_data() and implement a centralized error handling path
using goto labels to ensure all previously allocated resources are
freed on failure.

Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump")
Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260507131738.1173835-1-2022090917019@std.uestc.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMAINTAINERS: change maintainers for macb Ethernet driver
Nicolas Ferre [Thu, 7 May 2026 12:04:44 +0000 (14:04 +0200)] 
MAINTAINERS: change maintainers for macb Ethernet driver

I would like to hand over the macb maintenance to Théo, as I'm unable to
keep up with the recent flow of patches for this driver. After speaking
with Claudiu, he indicated that he is in the same position as me.
To help with this work, Conor has agreed to act as a reviewer.

I was given responsibility for this driver years ago, and I'm glad to
see it continue with talented developers.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260507120444.9733-1-nicolas.ferre@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agodt-bindings: net: lan966x: Accept standard ethernet prefixes
Linus Walleij [Thu, 7 May 2026 09:26:01 +0000 (11:26 +0200)] 
dt-bindings: net: lan966x: Accept standard ethernet prefixes

The dsa.yaml and ethernet-switch.yaml bindings recommend
prefixing ethernet switches and ports with "ethernet-" so
make the LAN966x do the same.

Reported-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260507-lan966-binding-v1-1-e99293d2a4ec@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ethernet: atheros: atl2: remove kernel backward-compatibility code
Ethan Nelson-Moore [Wed, 6 May 2026 05:40:27 +0000 (22:40 -0700)] 
net: ethernet: atheros: atl2: remove kernel backward-compatibility code

The atl2 driver contains code for compatibility with old kernels that
do not support module_param_array. Backward compatibility is
irrelevant because this driver is in-tree. Remove this unreachable
code to simplify the driver's handling of module parameters.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506054035.23710-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: napi: Avoid gro timer misfiring at end of busypoll
Dragos Tatulea [Wed, 6 May 2026 09:08:08 +0000 (09:08 +0000)] 
net: napi: Avoid gro timer misfiring at end of busypoll

When in irq deferral mode (defer-hard-irqs > 0), a short enough
gro-flush timeout can trigger before NAPI_STATE_SCHED is cleared if the
last poll in busy_poll_stop() takes too long. This can have the effect
of leaving the queue stuck with interrupts disabled and no timer armed
which results in a tx timeout if there is no subsequent busypoll cycle.

To prevent this, defer the gro-flush timer arm after the last poll.

Fixes: 7fd3253a7de6 ("net: Introduce preferred busy-polling")
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260506090808.820559-2-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'ipv6-flowlabel-per-netns-budget-for-unprivileged-callers'
Jakub Kicinski [Fri, 8 May 2026 21:59:17 +0000 (14:59 -0700)] 
Merge branch 'ipv6-flowlabel-per-netns-budget-for-unprivileged-callers'

Maoyi Xie says:

====================
ipv6: flowlabel: per-netns budget for unprivileged callers

From: Maoyi Xie <maoyi.xie@ntu.edu.sg>

This series fixes the cross-tenant DoS in net/ipv6/ip6_flowlabel.c.
v1 through v6 were single-patch postings, each in its own thread.
v6 review pointed out that the existing fl_size read in
mem_check() and the corresponding write in fl_intern() are not in
the same critical section. v7 split the work into 2 patches.

Patch 1/2 is a prerequisite. It moves spin_lock_bh(&ip6_fl_lock)
and the matching unlock from fl_intern() into its only caller
ipv6_flowlabel_get(), so the mem_check() call runs under the same
critical section as the fl_intern() insert. With all writers and
the read of fl_size under the lock, fl_size is converted from
atomic_t to plain int. This is independent of the per-netns
budget. It also makes 2/2 backportable without conflicts.

Patch 2/2 is the v6 patch, rebased on 1/2.

  - flowlabel_count is plain int rather than atomic_t, since the
    previous patch put all writers and readers under ip6_fl_lock.
  - In ip6_fl_gc(), fl_free() is now placed below the fl_size
    and flowlabel_count decrements, removing the v6 cache of
    fl->fl_net.
  - In ip6_fl_purge(), fl_free() stays in its original position.
    The function argument net is used for flowlabel_count.
  - mem_check() uses spaces around the / operator on all four
    expressions, addressing the checkpatch note in v6 review.

Numeric budget (preserved from v6):

  pre-patch:
    global non-CAP_NET_ADMIN budget = FL_MAX_SIZE - FL_MAX_SIZE/4
                                    = 4096 - 1024 = 3072
    per-actor reach                 = 3072

  post-patch:
    FL_MAX_SIZE doubled to 8192
    global non-CAP_NET_ADMIN budget = 8192 - 2048 = 6144
    per-netns ceiling               = 6144 / 2 = 3072
    per-actor reach                 = 3072 (preserved)

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

Reproducer (KASAN VM, 4 cores, qemu): unprivileged netns A holds
3072 flowlabels via 100 procs. Fresh unprivileged netns B then
allocates 32 flowlabels (the FL_MAX_PER_SOCK ceiling for one
socket), the same as a clean baseline. Without the per-netns
ceiling, netns A could push fl_size past FL_MAX_SIZE - FL_MAX_SIZE
/ 4 and netns B would see allocations denied.
====================

Link: https://patch.msgid.link/20260506082416.2259567-1-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoipv6: flowlabel: enforce per-netns limit for unprivileged callers
Maoyi Xie [Wed, 6 May 2026 08:24:16 +0000 (16:24 +0800)] 
ipv6: flowlabel: enforce per-netns limit for unprivileged callers

fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are
file scope and shared across netns. mem_check() reads fl_size to
decide whether to deny non-CAP_NET_ADMIN callers. capable() runs
against init_user_ns, so an unprivileged user in any non-init
userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and
starve every other unprivileged userns on the host.

Add struct netns_ipv6::flowlabel_count, bumped and decremented
next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new
field fills the existing 4-byte hole after ipmr_seq, so struct
netns_ipv6 stays the same size on 64-bit builds.

Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the
file was added. Machines and connection counts have grown.

mem_check() folds an extra per-netns ceiling into the existing
non-CAP_NET_ADMIN conditional. The ceiling is half of the total
budget that unprivileged callers have ever been able to use, i.e.
(FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With
FL_MAX_SIZE doubled, this preserves the original per-user reach
of 3K (what an unprivileged caller could already obtain before
this change), while forcing an attacker to spread allocations
across at least two netns to exhaust the global non-CAP_NET_ADMIN
budget.

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

The previous patch took ip6_fl_lock across mem_check and
fl_intern, so the new flowlabel_count read in mem_check and the
new flowlabel_count++ in fl_intern run under the same critical
section. flowlabel_count is therefore plain int, like fl_size.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern
Maoyi Xie [Wed, 6 May 2026 08:24:15 +0000 (16:24 +0800)] 
ipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern

mem_check() in net/ipv6/ip6_flowlabel.c reads fl_size without
holding ip6_fl_lock. fl_intern() takes the lock immediately
afterwards. The two checks therefore race against concurrent
fl_intern, ip6_fl_gc and ip6_fl_purge writers, which makes the
mem_check budget check approximate.

Move spin_lock_bh(&ip6_fl_lock) and the matching unlock from
fl_intern() into its only caller ipv6_flowlabel_get(). The
mem_check() call now runs under the same critical section as the
fl_intern() insert, so the budget check is exact.

With all writers and the read of fl_size under ip6_fl_lock,
convert fl_size from atomic_t to plain int. The four sites that
update or read fl_size are fl_intern (insert path), ip6_fl_gc
(garbage collector, the !sched check and the per-entry decrement),
ip6_fl_purge (per-netns purge), and mem_check (budget check), and
all four now run under ip6_fl_lock.

This is a prerequisite for adding a per-netns budget alongside
fl_size. The follow-up patch adds netns_ipv6::flowlabel_count and
folds it into mem_check().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506082416.2259567-2-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMAINTAINERS: Add self for the 3c509 network driver
Maciej W. Rozycki [Mon, 27 Apr 2026 10:23:17 +0000 (11:23 +0100)] 
MAINTAINERS: Add self for the 3c509 network driver

It appears there's a need for a maintainer for the 3Com EtherLink III
family of Ethernet network adapters.  There is documentation available
and the driver is very mature so the task ought to be of little hassle,
so I think I should be able to squeeze in any issues to be addressed.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/alpine.DEB.2.21.2604271056460.28583@angie.orcam.me.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: net: add tests for filtered dumps of page pool
Jakub Kicinski [Wed, 6 May 2026 03:48:21 +0000 (20:48 -0700)] 
selftests: net: add tests for filtered dumps of page pool

Add tests for page pool dumps of a specific ifindex.

Link: https://patch.msgid.link/20260506034821.1710113-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'tcp-two-fixes-for-socket-migration-in-reqsk_timer_handler'
Jakub Kicinski [Fri, 8 May 2026 21:54:55 +0000 (14:54 -0700)] 
Merge branch 'tcp-two-fixes-for-socket-migration-in-reqsk_timer_handler'

Kuniyuki Iwashima says:

====================
tcp: Two fixes for socket migration in reqsk_timer_handler().

The series fixes two bugs in the error path of socket migration
in reqsk_timer_handler().

Patch 1 fixes a potential UAF in reqsk_timer_handler().

Patch 2 fixes imbalanced icsk_accept_queue count.
====================

Link: https://patch.msgid.link/20260506035954.1563147-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: page_pool: support dumping pps of a specific ifindex via Netlink
Jakub Kicinski [Wed, 6 May 2026 03:48:20 +0000 (20:48 -0700)] 
net: page_pool: support dumping pps of a specific ifindex via Netlink

NIPA tries to make sure that HW tests don't modify system state.
It saves the state of page pools, too. Now that I write this commit
message I realize that this is impractical since page pool IDs and
state will get legitimately changed by the tests. But I already
spent a couple of hours implementing the filtering, so..

Link: https://patch.msgid.link/20260506034821.1710113-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Fix imbalanced icsk_accept_queue count.
Kuniyuki Iwashima [Wed, 6 May 2026 03:59:19 +0000 (03:59 +0000)] 
tcp: Fix imbalanced icsk_accept_queue count.

When TCP socket migration happens in reqsk_timer_handler(),
@sk_listener will be updated with the new listener.

When we call __inet_csk_reqsk_queue_drop(), the listener must
be the one stored in req->rsk_listener.

The cited commit accidentally replaced oreq->rsk_listener with
sk_listener, leading to imbalanced icsk_accept_queue count.

Let's pass the correct listener to __inet_csk_reqsk_queue_drop().

Fixes: e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506035954.1563147-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Fix potential UAF in reqsk_timer_handler().
Kuniyuki Iwashima [Wed, 6 May 2026 03:59:18 +0000 (03:59 +0000)] 
tcp: Fix potential UAF in reqsk_timer_handler().

When TCP socket migration fails at inet_ehash_insert() in
reqsk_timer_handler(), we jump to the no_ownership: label
and free the new reqsk immediately with __reqsk_free().

Thus, we must stop the new reqsk's timer before jumping to the
label, but the timer might be missed since the cited commit,
resulting in UAF.

As we are in the original reqsk's timer context, we can safely
call timer_delete_sync() for the new reqsk.

Let's pass false to __inet_csk_reqsk_queue_drop() to stop
the new reqsk's timer.

Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506035954.1563147-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agox86/cpuid: Introduce <asm/cpuid/leaf_types.h>
Ahmed S. Darwish [Fri, 27 Mar 2026 02:15:21 +0000 (03:15 +0100)] 
x86/cpuid: Introduce <asm/cpuid/leaf_types.h>

To centralize all CPUID access across the x86 subsystem, introduce
<asm/cpuid/leaf_types.h>.  It is generated by the x86-cpuid-db project¹ and
provides C99 bitfield listings for all publicly known CPUID leaves.

¹ https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v3.0/CHANGELOG.rst

Suggested-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de
5 weeks agox86/cpuid: Rename cpuid_leaf()/cpuid_subleaf() APIs
Ahmed S. Darwish [Fri, 27 Mar 2026 02:15:20 +0000 (03:15 +0100)] 
x86/cpuid: Rename cpuid_leaf()/cpuid_subleaf() APIs

A new CPUID model will be added where its APIs will be designated as the
official CPUID API.  Free the cpuid_leaf() and cpuid_subleaf() function
names for that API.  Rename them accordingly to cpuid_read() and
cpuid_read_subleaf().

For kernel/cpuid.c, rename its local file operations read function from
cpuid_read() to cpuid_read_f() so that it does not conflict with the new
API.

No functional change.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de
5 weeks agox86/cpu: Do not include the CPUID API header in asm/processor.h
Ahmed S. Darwish [Fri, 27 Mar 2026 02:15:19 +0000 (03:15 +0100)] 
x86/cpu: Do not include the CPUID API header in asm/processor.h

asm/processor.h includes asm/cpuid/api.h but it does not need it.

Remove the include.

This allows the CPUID APIs header to include <asm/processor.h> at a later
step without introducing a circular dependency.

Note, all call sites which implicitly included the CPUID API through
<asm/processor.h> have been modified to explicitly include the CPUID APIs
instead.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de
5 weeks agodrm/xe/multi_queue: Whitelist QUEUE_TIMESTAMP register
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:28 +0000 (09:20 -0700)] 
drm/xe/multi_queue: Whitelist QUEUE_TIMESTAMP register

In a multi-queue use case, when a job is running on the secondary queue,
the CTX_TIMESTAMP does not reflect the queues run ticks. Instead, we use
the QUEUE TIMESTAMP to check how long the job ran. For user space to see
the run ticks for a secondary queue, whitelist the QUEUE_TIMESTAMP
register.

Compute PR: https://github.com/intel/compute-runtime/pull/923

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-24-umesh.nerlige.ramappa@intel.com
5 weeks agoMAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer
Aksh Garg [Fri, 8 May 2026 06:09:51 +0000 (11:39 +0530)] 
MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer

I wish to contribute to the review process for Cadence PCIe IP drivers,
hence add myself as a reviewer.

Signed-off-by: Aksh Garg <a-garg7@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508060951.840233-1-a-garg7@ti.com
5 weeks agoMAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1
Hans Zhang [Fri, 8 May 2026 02:30:06 +0000 (10:30 +0800)] 
MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1

Update my email address as my work email account is no longer in use.

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508023006.1787674-1-18255117159@163.com
5 weeks agoMAINTAINERS: Update Marek Vasut email for PCIe R-Car
Marek Vasut [Tue, 28 Apr 2026 05:19:54 +0000 (07:19 +0200)] 
MAINTAINERS: Update Marek Vasut email for PCIe R-Car

Use up to date address. No functional change.

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260428052030.51101-1-marek.vasut+renesas@mailbox.org
5 weeks agoPCI: Initialize temporary device in new_id_store()
Samiullah Khawaja [Tue, 5 May 2026 23:43:27 +0000 (23:43 +0000)] 
PCI: Initialize temporary device in new_id_store()

When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override().  That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").

The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.

Initialize the temporary pci_dev to fix this.

Repro:

  Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:

  # echo "8086 10f5" > /sys/bus/pci/drivers/e1000e/new_id

  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   register_lock_class+0x77e/0x790
   lock_acquire+0xbf/0x2e0
   pci_match_device+0x24/0x180
   new_id_store+0x189/0x1d0
   kernfs_fop_write_iter+0x14f/0x210
   vfs_write+0x263/0x5e0
   ksys_write+0x79/0xf0
   do_syscall_64+0x117/0xf80

Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure")
Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com
5 weeks agoPCI: Update saved_config_space upon resource assignment
Lukas Wunner [Wed, 15 Apr 2026 15:56:06 +0000 (17:56 +0200)] 
PCI: Update saved_config_space upon resource assignment

Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):

  ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
  ddbridge 0000:05:00.0: cannot read registers
  ddbridge 0000:05:00.0: fail

BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window.  The kernel corrects the BAR assignment:

  pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
  pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned

Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:

  https://git.kernel.org/tglx/history/c/a06a30144bbc

No other architecture performs such a late BAR correction.

Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit
4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.

Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction.  Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:

  vfio_pci_core_register_device()
    vfio_pci_set_power_state()
      pci_restore_state()

But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7.  Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.

Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well.  The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.

Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.

To robustly fix the issue, always update saved_config_space upon resource
assignment.

Reported-by: Bernd Schumacher <bernd@bschu.de>
Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/
Reported-by: Alexandre N. <an.tech@mailo.com>
Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/
Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Bernd Schumacher <bernd@bschu.de>
Tested-by: Alexandre N. <an.tech@mailo.com>
Cc: stable@vger.kernel.org # v6.12+
Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de
5 weeks agodrm/xe/multi_queue: Use QUEUE_TIMESTAMP as job timestamp for multi-queue
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:27 +0000 (09:20 -0700)] 
drm/xe/multi_queue: Use QUEUE_TIMESTAMP as job timestamp for multi-queue

Each queue in a multi queue group has a dedicated timestamp counter. Use
this QUEUE TIMESTAMP register to capture the start timestamp for the
job.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-23-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/multi_queue: Add trace event for the multi queue timestamp
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:26 +0000 (09:20 -0700)] 
drm/xe/multi_queue: Add trace event for the multi queue timestamp

Add a trace event for multi queue timestamp capture.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-22-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/multi_queue: Capture queue run times for active queues
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:25 +0000 (09:20 -0700)] 
drm/xe/multi_queue: Capture queue run times for active queues

If a queue is currently active on the CS, query the QUEUE TIMESTAMP
register to get an up to date value of the runtime. To do so, ensure
that the primary queue is active and then check if the secondary queue
is executing on the CS.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-21-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/lrc: Refactor out engine id to hwe conversion
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:24 +0000 (09:20 -0700)] 
drm/xe/lrc: Refactor out engine id to hwe conversion

We need to define more helpers that read engine ID specific register, so
move that logic outside of get_ctx_timestamp().

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-20-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/multi_queue: Add helpers to access CS QUEUE TIMESTAMP from lrc
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:23 +0000 (09:20 -0700)] 
drm/xe/multi_queue: Add helpers to access CS QUEUE TIMESTAMP from lrc

In secondary queue LRCs, the QUEUE TIMESTAMP register is saved and
restored allowing us to view the individual queue run times. Add helpers
to read this value from the LRC.

BSpec: 73988

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-19-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/multi_queue: Store primary LRC and position info in LRC
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:22 +0000 (09:20 -0700)] 
drm/xe/multi_queue: Store primary LRC and position info in LRC

For an LRC belonging to the secondary queue, in order to check if its
context group is active, we need to check the LRC of the primary queue.
In addition to that we want to compare the secondary queue position to
CSMQDEBUG register to check if the queue itself is active.

To do so, store primary LRC and position information in the LRC.

A note on references involved:

- In general the Queue takes a ref on its LRC.
- In addition, for multi-queue,
a. Primary Queue takes a ref for each Secondary LRC.
b. Each Secondary Queue takes a ref to the Primary Queue

In the current patch, each LRC in the queue group is storing a pointer
to primary LRC. Both primary and secondary LRCs are freed only when
primary queue is destroyed. At this time, all secondary queues are
already destroyed, so there is no one using secondary LRCs. We should be
good without taking any additional references.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-18-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/multi_queue: Refactor check for multi queue support for engine class
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:21 +0000 (09:20 -0700)] 
drm/xe/multi_queue: Refactor check for multi queue support for engine class

xe exec queue code is using a check to see if a class of engines support
multi queue. This check is also needed by other code, so move it to
xe_gt and export it for others.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-17-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/lrc: Refactor xe_lrc_timestamp to simplify logic
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:20 +0000 (09:20 -0700)] 
drm/xe/lrc: Refactor xe_lrc_timestamp to simplify logic

Use a context_active() helper and simplify the timestamp logic.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-16-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe: Add timestamp_ms to LRC snapshot
Matthew Brost [Thu, 7 May 2026 16:20:19 +0000 (09:20 -0700)] 
drm/xe: Add timestamp_ms to LRC snapshot

Add a timestamp in milliseconds to the LRC snapshot to make it easier to
reason about how long the LRC has been running and the average duration
of each job.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-15-umesh.nerlige.ramappa@intel.com
5 weeks agodrm/xe/lrc: Use 64 bit ctx timestamp in the LRC snapshot
Umesh Nerlige Ramappa [Thu, 7 May 2026 16:20:18 +0000 (09:20 -0700)] 
drm/xe/lrc: Use 64 bit ctx timestamp in the LRC snapshot

Use the 64 bit value when available for the context timestamp in the LRC
snapshot.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-14-umesh.nerlige.ramappa@intel.com
5 weeks agobpf: Free reuseport cBPF prog after RCU grace period.
Kuniyuki Iwashima [Sun, 26 Apr 2026 01:26:43 +0000 (01:26 +0000)] 
bpf: Free reuseport cBPF prog after RCU grace period.

Eulgyu Kim reported the splat below with a repro. [0]

The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.

The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().

If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.

Let's defer freeing the reuseport cBPF prog after one RCU
grace period.

Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().

[0]:
BUG: KASAN: vmalloc-out-of-bounds in reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
Read of size 4 at addr ffffc9000051e004 by task slowme/10208
CPU: 6 UID: 1000 PID: 10208 Comm: slowme Not tainted 7.0.0-geb7ac95ff75e #32 PREEMPT(full)
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x240 mm/kasan/report.c:482
 kasan_report+0x118/0x150 mm/kasan/report.c:595
 reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
 udp4_lib_lookup2+0x3bc/0x950 net/ipv4/udp.c:495
 __udp4_lib_lookup+0x768/0xe20 net/ipv4/udp.c:723
 __udp4_lib_lookup_skb+0x297/0x390 net/ipv4/udp.c:752
 __udp4_lib_rcv+0x1312/0x2620 net/ipv4/udp.c:2752
 ip_protocol_deliver_rcu+0x282/0x440 net/ipv4/ip_input.c:207
 ip_local_deliver_finish+0x3bb/0x6f0 net/ipv4/ip_input.c:241
 NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
 NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
 __netif_receive_skb_one_core net/core/dev.c:6181 [inline]
 __netif_receive_skb net/core/dev.c:6294 [inline]
 process_backlog+0xaa4/0x1960 net/core/dev.c:6645
 __napi_poll+0xae/0x340 net/core/dev.c:7709
 napi_poll net/core/dev.c:7772 [inline]
 net_rx_action+0x5d7/0xf50 net/core/dev.c:7929
 handle_softirqs+0x22b/0x870 kernel/softirq.c:622
 do_softirq+0x76/0xd0 kernel/softirq.c:523
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:924 [inline]
 __dev_queue_xmit+0x1dd7/0x3710 net/core/dev.c:4890
 neigh_output include/net/neighbour.h:556 [inline]
 ip_finish_output2+0xca9/0x1070 net/ipv4/ip_output.c:237
 NF_HOOK_COND include/linux/netfilter.h:307 [inline]
 ip_output+0x29f/0x450 net/ipv4/ip_output.c:438
 ip_send_skb+0x45/0xc0 net/ipv4/ip_output.c:1508
 udp_send_skb+0xb04/0x1510 net/ipv4/udp.c:1195
 udp_sendmsg+0x1a71/0x2350 net/ipv4/udp.c:1485
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg net/socket.c:742 [inline]
 __sys_sendto+0x554/0x680 net/socket.c:2206
 __do_sys_sendto net/socket.c:2213 [inline]
 __se_sys_sendto net/socket.c:2209 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2209
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x160/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x415a2d
Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6bc31e41e8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f6bc31e4cdc RCX: 0000000000415a2d
RDX: 0000000000000001 RSI: 00007f6bc31e421f RDI: 0000000000000003
RBP: 00007f6bc31e4240 R08: 00007f6bc31e4220 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000212 R12: 00007f6bc31e46c0
R13: ffffffffffffffb8 R14: 0000000000000000 R15: 00007ffc9b0d70b0
 </TASK>

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Reported-by: Eulgyu Kim <eulgyukim@snu.ac.kr>
Reported-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20260426012647.3233119-1-kuniyu@google.com
5 weeks agodrm/xe/eustall: Return ENODEV from read if EU stall registers get reset
Harish Chegondi [Wed, 15 Apr 2026 18:27:17 +0000 (11:27 -0700)] 
drm/xe/eustall: Return ENODEV from read if EU stall registers get reset

If a reset (GT or engine) happens during EU stall data sampling, all the
EU stall registers can get reset to 0. This will result in EU stall data
buffers' read and write pointer register values to be out of sync with
the cached values. This will result in read() returning invalid data. To
prevent this, check the value of a EU stall base register. If it is zero,
it indicates a reset may have happened that wiped the register to zero.
If this happens, return ENODEV from read() upon which the user space
should disable and enable EU stall data sampling or close the fd and
open a new fd for a new EU stall data collection session.

This patch has been tested by running two IGT tests simultaneously
xe_eu_stall and xe_exec_reset.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Felix Degrood <felix.j.degrood@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/6935698d70b6c7c347ff8b138209d601030c2c9f.1776200094.git.harish.chegondi@intel.com
5 weeks agoMerge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe...
Linus Torvalds [Fri, 8 May 2026 20:18:13 +0000 (13:18 -0700)] 
Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

 - Fix for ublk not doing an actual issue from the task_work fallback
   path. Any request hitting that should be canceled automatically

 - Fix for uring_cmd prep side handling, for the block side uring_cmd
   discard handling

 - Fix for missing validation of the io and physical block size shifts

 - Fix for a use-after-free in ublk's cancel command handling

* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  ublk: fix use-after-free in ublk_cancel_cmd()
  ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
  block: only read from sqe on initial invocation of blkdev_uring_cmd()
  ublk: don't issue uring_cmd from fallback task work

5 weeks agoMerge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 8 May 2026 20:12:48 +0000 (13:12 -0700)] 
Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

 - Ensure that the absolute timeouts for both the command side and the
   waiting side honor the callers time namespace

 - Ensure tracked NAPI entries are cleared at unregistration time, as
   the NAPI polling loop checks the list state rather than the general
   NAPI state. This can lead to NAPI polling even after unregistration
   has been done. If unregistered, all NAPI polling should be disabled

 - Fix for eventfd recursive invocation handling

* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
  io_uring/eventfd: reset deferred signal state
  io_uring/napi: clear tracked NAPI entries on unregister

5 weeks agodrm/xe/multi_queue: Refactor CGP_SYNC send path
Niranjana Vishwanathapura [Fri, 8 May 2026 19:44:29 +0000 (12:44 -0700)] 
drm/xe/multi_queue: Refactor CGP_SYNC send path

Factor the repeated CGP_SYNC action build-and-send sequence into a new
helper guc_exec_queue_send_cgp_sync(). Drop the redundant guc parameter
from __register_exec_queue_group() since it can be derived via
exec_queue_to_guc(q). Remove xe_guc_exec_queue_group_add() which is now
identical to the helper and replace its call site directly.

No functional change.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260508194428.61819-6-niranjana.vishwanathapura@intel.com
5 weeks agodrm/xe/multi_queue: Remove redundant assignment in guc_exec_queue_run_job
Niranjana Vishwanathapura [Fri, 8 May 2026 19:44:28 +0000 (12:44 -0700)] 
drm/xe/multi_queue: Remove redundant assignment in guc_exec_queue_run_job

The 'killed_or_banned_or_wedged = true' assignment is redundant
since the variable is never read after that point.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260508194428.61819-5-niranjana.vishwanathapura@intel.com
5 weeks agoRevert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"
Mario Limonciello [Mon, 4 May 2026 23:01:37 +0000 (18:01 -0500)] 
Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"

Some older systems don't support CPPC in the firmware and this just makes
noise for them when booting.  Drop back to debug.

This reverts commit 21fb59ab4b9767085f4fe1edbdbe3177fbb9ec97.

Fixes: 21fb59ab4b976 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn")
Suggested-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260504230141.484743-2-mario.limonciello@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 weeks agoACPI: PMIC: Replace mutex_lock/unlock() with guard()/scoped_guard()
Maxwell Doose [Thu, 30 Apr 2026 01:35:06 +0000 (20:35 -0500)] 
ACPI: PMIC: Replace mutex_lock/unlock() with guard()/scoped_guard()

Replace mutex_lock() and unlock() calls with the newer guard() and
scoped_guard() macros to make the code more straightforward.

Signed-off-by: Maxwell Doose <m32285159@gmail.com>
[ rjw: Subject tweak, changelog edits ]
Link: https://patch.msgid.link/20260430013507.46259-1-m32285159@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 weeks agodrm/nouveau/gsp: Use kzalloc_flex() for r535 display funcs
Rosen Penev [Fri, 8 May 2026 05:20:56 +0000 (22:20 -0700)] 
drm/nouveau/gsp: Use kzalloc_flex() for r535 display funcs

struct nvkm_disp_func ends with the user flexible array member. Allocate
the r535 display function table with kzalloc_flex() instead of open-coding
the size calculation with sizeof().

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[dropped nothing-burger sentence from commit message]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260508052056.1744665-1-rosenp@gmail.com
5 weeks agonouveau/vmm: use kzalloc_flex
Rosen Penev [Thu, 12 Mar 2026 19:55:29 +0000 (12:55 -0700)] 
nouveau/vmm: use kzalloc_flex

Use the proper macro do to these sizeof calculations.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[fixed style warning from checkpatch]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260312195529.13002-1-rosenp@gmail.com
5 weeks agoMerge branch 'bpf-tcp-fix-type-confusion-in-bpf-helper-functions'
Martin KaFai Lau [Fri, 8 May 2026 16:55:33 +0000 (09:55 -0700)] 
Merge branch 'bpf-tcp-fix-type-confusion-in-bpf-helper-functions'

Kuniyuki Iwashima says:

====================
bpf: tcp: Fix type confusion in bpf helper functions.

bpf_tcp_sock() only check if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

The same issues exist in other bpf functions:

  * bpf_mptcp_sock_from_subflow()
  * bpf_skc_to_tcp_sock()
  * bpf_skc_to_tcp6_sock()
  * sol_tcp_sockopt()

Patch 1 fixes bpf_tcp_sock() and Patch 2 adds a test for it.
Patch 3 ~ 6 fix the rest of the functions above.

Changes:
  v2:
    * Inverse if (err) to if (!err) in the selftest
    * Add patch 3 ~ 6

  v1: https://lore.kernel.org/bpf/20260430184405.1227386-1-kuniyu@google.com/
      https://lore.kernel.org/mptcp/20260430-mptcp-bpf-mptcp-sock-type-v1-1-d2ed5cda7da9@kernel.org/
====================

Link: https://patch.msgid.link/20260504210610.180150-1-kuniyu@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
5 weeks agobpf: tcp: Fix type confusion in sol_tcp_sockopt().
Kuniyuki Iwashima [Mon, 4 May 2026 21:04:53 +0000 (21:04 +0000)] 
bpf: tcp: Fix type confusion in sol_tcp_sockopt().

sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.

Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com
5 weeks agobpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
Kuniyuki Iwashima [Mon, 4 May 2026 21:04:52 +0000 (21:04 +0000)] 
bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().

bpf_skc_to_tcp6_sock() only checks if sk->sk_protocol is IPPROTO_TCP
and sk->sk_family is AF_INET6, but RAW socket can bypass it:

  socket(AF_INET6, SOCK_RAW, IPPROTO_TCP)

Let's check sk->sk_type too.

Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-6-kuniyu@google.com
5 weeks agobpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
Kuniyuki Iwashima [Mon, 4 May 2026 21:04:51 +0000 (21:04 +0000)] 
bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().

bpf_skc_to_tcp_sock() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Fixes: 478cfbdf5f13 ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-5-kuniyu@google.com
5 weeks agomptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
Matthieu Baerts (NGI0) [Mon, 4 May 2026 21:04:50 +0000 (21:04 +0000)] 
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()

bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.

Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260504210610.180150-4-kuniyu@google.com
5 weeks agoselftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
Kuniyuki Iwashima [Mon, 4 May 2026 21:04:49 +0000 (21:04 +0000)] 
selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.

Let's extend sockopt_sk.c to cover bpf_tcp_sock() for the
wrong socket type.

Before:
  # ./test_progs -t sockopt_sk
  [  151.948613] ==================================================================
  [  151.951376] BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt+0xc7/0x8e0
  [  151.954159] Read of size 8 at addr ffff88801083d760 by task test_progs/1259
  ...
  run_test:FAIL:getsetsockopt unexpected error: -1 (errno 0)
  #427     sockopt_sk:FAIL

After:
  #427     sockopt_sk:OK

While at it, missing free() is fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-3-kuniyu@google.com
5 weeks agocrypto/ccp: Skip SNP_INIT if preparation fails
Tycho Andersen (AMD) [Wed, 29 Apr 2026 15:56:36 +0000 (09:56 -0600)] 
crypto/ccp: Skip SNP_INIT if preparation fails

If snp_prepare() fails, SNP_INIT will fail, so skip it and return early.

Note that this is not a change in initialization behavior: if SNP_INIT fails
even before this change, it will still return an error and
__sev_snp_init_locked() will fail initialization of other SEV modes.

Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/20260429155636.540040-1-tycho@kernel.org
5 weeks agox86/sev: Do not initialize SNP if missing CPUs
Tycho Andersen (AMD) [Wed, 29 Apr 2026 15:56:35 +0000 (09:56 -0600)] 
x86/sev: Do not initialize SNP if missing CPUs

The SEV firmware checks that the SNP enable bit is set on each CPU during SNP
initialization, and will fail if not. If there are some CPUs offline, they
will not run the setup functions, so SNP initialization will always fail.

Skip the IPIs in this case and return an error so that the CCP driver can
skip the SNP_INIT that will fail. Also print the CPU masks in order to leave
breadcrumbs so people can figure out what happened.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://20260429155636.540040-1-tycho@kernel.org
5 weeks agodrm/nouveau/kms/nvd9-: Remove unused header in crc.c
Lyude Paul [Fri, 1 May 2026 21:57:02 +0000 (17:57 -0400)] 
drm/nouveau/kms/nvd9-: Remove unused header in crc.c

No functional changes.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260501215703.820656-2-lyude@redhat.com
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 weeks agonilfs2: fix backing_dev_info reference leak
Shuangpeng Bai [Thu, 7 May 2026 15:50:21 +0000 (11:50 -0400)] 
nilfs2: fix backing_dev_info reference leak

setup_bdev_super() already initializes sb->s_bdev and takes a
reference on the block device backing_dev_info when assigning sb->s_bdi.

nilfs_fill_super() takes another reference to the same
backing_dev_info and stores it in sb->s_bdi again. The extra
reference is not paired with a matching bdi_put(), since
generic_shutdown_super() releases sb->s_bdi only once.

Drop the redundant bdi_get() in nilfs_fill_super(). The single
reference taken by setup_bdev_super() is enough and is released
during superblock shutdown.

Fixes: c1e012ea9e83 ("nilfs2: use setup_bdev_super to de-duplicate the mount code")
Signed-off-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
5 weeks agoworkqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path
Breno Leitao [Fri, 8 May 2026 16:22:03 +0000 (09:22 -0700)] 
workqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path

For WQ_UNBOUND workqueues, alloc_and_link_pwqs() allocates wq->cpu_pwq
via alloc_percpu() and then calls apply_workqueue_attrs_locked(). On
failure it returns the error directly, bypassing the enomem: label
which holds the only free_percpu(wq->cpu_pwq) in this function.

The caller's error path kfree()s wq without touching wq->cpu_pwq,
leaking one percpu pointer table (nr_cpu_ids * sizeof(void *) bytes) per
failed call.

If kmemleak is enabled, we can see:

  unreferenced object (percpu) 0xc0fffa5b121048 (size 8):
    comm "insmod", pid 776, jiffies 4294682844
    backtrace (crc 0):
      pcpu_alloc_noprof+0x665/0xac0
      __alloc_workqueue+0x33f/0xa20
      alloc_workqueue_noprof+0x60/0x100

Route the error through the existing enomem: cleanup and any error
before this one.

Cc: stable@kernel.org
Fixes: 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoworkqueue: Release PENDING in __queue_work() drain/destroy reject path
Breno Leitao [Thu, 7 May 2026 11:04:46 +0000 (04:04 -0700)] 
workqueue: Release PENDING in __queue_work() drain/destroy reject path

The caller of __queue_work() owns WORK_STRUCT_PENDING, won via
test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The
state machine documented above __queue_work() requires that owner
to either hand the token to a pwq (insert_work() -> set_work_pwq()),
hand it to a timer, or release it via set_work_pool_and_clear_pending().
try_to_grab_pending() relies on this: when it observes
"PENDING && off-queue" it busy-loops, trusting the current owner to
make progress.

The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that
contract. It WARN_ONCE()s and bare-returns, leaving work->data with
PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty.

The path is reachable without explicit API abuse: queue_delayed_work()
arms a timer with PENDING set; if drain_workqueue() runs while the
timer is still pending, delayed_work_timer_fn() -> __queue_work() in
softirq context hits the WARN, current is not a wq worker so
is_chained_work() is false, and the work is silently dropped with
PENDING leaked.

Mirror what clear_pending_if_disabled() already does on its analogous
reject path: unpack the off-queue data and call
set_work_pool_and_clear_pending() to release the token before
returning.

I was able to reproduce this by queueing several slow works on
a max_active=1 wq, arm a delayed_work whose timer fires while
drain_workqueue() is blocked, then call cancel_delayed_work_sync().
Without this patch the cancel livelocks at 100% CPU; with it the cancel
returns immediately.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoMerge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 8 May 2026 17:24:35 +0000 (10:24 -0700)] 
Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Fix for two ACL issues (security fix to validate dacloffset better
   and chmod fix)

 - Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
   symlinks)

 - Two Kerberos fixes including an important one when AES-256 encryption
   chosen

 - Fix open_cached_dir problem when directory leases disabled

* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: validate dacloffset before building DACL pointers
  smb/client: fix out-of-bounds read in smb2_compound_op()
  smb/client: fix out-of-bounds read in symlink_data()
  smb: client: Zero-pad short GSS session keys per MS-SMB2
  smb: client: Use FullSessionKey for AES-256 encryption key derivation
  smb: client: use kzalloc to zero-initialize security descriptor buffer
  cifs: abort open_cached_dir if we don't request leases

5 weeks agoMerge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Fri, 8 May 2026 17:14:51 +0000 (10:14 -0700)] 
Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "There's two main series here, fixing issues that came up in the
  Microchip QSPI and Freescale i.MX drivers. Both of those could result
  in some quite noticable issues if they were encountered in production.
  We also have one minor documentation fix in the ch341 driver"

* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: ch341: correct company name in MODULE_DESCRIPTION
  spi: microchip-core-qspi: remove some inline markings
  spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
  spi: microchip-core-qspi: control built-in cs manually
  spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
  spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
  spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()

5 weeks agoMerge tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 8 May 2026 17:07:59 +0000 (10:07 -0700)] 
Merge tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "A straightforward fix for an incorrect description of one of the
  regulators on the Qualcomm PMH0101"

* tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: qcom-rpmh: Fix index for pmh0101 ldo16

5 weeks agobpf: tcp: Fix type confusion in bpf_tcp_sock().
Kuniyuki Iwashima [Mon, 4 May 2026 21:04:48 +0000 (21:04 +0000)] 
bpf: tcp: Fix type confusion in bpf_tcp_sock().

bpf_tcp_sock() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Calling bpf_setsockopt() in SOCKOPT prog triggers out-of-bounds
access to another slab object. [0]

Let's use sk_is_tcp().

[0]:
BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt (net/core/filter.c:5519)
Read of size 8 at addr ffff88801083d760 by task test_progs/1259

CPU: 1 UID: 0 PID: 1259 Comm: test_progs Tainted: G           OE       7.0.0-11175-gb5c111f4967b #1 PREEMPT(full)
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
 print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
 kasan_report (mm/kasan/report.c:595)
 sol_tcp_sockopt (net/core/filter.c:5519)
 __bpf_getsockopt (net/core/filter.c:5633)
 bpf_sk_getsockopt (net/core/filter.c:5654)
 bpf_prog_629ba00a1601e9f2__setsockopt+0x86/0x22c
 __cgroup_bpf_run_filter_setsockopt (./include/linux/bpf.h:1402 ./include/linux/filter.h:722 ./include/linux/filter.h:729 kernel/bpf/cgroup.c:81 kernel/bpf/cgroup.c:2026)
 do_sock_setsockopt (net/socket.c:2363)
 __x64_sys_setsockopt (net/socket.c:2406)
 do_syscall_64 (arch/x86/entry/syscall_64.c:63)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
RIP: 0033:0x7f85f82fe7de
Code: 55 48 63 c9 48 63 ff 45 89 c9 48 89 e5 48 83 ec 08 6a 2c e8 34 69 f7 ff c9 c3 66 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 e1
RSP: 002b:00007ffe59dcecd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85f82fe7de
RDX: 000000000000001c RSI: 0000000000000006 RDI: 000000000000000d
RBP: 00007ffe59dcef20 R08: 000000000000003c R09: 0000000000000000
R10: 00007ffe59dcef00 R11: 0000000000000202 R12: 00007ffe59dcf268
R13: 0000000000000003 R14: 00007f85f9da5000 R15: 000055b2f3201400
 </TASK>

The buggy address belongs to the object at ffff88801083d280
 which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
 allocated 1792-byte region [ffff88801083d280ffff88801083d980)

Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com
5 weeks agodrm/xe: Make decision to use Xe2-style blitter instructions a feature flag
Matt Roper [Thu, 7 May 2026 21:00:15 +0000 (14:00 -0700)] 
drm/xe: Make decision to use Xe2-style blitter instructions a feature flag

The blitter engines' MEM_COPY and MEM_SET instructions were added as
part of the same hardware change that introduced service copy engines
(i.e., BCS1-BCS8) which is why the driver checks for service copy engine
presence when deciding whether to use these instructions or the older
XY_* instructions.  However when making this decision the driver should
consider which engines are part of the hardware architecture, not which
engines are present/usable on the current device.  For graphics IP
versions that architecturally include service copy engines (i.e.,
everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and
MEM_COPY even in if all of the service copy engines wind up getting
fused off.  I.e., we need to decide based on whether the platform's
graphics descriptor contains these engines, rather than whether the
usable engine mask contains them.  This logic got broken when
gt->info.__engine_mask was removed, although in practice that mistake
has been harmless so far because there haven't been any hardware
SKUs that fuse off all of the service copy engines yet.

Replace the incorrect has_service_copy_support() function with a GT
feature flag that tracks more accurately whether the new blitter
instructions are usable.  In addition to fixing incorrect logic if all
service copies are fused off, the flag also makes it more obvious what
the calling code is trying to do; previously it wasn't terribly obvious
why "has service copy engines" was being used as the condition for using
different instructions on all copy engine types.

The new feature flag is named 'has_xe2_blt_instructions' because we
expect this flag to be set for all Xe2 and later platforms (i.e.,
everything officially supported by the Xe driver).  Technically there's
also one Xe1-era platform (PVC) that supports these engines/instructions
and will set this flag, but this still seems to be the most clear and
understandable name for the flag.

Fixes: 61549a2ee594 ("drm/xe: Drop __engine_mask")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
5 weeks agoDocumentation: core-api/cpu_hotplug: Remove stale cpu0_hotplug docs
Chao Gao [Thu, 7 May 2026 13:47:31 +0000 (06:47 -0700)] 
Documentation: core-api/cpu_hotplug: Remove stale cpu0_hotplug docs

Commit e59e74dc48a3 ("x86/topology: Remove CPU0 hotplug option")
removed the 'cpu0_hotplug' option, but its documentation remained in
cpu_hotplug.rst. Remove the stale entry.

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260507134732.254617-1-chao.gao@intel.com
5 weeks agoselftests/cgroup: Fix incorrect variable check in online_cpus()
Hongfu Li [Fri, 8 May 2026 03:34:53 +0000 (11:34 +0800)] 
selftests/cgroup: Fix incorrect variable check in online_cpus()

"OFFLINE_CPUS" is a literal string that is always non-empty. It should
be "$OFFLINE_CPUS" to check the variable's value instead.

Signed-off-by: Hongfu Li <lihongfu@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agosched_ext: Use offsetofend on both sides of the ops_cid layout assert
Tejun Heo [Fri, 8 May 2026 15:50:08 +0000 (05:50 -1000)] 
sched_ext: Use offsetofend on both sides of the ops_cid layout assert

sizeof() includes trailing struct pad, offsetofend() doesn't. On
32-bit PPC, sched_ext_ops_cid tail-pads 4 bytes past @priv and the
assert trips. Use offsetofend() on both sides.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605081637.DbH4SZ1E-lkp@intel.com/
Fixes: 7e655ed7b953 ("sched_ext: Add bpf_sched_ext_ops_cid struct_ops type")
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoselftests/sched_ext: Fix select_cpu_dfl link leak on early return
Cheng-Yang Chou [Fri, 8 May 2026 06:55:12 +0000 (14:55 +0800)] 
selftests/sched_ext: Fix select_cpu_dfl link leak on early return

If run() exits early via SCX_EQ/SCX_ASSERT (which calls return
directly), bpf_link__destroy() is never reached and the BPF
scheduler stays loaded. All subsequent tests then fail to attach
because SCX is not in the DISABLED state.

Move bpf_link into a context struct so cleanup() always destroys
it, regardless of how run() exits. Also skip waitpid() for children
where fork() returned -1, avoiding waitpid(-1,...) accidentally
reaping an unrelated child and triggering the early return path.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
5 weeks agoMerge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 8 May 2026 15:23:06 +0000 (08:23 -0700)] 
Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
  usual but then a large amount of fixes all over.

  core:
   - fix race condition in handle change ioctl

  fb-helper:
   - fix clipping

  rust:
   - fix unsound initialization
   - fix GEM state cleanup
   - fix wrong ARef import

  ttm:
   - update GPU MM stats on pool shrinking

  i915:
   - Re-enable ccs modifiers on dg2

  nova:
   - fix mailing list

  xe:
   - Add NULL check for media_gt in intel_hdcp_gsc_check_status
   - Fix EAGAIN sign in pf_migration_consume
   - Fix MMIO access using PF view instead of VF view during migration
   - Exclude indirect ring state page from ADS engine state size

  amdgpu:
   - GFX9 fixes
   - Hawaii SMU fixes
   - SDMA4 fix
   - GART fix
   - Userq fixes

  amdkfd:
   - GPUVM TLB flush fix
   - Hotplug fix

  radeon:
   - Hawaii SMU fixes

  bochs:
   - fix managed cleanup

  bridge:
   - tda998x: fix sparse warnings on type correctness

  etnaviv:
   - schedule armed jobs

  exynos:
   - managed bridge cleanup

  ivpu:
   - disallow reexport of GEM buffer objects

  noveau:
   - revert support for GA100

  panel:
   - boe-tv101wum-nl16: use correct MIPI_DSI mode
   - feyjang-fy07024di26a30d: fix error reporting
   - himax-hx83102: use correct MIPI_DSI mode
   - himax-hx83121a: fix error checks
   - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER

  qaic:
   - fix RAS message handling

  qxl:
   - clean up polling

  sti:
   - managed bridge cleanup

* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
  drm: Set old handle to NULL before prime swap in change_handle
  drm/bochs: Drop manual put on probe error path
  drm/xe/guc: Exclude indirect ring state page from ADS engine state size
  drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
  drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
  drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
  drm/exynos: remove bridge when component_add fails
  drm/amdgpu: nuke amdgpu_userq_fence_slab v2
  drm/amdgpu/userq: fix access to stale wptr mapping
  drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
  drm/amdgpu: zero-initialize GART table on allocation
  drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
  drm/radeon: add missing revision check for CI
  drm/amdgpu/pm: align Hawaii mclk workaround with radeon
  drm/amdgpu/pm: add missing revision check for CI
  drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
  drm/amdkfd: Make all TLB-flushes heavy-weight
  drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
  drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
  drm/panel: feiyang-fy07024di26a30d: return display-on error
  ...

5 weeks agodrm/ttm: Fix ttm_bo_swapout() infinite LRU walk on swapout failure
Thomas Hellström [Tue, 28 Apr 2026 09:44:42 +0000 (11:44 +0200)] 
drm/ttm: Fix ttm_bo_swapout() infinite LRU walk on swapout failure

When ttm_tt_swapout() fails, the current code calls
ttm_resource_add_bulk_move() followed by ttm_resource_move_to_lru_tail()
to restore the resource's bulk_move membership.

However, ttm_resource_move_to_lru_tail() places the resource at the tail
of the LRU list which, relative to the walk cursor's hitch node (placed
immediately after the resource when it was yielded), puts the resource
*in front of the* the hitch. The next list_for_each_entry_continue() from
the hitch finds the same resource again, causing an infinite loop.

Fix by deferring del_bulk_move to the success path only.

On the success path, TTM_TT_FLAG_SWAPPED has just been set by
ttm_tt_swapout() but the resource is still tracked in the bulk_move range,
so ttm_resource_del_bulk_move()'s !ttm_resource_unevictable() guard would
incorrectly skip the removal. Introduce
ttm_resource_del_bulk_move_unevictable() which bypasses that guard.

Reported-by: Jatin Kataria <jkataria@netflix.com>
Fixes: fc5d96670eb2 ("drm/ttm: Move swapped objects off the manager's LRU list")
Cc: Christian König <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <stable@vger.kernel.org> # v6.13+
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260428094442.16985-1-thomas.hellstrom@linux.intel.com
5 weeks agoMerge tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Fri, 8 May 2026 15:18:43 +0000 (17:18 +0200)] 
Merge tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB serial device ids for 7.1-rc3

Here are some new modem device ids.

This one has been in linux-next with no reported issues.

* tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: option: add Telit Cinterion LE910Cx compositions

5 weeks agoMerge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 8 May 2026 15:16:07 +0000 (08:16 -0700)] 
Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:
 "Core:
   - Cache-flushing fix for non-x86 platforms

  AMD-Vi:
   - Security fix when SEV-SNP is enabled
   - Operator precedence fix in DTE setting"

* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/amd: Fix precedence order in set_dte_passthrough()
  iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
  iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
  iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19

5 weeks agosched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work
Zqiang [Fri, 8 May 2026 11:50:45 +0000 (19:50 +0800)] 
sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work

For built with PREEMPT_RT kernels, the scx_disable_irq_workfn() is
called from per-cpu irq_work kthreads context, this means that
when call the scx_dump_state() in the scx_disable_irq_workfn() to
output current->comm/pid, it always output current irq_work kthread's
comm/pid. this commit therefore use the IRQ_WORK_INIT_HARD() to
initialize sch->disable_irq_work to make scx_disable_irq_workfn() is
called from hardirq context.

Fixes: f4a6c506d118 ("sched_ext: Always bounce scx_disable() through irq_work")
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoarm64/entry: Fix arm64-specific rseq brokenness
Mark Rutland [Fri, 8 May 2026 14:20:23 +0000 (15:20 +0100)] 
arm64/entry: Fix arm64-specific rseq brokenness

Mathias Stearn reports that since v6.19, there are two big issues
affecting rseq:

(1) On arm64 specifically, rseq critical sections aren't aborted when
    they should be.

(2) The 'cpu_id_start' field is no longer written by the kernel in all
    cases it used to be, including some cases where TCMalloc depends on
    the kernel clobbering the field.

This patch fixes issue #1. This patch DOES NOT fix issue #2, which will
need to be addressed by other patches.

The arm64-specific brokenness is a result of commits:

  2fc0e4b4126c ("rseq: Record interrupt from user space")
  39a167560a61 ("rseq: Optimize event setting")

The first commit failed to add a call to rseq_note_user_irq_entry() on
arm64. Thus arm64 never sets rseq_event::user_irq to record that it may
be necessary to abort an active rseq critical section upon return to
userspace. On its own, this commit had no functional impact as the value
of rseq_event::user_irq was not consumed.

The second commit relied upon rseq_event::user_irq to determine whether
or not to bother to perform rseq work when returning to userspace. As
rseq_event::user_irq wasn't set on arm64, this work would be skipped,
and consequently an active rseq critical section would not be aborted.

Fix this by giving arm64 syscall-specific entry/exit paths, and
performing the relevant logic in syscall and non-syscall paths,
including calling rseq_note_user_irq_entry() for non-syscall entry.

Currently arm64 cannot use syscall_enter_from_user_mode(),
syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to
ordering constraints with exception masking, and risk of ABI breakage
for syscall tracing/audit/etc. For the moment the entry/exit logic is
left as arm64-specific, directly using enter_from_user_mode() and
exit_to_user_mode(), but mirroring the generic code.

I intend to follow up with refactoring/cleanup, as we did for kernel
mode entry paths in commit:

  041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()")

... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly.

Fixes: 39a167560a61 ("rseq: Optimize event setting")
Reported-by: Mathias Stearn <mathias@mongodb.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/
Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com
5 weeks agox86/kexec: Push kjump return address even for non-kjump kexec
David Woodhouse [Tue, 28 Apr 2026 20:59:52 +0000 (21:59 +0100)] 
x86/kexec: Push kjump return address even for non-kjump kexec

The version of purgatory code shipped by kexec-tools attempts to look above
the top of its stack to find a return address for a kjump, even in a non-kjump
kexec.

After the commit in Fixes: the word above the stack might not be there,
leading to a fault (which is at least now caught by my exception-handling code
in kexec).

That commit fixed things for the actual kjump path, but no longer
"gratuitously" pushes the unused return address to the stack in the non-kjump
path. Put that *back* in the non-kjump path, to prevent purgatory from
crashing when trying to access it.

Fixes: 2cacf7f23a02 ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context")
Reported-by: Rohan Kakulawaram <rohanka@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Rohan Kakulawaram <rohanka@google.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/32d627134143ffd957891cb697138e839c623211.camel@infradead.org
5 weeks agontfs: avoid leaking uninitialised bytes in new security descriptors
DaeMyung Kang [Thu, 7 May 2026 15:48:52 +0000 (00:48 +0900)] 
ntfs: avoid leaking uninitialised bytes in new security descriptors

ntfs_sd_add_everyone() builds the on-disk security descriptor for a
newly created file by kmalloc()'ing a buffer and then partially
filling it in:

sd = kmalloc(sd_len, GFP_NOFS);
...
sd->revision = 1;
sd->control = SE_DACL_PRESENT | SE_SELF_RELATIVE;
...

The buffer is then handed to ntfs_attr_add() and persisted as the
SECURITY_DESCRIPTOR attribute of the new MFT record.  The descriptor
covers a relative security descriptor header, two SIDs (owner and
group), an ACL header, and a single ACE, but several fields inside
those structures are never written before the buffer is committed
to disk:

  - struct security_descriptor_relative
        @alignment (1 byte)
        @sacl (4 bytes; SE_SACL_PRESENT is not set
                                 but the offset still reaches disk)

  - struct ntfs_sid (3 instances: owner, group, ACE.sid)
        identifier_authority.value[0..4] (5 bytes per SID, 15 total
                                          - only value[5] is set)

  - struct ntfs_acl
        @alignment1 (1 byte)
        @alignment2 (2 bytes)

That is 23 bytes of uninitialised slab memory persisted to disk for
every new file or directory the legacy ntfs driver creates.  The
"+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0],
which the existing code does explicitly write to zero, so it is
not part of the leak.

Anything later able to read the SECURITY_DESCRIPTOR attribute - the
same NTFS volume mounted on Windows or by another NTFS reader, an
offline forensics tool, an unprivileged user that ends up with read
access to the volume - can recover those bytes.  The leak persists
for the lifetime of the file on disk, not just the lifetime of the
kernel that wrote it.

Switch the allocation to kzalloc() so every byte the on-disk
descriptor covers is zero before the explicit initialisations run.
While there, replace the bare "return -1" allocation-failure path
with a proper -ENOMEM so the error reaches userspace as a meaningful
errno instead of an unrelated -EPERM.

Found by inspection while auditing fs/ntfs new-inode paths.

Fixes: af0db57d4293 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
5 weeks agontfs: fix out-of-bounds write in ntfs_index_walk_down()
DaeMyung Kang [Thu, 7 May 2026 02:18:31 +0000 (11:18 +0900)] 
ntfs: fix out-of-bounds write in ntfs_index_walk_down()

ntfs_index_walk_down() used to update the index traversal depth
directly before writing parent_pos[] and parent_vcn[]. A malformed
directory index with too many child-node levels can therefore advance
pindex past MAX_PARENT_VCN and write past the fixed arrays in struct
ntfs_index_context, corrupting context state used by later index
traversal.

Use ntfs_icx_parent_inc() for walk-down transitions so the existing
depth limit is enforced before the arrays are updated. Make the helper
check the limit before incrementing pindex so failed callers do not
leave the context at an out-of-range depth.

This is reachable by iterating a crafted NTFS directory after the volume
has been mounted, including read-only mounts. The reproducer uses
getdents64() on an index root that points to an excessively deep chain
of child index blocks.

A crafted directory index with a chain of child-node entries reproduced
UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and
subsequent KASAN reports in ntfs_index_walk_up(). With this change, the
same image is rejected with "Index is over 32 level deep" and no KASAN
or UBSAN report is emitted.

Fixes: 0a8ac0c1fa0b ("ntfs: update directory operations")
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>