]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
3 weeks agoMIPS: ip22-gio: switch to dynamic root device
Johan Hovold [Fri, 24 Apr 2026 10:28:48 +0000 (12:28 +0200)] 
MIPS: ip22-gio: switch to dynamic root device

Driver core expects devices to be dynamically allocated and will, for
example, complain loudly when no release function has been provided.

Use root_device_register() to allocate and register the root device
instead of open coding using a static device.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 weeks agoMIPS: ip22-gio: fix device reference leak in probe
Johan Hovold [Fri, 24 Apr 2026 10:28:47 +0000 (12:28 +0200)] 
MIPS: ip22-gio: fix device reference leak in probe

The gio probe function needlessly takes a device reference which is
never released and therefore prevents unbound gio devices from being
freed.

Fixes: e84de0c61905 ("MIPS: GIO bus support for SGI IP22/28")
Cc: stable@vger.kernel.org # 3.3
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 weeks agoMIPS: ip22-gio: fix gio device memory leak
Johan Hovold [Fri, 24 Apr 2026 10:28:46 +0000 (12:28 +0200)] 
MIPS: ip22-gio: fix gio device memory leak

The gio device release callback was never wired up so gio devices are
not freed when the last reference is dropped.

Fixes: e84de0c61905 ("MIPS: GIO bus support for SGI IP22/28")
Cc: stable@vger.kernel.org # 3.3
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 weeks agoMIPS: ip22-gio: fix kfree() of static object
Johan Hovold [Fri, 24 Apr 2026 10:28:45 +0000 (12:28 +0200)] 
MIPS: ip22-gio: fix kfree() of static object

The gio bus root device is a statically allocated object which must not
be freed by kfree() on failure to register the device or bus.

Fixes: 82242d28ff8b ("MIPS: IP22: Add missing put_device call")
Cc: stable@vger.kernel.org # 3.17
Cc: Levente Kurusa <levex@linux.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 weeks agoALSA: doc: usb-audio: Add doc for QUIRK_FLAG_IFB_SILENCE_ON_EMPTY
Rong Zhang [Tue, 26 May 2026 17:49:23 +0000 (01:49 +0800)] 
ALSA: doc: usb-audio: Add doc for QUIRK_FLAG_IFB_SILENCE_ON_EMPTY

QUIRK_FLAG_IFB_SILENCE_ON_EMPTY was introduced into usb-audio before
without appropriate documentation, so add it.

Fixes: a23812004228 ("ALSA: usb-audio: add IFB_SILENCE_ON_EMPTY quirk for Behringer Flow 8")
Signed-off-by: Rong Zhang <i@rong.moe>
Link: https://patch.msgid.link/20260527-uac-quirk-get-cur-vol-v1-1-e9362b712e5e@rong.moe
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoALSA: usb-audio: Add quirk for Novation Mininova
Uwe Küchler [Tue, 26 May 2026 16:20:33 +0000 (18:20 +0200)] 
ALSA: usb-audio: Add quirk for Novation Mininova

Add a device-specific quirk for the Novation Mininova synthesizer
(USB ID 1235:001e) to enable proper recognition and functionality
as a MIDI device.

Signed-off-by: Uwe Küchler <uwe@kuechler.org>
Link: https://patch.msgid.link/20260526162033.7513-1-uwe@kuechler.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoALSA: seq: oss: Fix UAF at handling events with embedded SysEx data
Takashi Iwai [Tue, 26 May 2026 15:28:41 +0000 (17:28 +0200)] 
ALSA: seq: oss: Fix UAF at handling events with embedded SysEx data

The OSS sequencer processes the input MIDI bytes into a sequencer
event to be dispatched later (in snd_seq_oss_midi_putc() called from
snd_seq_oss_process_event()).  When it's a SysEx data, the event
record contains data.ext.ptr pointer to the original SysEx bytes, and
the referred data is copied into the pool afterwards at dispatching.
The problem is that, if the sequencer port gets closed concurrently
before the dispatch, the OSS sequencer core also releases the
resources (in snd_seq_oss_midi_check_exit_port()), while the pending
event may hold a stale pointer, eventually leading to a UAF at a later
dispatch.

Fortunately, there is already a refcounting mechanism (snd_use_lock_t)
for the OSS MIDI device access, and for addressing the issue above, we
just need to extend the refcount until the event gets dispatched.

This patch extends snd_seq_oss_process_event() to give back the
refcount object, which is in turn released after calling the sequencer
dispatcher with the given event in the caller side.

According to the original report, KASAN report as below:

KASAN slab-use-after-free in snd_seq_event_dup+0x40c/0x470
RIP: 0033:0x7f2cb66a6340
Read of size 6
Call trace:
  dump_stack_lvl+0x73/0xb0 (?:?)
  print_report+0xd1/0x650 (?:?)
  srso_alias_return_thunk+0x5/0xfbef5 (?:?)
  __virt_addr_valid+0x1a7/0x340 (?:?)
  kasan_complete_mode_report_info+0x64/0x200 (?:?)
  kasan_report+0xf7/0x130 (?:?)
  snd_seq_event_dup+0x40c/0x470 (?:?)
  kasan_check_range+0x10c/0x1c0 (?:?)
  __asan_memcpy+0x27/0x70 (?:?)
  snd_seq_event_dup+0x9/0x470 (?:?)
  snd_seq_client_enqueue_event+0x139/0x240 (?:?)
  _raw_spin_unlock_irqrestore+0x4b/0x60 (?:?)
  snd_seq_kernel_client_enqueue+0x102/0x120 (?:?)
  snd_seq_oss_write+0x416/0x4e0 (?:?)
  apparmor_file_permission+0x20/0x30 (?:?)
  odev_write+0x3b/0x60 (?:?)
  vfs_write+0x1ce/0x850 (?:?)
  lock_release+0xc8/0x2a0 (?:?)
  __kasan_check_write+0x18/0x20 (?:?)
  __mutex_unlock_slowpath+0x129/0x510 (?:?)
  ksys_write+0xe1/0x180 (?:?)
  mutex_unlock+0x16/0x20 (?:?)
  odev_ioctl+0x65/0xc0 (?:?)
  __x64_sys_write+0x46/0x60 (?:?)
  x64_sys_call+0x7d/0x20d0 (?:?)
  do_syscall_64+0xc1/0x360 (arch/x86/entry/syscall_64.c:87)
  entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?)

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: Zhang Cen <rollkingzzc@gmail.com>
Closes: https://lore.kernel.org/20260521233900.478153-1-rollkingzzc@gmail.com
Link: https://patch.msgid.link/20260526152843.617503-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoALSA: xen-front: Connect event channel after stream prepare
Cássio Gabriel [Tue, 26 May 2026 12:48:27 +0000 (09:48 -0300)] 
ALSA: xen-front: Connect event channel after stream prepare

The request channel must be connected from ALSA .open(), because hw-rule
queries and the stream open request use it. The event channel is
different: XENSND_EVT_CUR_POS handling uses ALSA runtime buffer and
period geometry, and the corresponding Xen stream parameters are not
submitted to the backend until .prepare() sends XENSND_OP_OPEN.

Currently .open() connects both channels. A backend current-position
event, or a stale event queued for an earlier stream instance, can
therefore reach xen_snd_front_alsa_handle_cur_pos() before
runtime->buffer_size and runtime->period_size are valid.

Add a per-channel connection helper, connect only the request channel in
.open(), connect the event channel after a successful stream prepare,
and disconnect it before stream close/free. Re-check the event-channel
state after taking ring_io_lock so disconnecting the event channel
synchronizes against a threaded IRQ that passed the initial lockless
state test. Keep defensive runtime geometry checks in the position
handler.

Fixes: 1cee559351a7 ("ALSA: xen-front: Implement ALSA virtual sound driver")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260526-alsa-xen-event-channel-fixes-v1-2-91d3a6a50778@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoALSA: xen-front: Reset event channel state on stream clear
Cássio Gabriel [Tue, 26 May 2026 12:48:26 +0000 (09:48 -0300)] 
ALSA: xen-front: Reset event channel state on stream clear

xen_snd_front_evtchnl_pair_clear() resets evt_next_id for both
channels. That is correct for the request channel, where evt_next_id is
used to allocate the next request id. It is wrong for the event channel:
incoming events are validated against evt_id, and evt_id is incremented
by evtchnl_interrupt_evt().

This leaves the expected event id from the previous stream instance. A
backend that restarts event ids for a reopened stream can then have valid
current-position events dropped until the stale frontend id catches up.

Reset evt_id for the event channel. Also advance the event-page consumer
to the current producer while clearing the stream, so obsolete events
queued for the previous stream instance are not delivered to the next
ALSA runtime.

Fixes: 1cee559351a7 ("ALSA: xen-front: Implement ALSA virtual sound driver")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260526-alsa-xen-event-channel-fixes-v1-1-91d3a6a50778@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoALSA: usb-audio: Add iface reset and delay quirk for TAE1160 USB Audio
Lianqin Hu [Wed, 27 May 2026 03:33:08 +0000 (03:33 +0000)] 
ALSA: usb-audio: Add iface reset and delay quirk for TAE1160 USB Audio

Setting up the interface when suspended/resumeing fail on this card.
Adding a reset and delay quirk will eliminate this problem.

usb 1-1: new full-speed USB device number 2 using xhci-hcd
usb 1-1: New USB device found, idVendor=25aa, idProduct=600b
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-1: Product: TAE1159
usb 1-1: Manufacturer: Generic
usb 1-1: SerialNumber: 20210726905926

Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
Link: https://patch.msgid.link/TYUPR06MB621736D7C85D43200E54E740D2082@TYUPR06MB6217.apcprd06.prod.outlook.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoALSA: hda/cs420x: Add CS4208 fixup for iMac16,1
Jakub Pisarczyk [Tue, 26 May 2026 20:18:30 +0000 (22:18 +0200)] 
ALSA: hda/cs420x: Add CS4208 fixup for iMac16,1

The 21.5" Retina 4K iMac (Late 2015, DMI product name "iMac16,1") ships
with a Cirrus Logic CS4208 codec wired to an external speaker amplifier
enabled through codec GPIO0 -- the same arrangement as the late-2013
MacBookPro 11,x. Without a matching entry in cs4208_mac_fixup_tbl[] the
fixup picker logs:

    snd_hda_codec_cs420x hdaudioC1D0: CS4208: picked fixup  for codec SSID 106b:0000

i.e. an empty fixup name, GPIO0 stays low, the external amp is never
powered up, and the internal speakers are silent on a stock kernel.

The codec SSID reported by hardware is 0x106b:0x7f00. Reusing CS4208_MBP11
(GPIO0 + SPDIF switch fixup) makes the internal speakers and S/PDIF
output work out of the box, removing the need for users to set
`options snd_hda_intel model=mbp11` via /etc/modprobe.d/.

Tested on iMac16,1 (kernel 6.17.0): four internal drivers
(Left tweeter, Left woofer, Right tweeter, Right woofer, exposed as the
4 channels of the analog-surround-40 ALSA profile) produce audio after
the fixup is applied.

Signed-off-by: Jakub Pisarczyk <pisarz77@gmail.com>
Link: https://patch.msgid.link/20260526201830.34097-1-pisarz77@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoALSA: hda/realtek: add quirk for HP Dragonfly Folio G3 2-in-1
Fabian Lippold [Tue, 26 May 2026 15:41:01 +0000 (17:41 +0200)] 
ALSA: hda/realtek: add quirk for HP Dragonfly Folio G3 2-in-1

Add PCI quirk for HP Dragonfly Folio G3 (PCI ID 103c:8a06) to select the
CS35L41 SPI4 & GPIO LED fixup variant.

Signed-off-by: Fabian Lippold <fabianlippold1184@gmail.com>
Link: https://patch.msgid.link/20260526154418.1850568-3-fabianlippold1184@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agoARM: dts: aspeed: sanmiguel: Fix the CPU_CHIPTHROT linename
Potin Lai [Sat, 23 May 2026 02:28:08 +0000 (10:28 +0800)] 
ARM: dts: aspeed: sanmiguel: Fix the CPU_CHIPTHROT linename

Fix the GPIO linenames for CPU_CHIPTHROT signals.

The signals were incorrectly marked as output ("-O") while they are
actually input signals ("-I").

[arj: Drop list of changed names]

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Link: https://patch.msgid.link/20260523-potin-update-sanmiguel-dts-20260522-v1-2-169f5fceb5f9@quantatw.com
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
3 weeks agoARM: dts: aspeed: sanmiguel: Add IOEXP interrupt pin settings
Potin Lai [Sat, 23 May 2026 02:28:07 +0000 (10:28 +0800)] 
ARM: dts: aspeed: sanmiguel: Add IOEXP interrupt pin settings

Kernel dmesg reports IRQ #44 being disabled due to unhandled
interrupts from multiple PCA953x IO expanders:

    [ 447.047861] irq 44: nobody cared (try booting with the "irqpoll" option)
    [ 447.063124] handlers:
    [ 447.068176] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.087268] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.106344] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.125421] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.144513] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.163587] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.182663] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.201756] [<2ab869ad>] irq_default_primary_handler threaded [<b8adc310>] pca953x_irq_handler
    [ 447.220837] Disabling IRQ #44

The affected IOEXP nodes are missing interrupt pin configuration in
the device tree, causing the interrupt line to remain asserted and
resulting in repeated unhandled IRQ events.

Add the required interrupt-related properties for the affected IOEXP
devices to ensure proper interrupt handling and prevent the IRQ from
being disabled.

[arj: Drop markdown code-block fence, favour indentation]

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Link: https://patch.msgid.link/20260523-potin-update-sanmiguel-dts-20260522-v1-1-169f5fceb5f9@quantatw.com
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
3 weeks agoARM: dts: aspeed: clemente: Remove IOB NIC TMP421 nodes
Mike Hsieh [Fri, 22 May 2026 10:07:59 +0000 (18:07 +0800)] 
ARM: dts: aspeed: clemente: Remove IOB NIC TMP421 nodes

Remove the TMP421 sensor entry from the DTS, as it is no longer the
primary telemetry source.

Accessing the CX8 NIC via I2C while it is powered off causes voltage
leakage on the bus, leading to EEPROM corruption on shared I2C devices.
Removing this node prevents the BMC from initiating traffic to the NIC
during initialization, protecting the integrity of the shared bus.

Signed-off-by: Mike Hsieh <mike.quanta.115@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
3 weeks agoARM: dts: aspeed: Enable networking for Asus Kommando IPMI Card
Anirudh Srinivasan [Tue, 31 Mar 2026 14:18:00 +0000 (09:18 -0500)] 
ARM: dts: aspeed: Enable networking for Asus Kommando IPMI Card

Adds the DT nodes needed for ethernet support for Asus Kommando, with
phy mode set to rgmii-id.

When this DT was originally added, the phy mode was set to rgmii (which
was incorrect). It was suggested to remove networking support from the
DT till the Aspeed networking driver was patched so that the correct phy
mode could be used.

The discussion in [1] mentions that u-boot was inserting clk delays that
weren't needed, which resulted in needing to set the phy mode in linux
to rgmii incorrectly. The solution suggested there was to patch u-boot to
no longer insert these clk delays and use rgmii-id as the phy mode for
any future DTs added to linux.

This DT was tested (on the OpenBMC u-boot fork [2]) with a u-boot DT
modified to insert clk delays of 0 (instead of patching u-boot itself).
[3] adds a u-boot DT for this device (without networking) and describes
how to patch it to add networking support. If this patched DT is used,
then networking works with rgmii-id phy mode in both u-boot and linux.

[1] https://lore.kernel.org/linux-aspeed/ef88bb50-9f2c-458d-a7e5-dc5ecb9c777a@lunn.ch/
[2] https://github.com/openbmc/u-boot/tree/v2019.04-aspeed-openbmc
[3] https://lore.kernel.org/openbmc/20260328-asus-kommando-v2-1-2a656f8cd314@gmail.com/

Signed-off-by: Anirudh Srinivasan <anirudhsriniv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260331-asus-kommando-networking-v2-1-f7d72ae5d40d@gmail.com
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
3 weeks agoARM: dts: aspeed: rainiera6: Add Meta Rainiera6 BMC
Neil Cheng [Tue, 19 May 2026 02:38:38 +0000 (10:38 +0800)] 
ARM: dts: aspeed: rainiera6: Add Meta Rainiera6 BMC

Add device tree for the Meta (Facebook) Rainiera6 compute node, based on
AST2600 BMC.

Signed-off-by: Neil Cheng <neilcheng0417@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
3 weeks agodt-bindings: arm: aspeed: Add Meta Rainiera6 board
Neil Cheng [Tue, 19 May 2026 02:38:37 +0000 (10:38 +0800)] 
dt-bindings: arm: aspeed: Add Meta Rainiera6 board

Document the new compatibles used on Meta Rainiera6.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Neil Cheng <neilcheng0417@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
3 weeks agoMerge branch 'rtnetlink-rtnl-avoidance-in-rtnl_getlink-and-rtnl_dump_ifinfo'
Jakub Kicinski [Wed, 27 May 2026 02:20:20 +0000 (19:20 -0700)] 
Merge branch 'rtnetlink-rtnl-avoidance-in-rtnl_getlink-and-rtnl_dump_ifinfo'

Eric Dumazet says:

====================
rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo()

Many shell scripts invoke iproute2 commands specifying a device by
its name.

This series improves their performance avoiding RTNL acquisition
for their (repeated) name->index conversion.
====================

Link: https://patch.msgid.link/20260525083542.1565964-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agortnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo()
Eric Dumazet [Mon, 25 May 2026 08:35:42 +0000 (08:35 +0000)] 
rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo()

When user requests RTEXT_FILTER_NAME_ONLY flag, we limit the dump
parts to:

 - struct nlmsghdr
 - IFLA_IFNAME
 - IFLA_PROP_LIST (alternate names)

- This saves space in the dump, pushing more devices per system call.
- This can be done without acquiring RTNL.

I still have a medium term goal to avoid RTNL in rtnl_dump_ifinfo()
regardless of RTEXT_FILTER_NAME_ONLY being used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agortnetlink: do not assume RTNL is held in link_master_filtered()
Eric Dumazet [Mon, 25 May 2026 08:35:41 +0000 (08:35 +0000)] 
rtnetlink: do not assume RTNL is held in link_master_filtered()

RTNL might be no longer held by the caller in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agortnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY
Eric Dumazet [Mon, 25 May 2026 08:35:40 +0000 (08:35 +0000)] 
rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY

When RTEXT_FILTER_NAME_ONLY is requested, rtnl_fill_ifinfo()
is dumping device attributes which do not need RTNL protection.

Many shell scripts invoke iproute2 commands specifying a device by
its name. After this patch, they will no longer add RTNL pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: defer netdev_name_node_alt_flush() call to netdev_run_todo()
Eric Dumazet [Mon, 25 May 2026 08:35:39 +0000 (08:35 +0000)] 
net: defer netdev_name_node_alt_flush() call to netdev_run_todo()

In the following patch, we want to call rtnl_fill_prop_list() without
RTNL being held, but after a device reference was taken.

We need to free altnames in netdev_run_todo() instead of
unregister_netdevice_many_notify().

Freeing will only happen once all device references
have been released.

Note that dev->name_node serves as the anchor for altnames,
thus must be also freed in netdev_run_todo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agortnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
Eric Dumazet [Mon, 25 May 2026 08:35:38 +0000 (08:35 +0000)] 
rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()

Avoid corrupting a netlink message and confuse user space in the
very unlikely case rtnl_fill_prop_list was able to produce a very big
nested element.

This is extremely unlikely, because rtnl_prop_list_size()
provisions nla_total_size(ALTIFNAMSIZ) per altname.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: thunderx: fix PTP device ref leak in nicvf_probe()
Haoxiang Li [Mon, 25 May 2026 08:26:11 +0000 (16:26 +0800)] 
net: thunderx: fix PTP device ref leak in nicvf_probe()

cavium_ptp_get() acquires a reference to the PTP PCI device
through pci_get_device(). If any initialization step fails
after cavium_ptp_get(), the PTP PCI device reference is leaked.
Add a common error path to release the PTP reference before
returning from probe failures.

Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20260525082611.61817-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoipv6: validate extension header length before copying to cmsg
Qi Tang [Sat, 23 May 2026 14:32:45 +0000 (22:32 +0800)] 
ipv6: validate extension header length before copying to cmsg

ip6_datagram_recv_specific_ctl() builds IPV6_{HOPOPTS,DSTOPTS,RTHDR}
cmsgs (and their IPV6_2292* legacy counterparts) by trusting the
on-wire hdrlen byte (ptr[1]) when computing the put_cmsg() length.
The length was validated only at parse time (ipv6_parse_hopopts(),
etc.).  An nftables payload-write expression can rewrite hdrlen after
parsing and before the skb reaches recvmsg; the write itself is
in-bounds but put_cmsg() then reads up to ((hdrlen+1) << 3) = 2040
bytes from an 8-byte header.  nftables is reachable from an
unprivileged user namespace, so this is an unprivileged
slab-out-of-bounds read:

  BUG: KASAN: slab-out-of-bounds in put_cmsg+0x3ac/0x540
   put_cmsg+0x3ac/0x540
   udpv6_recvmsg+0xca0/0x1250
   sock_recvmsg+0xdf/0x190
   ____sys_recvmsg+0x1b1/0x620

Add ipv6_get_exthdr_len() which validates that at least two bytes
are accessible before reading the hdrlen field, then checks the
computed length against skb_tail_pointer(skb), returning 0 on
failure.  Extension headers are kept in the linear skb area by
pskb_may_pull() during input, so skb_tail_pointer() is the correct
bound.

Use ipv6_get_exthdr_len() at all non-AH call sites: the five
standalone cmsg blocks (HbH, 2292HbH, 2292DSTOPTS x2, 2292RTHDR)
and the three standard cases in the extension-header walk loop
(DSTOPTS, ROUTING, default).  AH retains an inline bounds check
because its length formula differs ((ptr[1]+2)<<2).

The walk loop also gets a pre-read bounds check at the top to
validate ptr before any case accesses ptr[0] or ptr[1].

When the walk loop detects a corrupted header, return from the
function instead of continuing to process later socket options.

Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260523143245.2281415-1-tpluszz77@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: hsr: require valid EOT supervision TLV
Luka Gejak [Sat, 23 May 2026 13:04:20 +0000 (15:04 +0200)] 
net: hsr: require valid EOT supervision TLV

Supervision frames are only valid if terminated with a zero-length EOT
TLV. The current check fails to reject non-EOT entries as the terminal
TLV, potentially allowing malformed supervision traffic.

Fix this by strictly requiring the terminal TLV to be HSR_TLV_EOT with
a length of zero.

Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260523130420.62144-1-luka.gejak@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE
Sean Shen [Tue, 26 May 2026 13:07:16 +0000 (22:07 +0900)] 
ksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE

FSCTL_SET_SPARSE in fsctl_set_sparse() modifies the file's sparse
attribute and saves it through xattr without any permission checks.

This exposes two issues:

1) A client on a read-only share can change the sparse attribute
   on files it opened, even though the share is read-only.
   Other FSCTL write operations already check
   test_tree_conn_flag(work->tcon, KSMBD_TREE_CONN_FLAG_WRITABLE),
   but FSCTL_SET_SPARSE does not.

2) Even on writable shares, clients without FILE_WRITE_DATA or
   FILE_WRITE_ATTRIBUTES access should not modify the sparse
   attribute. Similar handle-level checks exist in other functions
   but are missing here.

Add both share-level writable check and per-handle access check.
Use goto out on error to avoid leaking file references.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steve French <smfrench@gmail.com>
Signed-off-by: Sean Shen <grayhat@foxmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agoksmbd: release ksmbd_inode ref via ksmbd_inode_put on lookup paths
Aleksandr Golovnya [Mon, 25 May 2026 18:50:18 +0000 (01:50 +0700)] 
ksmbd: release ksmbd_inode ref via ksmbd_inode_put on lookup paths

ksmbd_query_inode_status() and ksmbd_lookup_fd_inode() both take a
reference on a ksmbd_inode via __ksmbd_inode_lookup() (which performs
atomic_inc_not_zero()) and later release it using a bare
atomic_dec(&ci->m_count).  Unlike ksmbd_inode_put(), a bare
atomic_dec() does not check whether the reference count has reached
zero, so if the caller happens to drop the last reference, the
ksmbd_inode is leaked: it stays in the global inode hash table with
m_count == 0, future __ksmbd_inode_lookup() calls reject it via
atomic_inc_not_zero(), and ksmbd_inode_free() is never invoked.

The race is:

    T1: __ksmbd_inode_lookup()    -> atomic_inc_not_zero(): m_count = 2
    T2: ksmbd_inode_put()         -> atomic_dec_and_test():  m_count = 1
                                                            (not freed)
    T1: atomic_dec(&ci->m_count)                          ->  m_count = 0
        return                                            (LEAK)

In ksmbd_lookup_fd_inode() the matched-fp path (which now also uses
ksmbd_inode_put()) cannot currently reach m_count == 0 because the
matched ksmbd_file holds its own reference on ci, but converting it to
the proper API keeps the three call sites consistent and avoids
future regressions if the locking changes.

Because ksmbd_inode_put() may free the ksmbd_inode if this drops the
last reference, the call must happen after up_read(&ci->m_lock) on the
two affected paths in ksmbd_lookup_fd_inode().  On the no-match path
this is a pure reordering; on the matched path ksmbd_fp_get() is
moved above the unlock so that the returned ksmbd_file is pinned
before the inode reference is released.

Signed-off-by: Aleksandr Golovnya <cofedish@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agoksmbd: OOB read regression in smb_check_perm_dacl() ACE-walk loops
Ali Ganiyev [Mon, 25 May 2026 01:23:47 +0000 (10:23 +0900)] 
ksmbd: OOB read regression in smb_check_perm_dacl() ACE-walk loops

Commit d07b26f39246 ("ksmbd: require minimum ACE size in
smb_check_perm_dacl()") introduced a transposed bounds check:

    if (offsetof(struct smb_ace, sid) + aces_size < CIFS_SID_BASE_SIZE)

Since offsetof(..sid) is 8 and CIFS_SID_BASE_SIZE is 8, this evaluates
to `aces_size < 0`. Because `aces_size` is always non-negative, this
check becomes dead code and never breaks the loop.

Worse, that commit removed the old 4-byte guard, meaning the loop now
reads `ace->size` (offset 2) even when `aces_size` is 0-3 bytes. This
re-opens a 2-byte heap out-of-bounds (OOB) read past the pntsd allocation
during subsequent SMB2_CREATE operations.

Fix this by properly transposing the comparison to require at least
16 bytes (8-byte offset + 8-byte SID base), matching the correct form
used in smb_inherit_dacl().

Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()")
Cc: stable@vger.kernel.org
Signed-off-by: Ali Ganiyev <ali.qaniyev@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agoMerge tag 'nfc-7.1-rc6' of https://codeberg.org/linux-nfc/linux
Jakub Kicinski [Wed, 27 May 2026 01:32:34 +0000 (18:32 -0700)] 
Merge tag 'nfc-7.1-rc6' of https://codeberg.org/linux-nfc/linux

David Heidelberg says:

====================
nfc pull request for net:

Code improvements
 - llcp: Fix use-after-free in llcp_sock_release()
 - llcp: Fix use-after-free race in nfc_llcp_recv_cc()
 - hci: fix out-of-bounds read in HCP header parsing
Regression fixes:
 - nxp-nci: i2c: use rising-edge IRQ on ACPI systems

Signed-off-by: David Heidelberg <david@ixit.cz>
* tag 'nfc-7.1-rc6' of https://codeberg.org/linux-nfc/linux:
  nfc: nxp-nci: i2c: use rising-edge IRQ on ACPI systems
  nfc: hci: fix out-of-bounds read in HCP header parsing
  nfc: llcp: Fix use-after-free race in nfc_llcp_recv_cc()
  nfc: llcp: Fix use-after-free in llcp_sock_release()
====================

Link: https://patch.msgid.link/217c0646-8a30-4037-b613-580c2b189729@ixit.cz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotunnels: do not assume transport header in iptunnel_pmtud_check_icmp()
Eric Dumazet [Fri, 22 May 2026 11:55:12 +0000 (11:55 +0000)] 
tunnels: do not assume transport header in iptunnel_pmtud_check_icmp()

In some cases, iptunnel_pmtud_check_icmp() can be called while
skb transport header is not set.

This triggers an out-of-bound access, because
(typeof(skb->transport_header))~0U is 65535.

Access the icmp header based on IPv4 network header,
after making sure icmp->type is present in skb linear part.

Note that iptunnel_pmtud_check_icmpv6()) is fine.

Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260522115512.1519110-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agovxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()
Eric Dumazet [Mon, 25 May 2026 20:36:42 +0000 (20:36 +0000)] 
vxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()

skb_tunnel_check_pmtu() can change skb->head.

Reusing old_iph afer skb_tunnel_check_pmtu() can cause an UAF.

Use instead ip_hdr(skb) as done in drivers/net/bareudp.c
and drivers/net/geneve.c.

Found by Sashiko.

Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://patch.msgid.link/20260525203642.2389723-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotunnels: load network headers after skb_cow() in iptunnel_pmtud_build_icmp[v6]()
Eric Dumazet [Mon, 25 May 2026 20:13:35 +0000 (20:13 +0000)] 
tunnels: load network headers after skb_cow() in iptunnel_pmtud_build_icmp[v6]()

Sashiko found that iptunnel_pmtud_build_icmp() and
iptunnel_pmtud_build_icmpv6() were caching ip_hdr() and ipv6_hdr()
before an skb_cow() call which can reallocate skb->head.

Fix this possible UAF by initializing the local variables
after the skb_cow() call.

Remove skb_reset_network_header() calls which were not needed.

Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://patch.msgid.link/20260525201335.2361845-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge tag 'nf-next-26-05-25' of https://git.kernel.org/pub/scm/linux/kernel/git/netfi...
Jakub Kicinski [Wed, 27 May 2026 01:07:28 +0000 (18:07 -0700)] 
Merge tag 'nf-next-26-05-25' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

The following patchset contains Netfilter fixes and small enhancements:

1) Disable 32-bit x_tables compatibility (32bit binaries on 64bit
   kernel) interface in user namespaces.  This is 'last warning'
   before this is removed for good.

2) Add a configuration toggle for netfilter GCOV profiling. Provide
   dedicated toggles for ipset and ipvs.

3) Remove modular support for nfnetlink and restrict it to built-in only.
   From Pablo Neira Ayuso.

4) Use per-rule hash initval in nf_conncount. This avoids unecessary
   lock contention with short keys (e.g. conntrack zones) in different
   namespaces.

5) Use nf_ct_exp_net() in ctnetlink expectation dumps.
   From Pratham Gupta.

6) Remove a dead conditional in nft_set_rbtree.

7) Fix conntrack helper policy updates to apply per-class values correctly.
   From David Carlier.

8) Fix an off-by-one OOB read in nf_conntrack_irc:parse_dcc(). Use strict
   less-than comparison in the newline search loop to respect the
   exclusive-end pointer convention.  From Muhammad Bilal.

9) Fix typos in nf_conntrack_proto_tcp comments.  From Avinash Duduskar.

10) Restore performance optimization in nft_set_pipapo_avx2 by passing
    the next map index. Refactor lookup logic for clarity and add a
    DEBUG_NET check to document this.

11) Avoid (harmless) u16 overflow in nf_conntrack_ftp when parsing FTP PORT
    and EPRT commands.  Ignore commands where single octet exceeds 255.
    From Giuseppe Caruso.

Patch 12, which removes incorrect (and obviously unused) code from
nft_byteorder was kept back to avoid a net -> net-next merge conflict.

* tag 'nf-next-26-05-25' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_conntrack_ftp: avoid u16 overflows
  netfilter: nft_set_pipapo_avx2: restore performance optimization
  netfilter: nf_conntrack_proto_tcp: fix typos in comments
  netfilter: nf_conntrack_irc: fix parse_dcc() off-by-one OOB read
  netfilter: nfnl_cthelper: apply per-class values when updating policies
  netfilter: nft_set_rbtree: remove dead conditional
  netfilter: ctnetlink: use nf_ct_exp_net() in expectation dump
  netfilter: nf_conncount: use per-rule hash initval
  netfilter: allow nfnetlink built-in only
  netfilter: add option for GCOV profiling
  netfilter: x_tables: disable 32bit compat interface in user namespaces
====================

Link: https://patch.msgid.link/20260525182924.28456-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetconsole: Constify struct configfs_item_operations and configfs_group_operations
Christophe JAILLET [Mon, 25 May 2026 12:23:00 +0000 (14:23 +0200)] 
netconsole: Constify struct configfs_item_operations and configfs_group_operations

'struct configfs_item_operations' and 'configfs_group_operations' are not
modified in this driver.

Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

On a x86_64, with allmodconfig, as an example:
Before:
======
   text    data     bss     dec     hex filename
  64259   24272     608   89139   15c33 drivers/net/netconsole.o

After:
=====
   text    data     bss     dec     hex filename
  64579   23952     608   89139   15c33 drivers/net/netconsole.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/7ff56bdb0cee826a56365f930dcdf457b44931df.1779711734.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agomlxsw: spectrum_fid: use a dedicated list head pointer for sorted insert
Maoyi Xie [Mon, 25 May 2026 07:17:59 +0000 (15:17 +0800)] 
mlxsw: spectrum_fid: use a dedicated list head pointer for sorted insert

mlxsw_sp_fid_port_vid_list_add() inserts into a list sorted by
local_port. It walks the list to find the first entry with a
larger local_port, then inserts the new entry before it:

    list_add_tail(&port_vid->list, &tmp_port_vid->list);

If the loop falls through (the new local_port is the largest),
tmp_port_vid runs off the end of the list. &tmp_port_vid->list
then ends up at the list head itself (container_of() offsets
cancel), and list_add_tail() inserts at the tail. So the code
works today.

It is fragile though. Anyone who later adds a read of another
field of tmp_port_vid will hit memory outside the list head.

Track the insertion point with a dedicated list_head pointer.
Initialise insert_before to &fid->port_vid_list, set it to
&tmp_port_vid->list only on early break, and pass insert_before
to list_add_tail(). The cursor is no longer touched after the
loop. Behaviour is unchanged.

Same shape as the Koschel cleanups from 2022 (e.g. 99d8ae4ec8a
tracing, 2966a9918df clockevents, dc1acd5c946 dlm).

Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260525071759.1517576-1-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: dsa: netc: fix unmet Kconfig dependencies for NET_DSA_NETC_SWITCH
Wei Fang [Sun, 24 May 2026 07:03:10 +0000 (15:03 +0800)] 
net: dsa: netc: fix unmet Kconfig dependencies for NET_DSA_NETC_SWITCH

NET_DSA_NETC_SWITCH selects NXP_NTMP, NXP_NETC_LIB and FSL_ENETC_MDIO,
but these symbols depend on NET_VENDOR_FREESCALE which may not be
enabled. This results in Kconfig warnings and linker errors like:

  undefined reference to `ntmp_bpt_update_entry'
  undefined reference to `ntmp_fdbt_search_port_entry'
  undefined reference to `ntmp_free_cbdr'
  undefined reference to `enetc_hw_alloc'
  ...

Therefore, add "depends on NET_VENDOR_FREESCALE" to NET_DSA_NETC_SWITCH,
ensuring that the selected symbols NXP_NTMP, NXP_NETC_LIB and
FSL_ENETC_MDIO, which all depend on NET_VENDOR_FREESCALE, can only be
selected when that dependency is already satisfied.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605240046.8MvKuOMg-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202605240706.EuGmnrz5-lkp@intel.com/
Fixes: 187fbae024c8 ("net: dsa: netc: introduce NXP NETC switch driver for i.MX94")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260524070310.2429819-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: phy: air_en8811h: add AN8811HB MCU assert/deassert support
Lucien.Jheng [Sun, 24 May 2026 06:39:15 +0000 (14:39 +0800)] 
net: phy: air_en8811h: add AN8811HB MCU assert/deassert support

AN8811HB needs a MCU soft-reset cycle before firmware loading begins.
Assert the MCU (hold it in reset) and immediately deassert (release)
via a dedicated PBUS register pair (0x5cf9f8 / 0x5cf9fc), accessed
through a registered mdio_device at PHY-addr+8.

Add __air_pbus_reg_write() as a low-level helper taking a struct
mdio_device *, create and register the PBUS mdio_device in
an8811hb_probe() and store it in priv->pbusdev, then implement
an8811hb_mcu_assert() / _deassert() on top of it. Add
an8811hb_remove() to unregister the PBUS device on teardown. Wire
both calls into an8811hb_load_firmware() and en8811h_restart_mcu()
so every firmware load or MCU restart on AN8811HB correctly sequences
the reset control registers.

Fixes: 5afda1d734ed ("net: phy: air_en8811h: add Airoha AN8811HB support")
Signed-off-by: Lucien Jheng <lucienzx159@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260524063915.47961-1-lucienzx159@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: fib_tests: add temporary IPv6 address renewal test
Fernando Fernandez Mancera [Sat, 23 May 2026 10:38:11 +0000 (12:38 +0200)] 
selftests: fib_tests: add temporary IPv6 address renewal test

Add a test to check that temporary IPv6 address is regenerated properly
after the base prefix is deprecated and restored.

Fib6 temporary address renewal test
    TEST: IPv6 temporary address cleanly deprecated and regenerated     [ OK ]

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260523103811.3790-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoipv6: addrconf: fix temp address generation after prefix deprecation
Fernando Fernandez Mancera [Sat, 23 May 2026 10:38:10 +0000 (12:38 +0200)] 
ipv6: addrconf: fix temp address generation after prefix deprecation

When a router temporarily deprecates an IPv6 prefix (either by sending a
Router Advertisement with Preferred Lifetime = 0 or by letting the
lifetime expire) and later restores it, the kernel permanently loses its
ability to generate temporary privacy addresses (RFC 8981) for that
prefix.

This happens because the address worker attempts to generate a
replacement temporary address when the current one nears expiration. As
the base prefix is deprecated already, the generation fails after
marking the temporary address as already having spawned a replacement
(ifp->regen_count++).

When the router eventually restores the prefix, the temporary address
becomes active again. However, once it naturally expires, the address
worker sees this temporary address already tried to generate one and
skips the regeneration.

Fix the issue by resetting the regen_count check of the latest temp
address generated for the prefix updated by the incoming RA.

Reported-by: Łukasz Stelmach <steelman@post.pl>
Closes: https://lore.kernel.org/netdev/87340td30q.fsf%25steelman@post.pl/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260523103811.3790-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agol2tp: use refcount_inc_not_zero in l2tp_session_get_by_ifname
Michael Bommarito [Sat, 23 May 2026 02:34:23 +0000 (22:34 -0400)] 
l2tp: use refcount_inc_not_zero in l2tp_session_get_by_ifname

A reader in l2tp_session_get_by_ifname() can return a pointer to a
session whose refcount has reached zero. The getter takes its
reference with plain refcount_inc(), but every other session getter
in the same file (l2tp_v2_session_get, l2tp_v3_session_get, and the
corresponding _get_next variants) uses refcount_inc_not_zero()
because the IDR/RCU lookup can race with refcount_dec_and_test() ->
l2tp_session_free() -> kfree_rcu(). The ifname getter is the only
outlier; the inconsistency was raised on-list after 979c017803c4
("l2tp: use list_del_rcu in l2tp_session_unhash").

A reader inside rcu_read_lock_bh() that matches session->ifname can
be preempted between the strcmp() and the refcount_inc(). If the
last reference drops on another CPU in that window, the reader's
refcount_inc() runs on a counter that has reached zero. refcount_t
catches the addition-on-zero, prints "refcount_t: addition on 0;
use-after-free", saturates the counter, and returns the saturated
pointer to the caller. Session memory is held live by the in-flight
RCU read section, but the kfree_rcu() callback queued from
l2tp_session_free() will free it once the grace period closes; a
caller that dereferences the returned session past that point hits
a slab-use-after-free. On PREEMPT_RT local_bh_disable() is a per-CPU
sleeping lock and the preemption window is real; on stock PREEMPT
kernels local_bh_disable() is a preempt_count increment that closes
the cross-CPU race in practice (see below).

Use refcount_inc_not_zero() and continue the list walk on failure,
matching the other session getters in the file. The ifname getter
is the only session getter in net/l2tp/ that still uses the bare
refcount_inc() pattern; this change restores file-internal
consistency. The success path is unchanged.

Fixes: abe7a1a7d0b6 ("l2tp: improve tunnel/session refcount helpers")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260523023423.2568972-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: napi: Skip last poll when arming gro timer in busy poll
Martin Karsten [Sat, 23 May 2026 01:22:20 +0000 (21:22 -0400)] 
net: napi: Skip last poll when arming gro timer in busy poll

Skip the extra call to napi->poll(), if the gro timer is armed at the
end of busy polling. This removes the need for having a separate
__busy_poll_stop() routine and its code is moved directly into the
relevant places in busy_poll_stop(). Remove obsolete comment about
ndo_busy_poll_stop().

This is a follow-up to commit 58e2330bd455 ("net: napi: Avoid gro timer
misfiring at end of busypoll"), which has deferred arming the gro timer
to the end of __busy_poll_stop() to eliminate a race condition between
a short timer and long poll that could leave the queue stuck with
interrupts disabled and no timer armed.

Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20260523012247.1574691-1-mkarsten@uwaterloo.ca
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agollc: Add SPDX id lines to llc header files
Tim Bird [Sat, 23 May 2026 00:23:54 +0000 (18:23 -0600)] 
llc: Add SPDX id lines to llc header files

Add appropriate SPDX-License-Identifier lines to llc header (.h)
files, and remove other license text from the files.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Link: https://patch.msgid.link/20260523002354.28831-1-tim.bird@sony.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agollc: Add SPDX id lines to some llc source files
Tim Bird [Fri, 22 May 2026 22:55:08 +0000 (16:55 -0600)] 
llc: Add SPDX id lines to some llc source files

Most of the lls source files are missing SPDX-License-Identifier
lines.  Add appropriate IDs to these files, and remove other license
info from the header.  In once case, leave the existing id line
and just remove the license reference text.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Link: https://patch.msgid.link/20260522225508.24006-1-tim.bird@sony.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agorust: module_param: use `pr_warn_once!` for null pointer warning
Andreas Hindborg [Mon, 27 Apr 2026 08:11:35 +0000 (10:11 +0200)] 
rust: module_param: use `pr_warn_once!` for null pointer warning

Replace `pr_warn!` and the accompanying TODO with `pr_warn_once!`, now that
the macro is available.

[ Note: Adarsh Das independently authored an identical patch on the
  rust-for-linux list, but it missed the modules tree. ]

Suggested-by: Adarsh Das <adarshdas950@gmail.com>
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
3 weeks agoMerge branch 'add-ovs-packet-family-ynl-spec-and-unicast-notification-support'
Jakub Kicinski [Wed, 27 May 2026 00:17:43 +0000 (17:17 -0700)] 
Merge branch 'add-ovs-packet-family-ynl-spec-and-unicast-notification-support'

Minxi Hou says:

====================
Add OVS packet family YNL spec and unicast notification support

This series adds a YAML netlink spec for the OVS_PACKET_FAMILY genetlink
family and a bind-only ntf_bind() helper for receiving unicast
notifications.
====================

Link: https://patch.msgid.link/20260522174154.720293-1-houminxi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotools: ynl: add unicast notification receive support
Minxi Hou [Fri, 22 May 2026 17:41:54 +0000 (01:41 +0800)] 
tools: ynl: add unicast notification receive support

Add ntf_bind() method to YnlFamily for binding the netlink
socket without joining a multicast group. This enables receiving
unicast notifications through the existing poll_ntf/check_ntf
path.

The OVS packet family sends MISS and ACTION upcalls via
genlmsg_unicast() to a per-vport PID rather than through a
multicast group. The existing ntf_subscribe() couples bind()
with setsockopt(ADD_MEMBERSHIP), which does not fit the unicast
case. ntf_bind() provides the bind-only alternative, with the
address defaulting to (0, 0) but exposed as an explicit argument.

Signed-off-by: Minxi Hou <houminxi@gmail.com>
Link: https://patch.msgid.link/20260522174154.720293-3-houminxi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetlink: specs: add OVS packet family specification
Minxi Hou [Fri, 22 May 2026 17:41:53 +0000 (01:41 +0800)] 
netlink: specs: add OVS packet family specification

Add YAML netlink spec for the OVS_PACKET_FAMILY (ovs_packet).
This completes the set of OVS genetlink family specs (ovs_datapath,
ovs_flow, ovs_vport already exist).

The spec defines three operations: MISS (event), ACTION (event),
and EXECUTE (do). MISS and ACTION are kernel-to-userspace upcalls
sent via genlmsg_unicast(); EXECUTE is the only registered genl
operation.

Key, actions, and egress-tun-key attributes are typed as binary
rather than nest because the nested attribute definitions belong
to the ovs_flow spec and cross-spec references are not supported
by the YNL framework.

Signed-off-by: Minxi Hou <houminxi@gmail.com>
Link: https://patch.msgid.link/20260522174154.720293-2-houminxi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agorust: kasan: add support for Software Tag-Based KASAN
Alice Ryhl [Wed, 8 Apr 2026 08:32:17 +0000 (08:32 +0000)] 
rust: kasan: add support for Software Tag-Based KASAN

This adds support for Software Tag-Based KASAN (KASAN_SW_TAGS) when
CONFIG_RUST is enabled. This requires that rustc includes support for
the kernel-hwaddress sanitizer, which is available since 1.96.0 [1].

Unlike with clang, we need to pass -Zsanitizer-recover in addition to
-Zsanitizer because the option is not implied automatically.

The kasan makefile uses different names for the flags depending on
whether CC is clang or gcc, but as we require that CC is clang when
using KASAN, we do not need to try to handle mixed gcc/llvm builds when
Rust is enabled.

Link: https://github.com/rust-lang/rust/pull/153049
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260408-kasan-rust-sw-tags-v3-2-e07964d14363@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 weeks agorust: kasan: KASAN+RUST requires clang
Alice Ryhl [Wed, 8 Apr 2026 08:32:16 +0000 (08:32 +0000)] 
rust: kasan: KASAN+RUST requires clang

Kernel KASAN involves passing various llvm/gcc specific arguments to
the C and Rust compiler. Since these arguments differ between llvm and
gcc, it's not safe to mix an llvm-based rustc with a gcc build when
kasan is enabled.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Cc: stable@vger.kernel.org
Fixes: e3117404b411 ("kbuild: rust: Enable KASAN support")
Link: https://patch.msgid.link/20260408-kasan-rust-sw-tags-v3-1-e07964d14363@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 weeks agokbuild: rust: add AutoFDO support
Alice Ryhl [Tue, 31 Mar 2026 10:57:49 +0000 (10:57 +0000)] 
kbuild: rust: add AutoFDO support

This patch enables AutoFDO build support for Rust code within the Linux
kernel. This allows Rust code to be profiled and optimized based on the
profile.

The RUSTFLAGS variable was suffixed with *_AUTOFDO_CLANG to match the
naming of the config option, which is called CONFIG_AUTOFDO_CLANG.

This implementation has been verified in Android, first by inspecting
the object files and confirming that they look correct. After that,
it was verified as below:

1. Running the binderAddInts benchmark [1] with Rust Binder built as
   rust_binder.ko module, using a Pixel 9 Pro.
2. Collecting a profile on a Pixel 10 Pro XL using the app-launch
   benchmark, which starts different apps many times, on a device with
   Rust Binder as a built-in kernel module. (C Binder was not present on
   the device.)
3. Using the collected profile, run the binderAddInts benchmark again
   with Rust Binder built both as a rust_binder.ko module, and as a
   built-in kernel module.
4. In both cases, Rust Binder without AutoFDO was approximately 13%
   slower than the AutoFDO optimized version. Built-in vs .ko did not
   make a measurable performance difference.

All of the above was verified in conjunction with my helpers inlining
series [2], which confirmed that this worked correctly for helpers too
once [3] was fixed in the helpers inlining series.

Link: https://android.googlesource.com/platform/system/extras/+/920f089/tests/binder/benchmarks/binderAddInts.cpp
Link: https://lore.kernel.org/r/20260203-inline-helpers-v2-0-beb8547a03c9@google.com
Link: https://lore.kernel.org/r/aasPsbMEsX6iGUl8@google.com
Reviewed-by: Rong Xu <xur@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Nicolas Schier <nsc@kernel.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-autofdo-v2-1-eb5c5964820d@google.com
[ Reworded for typos. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 weeks agoi2c: davinci: fix division by zero on missing clock-frequency
Chaitanya Sabnis [Tue, 26 May 2026 10:22:40 +0000 (15:52 +0530)] 
i2c: davinci: fix division by zero on missing clock-frequency

When the 'clock-frequency' property is missing from the device tree,
the driver falls back to DAVINCI_I2C_DEFAULT_BUS_FREQ. However, this
macro was defined in kHz (100), whereas the device tree property is
expected in Hz.

The probe function divided the fallback value by 1000, causing
integer truncation that resulted in dev->bus_freq = 0. This triggered
a deterministic division-by-zero kernel panic when calculating clock
dividers later in the probe sequence.

Fix this by redefining DAVINCI_I2C_DEFAULT_BUS_FREQ in Hz (100000)
to match the expected device tree property unit, allowing the existing
division logic to work correctly for both cases.

Fixes: b04ce6385979 ("i2c: davinci: kill platform data")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/all/20260514044726.57297C2BCB7@smtp.kernel.org/
Signed-off-by: Chaitanya Sabnis <chaitanya.msabnis@gmail.com>
Cc: <stable@vger.kernel.org> # v6.14+
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260526102240.4949-1-chaitanya.msabnis@gmail.com
3 weeks agoaudit: fix removal of dangling executable rules
Ricardo Robaina [Wed, 13 May 2026 21:47:59 +0000 (18:47 -0300)] 
audit: fix removal of dangling executable rules

When an audited executable is deleted from the disk, its dentry
becomes negative. Any later attempt to delete the associated audit
rule will lead to audit_alloc_mark() encountering this negative
dentry and immediately aborting, returning -ENOENT.

This early abort prevents the subsystem from allocating the temporary
fsnotify mark needed to construct the search key, meaning the kernel
cannot find the existing rule in its own lists to delete it. This
leaves a dangling rule in memory, resulting in the following error
while attempting to delete the rule:

 # ./audit-dupe-exe-deadlock.sh
 No rules
 Error deleting rule (No such file or directory)
 There was an error while processing parameters

 # auditctl -l
 -a always,exit -S all -F exe=/tmp/file -F path=/tmp/file -F key=dr

 # auditctl -D
 Error deleting rule (No such file or directory)
 There was an error while processing parameters

This patch fixes this issue by removing the d_really_is_negative()
check. By doing so, a dummy mark can be successfully generated for
the deleted path, which allows the audit subsystem to properly match
and flush the dangling rule.

Cc: stable@kernel.org
Fixes: 76a53de6f7ff ("VFS/audit: introduce kern_path_parent() for audit")
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
3 weeks agoKVM: x86: Treat KVM's virtual PMU as disabled for TDX VMs
Vishal Annapurve [Fri, 22 May 2026 15:15:34 +0000 (15:15 +0000)] 
KVM: x86: Treat KVM's virtual PMU as disabled for TDX VMs

Introduce a "protected PMU" concept, and use it to disable KVM's virtual
PMU for TDX VMs, as the PMU state for TDX VMs is virtualized by the TDX
Module[1], i.e. _can't_ emulated/virtualized by KVM, and KVM doesn't yet
support enabling/exposing PMU functionality for/to TDX VMs.  For now,
simply treat the PMU as disabled, as it's not clear what all needs to be
changed, e.g. KVM needs to do at least:

 1) Configure TD_PARAMS to allow guests to use performance monitoring.
 2) Restrict the TD to a subset of the PEBS counters if supported.
 3) Limit the TD to setup a certain perfmon events using basic/enhanced
    event filtering.

Explicitly disallow enabling the PMU via KVM_CAP_PMU_CAPABILITY for VMs
with a protected PMU to prevent userspace from circumventing KVM's
protections.

Link: https://cdrdv2.intel.com/v1/dl/getContent/733575
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://patch.msgid.link/20260522151534.652522-1-vannapurve@google.com
[sean: massage shortlog and changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agodrm/i915/display: stop passing i to for_each_pipe_crtc_modeset_{enable, disable}()
Jani Nikula [Wed, 13 May 2026 07:58:40 +0000 (10:58 +0300)] 
drm/i915/display: stop passing i to for_each_pipe_crtc_modeset_{enable, disable}()

Refactor for_each_pipe_crtc_modeset_{enable,disable}() and their
underlying for_each_crtc_in_masks{,_reverse}() helpers to utilize
__UNIQUE_ID() to avoid having to pass the for loop variable to them.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/2270d4a10663bb55d5b16902b02798234f440517.1778659089.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
3 weeks agodrm/i915/display: stop passing i to for_each_*_intel_crtc_in_state() macros
Jani Nikula [Wed, 13 May 2026 07:58:39 +0000 (10:58 +0300)] 
drm/i915/display: stop passing i to for_each_*_intel_crtc_in_state() macros

None of the for_each_*_intel_crtc_in_state() macros or their users
actually need the CRTC index i variable anymore. Remove them.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/edb9dc76cb9cad50622a1f425abaf076d1509888.1778659089.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
3 weeks agodrm/i915/display: pass struct intel_display to all for_each_intel_crtc*() macros
Jani Nikula [Wed, 13 May 2026 07:58:38 +0000 (10:58 +0300)] 
drm/i915/display: pass struct intel_display to all for_each_intel_crtc*() macros

Now that the for_each_intel_crtc*() iterator macros primarily use
display->pipe_list for iteration, it's more convenient to pass struct
intel_display to them directly instead of struct drm_device. Make it so.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/90ec6b84d772a4842d4816efc10042ec4403e996.1778659089.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
3 weeks agodrm/i915/display: always pass display->drm to for_each_intel_crtc*()
Jani Nikula [Wed, 13 May 2026 07:58:37 +0000 (10:58 +0300)] 
drm/i915/display: always pass display->drm to for_each_intel_crtc*()

In preparation for always passing struct intel_display to
for_each_intel_crtc*() family of iterators, start off by unifying their
usage to always having struct intel_display *display around, and passing
display->drm to them.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/447a5b2309e213abb849601727d45b406d440c88.1778659089.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
3 weeks agodrm/i915/display: switch from drm_for_each_crtc() to for_each_intel_crtc()
Jani Nikula [Wed, 13 May 2026 07:58:36 +0000 (10:58 +0300)] 
drm/i915/display: switch from drm_for_each_crtc() to for_each_intel_crtc()

intel_has_pending_fb_unpin() has the last direct user of
drm_for_each_crtc() in i915. Switch to for_each_intel_crtc() to ensure
pipe order iteration in all cases.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/8ee4320cd15bc35a8b40676faae6db4b33eb50eb.1778659089.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
3 weeks agodrm/{i915, xe}: move xe_display_flush_cleanup_work() to i915 display
Jani Nikula [Mon, 25 May 2026 11:05:53 +0000 (14:05 +0300)] 
drm/{i915, xe}: move xe_display_flush_cleanup_work() to i915 display

xe_display_flush_cleanup_work() is a bit of an oddball function in xe
display code. There shouldn't be anything this specific or xe
specific. While I'm not sure what the correct refactor for the function
should be, move it to shared display code for starters, next to the
eerily similar but slightly different intel_has_pending_fb_unpin() that
is only called from i915 core.

The main goal here is to unblock some refactors on
for_each_intel_crtc().

v2: Add FIXME comment (Ville)

Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260525110553.651208-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
3 weeks agoKVM: selftests: Add nested page fault injection test
Kevin Cheng [Fri, 22 May 2026 23:27:01 +0000 (16:27 -0700)] 
KVM: selftests: Add nested page fault injection test

Add a test that exercises nested page fault injection during L2
execution. L2 executes I/O string instructions (OUTSB/INSB) that access
memory restricted in L1's nested page tables (NPT/EPT), triggering a
nested page fault that L0 must inject to L1.

The test supports both AMD SVM (NPF) and Intel VMX (EPT violation) and
verifies that:
  - The exit reason is an NPF/EPT violation
  - The access type and permission bits are correct
  - The faulting GPA is correct

Three test cases are implemented:
  - Unmap the final data page (final translation fault, OUTSB read)
  - Unmap a PT page (page walk fault, OUTSB read)
  - Write-protect the final data page (protection violation, INSB write)
  - Write-protect a PT page (protection violation on A/D update, OUTSB
    read)

Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: name it nested_tdp_fault_test, consolidate asserts]
Link: https://patch.msgid.link/20260522232701.3671446-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits
Kevin Cheng [Fri, 22 May 2026 23:27:00 +0000 (16:27 -0700)] 
KVM: VMX: Synthesize nested EPT violation GVA_IS_VALID/GVA_TRANSLATED bits

When injecting an EPT Violation into L2 in response to a fault detected
while emulating an L2 GVA access, synthesize the GVA_IS_VALID and
GVA_TRANSLATED bits using information provided by the walker, instead of
pulling the bits from vmcs02.EXIT_QUALIFICATION.  The information in
vmcs02.EXIT_QUALIFICATION is valid/correct if and only if the fault being
injected into L1 is the direct result of an EPT Violation VM-Exit from L2.

E.g. if KVM is emulating an I/O instruction and the memory operand's
translation through L1's EPT fails, using vmcs02.EXIT_QUALIFICATION is
wrong as the semantics for EXIT_QUALIFICATION would be for an I/O exit,
not an EPT Violation exit.

Opportunistically clean up the formatting for creating the mask of bits
to pull from vmcs02.EXIT_QUALIFICATION.

Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: use plumbed in @access bits, massage changelog]
Link: https://patch.msgid.link/20260522232701.3671446-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits
Kevin Cheng [Fri, 22 May 2026 23:26:59 +0000 (16:26 -0700)] 
KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits

Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
Nested Page Fault into L1.  Currently, KVM blindly stuffs GUEST_FINAL into
L1, which is blatantly wrong given that KVM obviously generates NPFs for
page table accesses.

There are two paths that trigger NPF injection: hardware NPF exits (from
L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
emulating an L2 GVA access.  For the hardware case, use the bits verbatim
from the VMCB, as KVM is simply forwarding a NPF to L1.  For the emulation
case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
were recently added for MBEC+GMET support).

To differentiate between the two cases, add "hardware_nested_page_fault"
to "struct x86_exception", and set it when injecting a NPF in response to
an NPF exit from L2.

To help guard against future goofs, assert that exactly one of GUEST_PAGE
or GUEST_FINAL is set when injecting a NPF.  Unlike VMX, there are no
(known) cases where hardware doesn't set either bit, and KVM should always
set one or the other when emulating a GVA access.

Signed-off-by: Kevin Cheng <chengkev@google.com>
[sean: use plumbed in @access bits, massage changelog]
Link: https://patch.msgid.link/20260522232701.3671446-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agonvme-multipath: enable PCI P2PDMA for multipath devices
Kiran Kumar Modukuri [Wed, 13 May 2026 18:51:53 +0000 (11:51 -0700)] 
nvme-multipath: enable PCI P2PDMA for multipath devices

NVMe multipath does not expose BLK_FEAT_PCI_P2PDMA on the head disk
even when all underlying controllers support it.

Set BLK_FEAT_PCI_P2PDMA unconditionally in nvme_mpath_alloc_disk()
alongside the other features.  nvme_update_ns_info_block() already
calls queue_limits_stack_bdev() to stack each path's limits onto the
head disk, which routes through blk_stack_limits().  The core now
clears BLK_FEAT_PCI_P2PDMA automatically if any path (e.g., FC) does
not support it, consistent with how BLK_FEAT_NOWAIT and BLK_FEAT_POLL
are handled.

Tested-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Kiran Kumar Modukuri <kmodukuri@nvidia.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested=by: Pranjal Shrivastava <praan@google.com>
Link: https://patch.msgid.link/20260513185153.95552-4-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agomd: propagate BLK_FEAT_PCI_P2PDMA from member devices to RAID device
Kiran Kumar Modukuri [Wed, 13 May 2026 18:51:52 +0000 (11:51 -0700)] 
md: propagate BLK_FEAT_PCI_P2PDMA from member devices to RAID device

MD RAID does not propagate BLK_FEAT_PCI_P2PDMA from member devices to
the RAID device, preventing peer-to-peer DMA through the RAID layer even
when all underlying devices support it.

Enable BLK_FEAT_PCI_P2PDMA unconditionally in raid0, raid1 and raid10
personalities during queue limits setup.  blk_stack_limits() clears it
automatically if any member device lacks support, consistent with how
BLK_FEAT_NOWAIT and BLK_FEAT_POLL are handled in the block core.

Parity RAID personalities (raid4/5/6) are excluded because they require
CPU access to data pages for parity computation, which is incompatible
with P2P mappings.

Tested with RAID0/1/10 arrays containing multiple NVMe devices with
P2PDMA support, confirming that peer-to-peer transfers work correctly
through the RAID layer.

Tested-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Kiran Kumar Modukuri <kmodukuri@nvidia.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Yu Kuai <yukuai@fygo.io>
Tested=by: Pranjal Shrivastava <praan@google.com>
Link: https://patch.msgid.link/20260513185153.95552-3-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoblock: clear BLK_FEAT_PCI_P2PDMA in blk_stack_limits() for non-supporting devices
Chaitanya Kulkarni [Wed, 13 May 2026 18:51:51 +0000 (11:51 -0700)] 
block: clear BLK_FEAT_PCI_P2PDMA in blk_stack_limits() for non-supporting devices

BLK_FEAT_NOWAIT and BLK_FEAT_POLL are cleared in blk_stack_limits()
when an underlying device does not support them.  Apply the same
treatment to BLK_FEAT_PCI_P2PDMA: stacking drivers set it
unconditionally and rely on the core to clear it whenever a
non-supporting member device is stacked.

Tested-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested=by: Pranjal Shrivastava <praan@google.com>
Link: https://patch.msgid.link/20260513185153.95552-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoKVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware
Sean Christopherson [Fri, 22 May 2026 23:26:58 +0000 (16:26 -0700)] 
KVM: x86: Tell ->inject_page_fault() whether or a fault came from hardware

When injecting a page fault (including nested TDP faults into L1), tell the
injection routine whether or not the fault originated in hardware, i.e. if
KVM is effectively forwarding a fault it intercept.  For nested TDP fault
injection, KVM needs to grab PAGE_WALK vs. GUEST_FINAL information from the
VMCB/VMCS, _if_ the fault originated in hardware.

Note, simply checking whether or not the original exit was due a #NPF or
EPT Violation isn't sufficient/correct, as the fault being synthesized for
L1 may or may not be the "same" fault that triggered a VM-Exit from L2.
E.g. if access to emulated MMIO in L2 hits a !PRESENT fault (EPT Violation
or #NPF), e.g. because MMIO caching is disabled or it's the first time the
GPA has been accessed by L2, then KVM will enter the emulator.  If
emulating the MMIO instruction then hits a nested TDP fault, e.g. because
L2 was accessing MMIO with a MOVSQ (memory-to-memory move), or because L1
has since unmapped the code stream, then the TDP fault synthesized to L1
will not be the same emulated fault the triggered the VM-Exit.

No functional change intended (nothing uses the new param, yet...).

Link: https://patch.msgid.link/20260522232701.3671446-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agox86/virt/tdx: Move mk_keyed_paddr() to tdx.c due to no external users
Yan Zhao [Thu, 30 Apr 2026 01:50:14 +0000 (09:50 +0800)] 
x86/virt/tdx: Move mk_keyed_paddr() to tdx.c due to no external users

Move mk_keyed_paddr() from tdx.h to tdx.c to avoid unnecessary header
inclusion and improve encapsulation since there are no users outside of
tdx.c.

No functional change intended.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260430015014.24261-1-yan.y.zhao@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agox86/tdx: Drop exported function tdx_quirk_reset_page()
Yan Zhao [Thu, 30 Apr 2026 01:50:01 +0000 (09:50 +0800)] 
x86/tdx: Drop exported function tdx_quirk_reset_page()

KVM invokes tdx_quirk_reset_page() to reset TDX control pages (including
S-EPT pages, TDR page, etc.), as all those pages are allocated by KVM TDX
and thus always have struct page.

However, it's also reasonable for KVM to reset those TDX control pages via
tdx_quirk_reset_paddr() directly, eliminating the need to export two
parallel APIs. Keeping tdx_quirk_reset_page() as a one-line helper in the
header file is also unnecessary.

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260430015001.24242-1-yan.y.zhao@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agox86/tdx: Use PFN directly for unmapping guest private memory
Sean Christopherson [Thu, 30 Apr 2026 01:49:48 +0000 (09:49 +0800)] 
x86/tdx: Use PFN directly for unmapping guest private memory

Remove struct page assumptions/constraints in APIs for unmapping guest
private memory and have them take physical address directly.

Having core TDX make assumptions that guest private memory must be backed
by struct page (and/or folio) will create subtle dependencies on how
KVM/guest_memfd allocates/manages memory (e.g., whether it uses memory
allocated from core MM, if the memory is refcounted, or if the folio is
split) that are easily avoided. [1].

KVM's MMUs work with PFNs. This is very much an intentional design choice.
It ensures that the KVM MMUs remain flexible and are not too tightly tied
to the regular CPU MMUs and the kernel code around them. Using
"struct page" for TDX guest memory is not a good fit anywhere near the KVM
MMU code [2].

Therefore, for unmapping guest private memory: export
tdx_quirk_reset_paddr() for direct KVM invocation, and convert the SEAMCALL
wrapper API tdh_phymem_page_wbinvd_hkid() to take PFN as input (thus
updating mk_keyed_paddr() and tdh_phymem_page_wbinvd_tdr()).

Intentionally have KVM pass PAGE_SIZE (rather than KVM_HPAGE_SIZE(level))
to tdx_quirk_reset_paddr() in tdx_sept_remove_private_spte() to avoid
mixing in huge page changes. The KVM_BUG_ON() check for !PG_LEVEL_4K in
tdx_sept_remove_private_spte() justifies using PAGE_SIZE.

Do not convert tdx_reclaim_page() to use PFN as input since it currently
does not remove guest private memory.

Use "kvm_pfn_t pfn" for type safety. Using this KVM type is appropriate
since APIs tdh_phymem_page_wbinvd_hkid() and tdx_quirk_reset_paddr() are
exported to KVM only.

[Yan: Use kvm_pfn_t,exclude tdx_reclaim_page(),use tdx_quirk_reset_paddr()]

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/all/aWgyhmTJphGQqO0Y@google.com
Link: https://lore.kernel.org/all/ac7V0g2q2hN3dU5u@google.com
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260430014948.24226-1-yan.y.zhao@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agox86/tdx: Use PFN directly for mapping guest private memory
Sean Christopherson [Thu, 30 Apr 2026 01:49:29 +0000 (09:49 +0800)] 
x86/tdx: Use PFN directly for mapping guest private memory

Remove struct page assumptions/constraints in the SEAMCALL wrapper APIs for
mapping guest private memory and have them take PFN directly.

Having core TDX make assumptions that guest private memory must be backed
by struct page (and/or folio) will create subtle dependencies on how
KVM/guest_memfd allocates/manages memory (e.g., whether it uses memory
allocated from core MM, if the memory is refcounted, or if the folio is
split) that are easily avoided. [1].

KVM's MMUs work with PFNs. This is very much an intentional design choice.
It ensures that the KVM MMUs remain flexible and are not too tied to the
regular CPU MMUs and the kernel code around them. Using 'struct page' for
TDX guest memory is not a good fit anywhere near the KVM MMU code [2].

Use "kvm_pfn_t pfn" for type safety. Using this KVM type is appropriate
since APIs tdh_mem_page_add() and tdh_mem_page_aug() are exported to KVM
only.

[ Yan: Replace "u64 pfn" with "kvm_pfn_t pfn" ]

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/all/aWgyhmTJphGQqO0Y@google.com
Link: https://lore.kernel.org/all/ac7V0g2q2hN3dU5u@google.com
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260430014929.24210-1-yan.y.zhao@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoARM: rockchip: keep reset control around
Heiko Stuebner [Thu, 21 May 2026 21:09:15 +0000 (23:09 +0200)] 
ARM: rockchip: keep reset control around

Do not put the reset control, retain exclusive control over it.
After turning on a CPU, the corresponding reset line must stay
deasserted.

This also avoids calling reset_control_put() before workqueues
are operational.

Fixes: 78ebbff6d1a0 ("reset: handle removing supplier before consumers")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Steven Price <steven.price@arm.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20260521210915.2331176-1-heiko@sntech.de
3 weeks agoaudit: use 'unsigned int' instead of 'unsigned'
Ricardo Robaina [Thu, 14 May 2026 15:13:20 +0000 (12:13 -0300)] 
audit: use 'unsigned int' instead of 'unsigned'

Address checkpatch.pl warning below, across the audit subsystem:

  WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Minor cleanup, no functional changes.

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
3 weeks agoblk-mq: reinsert cached request to the list
Keith Busch [Tue, 26 May 2026 15:35:31 +0000 (08:35 -0700)] 
blk-mq: reinsert cached request to the list

A previous commit removed an optimization out of caution for a scenario
that turns out not to be real: all the "queue_exit" goto's are safe to
reinsert the request into the cached_rq's plug list as they are either
from a non-blocking path, or a successful merge that already holds the
queue reference. This optimization is most needed for small sequential
workloads that successfully merge into larger requests.

Fixes: dc278e9bf2b9 ("blk-mq: pop cached request if it is usable")
Suggested-by: Ming Lei <tom.leiming@gmail.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://patch.msgid.link/20260526153531.2365935-1-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agocxl/test: Update mock dev array before calling platform_device_add()
Li Ming [Wed, 20 May 2026 12:14:57 +0000 (20:14 +0800)] 
cxl/test: Update mock dev array before calling platform_device_add()

CXL test environment hits the following error sometimes.

 cxl_mem mem9: endpoint7 failed probe

All mock memdevs are platform firmware devices added by cxl_test module,
and cxl_test module also provides a platform device driver for them to
create a memdev device to CXL subsystem. cxl_test module uses
cxl_rcd/mem_single/mem arrays to store different types of mock memdevs.
CXL drivers calls registered mock functions for a mock memdev by
checking if a given memdev is in these arrays.

When cxl_test module adds these mock memdevs, it always calls
platform_device_add() before adding them to a suitable mock memdev
array. However, there is a small window where CXL drivers calls mock
function for a added memdev before it added to a mock memdev array. In
above case, cxl endpoint driver considers a added memdev was not a mock
memdev, then calling devm_cxl_endpoint_decoders_setup() for it rather
than mock_endpoint_decoders_setup().

An appropriate solution is that adding a new mock device to a mock
device array before calling platform_device_add() for it. It can
guarantee the new mock device is visible to CXL subsystem.

This patch introduces a new helped called cxl_mock_platform_device_add()
to handle the issue, and uses the function for all mock devices addition.

Fixes: 3a2b97b3210b ("cxl/test: Improve init-order fidelity relative to real-world systems")
Signed-off-by: Li Ming <ming.li@zohomail.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20260520121457.234404-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
3 weeks agoMerge tag 'nfsd-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Tue, 26 May 2026 20:49:13 +0000 (13:49 -0700)] 
Merge tag 'nfsd-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Regressions:

   - Tighten bounds checking for sunrpc cache hash tables

   - Don't report key material in the ftrace log

  Stable fix:

   - Fix lockd's implementation of the NLM TEST procedure"

* tag 'nfsd-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  lockd: fix TEST handling when not all permissions are available.
  NFSD: Report whether fh_key was actually updated
  sunrpc: prevent out-of-bounds read in __cache_seq_start()

3 weeks agomodule, riscv: force sh_addr=0 for arch-specific sections
Petr Pavlu [Fri, 27 Mar 2026 07:59:03 +0000 (08:59 +0100)] 
module, riscv: force sh_addr=0 for arch-specific sections

When linking modules with 'ld.bfd -r', sections defined without an address
inherit the location counter, resulting in non-zero sh_addr values in the
resulting .ko files. Relocatable objects are expected to have sh_addr=0 for
all sections. Non-zero addresses are confusing in this context, typically
worse compressible, and may cause tools to misbehave [1].

Force sh_addr=0 for all riscv-specific module sections.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=33958
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
3 weeks agomodule, m68k: force sh_addr=0 for arch-specific sections
Petr Pavlu [Fri, 27 Mar 2026 07:59:02 +0000 (08:59 +0100)] 
module, m68k: force sh_addr=0 for arch-specific sections

When linking modules with 'ld.bfd -r', sections defined without an address
inherit the location counter, resulting in non-zero sh_addr values in the
resulting .ko files. Relocatable objects are expected to have sh_addr=0 for
all sections. Non-zero addresses are confusing in this context, typically
worse compressible, and may cause tools to misbehave [1].

Force sh_addr=0 for all m68k-specific module sections.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=33958
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
3 weeks agomodule, arm64: force sh_addr=0 for arch-specific sections
Petr Pavlu [Fri, 27 Mar 2026 07:59:01 +0000 (08:59 +0100)] 
module, arm64: force sh_addr=0 for arch-specific sections

When linking modules with 'ld.bfd -r', sections defined without an address
inherit the location counter, resulting in non-zero sh_addr values in the
resulting .ko files. Relocatable objects are expected to have sh_addr=0 for
all sections. Non-zero addresses are confusing in this context, typically
worse compressible, and may cause tools to misbehave [1].

Force sh_addr=0 for all arm64-specific module sections.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=33958
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
3 weeks agomodule, arm: force sh_addr=0 for arch-specific sections
Petr Pavlu [Fri, 27 Mar 2026 07:59:00 +0000 (08:59 +0100)] 
module, arm: force sh_addr=0 for arch-specific sections

When linking modules with 'ld.bfd -r', sections defined without an address
inherit the location counter, resulting in non-zero sh_addr values in the
resulting .ko files. Relocatable objects are expected to have sh_addr=0 for
all sections. Non-zero addresses are confusing in this context, typically
worse compressible, and may cause tools to misbehave [1].

Force sh_addr=0 for all arm-specific module sections.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=33958
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
3 weeks agoMerge tag 'linux_kselftest-kunit-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 26 May 2026 20:37:26 +0000 (13:37 -0700)] 
Merge tag 'linux_kselftest-kunit-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fix from Shuah Khan:
 "Fix a use-after-free in kunit debugfs when using kunit.filter when the
  executor frees dynamically allocated resources after running boot-time
  tests. This resulted in fatal hardware exception due to invalidation
  of capability flags on the reclaimed memory on some architectures such
  as CHERI RISC-V that support the feature, and silent memory corruption
  on others.

  The fix for this couples the lifetime of the filtered suite memory
  allocation to the lifetime of the kunit subsystem and its associated
  VFS nodes. Ownership of the boot-time suite_set is now transferred to
  a global tracker ('kunit_boot_suites'), and the memory is cleanly
  released in kunit_exit() during module teardown"

* tag 'linux_kselftest-kunit-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: fix use-after-free in debugfs when using kunit.filter

3 weeks agox86/microcode: Do not access MSR_IA32_PLATFORM_ID when running as a guest
Borislav Petkov [Wed, 13 May 2026 20:06:01 +0000 (22:06 +0200)] 
x86/microcode: Do not access MSR_IA32_PLATFORM_ID when running as a guest

Patch in Fixes: causes the usual:

  unchecked MSR access error: RDMSR from 0x17 at ... (intel_get_platform_id)
  Call Trace:
   early_init_intel
   early_cpu_init
   setup_arch
   _printk
   start_kernel
   x86_64_start_reservations
   x86_64_start_kernel
   common_startup_64

because the kernel is booted in a guest.

In order to avoid it, this MSR access needs to be prevented when running
virtualized. That is usually done by checking X86_FEATURE_HYPERVISOR but
for this particular case it is too early yet.

The platform ID needs to be read as early as when microcode is loaded on
the BSP:

  load_ucode_bsp ... -> get_microcode_blob ... -> intel_find_matching_signature

and by that time, CPUID leafs haven't been parsed yet.

The microcode loader already has logic to check early whether the kernel
is running virtualized so make that globally available to arch/x86/. The
query whether running virtualized is getting more and more prominent in
recent times so might as well make it an arch-global var which the rest
of the code can use.

Fixes: d8630b67ca1ed ("x86/cpu: Add platform ID to CPU info structure")
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/all/20260430020953.1405535-1-binbin.wu@linux.intel.com
3 weeks agoirqchip/gic-v4: Don't advertise VLPIs if no ITS is probed
Mostafa Saleh [Tue, 26 May 2026 12:53:17 +0000 (12:53 +0000)] 
irqchip/gic-v4: Don't advertise VLPIs if no ITS is probed

When accidentally setting “kvm-arm.vgic_v4_enable=1” on a system that has
no MSI controller device tree node and GICv4, it results a panic as
“gic_domain” is NULL and the kernel attempts to access it.

    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028
    Mem abort info:
      ESR = 0x0000000096000006

    CPU: 1 UID: 0 PID: 295 Comm: lkvm-static Not tainted 7.1.0-rc4-ge3f15ad3970e #5 PREEMPT
    Hardware name: linux,dummy-virt (DT)
    pstate: 81402005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
    pc : __irq_domain_instantiate+0x1d4/0x578
    lr : __irq_domain_instantiate+0x1cc/0x578

Set vLPI support to false at init time if the host has no ITS, so it
propagates properly to kvm_vgic_global_state.has_gicv4.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20260526125317.3672297-1-smostafa@google.com
3 weeks agodrm/xe: Assign queue name in time for drm_sched_init
Tvrtko Ursulin [Sat, 23 May 2026 10:34:18 +0000 (11:34 +0100)] 
drm/xe: Assign queue name in time for drm_sched_init

Currently the queue name is only assigned after the drm scheduler instance
has been created. This loses information with all logging or debug
workqueue facilities so lets re-order things a bit so the name gets
assigned in time.

To be able to assign a GuC ID early we split the allocation into
reservation and publish phases.

First, with the submission state lock held, we reserve the ID in the GuC
ID manager, which serves as an authoritative source of truth. Then we can
drop the lock and reserve entries in the exec queue lookup XArray. This
can be lockless since the NULL entries are invisible both to the kernel
and userspace. Only after the queue has been fully created we replace the
reserved entries with the queue pointer, which can be done locklessly for
single width queues.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260523103418.61832-1-tvrtko.ursulin@igalia.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 weeks agoKVM: x86: Widen x86_exception's error_code to 64 bits
Kevin Cheng [Fri, 22 May 2026 23:26:57 +0000 (16:26 -0700)] 
KVM: x86: Widen x86_exception's error_code to 64 bits

Widen the error_code field in struct x86_exception from u16 to u64 to
accommodate AMD's NPF error code, which defines information bits above
bit 31, e.g. PFERR_GUEST_FINAL_MASK (bit 32), and PFERR_GUEST_PAGE_MASK
(bit 33).

Retain the u16 type for the local errcode variable in walk_addr_generic
as the walker synthesizes conventional #PF error codes that are
architecturally limited to bits 15:0.

Signed-off-by: Kevin Cheng <chengkev@google.com>
Link: https://patch.msgid.link/20260522232701.3671446-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: selftests: hyperv_features: test write of 1 to HV_X64_MSR_RESET
Piotr Zarycki [Sat, 23 May 2026 11:18:57 +0000 (13:18 +0200)] 
KVM: selftests: hyperv_features: test write of 1 to HV_X64_MSR_RESET

Writing 1 to HV_X64_MSR_RESET triggers a real vCPU reset; the test
was writing 0 because the host loop was not prepared to handle the
resulting KVM_EXIT_SYSTEM_EVENT. Add the missing handling and write
1 to actually exercise the reset path.

Signed-off-by: Piotr Zarycki <piotr.zarycki@gmail.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://patch.msgid.link/20260523111857.195396-1-piotr.zarycki@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agocall_once:: Fix typo in comment for call_once()
Jiun Jeong [Fri, 1 May 2026 14:44:13 +0000 (23:44 +0900)] 
call_once:: Fix typo in comment for call_once()

Change "succesfully" to "successfully" in the kerneldoc
comment of call_once().

Signed-off-by: Jiun Jeong <jiun.jeong.cs@gmail.com>
Link: https://patch.msgid.link/20260501144413.49419-1-jiun.jeong.cs@gmail.com
[sean: don't scope to KVM, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: selftests: Randomize dirty_log_test's delay before reaping the bitmap
Sean Christopherson [Fri, 22 May 2026 17:02:30 +0000 (10:02 -0700)] 
KVM: selftests: Randomize dirty_log_test's delay before reaping the bitmap

In the dirty log test, randomize the delay before the initial call to get
the dirty log bitmap for a given iteration, so that the amount of memory
dirtied by the guest varies from iteration to iteration, and so that the
user can effectively control the duration (by increasing the interval).

Always waiting 1ms effectively hides a KVM RISC-V bug as the test reaps the
dirty bitmap before the guest has a chance to trigger the problematic flow
in KVM.

Reported-by: Wu Fei <wu.fei9@sanechips.com.cn>
Closes: https://lore.kernel.org/all/202605111130.64BBUXDN013040@mse-fl2.zte.com.cn
Cc: Wu Fei <atwufei@163.com>
Link: https://patch.msgid.link/20260522170230.3518669-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: selftests: Add and use kvm_free_fd() to harden against fd goofs
Sean Christopherson [Fri, 22 May 2026 17:15:35 +0000 (10:15 -0700)] 
KVM: selftests: Add and use kvm_free_fd() to harden against fd goofs

Add a kvm_free_fd() macro to close and invalidate a file descriptor, and
use it through the core infrastructure to harden against goofs where a
selftest attempts to reuse a closed file descriptor.

Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Fuad Tabba <tabba@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Link: https://patch.msgid.link/20260522171535.3525890-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: selftests: Cast guest_memfd fd to a signed int when checking for >= 0
Sean Christopherson [Fri, 22 May 2026 17:15:34 +0000 (10:15 -0700)] 
KVM: selftests: Cast guest_memfd fd to a signed int when checking for >= 0

When conditionally closing a memory region's guest_memfd file descriptor,
cast the field to a signed it so that negative values are correctly
detected.  Because selftests reuse "struct kvm_userspace_memory_region2"
instead of providing custom storage, they pick up the kernel uAPI's __u32
definition of the file descriptor, not the more common "int" definition,
e.g. that's used for userspace_mem_region.fd.

Fixes: bb2968ad6c33 ("KVM: selftests: Add support for creating private memslots")
Reported-by: Bibo Mao <maobibo@loongson.cn>
Closes: https://lore.kernel.org/all/20260508015013.4108345-1-maobibo@loongson.cn
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Link: https://patch.msgid.link/20260522171535.3525890-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: selftests: Remove unnecessary "%s" formatting of a constant string
Sean Christopherson [Fri, 22 May 2026 17:21:51 +0000 (10:21 -0700)] 
KVM: selftests: Remove unnecessary "%s" formatting of a constant string

Drop superfluous %s formatting from assertions in the guest_memfd overlap
testcases, as the string being printed doesn't require runtime formatting.

No functional change intended.

Reported-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Link: https://patch.msgid.link/20260522172151.3530267-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: selftests: Test guest_memfd binding overlap without GPA overlap
Zongyao Chen [Fri, 22 May 2026 17:21:50 +0000 (10:21 -0700)] 
KVM: selftests: Test guest_memfd binding overlap without GPA overlap

The guest_memfd binding overlap test recreates the deleted slot with GPA
ranges that overlap the still-live slot.  KVM rejects those attempts from
the generic memslot overlap check before reaching kvm_gmem_bind(), so the
test can pass even if guest_memfd binding overlap detection is broken.

Recreate the slot at its original, non-overlapping GPA and use guest_memfd
offsets that overlap the front and back halves of the other slot's binding.
Expand the guest_memfd so the back-half case remains within the file size.

Fixes: 2feabb855df8 ("KVM: selftests: Expand set_memory_region_test to validate guest_memfd()")
Signed-off-by: Zongyao Chen <ZongYao.Chen@linux.alibaba.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
[sean: keep the existing GPA overlap testcases]
Link: https://patch.msgid.link/20260522172151.3530267-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: guest_memfd: Return -EEXIST for overlapping bindings
Zongyao Chen [Fri, 22 May 2026 17:21:49 +0000 (10:21 -0700)] 
KVM: guest_memfd: Return -EEXIST for overlapping bindings

KVM_SET_USER_MEMORY_REGION2 rejects guest_memfd ranges that overlap an
existing binding, but kvm_gmem_bind() currently reports the failure through
its generic -EINVAL path.  That makes binding conflicts indistinguishable
from malformed guest_memfd parameters.

Return -EEXIST when the target guest_memfd range is already bound, matching
the errno used for overlapping GPA memslots and making the two types of
range conflicts report the same class of error to userspace.

Note, returning -EINVAL was definitely not intentional, as guest_memfd
support was accompanied by a selftest to verify that attempting to create
overlapping bindings fails with -EEXIST.  Except the selftest was also
flawed in that it unintentionally overlapped memslot GPAs, and so failed
on KVM's common memslot checks before reaching guest_memfd.

Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Signed-off-by: Zongyao Chen <ZongYao.Chen@linux.alibaba.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
[sean: call out that the original intent was to return -EEXIST]
Link: https://patch.msgid.link/20260522172151.3530267-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoselftests/nolibc: test against -Wwrite-strings
Thomas Weißschuh [Mon, 25 May 2026 08:27:17 +0000 (10:27 +0200)] 
selftests/nolibc: test against -Wwrite-strings

Users may use this warning when building their own applications.
Make sure that nolibc does not trigger any such warnings.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260525-nolibc-write-strings-v2-3-ab5cc16c7b23@weissschuh.net
3 weeks agoselftests/nolibc: use mutable buffer for execve() argv string
Thomas Weißschuh [Mon, 25 May 2026 08:27:16 +0000 (10:27 +0200)] 
selftests/nolibc: use mutable buffer for execve() argv string

The existing code would trigger a warning under -Wwrite-strings which is
about to be enabled. Use a mutable buffer instead. While in this
specific case, casting away the 'const' would be fine, let's avoid casts
which are not really necessary.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260525-nolibc-write-strings-v2-2-ab5cc16c7b23@weissschuh.net
3 weeks agotools/nolibc: cast default values of program_invocation_name
Thomas Weißschuh [Mon, 25 May 2026 08:27:15 +0000 (10:27 +0200)] 
tools/nolibc: cast default values of program_invocation_name

With -Wwrite-strings the plain assignment triggers a warning as a
'const char *' is assigned to a 'char *', removing the const qualifier.

Casting the const away is fine, as there is no valid modification that
can be done to an empty string anyways.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260525-nolibc-write-strings-v2-1-ab5cc16c7b23@weissschuh.net
3 weeks agospi: dt-bindings: spi-qpic-snand: Add ipq5210 compatible
Varadarajan Narayanan [Thu, 14 May 2026 06:45:31 +0000 (12:15 +0530)] 
spi: dt-bindings: spi-qpic-snand: Add ipq5210 compatible

Since the QPIC-SPI-NAND flash controller present in ipq5210 is the same
as the one found in ipq9574, document the ipq5210 compatible and with
ipq9574 as the fallback.

Signed-off-by: Varadarajan Narayanan <varadarajan.narayanan@oss.qualcomm.com>
Link: https://patch.msgid.link/20260514-ipq5210-nand-v1-1-cbdd7492e826@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agothermal: intel: int340x: Check return value of ptc_create_groups()
Aravind Anilraj [Sun, 29 Mar 2026 07:06:42 +0000 (03:06 -0400)] 
thermal: intel: int340x: Check return value of ptc_create_groups()

proc_thermal_ptc_add() ignores the return value of ptc_create_groups()
causing the driver to silenty continue even if sysfs group creation
fails.

The thermal control interface would be unavailable with no indication
of failure.

Check the return value and on failure clean up any sysfs groups that
were successfully created before the error, then propagate the error to
the caller which already handles it correctly via goto err_rem_rapl.

Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>
Link: https://patch.msgid.link/20260329070642.10721-3-aravindanilraj0702@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 weeks agothermal: intel: int340x: Fix potential shift overflow in ptc_mmio_write()
Aravind Anilraj [Sun, 29 Mar 2026 07:06:41 +0000 (03:06 -0400)] 
thermal: intel: int340x: Fix potential shift overflow in ptc_mmio_write()

The value parameter is u32 but is shifted into a u64 register value
without casting first. If the shift amount pushes bits beyond 32, they
are lost. Cast value to u64 before shifting to ensure all bits are
preserved.

Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>
Link: https://patch.msgid.link/20260329070642.10721-2-aravindanilraj0702@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>