git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- ads7871: Fix endianness bug in 16-bit register reads

- lm75: Fix configuration register writes and AS6200/TMP112 setup and
   alarm handling

- lm63: Fix TOCTOU problems

- corsair-psu: Close HID device on probe errors

- ltc2992: Fix overflow and threshold range

- Documentation: fix link to ideapad-laptop.c file

- Remove stale CONFIG_SENSORS_SBRMI Makefile reference

* tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (ads7871) Fix endianness bug in 16-bit register reads
  hwmon: (lm75) Fix configuration register writes.
  hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
  hwmon: (lm63) Add locking to avoid TOCTOU
  hwmon: (corsair-psu) Close HID device on probe errors
  hwmon: Remove stale CONFIG_SENSORS_SBRMI Makefile reference
  Documentation: hwmon: fix link to ideapad-laptop.c file
  hwmon: (ltc2992) Fix u32 overflow in power read path
  hwmon: (ltc2992) Clamp threshold writes to hardware range

Merge tag 'staging-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for 7.1-rc3.  They are:

   - vme_user root device leak fix

   - NULL dereference bugfix in the rtl8723bs driver

  Both of these have been in linux-next all this week with no reported
  issues"

* tag 'staging-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8723bs: os_dep: avoid NULL pointer dereference in rtw_cbuf_alloc
  staging: vme_user: fix root device leak on init failure

Merge tag 'usb-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for 7.1-rc3 to resolve some
  reported issues, and a new device id. These are:

   - usblp driver heap leak fixes

   - ulpi driver memory leak fix

   - typec driver fixes

   - dwc3 driver fix

   - omap dma driver fix

   - new option driver device id addition

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'usb-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: option: add Telit Cinterion LE910Cx compositions
  usb: usblp: fix uninitialized heap leak via LPGETSTATUS ioctl
  usb: usblp: fix heap leak in IEEE 1284 device ID via short response
  usb: dwc3: Move GUID programming after PHY initialization
  usb: typec: tcpm: fix debug accessory mode detection for sink ports
  usb: typec: tcpm: reset internal port states on soft reset AMS
  usb: ulpi: fix memory leak on ulpi_register() error paths
  USB: omap_udc: DMA: Don't enable burst 4 mode

Merge tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- sanitize more input parameters in the core (found by syzkaller)

- usual set of driver fixes (proper completion handling, applying
   quirks, correct workqueue selection...)

- ID additions to simplify dependency handling

- new email address for Peter Rosin

* tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: smbus: reject oversized block transfers in the common path
  MAINTAINERS: Update mail for Peter Rosin
  i2c: stub: Reject I2C block transfers with invalid length
  i2c: Compare the return value of gpiod_get_direction against GPIO_LINE_DIRECTION_OUT
  i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl
  i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids
  dt-bindings: i2c: apple,i2c: Add t8122 compatible
  i2c: stm32f7: reinit_completion() per transfer not per msg
  dt-bindings: i2c: amlogic: Add compatible for T7 SOC
  i2c: testunit: Replace system_long_wq with system_dfl_long_wq

Merge tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

- Fix KASAN sanitization flag for core_$(BITS).o

- Fixes for handling offset values in pseries htmdump

- Fix interrupt mask in cpm1_gpiochip_add16()

- ps3/pasemi fixes to drop redundant result assignment

- Fixes in papr-hvpipe code path

- powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events

Thanks to Aboorva Devarajan, Athira Rajeev, Christophe Leroy (CS GROUP),
Geert Uytterhoeven, Haren Myneni, Krzysztof Kozlowski, Mukesh Kumar
Chaurasiya (IBM), Nathan Chancellor, Ritesh Harjani (IBM), Shivani
Nittor, Sourabh Jain, Thomas Zimmermann, and Venkat Rao Bagalkote.

* tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
  powerpc/pasemi: Drop redundant res assignment
  powerpc/ps3: Drop redundant result assignment
  powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang
  arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files
  powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
  powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()
  powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec
  powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
  pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()
  pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()
  pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info
  pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()
  pseries/papr-hvpipe: Fix the usage of copy_to_user()
  pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
  pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
  pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
  pseries/papr-hvpipe: Fix race with interrupt handler
  powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module
  powerpc/pseries/htmdump: Fix the offset value used in htm status dump
  powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump
  ...

Merge tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

- Fix memory map enumeration bug in the Xen e820 parsing code (Juergen
   Gross)

- Re-enable e820 BIOS fallback if e820 table is empty (David Gow)

* tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/e820: Re-enable BIOS fallback if e820 table is empty
  x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()

ntfs: fix missing kstrdup() error check in ntfs_write_volume_label()

ntfs_write_volume_label() does not check the return value of
kstrdup().  If the allocation fails, vol->volume_label is set to
NULL while the function returns success.  A subsequent
FS_IOC_GETFSLABEL then returns an empty string even though the
on-disk label was updated correctly.

Fix by allocating the new label before taking vol_ni->mrec_lock and
updating any on-disk metadata, so an -ENOMEM from kstrdup() leaves
both the in-memory and on-disk labels untouched and consistent.  On
success the preallocated copy replaces the old vol->volume_label.
Also move mark_inode_dirty_sync() into the success path so that it
is not called when no metadata was actually modified.

Fixes: 6251f0b0de7d ("ntfs: update super block operations")
Suggested-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

Merge tag 'timers-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"Fix CPU hotplug activation race in the timer migration code, by
Frederic Weisbecker"

* tag 'timers-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Fix another hotplug activation race

Merge tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

- Fix spurious failures in rseq self-tests (Mark Brown)

- Fix rseq rseq::cpu_id_start ABI regression due to TCMalloc's creative
   use of the supposedly read-only field

   The fix is to introduce a new ABI variant based on a new (larger)
   rseq area registration size, to keep the TCMalloc use of rseq
   backwards compatible on new kernels (Thomas Gleixner)

- Fix wakeup_preempt_fair() for not waking up task (Vincent Guittot)

- Fix s64 mult overflow in vruntime_eligible() (Zhan Xusheng)

* tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix wakeup_preempt_fair() for not waking up task
  sched/fair: Fix overflow in vruntime_eligible()
  selftests/rseq: Expand for optimized RSEQ ABI v2
  rseq: Reenable performance optimizations conditionally
  rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode
  selftests/rseq: Validate legacy behavior
  selftests/rseq: Make registration flexible for legacy and optimized mode
  selftests/rseq: Skip tests if time slice extensions are not available
  rseq: Revert to historical performance killing behaviour
  rseq: Don't advertise time slice extensions if disabled
  rseq: Protect rseq_reset() against interrupts
  rseq: Set rseq::cpu_id_start to 0 on unregistration
  selftests/rseq: Don't run tests with runner scripts outside of the scripts

Merge tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events fixes from Ingo Molnar:

- Fix deadlock in the perf_mmap() failure path (Peter Zijlstra)

- Intel ACR (Auto Counter Reload) fixes (Dapeng Mi):
     - Fix validation and configuration of ACR masks
     - Fix ACR rescheduling bug causing stale masks
     - Disable the PMI on ACR-enabled hardware
     - Enable ACR on Panther Cover uarch too

* tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Enable auto counter reload for DMR
  perf/x86/intel: Disable PMI for self-reloaded ACR events
  perf/x86/intel: Always reprogram ACR events to prevent stale masks
  perf/x86/intel: Improve validation and configuration of ACR masks
  perf/core: Fix deadlock in perf_mmap() failure path

net: wan: fsl_ucc_hdlc: free tx_skbuff in uhdlc_memclean

When the device is removed all allocated resources should be freed.
In uhdlc_memclean the netdev transmit queue was already stopped. But at
this point we may have pending skb in the transmit queue which must be
freed. Therefore iterate over the tx_skbuff pointers and free all
pending skb. The issue was discovered by sashiko.
Tested on a ls1043a board running HDLC in bus mode on kernel 6.12.

https: //sashiko.dev/#/patchset/20260429114208.941011-1-holger.brunck%40hitachienergy.com
Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC")
Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com>
Link: https://patch.msgid.link/20260507155332.3452319-1-holger.brunck@hitachienergy.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains Netfilter fixes for net:

1) Allow initial x_tables table replacement without emitting an audit
   log message. Delay the register message until after hooks are wired up
   to avoid unnecessary unregister logs during error unwinding.

2) Fix a NULL dereference by allocating hook ops before adding the
   table to the per-netns list. Use `synchronize_rcu()` during error
   unwinding to ensure the table stops processing packets before
   teardown. Defer audit log register message until all operations
   succeed.

3) Refactor xtables to use a single `xt_unregister_table_pre_exit`
   function. Eliminate code duplication by centralizing table
   unregistration logic within the xtables core. ebtables cannot be
   changed due to incompatibility.

4) Unregister xtables templates before module removal. This prevents
   a race condition where userspace instantiates a new table after the
   pernet unreg removed the current table.

5) Add `xtables_unregister_table_exit` to fully unregister netfilter
   tables during module removal. Unlink the table from dying lists,
   then free hook operations.

6) Implement a two-stage removal scheme for ebtables following the
   x_tables pattern. Assign table->ops while holding the ebt mutex to
   prevent exposing partially-filled structures.

7) Fix ebtables module initialization race. Register the template last
   in table initialization functions. Prevent table instantiation before
   pernet operations are available.

8) Fix a race condition in x_tables module initialization. Ensure
   pernet ops are fully set up before exposing the table to userspace.

9) Fix a race condition in ebtables module initialization, similar to
   previous patch.

10) Restore propagation of helper to expected connection, this is a
    fix-for-recent-fix.

11) Validate that the expectation tuple and mask netlink attributes are
    present when adding expectation via nfqueue, this fixes a possible
    null-ptr-deref.

12) Fix possible rare memleak in the SIP helper in case helper has been
    detached from conntrack entry, from Li Xiasong.

13) Fix refcount leak in nft_ct when creating custom expectation, also
    from Li Xiason.

Patches 1-9 from Florian Westphal.

10) Restore propagation of helper to expected connection, this is a
    fix-for-recent-fix.

11) Check that tuple and mask netlink attributes are set when creating an
    expectation via nfqueue.

* tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_ct: fix missing expect put in obj eval
  netfilter: nf_conntrack_sip: get helper before allocating expectation
  netfilter: ctnetlink: check tuple and mask in expectations created via nfqueue
  netfilter: nf_conntrack_expect: restore helper propagation via expectation
  netfilter: bridge: eb_tables: close module init race
  netfilter: x_tables: close dangling table module init race
  netfilter: ebtables: close dangling table module init race
  netfilter: ebtables: move to two-stage removal scheme
  netfilter: x_tables: add and use xtables_unregister_table_exit
  netfilter: x_tables: unregister the templates first
  netfilter: x_tables: add and use xt_unregister_table_pre_exit
  netfilter: x_tables: allocate hook ops while under mutex
  netfilter: x_tables: allow initial table replace without emitting audit log message
====================

Link: https://patch.msgid.link/20260507234509.603182-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drbd: replace genl_magic with explicit netlink serialization

Replace the genl_magic multi-include macro system with explicit
serialization and parsing.

The *_gen files were initially produced from a YNL spec via a
customized ynl-gen-c, but the DRBD netlink family is effectively
frozen, so the generator is kept unmodified.
All new functionality will land in a separate, properly-designed
family.
Carry the resulting code as ordinary in-tree source rather than
landing the spec and generator changes that produced it.

The bulk of the changes are mechanical renames to fit the YNL naming
conventions:
  - Handler functions: drbd_adm_* -> drbd_nl_*_doit/dumpit
  - GENL_MAGIC_VERSION -> DRBD_FAMILY_VERSION
  - GENL_MAGIC_FAMILY_HDRSZ -> sizeof(struct drbd_genlmsghdr)
  - drbd_genl_family -> drbd_nl_family
  - Attribute IDs: T_* -> DRBD_A_*

Remove the nested_attr_tb static global buffer and move to a per-call
allocation approach: each deserialization manages its own nested
attribute table. This will be needed anyway when we eventually move
to parallel_ops, and it's actually simpler this way, so make the
move now.

Replace the functionality of the "sensitive" flag: this was only used
by a single field (shared_secret); open-code redaction logic for that
locally.

Also replace the "invariant" flag: this only had a couple of users,
and those basically never change. Hard code the check directly inline.

The genl_family struct itself is defined manually in drbd_nl.c.

Also replace a couple of drbd-specific wrappers (nla_put_u64_0pad,
drbd_nla_find_nested) with standard kernel functions while we're at
it.

Finally, completely remove the genl_magic system; DRBD was its only
user.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260506124541.1951772-3-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

drbd: move UAPI headers to include/uapi/linux/

drbd.h and drbd_limits.h contain only type definitions, enums, and
constants shared between kernel and userspace. These should be part of
UAPI.

Split the genl_api header into two: the genlmsghdr and the enums are
UAPI, the rest stays there for now (it will be removed by one of the
next commits in this series).

drbd_config.h is clearly DRBD-internal, so move it there.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260506124541.1951772-2-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL

The SCTP_SENDALL path in sctp_sendmsg() iterates ep->asocs with
list_for_each_entry_safe(), which caches the next entry in @tmp before
the loop body runs.  The body calls sctp_sendmsg_to_asoc(), which may
drop the socket lock inside sctp_wait_for_sndbuf().

While the lock is dropped, another thread can SCTP_SOCKOPT_PEELOFF the
association cached in @tmp, migrating it to a new endpoint via
sctp_sock_migrate() (list_del_init() + list_add_tail() to
newep->asocs), and optionally close the new socket which frees the
association via kfree_rcu().  The cached @tmp can also be freed by a
network ABORT for that association, processed in softirq while the
lock is dropped.

sctp_wait_for_sndbuf() revalidates @asoc (the current entry) on re-lock
via the "sk != asoc->base.sk" and "asoc->base.dead" checks, but nothing
revalidates @tmp.  After a successful return, the iterator advances to
the stale @tmp, yielding either a use-after-free (if the peeled socket
was closed) or a list-walk onto the new endpoint's list head (type
confusion of &newep->asocs as a struct sctp_association *).

Both are reachable from CapEff=0; the type-confusion path gives
controlled indirect call via the outqueue.sched->init_sid pointer.

Fix by re-deriving @tmp from @asoc after sctp_sendmsg_to_asoc()
returns.  @asoc is known to still be on ep->asocs at that point: the
only callers that list_del an association from ep->asocs are
sctp_association_free() (which sets asoc->base.dead) and
sctp_assoc_migrate() (which changes asoc->base.sk), and
sctp_wait_for_sndbuf() checks both under the lock before any
successful return; a tripped check propagates as err < 0 and the loop
bails before the re-derive.

The SCTP_ABORT path in sctp_sendmsg_check_sflags() returns 0 and the
loop hits 'continue' before sctp_sendmsg_to_asoc() is ever called, so
the @tmp cached by list_for_each_entry_safe() still covers the
lock-held free that ba59fb027307 ("sctp: walk the list of asoc
safely") was added for.

Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
Cc: stable@vger.kernel.org
Signed-off-by: Ben Morris <bmorris@anthropic.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20260508001455.3137-1-joycathacker@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ti: icssm-prueth: fix eth_ports_node leak in probe

The error path on of_property_read_u32() failure inside
icssm_prueth_probe() returns without putting eth_ports_node,
which was acquired before the for_each_child_of_node() loop.

Drop it before returning.

Fixes: 511f6c1ae093 ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver")
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Link: https://patch.msgid.link/20260506195813.641610-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: avoid unregistering netdev on register failure

lan966x_probe_port() stores the newly allocated net_device in the
port before calling register_netdev(). If register_netdev() fails,
the probe error path calls lan966x_cleanup_ports(), which sees
port->dev and calls unregister_netdev() for a device that was never
registered.

Destroy the phylink instance created for this port and clear port->dev
before returning the registration error. The common cleanup path now skips
ports without port->dev before reaching the registered netdev cleanup, so
it only handles ports that reached the registered-netdev lifetime.

This also avoids treating an uninitialized FDMA netdev and the failed port
as a NULL == NULL match in the common cleanup path.

Fixes: d28d6d2e37d1 ("net: lan966x: add port module support")
Co-developed-by: Ijae Kim <ae878000@gmail.com>
Signed-off-by: Ijae Kim <ae878000@gmail.com>
Signed-off-by: Myeonghun Pak <mhun512@gmail.com>
Link: https://patch.msgid.link/20260506124331.31945-1-mhun512@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:

- ptrace(PTRACE_SETREGSET) fix to zero the target's fpsimd_state rather
than the tracer's

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's

Merge tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fixes from Bjorn Helgaas:

- Don't fallback to bus reset after failed slot reset; a bus reset
   isn't safe if the .reset_slot() callback is implemented (Keith Busch)

- Update saved_config_space upon resource assignment to fix passthrough
   regressions when x86 pcibios_assign_resources() updates BARs (Lukas
   Wunner)

- Initialize a temporary pci_dev->dev in sysfs 'new_id' attribute to
   fix a lockdep regression after driver_override was moved from PCI to
   device core (Samiullah Khawaja)

- Update MAINTAINERS email addresses (Marek Vasut, Hans Zhang)

- Add MAINTAINERS reviewer for PCIe Cadence IP (Aksh Garg)

* tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer
  MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1
  MAINTAINERS: Update Marek Vasut email for PCIe R-Car
  PCI: Initialize temporary device in new_id_store()
  PCI: Update saved_config_space upon resource assignment
  PCI: Don't fallback to bus reset after failed slot reset

Merge branch 'intel-wired-lan-driver-updates-2026-05-04-i40e-ice-idpf'

Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf)

Matt Volrath fixes two issues with the i40e driver probe routine, ensuring
that PTP is properly cleaned up if the probe fails.

Emil corrects the initialization of the read_dev_clk_lock spinlock in
idpf_ptp_init, ensuring it is initialized prior to when the
ptp_schedule_worker() is called.

Greg KH fixes a double free and use-after free in the idpf auxiliary device
error paths.

Marcin fixes ice_set_rss_hfunc() to use the correct q_opt_flags field,
correcting the assignment and preventing submission of invalid data to the
firmware.

Bart corrects the locking in ice_dcb_rebuild(), ensuring that the tc_mutex
is held over the entire operation.

Ivan fixes the rclk pin state get for E810 devices, ensuring the index is
properly offset by the base_rclk_idx value. This ensures that the correct
pin index is used to look up recovered clock state. He additionally adds
bounds checking to prevent attempting to access pins outside of the pin
state array.

Ivan also moves the CGU register macros to the top of ice_dpll.h, inside
the header guard to avoid duplicate macro definitions should the ice_dpll.h
header is included multiple times.
====================

Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: dpll: fix misplaced header macros

The CGU register definitions (ICE_CGU_R10, ICE_CGU_R11 and related field
masks) were placed after the #endif of the _ICE_DPLL_H_ include guard,
leaving them unprotected. Move them inside the guard.

Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-8-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: dpll: fix rclk pin state get for E810

The refactoring of ice_dpll_rclk_state_on_pin_get() to use
ice_dpll_pin_get_parent_idx() omitted the base_rclk_idx adjustment that was
correctly added in the ice_dpll_rclk_state_on_pin_set() path. This breaks
E810 devices where base_rclk_idx is non-zero, causing the wrong hardware
index to be used for pin state lookup and incorrect recovered clock state
to be reported via the DPLL subsystem. E825C is unaffected as its
base_rclk_idx is 0.

While at it, add bounds check against ICE_DPLL_RCLK_NUM_MAX on hw_idx after
the base_rclk_idx subtraction in both ice_dpll_rclk_state_on_pin_{get,set}()
to prevent out-of-bounds access on the pin state array.

Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-7-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: fix locking in ice_dcb_rebuild()

Move the mutex_lock() call up to prevent that DCB settings change after
the first ice_query_port_ets() call. The second ice_query_port_ets()
call in ice_dcb_rebuild() is already protected by pf->tc_mutex.

This also fixes a bug in an error path, as before taking the first
"goto dcb_error" in the function jumped over mutex_lock() to
mutex_unlock().

This bug has been detected by the clang thread-safety analyzer.

Cc: intel-wired-lan@lists.osuosl.org
Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-6-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: fix setting RSS VSI hash for E830

ice_set_rss_hfunc() performs a VSI update, in which it sets hashing
function, leaving other VSI options unchanged. However, ::q_opt_flags is
mistakenly set to the value of another field, instead of its original
value, probably due to a typo. What happens next is hardware-dependent:

On E810, only the first bit is meaningful (see
ICE_AQ_VSI_Q_OPT_PE_FLTR_EN) and can potentially end up in a different
state than before VSI update.

On E830, some of the remaining bits are not reserved. Setting them
to some unrelated values can cause the firmware to reject the update
because of invalid settings, or worse - succeed.

Reproducer:
sudo ethtool -X $PF1 equal 8

Output in dmesg:
Failed to configure RSS hash for VSI 6, error -5

Fixes: 352e9bf23813 ("ice: enable symmetric-xor RSS for Toeplitz hash function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-5-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: fix double free and use-after-free in aux device error paths

When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
auxiliary_device_uninit() and falls through to err_aux_dev_init. The
uninit call will trigger put_device(), which invokes the release
callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
iadev. The fall-through then reads adev->id from the freed iadev for
ida_free() and double-frees iadev with kfree().

Free the IDA slot and clear the back-pointer before uninit, while adev
is still valid, then return immediately.

Commit 65637c3a1811 ("idpf: fix UAF in RDMA core aux dev deinitialization")
fixed the same use-after-free in the matching unplug path in this file but
missed both probe error paths.

Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: stable@kernel.org
Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
Fixes: f4312e6bfa2a ("idpf: implement core RDMA auxiliary dev create, init, and destroy")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-4-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()

In idpf_ptp_init(), read_dev_clk_lock is initialized after
ptp_schedule_worker() had already been called (and after
idpf_ptp_settime64() could reach the lock). The PTP aux worker
fires immediately upon scheduling and can call into
idpf_ptp_read_src_clk_reg_direct(), which takes
spin_lock(&ptp->read_dev_clk_lock) on an uninitialized lock, triggering
the lockdep "non-static key" warning:

[12973.796587] idpf 0000:83:00.0: Device HW Reset initiated
[12974.094507] INFO: trying to register non-static key.
...
[12974.097208] Call Trace:
[12974.097213]  <TASK>
[12974.097218]  dump_stack_lvl+0x93/0xe0
[12974.097234]  register_lock_class+0x4c4/0x4e0
[12974.097249]  ? __lock_acquire+0x427/0x2290
[12974.097259]  __lock_acquire+0x98/0x2290
[12974.097272]  lock_acquire+0xc6/0x310
[12974.097281]  ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097311]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[12974.097318]  ? finish_task_switch.isra.0+0xd2/0x350
[12974.097330]  ? __pfx_ptp_aux_kworker+0x10/0x10 [ptp]
[12974.097343]  _raw_spin_lock+0x30/0x40
[12974.097353]  ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097373]  idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097391]  ? kthread_worker_fn+0x88/0x3d0
[12974.097404]  ? kthread_worker_fn+0x4e/0x3d0
[12974.097411]  idpf_ptp_update_cached_phctime+0x26/0x120 [idpf]
[12974.097428]  ? _raw_spin_unlock_irq+0x28/0x50
[12974.097436]  idpf_ptp_do_aux_work+0x15/0x20 [idpf]
[12974.097454]  ptp_aux_kworker+0x20/0x40 [ptp]
[12974.097464]  kthread_worker_fn+0xd5/0x3d0
[12974.097474]  ? __pfx_kthread_worker_fn+0x10/0x10
[12974.097482]  kthread+0xf4/0x130
[12974.097489]  ? __pfx_kthread+0x10/0x10
[12974.097498]  ret_from_fork+0x32c/0x410
[12974.097512]  ? __pfx_kthread+0x10/0x10
[12974.097519]  ret_from_fork_asm+0x1a/0x30
[12974.097540]  </TASK>

Move the call to spin_lock_init() up a bit to make sure read_dev_clk_lock
is not touched before it's been initialized.

Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-3-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i40e: Cleanup PTP pins on probe failure

PTP pin structs are allocated early in probe, but never cleaned up.

Fix this by calling i40e_ptp_free_pins in the error path.

To support this, i40e_ptp_free_pins is added to the header and
pin_config is correctly nullified after being freed.

This has been an issue since i40e_ptp_alloc_pins was introduced.

Fixes: 1050713026a08 ("i40e: add support for PTP external synchronization clock")
Reported-by: Kohei Enju <kohei@enjuk.jp>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Kohei Enju <kohei@enjuk.jp>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-2-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i40e: Cleanup PTP registration on probe failure

Fix two conditions which would leak PTP registration on probe failure:

1. i40e_setup_pf_switch can encounter an error in
   i40e_setup_pf_filter_control, call i40e_ptp_init, then return
   non-zero, sending i40e_probe to err_vsis.

2. i40e_setup_misc_vector can return non-zero, sending i40e_probe to
   err_vsis.

Both of these conditions have been present since PTP was introduced in
this driver.

Found with coccinelle.

Fixes: beb0dff1251db ("i40e: enable PTP")
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-1-a5ea4dc837a9@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: shaper: Reject reparenting of existing nodes

When an existing node-scope shaper is moved to a different parent
via the group operation, the framework fails to update the leaves
count on both the old and new parent shapers. Only newly created
nodes (handle.id == NET_SHAPER_ID_UNSPEC) trigger the parent
leaves increment at line 1039.

This causes the parent's leaves counter to diverge from the
actual number of children in the xarray. When the node is later
deleted, pre_del_node() allocates an array sized by the stale
leaves count, but the xarray iteration finds more children than
expected, hitting the WARN_ON_ONCE guard and returning -EINVAL.

Rather than adding reparenting support with complex leaves count
bookkeeping, reject group calls that attempt to change an existing
node's parent. Updates to an existing node's rate or leaves under
the same parent remain permitted. We expect that for any modification
of the topology user should always create new groups and let the
kernel garbage collect the leaf-less nodes.

Fixes: 5d5d4700e75d ("net-shapers: implement NL group operation")
Signed-off-by: Mohsin Bashir <hmohsin@meta.com>
Link: https://patch.msgid.link/20260506233745.111895-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

genetlink: free the skb on 'group >= family->n_mcgrps'

These methods generally consume ownership of the provided skb, so even
if an error path is encountered, the skb is freed. This is because the
very first thing they do after some initial setup is to unconditionally
consume the skb via consume_skb(skb). Any subsequent errors lead to the
core netlink layer freeing the skb.

However, there is one check that occurs before ownership is passed,
which is the check for the group index. So if this error condition is
encountered, then the skb is leaked. This error condition is generally
considered a violation of the netlink API, so it's not expected to occur
under normal circumstances. For the same reason, no callers check for
this error condition, and no callers need to be adjusted. However, we
should still follow the same ownership semantics of the rest of the
function. Thus, free the skb in this codepath.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Matthew Maurer <mmaurer@google.com>
Fixes: 2a94fe48f32c ("genetlink: make multicast groups const, prevent abuse")
Link: https://lore.kernel.org/r/845b36ba-7b3a-41f2-acb2-b284f253e2ca@lunn.ch
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260506-genlmsg-return-v2-1-a63ee2a055d6@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: nsh: fix incorrect header length macros

NSH header length is a 6-bit field that encodes the total length of
the header in 4-byte words.  So the maximum length is 0b111111 * 4,
which is 252 and not 256.  The maximum context length is the same
number minus the length of the base header (8), so 244.

These macros are used to validate push_nsh() action in openvswitch.
Miscalculation here doesn't cause any real issues.  In the worst case
the oversized context is truncated while building the header, so we'll
construct and send a broken packet, which is not a big problem, as any
receiver should validate the fields.  No invalid memory accesses will
happen during the header push.  But we should fix the macros to reject
the incorrect actions in the first place.

Using previously defined values and calculating the length instead
of defining numbers directly, so it's easier to understand where they
come from and harder to make a mistake.

Fixes: 1f0b7744c505 ("net: add NSH header structures and helpers")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20260507120434.2962505-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: fix NULL pointer dereference in phy_reply_size

In phy_prepare_data(), several strings such as 'name', 'drvname',
'upstream_sfp_name', and 'downstream_sfp_name' are allocated using
kstrdup(). However, these allocations were not checked for failure.

If kstrdup() fails for 'name', it returns NULL while the function
continues. This leads to a kernel NULL pointer dereference and panic
later in phy_reply_size() when it unconditionally calls strlen() on
the NULL pointer.

While other strings like 'upstream_sfp_name' might be checked before
access in certain code paths, failing to handle these allocations
consistently can lead to incomplete data reporting or hidden bugs.

Fix this by adding proper NULL checks for all kstrdup() calls in
phy_prepare_data() and implement a centralized error handling path
using goto labels to ensure all previously allocated resources are
freed on failure.

Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump")
Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260507131738.1173835-1-2022090917019@std.uestc.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: change maintainers for macb Ethernet driver

I would like to hand over the macb maintenance to Théo, as I'm unable to
keep up with the recent flow of patches for this driver. After speaking
with Claudiu, he indicated that he is in the same position as me.
To help with this work, Conor has agreed to act as a reviewer.

I was given responsibility for this driver years ago, and I'm glad to
see it continue with talented developers.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260507120444.9733-1-nicolas.ferre@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: napi: Avoid gro timer misfiring at end of busypoll

When in irq deferral mode (defer-hard-irqs > 0), a short enough
gro-flush timeout can trigger before NAPI_STATE_SCHED is cleared if the
last poll in busy_poll_stop() takes too long. This can have the effect
of leaving the queue stuck with interrupts disabled and no timer armed
which results in a tx timeout if there is no subsequent busypoll cycle.

To prevent this, defer the gro-flush timer arm after the last poll.

Fixes: 7fd3253a7de6 ("net: Introduce preferred busy-polling")
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260506090808.820559-2-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-flowlabel-per-netns-budget-for-unprivileged-callers'

Maoyi Xie says:

====================
ipv6: flowlabel: per-netns budget for unprivileged callers

From: Maoyi Xie <maoyi.xie@ntu.edu.sg>

This series fixes the cross-tenant DoS in net/ipv6/ip6_flowlabel.c.
v1 through v6 were single-patch postings, each in its own thread.
v6 review pointed out that the existing fl_size read in
mem_check() and the corresponding write in fl_intern() are not in
the same critical section. v7 split the work into 2 patches.

Patch 1/2 is a prerequisite. It moves spin_lock_bh(&ip6_fl_lock)
and the matching unlock from fl_intern() into its only caller
ipv6_flowlabel_get(), so the mem_check() call runs under the same
critical section as the fl_intern() insert. With all writers and
the read of fl_size under the lock, fl_size is converted from
atomic_t to plain int. This is independent of the per-netns
budget. It also makes 2/2 backportable without conflicts.

Patch 2/2 is the v6 patch, rebased on 1/2.

  - flowlabel_count is plain int rather than atomic_t, since the
    previous patch put all writers and readers under ip6_fl_lock.
  - In ip6_fl_gc(), fl_free() is now placed below the fl_size
    and flowlabel_count decrements, removing the v6 cache of
    fl->fl_net.
  - In ip6_fl_purge(), fl_free() stays in its original position.
    The function argument net is used for flowlabel_count.
  - mem_check() uses spaces around the / operator on all four
    expressions, addressing the checkpatch note in v6 review.

Numeric budget (preserved from v6):

  pre-patch:
    global non-CAP_NET_ADMIN budget = FL_MAX_SIZE - FL_MAX_SIZE/4
                                    = 4096 - 1024 = 3072
    per-actor reach                 = 3072

  post-patch:
    FL_MAX_SIZE doubled to 8192
    global non-CAP_NET_ADMIN budget = 8192 - 2048 = 6144
    per-netns ceiling               = 6144 / 2 = 3072
    per-actor reach                 = 3072 (preserved)

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

Reproducer (KASAN VM, 4 cores, qemu): unprivileged netns A holds
3072 flowlabels via 100 procs. Fresh unprivileged netns B then
allocates 32 flowlabels (the FL_MAX_PER_SOCK ceiling for one
socket), the same as a clean baseline. Without the per-netns
ceiling, netns A could push fl_size past FL_MAX_SIZE - FL_MAX_SIZE
/ 4 and netns B would see allocations denied.
====================

Link: https://patch.msgid.link/20260506082416.2259567-1-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: flowlabel: enforce per-netns limit for unprivileged callers

fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are
file scope and shared across netns. mem_check() reads fl_size to
decide whether to deny non-CAP_NET_ADMIN callers. capable() runs
against init_user_ns, so an unprivileged user in any non-init
userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and
starve every other unprivileged userns on the host.

Add struct netns_ipv6::flowlabel_count, bumped and decremented
next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new
field fills the existing 4-byte hole after ipmr_seq, so struct
netns_ipv6 stays the same size on 64-bit builds.

Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the
file was added. Machines and connection counts have grown.

mem_check() folds an extra per-netns ceiling into the existing
non-CAP_NET_ADMIN conditional. The ceiling is half of the total
budget that unprivileged callers have ever been able to use, i.e.
(FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With
FL_MAX_SIZE doubled, this preserves the original per-user reach
of 3K (what an unprivileged caller could already obtain before
this change), while forcing an attacker to spread allocations
across at least two netns to exhaust the global non-CAP_NET_ADMIN
budget.

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

The previous patch took ip6_fl_lock across mem_check and
fl_intern, so the new flowlabel_count read in mem_check and the
new flowlabel_count++ in fl_intern run under the same critical
section. flowlabel_count is therefore plain int, like fl_size.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern

mem_check() in net/ipv6/ip6_flowlabel.c reads fl_size without
holding ip6_fl_lock. fl_intern() takes the lock immediately
afterwards. The two checks therefore race against concurrent
fl_intern, ip6_fl_gc and ip6_fl_purge writers, which makes the
mem_check budget check approximate.

Move spin_lock_bh(&ip6_fl_lock) and the matching unlock from
fl_intern() into its only caller ipv6_flowlabel_get(). The
mem_check() call now runs under the same critical section as the
fl_intern() insert, so the budget check is exact.

With all writers and the read of fl_size under ip6_fl_lock,
convert fl_size from atomic_t to plain int. The four sites that
update or read fl_size are fl_intern (insert path), ip6_fl_gc
(garbage collector, the !sched check and the per-entry decrement),
ip6_fl_purge (per-netns purge), and mem_check (budget check), and
all four now run under ip6_fl_lock.

This is a prerequisite for adding a per-netns budget alongside
fl_size. The follow-up patch adds netns_ipv6::flowlabel_count and
folds it into mem_check().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506082416.2259567-2-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Add self for the 3c509 network driver

It appears there's a need for a maintainer for the 3Com EtherLink III
family of Ethernet network adapters. There is documentation available
and the driver is very mature so the task ought to be of little hassle,
so I think I should be able to squeeze in any issues to be addressed.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/alpine.DEB.2.21.2604271056460.28583@angie.orcam.me.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-two-fixes-for-socket-migration-in-reqsk_timer_handler'

Kuniyuki Iwashima says:

====================
tcp: Two fixes for socket migration in reqsk_timer_handler().

The series fixes two bugs in the error path of socket migration
in reqsk_timer_handler().

Patch 1 fixes a potential UAF in reqsk_timer_handler().

Patch 2 fixes imbalanced icsk_accept_queue count.
====================

Link: https://patch.msgid.link/20260506035954.1563147-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Fix imbalanced icsk_accept_queue count.

When TCP socket migration happens in reqsk_timer_handler(),
@sk_listener will be updated with the new listener.

When we call __inet_csk_reqsk_queue_drop(), the listener must
be the one stored in req->rsk_listener.

The cited commit accidentally replaced oreq->rsk_listener with
sk_listener, leading to imbalanced icsk_accept_queue count.

Let's pass the correct listener to __inet_csk_reqsk_queue_drop().

Fixes: e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506035954.1563147-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Fix potential UAF in reqsk_timer_handler().

When TCP socket migration fails at inet_ehash_insert() in
reqsk_timer_handler(), we jump to the no_ownership: label
and free the new reqsk immediately with __reqsk_free().

Thus, we must stop the new reqsk's timer before jumping to the
label, but the timer might be missed since the cited commit,
resulting in UAF.

As we are in the original reqsk's timer context, we can safely
call timer_delete_sync() for the new reqsk.

Let's pass false to __inet_csk_reqsk_queue_drop() to stop
the new reqsk's timer.

Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506035954.1563147-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

x86/cpuid: Introduce <asm/cpuid/leaf_types.h>

To centralize all CPUID access across the x86 subsystem, introduce
<asm/cpuid/leaf_types.h>. It is generated by the x86-cpuid-db project¹ and
provides C99 bitfield listings for all publicly known CPUID leaves.

¹ https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v3.0/CHANGELOG.rst

Suggested-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de

x86/cpuid: Rename cpuid_leaf()/cpuid_subleaf() APIs

A new CPUID model will be added where its APIs will be designated as the
official CPUID API. Free the cpuid_leaf() and cpuid_subleaf() function
names for that API. Rename them accordingly to cpuid_read() and
cpuid_read_subleaf().

For kernel/cpuid.c, rename its local file operations read function from
cpuid_read() to cpuid_read_f() so that it does not conflict with the new
API.

No functional change.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de

x86/cpu: Do not include the CPUID API header in asm/processor.h

asm/processor.h includes asm/cpuid/api.h but it does not need it.

Remove the include.

This allows the CPUID APIs header to include <asm/processor.h> at a later
step without introducing a circular dependency.

Note, all call sites which implicitly included the CPUID API through
<asm/processor.h> have been modified to explicitly include the CPUID APIs
instead.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de

MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer

I wish to contribute to the review process for Cadence PCIe IP drivers,
hence add myself as a reviewer.

Signed-off-by: Aksh Garg <a-garg7@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508060951.840233-1-a-garg7@ti.com

MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1

Update my email address as my work email account is no longer in use.

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508023006.1787674-1-18255117159@163.com

MAINTAINERS: Update Marek Vasut email for PCIe R-Car

Use up to date address. No functional change.

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260428052030.51101-1-marek.vasut+renesas@mailbox.org

PCI: Initialize temporary device in new_id_store()

When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override().  That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").

The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.

Initialize the temporary pci_dev to fix this.

Repro:

  Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:

  # echo "8086 10f5" > /sys/bus/pci/drivers/e1000e/new_id

  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   register_lock_class+0x77e/0x790
   lock_acquire+0xbf/0x2e0
   pci_match_device+0x24/0x180
   new_id_store+0x189/0x1d0
   kernfs_fop_write_iter+0x14f/0x210
   vfs_write+0x263/0x5e0
   ksys_write+0x79/0xf0
   do_syscall_64+0x117/0xf80

Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure")
Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com

PCI: Update saved_config_space upon resource assignment

Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):

  ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
  ddbridge 0000:05:00.0: cannot read registers
  ddbridge 0000:05:00.0: fail

BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window.  The kernel corrects the BAR assignment:

  pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
  pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned

Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:

  https://git.kernel.org/tglx/history/c/a06a30144bbc

No other architecture performs such a late BAR correction.

Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit
4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.

Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction.  Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:

  vfio_pci_core_register_device()
    vfio_pci_set_power_state()
      pci_restore_state()

But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7.  Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.

Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well.  The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.

Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.

To robustly fix the issue, always update saved_config_space upon resource
assignment.

Reported-by: Bernd Schumacher <bernd@bschu.de>
Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/
Reported-by: Alexandre N. <an.tech@mailo.com>
Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/
Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Bernd Schumacher <bernd@bschu.de>
Tested-by: Alexandre N. <an.tech@mailo.com>
Cc: stable@vger.kernel.org # v6.12+
Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de

bpf: Free reuseport cBPF prog after RCU grace period.

Eulgyu Kim reported the splat below with a repro. [0]

The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.

The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().

If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.

Let's defer freeing the reuseport cBPF prog after one RCU
grace period.

Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().

[0]:
BUG: KASAN: vmalloc-out-of-bounds in reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
Read of size 4 at addr ffffc9000051e004 by task slowme/10208
CPU: 6 UID: 1000 PID: 10208 Comm: slowme Not tainted 7.0.0-geb7ac95ff75e #32 PREEMPT(full)
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
udp4_lib_lookup2+0x3bc/0x950 net/ipv4/udp.c:495
__udp4_lib_lookup+0x768/0xe20 net/ipv4/udp.c:723
__udp4_lib_lookup_skb+0x297/0x390 net/ipv4/udp.c:752
__udp4_lib_rcv+0x1312/0x2620 net/ipv4/udp.c:2752
ip_protocol_deliver_rcu+0x282/0x440 net/ipv4/ip_input.c:207
ip_local_deliver_finish+0x3bb/0x6f0 net/ipv4/ip_input.c:241
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
__netif_receive_skb_one_core net/core/dev.c:6181 [inline]
__netif_receive_skb net/core/dev.c:6294 [inline]
process_backlog+0xaa4/0x1960 net/core/dev.c:6645
__napi_poll+0xae/0x340 net/core/dev.c:7709
napi_poll net/core/dev.c:7772 [inline]
net_rx_action+0x5d7/0xf50 net/core/dev.c:7929
handle_softirqs+0x22b/0x870 kernel/softirq.c:622
do_softirq+0x76/0xd0 kernel/softirq.c:523
</IRQ>
<TASK>
__local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:924 [inline]
__dev_queue_xmit+0x1dd7/0x3710 net/core/dev.c:4890
neigh_output include/net/neighbour.h:556 [inline]
ip_finish_output2+0xca9/0x1070 net/ipv4/ip_output.c:237
NF_HOOK_COND include/linux/netfilter.h:307 [inline]
ip_output+0x29f/0x450 net/ipv4/ip_output.c:438
ip_send_skb+0x45/0xc0 net/ipv4/ip_output.c:1508
udp_send_skb+0xb04/0x1510 net/ipv4/udp.c:1195
udp_sendmsg+0x1a71/0x2350 net/ipv4/udp.c:1485
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
__sys_sendto+0x554/0x680 net/socket.c:2206
__do_sys_sendto net/socket.c:2213 [inline]
__se_sys_sendto net/socket.c:2209 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2209
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x160/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x415a2d
Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6bc31e41e8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f6bc31e4cdc RCX: 0000000000415a2d
RDX: 0000000000000001 RSI: 00007f6bc31e421f RDI: 0000000000000003
RBP: 00007f6bc31e4240 R08: 00007f6bc31e4220 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000212 R12: 00007f6bc31e46c0
R13: ffffffffffffffb8 R14: 0000000000000000 R15: 00007ffc9b0d70b0
</TASK>

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Reported-by: Eulgyu Kim <eulgyukim@snu.ac.kr>
Reported-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20260426012647.3233119-1-kuniyu@google.com

Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Fix for ublk not doing an actual issue from the task_work fallback
   path. Any request hitting that should be canceled automatically

- Fix for uring_cmd prep side handling, for the block side uring_cmd
   discard handling

- Fix for missing validation of the io and physical block size shifts

- Fix for a use-after-free in ublk's cancel command handling

* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  ublk: fix use-after-free in ublk_cancel_cmd()
  ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
  block: only read from sqe on initial invocation of blkdev_uring_cmd()
  ublk: don't issue uring_cmd from fallback task work

Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Ensure that the absolute timeouts for both the command side and the
   waiting side honor the callers time namespace

- Ensure tracked NAPI entries are cleared at unregistration time, as
   the NAPI polling loop checks the list state rather than the general
   NAPI state. This can lead to NAPI polling even after unregistration
   has been done. If unregistered, all NAPI polling should be disabled

- Fix for eventfd recursive invocation handling

* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
  io_uring/eventfd: reset deferred signal state
  io_uring/napi: clear tracked NAPI entries on unregister

Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"

Some older systems don't support CPPC in the firmware and this just makes
noise for them when booting. Drop back to debug.

This reverts commit 21fb59ab4b9767085f4fe1edbdbe3177fbb9ec97.

Fixes: 21fb59ab4b976 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn")
Suggested-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260504230141.484743-2-mario.limonciello@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: PMIC: Replace mutex_lock/unlock() with guard()/scoped_guard()

Replace mutex_lock() and unlock() calls with the newer guard() and
scoped_guard() macros to make the code more straightforward.

Signed-off-by: Maxwell Doose <m32285159@gmail.com>
[ rjw: Subject tweak, changelog edits ]
Link: https://patch.msgid.link/20260430013507.46259-1-m32285159@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'bpf-tcp-fix-type-confusion-in-bpf-helper-functions'

Kuniyuki Iwashima says:

====================
bpf: tcp: Fix type confusion in bpf helper functions.

bpf_tcp_sock() only check if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

The same issues exist in other bpf functions:

  * bpf_mptcp_sock_from_subflow()
  * bpf_skc_to_tcp_sock()
  * bpf_skc_to_tcp6_sock()
  * sol_tcp_sockopt()

Patch 1 fixes bpf_tcp_sock() and Patch 2 adds a test for it.
Patch 3 ~ 6 fix the rest of the functions above.

Changes:
  v2:
    * Inverse if (err) to if (!err) in the selftest
    * Add patch 3 ~ 6

  v1: https://lore.kernel.org/bpf/20260430184405.1227386-1-kuniyu@google.com/
      https://lore.kernel.org/mptcp/20260430-mptcp-bpf-mptcp-sock-type-v1-1-d2ed5cda7da9@kernel.org/
====================

Link: https://patch.msgid.link/20260504210610.180150-1-kuniyu@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpf: tcp: Fix type confusion in sol_tcp_sockopt().

sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.

Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().

bpf_skc_to_tcp6_sock() only checks if sk->sk_protocol is IPPROTO_TCP
and sk->sk_family is AF_INET6, but RAW socket can bypass it:

socket(AF_INET6, SOCK_RAW, IPPROTO_TCP)

Let's check sk->sk_type too.

Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-6-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().

bpf_skc_to_tcp_sock() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Fixes: 478cfbdf5f13 ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-5-kuniyu@google.com

mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()

bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.

Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260504210610.180150-4-kuniyu@google.com

selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.

Let's extend sockopt_sk.c to cover bpf_tcp_sock() for the
wrong socket type.

Before:
  # ./test_progs -t sockopt_sk
  [  151.948613] ==================================================================
  [  151.951376] BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt+0xc7/0x8e0
  [  151.954159] Read of size 8 at addr ffff88801083d760 by task test_progs/1259
  ...
  run_test:FAIL:getsetsockopt unexpected error: -1 (errno 0)
  #427     sockopt_sk:FAIL

After:
  #427     sockopt_sk:OK

While at it, missing free() is fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-3-kuniyu@google.com

crypto/ccp: Skip SNP_INIT if preparation fails

If snp_prepare() fails, SNP_INIT will fail, so skip it and return early.

Note that this is not a change in initialization behavior: if SNP_INIT fails
even before this change, it will still return an error and
__sev_snp_init_locked() will fail initialization of other SEV modes.

Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/20260429155636.540040-1-tycho@kernel.org

x86/sev: Do not initialize SNP if missing CPUs

The SEV firmware checks that the SNP enable bit is set on each CPU during SNP
initialization, and will fail if not. If there are some CPUs offline, they
will not run the setup functions, so SNP initialization will always fail.

Skip the IPIs in this case and return an error so that the CCP driver can
skip the SNP_INIT that will fail. Also print the CPU masks in order to leave
breadcrumbs so people can figure out what happened.

[ bp: Massage commit message. ]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://20260429155636.540040-1-tycho@kernel.org

nilfs2: fix backing_dev_info reference leak

setup_bdev_super() already initializes sb->s_bdev and takes a
reference on the block device backing_dev_info when assigning sb->s_bdi.

nilfs_fill_super() takes another reference to the same
backing_dev_info and stores it in sb->s_bdi again. The extra
reference is not paired with a matching bdi_put(), since
generic_shutdown_super() releases sb->s_bdi only once.

Drop the redundant bdi_get() in nilfs_fill_super(). The single
reference taken by setup_bdev_super() is enough and is released
during superblock shutdown.

Fixes: c1e012ea9e83 ("nilfs2: use setup_bdev_super to de-duplicate the mount code")
Signed-off-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>

workqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path

For WQ_UNBOUND workqueues, alloc_and_link_pwqs() allocates wq->cpu_pwq
via alloc_percpu() and then calls apply_workqueue_attrs_locked(). On
failure it returns the error directly, bypassing the enomem: label
which holds the only free_percpu(wq->cpu_pwq) in this function.

The caller's error path kfree()s wq without touching wq->cpu_pwq,
leaking one percpu pointer table (nr_cpu_ids * sizeof(void *) bytes) per
failed call.

If kmemleak is enabled, we can see:

  unreferenced object (percpu) 0xc0fffa5b121048 (size 8):
    comm "insmod", pid 776, jiffies 4294682844
    backtrace (crc 0):
      pcpu_alloc_noprof+0x665/0xac0
      __alloc_workqueue+0x33f/0xa20
      alloc_workqueue_noprof+0x60/0x100

Route the error through the existing enomem: cleanup and any error
before this one.

Cc: stable@kernel.org
Fixes: 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: Release PENDING in __queue_work() drain/destroy reject path

The caller of __queue_work() owns WORK_STRUCT_PENDING, won via
test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The
state machine documented above __queue_work() requires that owner
to either hand the token to a pwq (insert_work() -> set_work_pwq()),
hand it to a timer, or release it via set_work_pool_and_clear_pending().
try_to_grab_pending() relies on this: when it observes
"PENDING && off-queue" it busy-loops, trusting the current owner to
make progress.

The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that
contract. It WARN_ONCE()s and bare-returns, leaving work->data with
PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty.

The path is reachable without explicit API abuse: queue_delayed_work()
arms a timer with PENDING set; if drain_workqueue() runs while the
timer is still pending, delayed_work_timer_fn() -> __queue_work() in
softirq context hits the WARN, current is not a wq worker so
is_chained_work() is false, and the work is silently dropped with
PENDING leaked.

Mirror what clear_pending_if_disabled() already does on its analogous
reject path: unpack the off-queue data and call
set_work_pool_and_clear_pending() to release the token before
returning.

I was able to reproduce this by queueing several slow works on
a max_active=1 wq, arm a delayed_work whose timer fires while
drain_workqueue() is blocked, then call cancel_delayed_work_sync().
Without this patch the cancel livelocks at 100% CPU; with it the cancel
returns immediately.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix for two ACL issues (security fix to validate dacloffset better
   and chmod fix)

- Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
   symlinks)

- Two Kerberos fixes including an important one when AES-256 encryption
   chosen

- Fix open_cached_dir problem when directory leases disabled

* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: validate dacloffset before building DACL pointers
  smb/client: fix out-of-bounds read in smb2_compound_op()
  smb/client: fix out-of-bounds read in symlink_data()
  smb: client: Zero-pad short GSS session keys per MS-SMB2
  smb: client: Use FullSessionKey for AES-256 encryption key derivation
  smb: client: use kzalloc to zero-initialize security descriptor buffer
  cifs: abort open_cached_dir if we don't request leases

Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"There's two main series here, fixing issues that came up in the
  Microchip QSPI and Freescale i.MX drivers. Both of those could result
  in some quite noticable issues if they were encountered in production.
  We also have one minor documentation fix in the ch341 driver"

* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: ch341: correct company name in MODULE_DESCRIPTION
  spi: microchip-core-qspi: remove some inline markings
  spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
  spi: microchip-core-qspi: control built-in cs manually
  spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
  spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
  spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()

Merge tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"A straightforward fix for an incorrect description of one of the
regulators on the Qualcomm PMH0101"

* tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix index for pmh0101 ldo16

bpf: tcp: Fix type confusion in bpf_tcp_sock().

bpf_tcp_sock() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Calling bpf_setsockopt() in SOCKOPT prog triggers out-of-bounds
access to another slab object. [0]

Let's use sk_is_tcp().

[0]:
BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt (net/core/filter.c:5519)
Read of size 8 at addr ffff88801083d760 by task test_progs/1259

CPU: 1 UID: 0 PID: 1259 Comm: test_progs Tainted: G OE 7.0.0-11175-gb5c111f4967b #1 PREEMPT(full)
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
kasan_report (mm/kasan/report.c:595)
sol_tcp_sockopt (net/core/filter.c:5519)
__bpf_getsockopt (net/core/filter.c:5633)
bpf_sk_getsockopt (net/core/filter.c:5654)
bpf_prog_629ba00a1601e9f2__setsockopt+0x86/0x22c
__cgroup_bpf_run_filter_setsockopt (./include/linux/bpf.h:1402 ./include/linux/filter.h:722 ./include/linux/filter.h:729 kernel/bpf/cgroup.c:81 kernel/bpf/cgroup.c:2026)
do_sock_setsockopt (net/socket.c:2363)
__x64_sys_setsockopt (net/socket.c:2406)
do_syscall_64 (arch/x86/entry/syscall_64.c:63)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
RIP: 0033:0x7f85f82fe7de
Code: 55 48 63 c9 48 63 ff 45 89 c9 48 89 e5 48 83 ec 08 6a 2c e8 34 69 f7 ff c9 c3 66 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 e1
RSP: 002b:00007ffe59dcecd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85f82fe7de
RDX: 000000000000001c RSI: 0000000000000006 RDI: 000000000000000d
RBP: 00007ffe59dcef20 R08: 000000000000003c R09: 0000000000000000
R10: 00007ffe59dcef00 R11: 0000000000000202 R12: 00007ffe59dcf268
R13: 0000000000000003 R14: 00007f85f9da5000 R15: 000055b2f3201400
</TASK>

The buggy address belongs to the object at ffff88801083d280
which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
allocated 1792-byte region [ffff88801083d280, ffff88801083d980)

Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com

Documentation: core-api/cpu_hotplug: Remove stale cpu0_hotplug docs

Commit e59e74dc48a3 ("x86/topology: Remove CPU0 hotplug option")
removed the 'cpu0_hotplug' option, but its documentation remained in
cpu_hotplug.rst. Remove the stale entry.

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260507134732.254617-1-chao.gao@intel.com

Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
  usual but then a large amount of fixes all over.

  core:
   - fix race condition in handle change ioctl

  fb-helper:
   - fix clipping

  rust:
   - fix unsound initialization
   - fix GEM state cleanup
   - fix wrong ARef import

  ttm:
   - update GPU MM stats on pool shrinking

  i915:
   - Re-enable ccs modifiers on dg2

  nova:
   - fix mailing list

  xe:
   - Add NULL check for media_gt in intel_hdcp_gsc_check_status
   - Fix EAGAIN sign in pf_migration_consume
   - Fix MMIO access using PF view instead of VF view during migration
   - Exclude indirect ring state page from ADS engine state size

  amdgpu:
   - GFX9 fixes
   - Hawaii SMU fixes
   - SDMA4 fix
   - GART fix
   - Userq fixes

  amdkfd:
   - GPUVM TLB flush fix
   - Hotplug fix

  radeon:
   - Hawaii SMU fixes

  bochs:
   - fix managed cleanup

  bridge:
   - tda998x: fix sparse warnings on type correctness

  etnaviv:
   - schedule armed jobs

  exynos:
   - managed bridge cleanup

  ivpu:
   - disallow reexport of GEM buffer objects

  noveau:
   - revert support for GA100

  panel:
   - boe-tv101wum-nl16: use correct MIPI_DSI mode
   - feyjang-fy07024di26a30d: fix error reporting
   - himax-hx83102: use correct MIPI_DSI mode
   - himax-hx83121a: fix error checks
   - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER

  qaic:
   - fix RAS message handling

  qxl:
   - clean up polling

  sti:
   - managed bridge cleanup

* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
  drm: Set old handle to NULL before prime swap in change_handle
  drm/bochs: Drop manual put on probe error path
  drm/xe/guc: Exclude indirect ring state page from ADS engine state size
  drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
  drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
  drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
  drm/exynos: remove bridge when component_add fails
  drm/amdgpu: nuke amdgpu_userq_fence_slab v2
  drm/amdgpu/userq: fix access to stale wptr mapping
  drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
  drm/amdgpu: zero-initialize GART table on allocation
  drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
  drm/radeon: add missing revision check for CI
  drm/amdgpu/pm: align Hawaii mclk workaround with radeon
  drm/amdgpu/pm: add missing revision check for CI
  drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
  drm/amdkfd: Make all TLB-flushes heavy-weight
  drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
  drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
  drm/panel: feiyang-fy07024di26a30d: return display-on error
  ...

drm/ttm: Fix ttm_bo_swapout() infinite LRU walk on swapout failure

When ttm_tt_swapout() fails, the current code calls
ttm_resource_add_bulk_move() followed by ttm_resource_move_to_lru_tail()
to restore the resource's bulk_move membership.

However, ttm_resource_move_to_lru_tail() places the resource at the tail
of the LRU list which, relative to the walk cursor's hitch node (placed
immediately after the resource when it was yielded), puts the resource
*in front of the* the hitch. The next list_for_each_entry_continue() from
the hitch finds the same resource again, causing an infinite loop.

Fix by deferring del_bulk_move to the success path only.

On the success path, TTM_TT_FLAG_SWAPPED has just been set by
ttm_tt_swapout() but the resource is still tracked in the bulk_move range,
so ttm_resource_del_bulk_move()'s !ttm_resource_unevictable() guard would
incorrectly skip the removal. Introduce
ttm_resource_del_bulk_move_unevictable() which bypasses that guard.

Reported-by: Jatin Kataria <jkataria@netflix.com>
Fixes: fc5d96670eb2 ("drm/ttm: Move swapped objects off the manager's LRU list")
Cc: Christian König <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <stable@vger.kernel.org> # v6.13+
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260428094442.16985-1-thomas.hellstrom@linux.intel.com

Merge tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB serial device ids for 7.1-rc3

Here are some new modem device ids.

This one has been in linux-next with no reported issues.

* tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit Cinterion LE910Cx compositions

Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:
"Core:
   - Cache-flushing fix for non-x86 platforms

  AMD-Vi:
   - Security fix when SEV-SNP is enabled
   - Operator precedence fix in DTE setting"

* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/amd: Fix precedence order in set_dte_passthrough()
  iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
  iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
  iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19

sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work

For built with PREEMPT_RT kernels, the scx_disable_irq_workfn() is
called from per-cpu irq_work kthreads context, this means that
when call the scx_dump_state() in the scx_disable_irq_workfn() to
output current->comm/pid, it always output current irq_work kthread's
comm/pid. this commit therefore use the IRQ_WORK_INIT_HARD() to
initialize sch->disable_irq_work to make scx_disable_irq_workfn() is
called from hardirq context.

Fixes: f4a6c506d118 ("sched_ext: Always bounce scx_disable() through irq_work")
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>

arm64/entry: Fix arm64-specific rseq brokenness

Mathias Stearn reports that since v6.19, there are two big issues
affecting rseq:

(1) On arm64 specifically, rseq critical sections aren't aborted when
    they should be.

(2) The 'cpu_id_start' field is no longer written by the kernel in all
    cases it used to be, including some cases where TCMalloc depends on
    the kernel clobbering the field.

This patch fixes issue #1. This patch DOES NOT fix issue #2, which will
need to be addressed by other patches.

The arm64-specific brokenness is a result of commits:

  2fc0e4b4126c ("rseq: Record interrupt from user space")
  39a167560a61 ("rseq: Optimize event setting")

The first commit failed to add a call to rseq_note_user_irq_entry() on
arm64. Thus arm64 never sets rseq_event::user_irq to record that it may
be necessary to abort an active rseq critical section upon return to
userspace. On its own, this commit had no functional impact as the value
of rseq_event::user_irq was not consumed.

The second commit relied upon rseq_event::user_irq to determine whether
or not to bother to perform rseq work when returning to userspace. As
rseq_event::user_irq wasn't set on arm64, this work would be skipped,
and consequently an active rseq critical section would not be aborted.

Fix this by giving arm64 syscall-specific entry/exit paths, and
performing the relevant logic in syscall and non-syscall paths,
including calling rseq_note_user_irq_entry() for non-syscall entry.

Currently arm64 cannot use syscall_enter_from_user_mode(),
syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to
ordering constraints with exception masking, and risk of ABI breakage
for syscall tracing/audit/etc. For the moment the entry/exit logic is
left as arm64-specific, directly using enter_from_user_mode() and
exit_to_user_mode(), but mirroring the generic code.

I intend to follow up with refactoring/cleanup, as we did for kernel
mode entry paths in commit:

  041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()")

... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly.

Fixes: 39a167560a61 ("rseq: Optimize event setting")
Reported-by: Mathias Stearn <mathias@mongodb.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/
Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com

x86/kexec: Push kjump return address even for non-kjump kexec

The version of purgatory code shipped by kexec-tools attempts to look above
the top of its stack to find a return address for a kjump, even in a non-kjump
kexec.

After the commit in Fixes: the word above the stack might not be there,
leading to a fault (which is at least now caught by my exception-handling code
in kexec).

That commit fixed things for the actual kjump path, but no longer
"gratuitously" pushes the unused return address to the stack in the non-kjump
path. Put that *back* in the non-kjump path, to prevent purgatory from
crashing when trying to access it.

Fixes: 2cacf7f23a02 ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context")
Reported-by: Rohan Kakulawaram <rohanka@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Rohan Kakulawaram <rohanka@google.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/32d627134143ffd957891cb697138e839c623211.camel@infradead.org

ntfs: avoid leaking uninitialised bytes in new security descriptors

ntfs_sd_add_everyone() builds the on-disk security descriptor for a
newly created file by kmalloc()'ing a buffer and then partially
filling it in:

sd = kmalloc(sd_len, GFP_NOFS);
...
sd->revision = 1;
sd->control = SE_DACL_PRESENT | SE_SELF_RELATIVE;
...

The buffer is then handed to ntfs_attr_add() and persisted as the
SECURITY_DESCRIPTOR attribute of the new MFT record.  The descriptor
covers a relative security descriptor header, two SIDs (owner and
group), an ACL header, and a single ACE, but several fields inside
those structures are never written before the buffer is committed
to disk:

  - struct security_descriptor_relative
        @alignment (1 byte)
        @sacl (4 bytes; SE_SACL_PRESENT is not set
                                 but the offset still reaches disk)

  - struct ntfs_sid (3 instances: owner, group, ACE.sid)
        identifier_authority.value[0..4] (5 bytes per SID, 15 total
                                          - only value[5] is set)

  - struct ntfs_acl
        @alignment1 (1 byte)
        @alignment2 (2 bytes)

That is 23 bytes of uninitialised slab memory persisted to disk for
every new file or directory the legacy ntfs driver creates.  The
"+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0],
which the existing code does explicitly write to zero, so it is
not part of the leak.

Anything later able to read the SECURITY_DESCRIPTOR attribute - the
same NTFS volume mounted on Windows or by another NTFS reader, an
offline forensics tool, an unprivileged user that ends up with read
access to the volume - can recover those bytes.  The leak persists
for the lifetime of the file on disk, not just the lifetime of the
kernel that wrote it.

Switch the allocation to kzalloc() so every byte the on-disk
descriptor covers is zero before the explicit initialisations run.
While there, replace the bare "return -1" allocation-failure path
with a proper -ENOMEM so the error reaches userspace as a meaningful
errno instead of an unrelated -EPERM.

Found by inspection while auditing fs/ntfs new-inode paths.

Fixes: af0db57d4293 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix out-of-bounds write in ntfs_index_walk_down()

ntfs_index_walk_down() used to update the index traversal depth
directly before writing parent_pos[] and parent_vcn[]. A malformed
directory index with too many child-node levels can therefore advance
pindex past MAX_PARENT_VCN and write past the fixed arrays in struct
ntfs_index_context, corrupting context state used by later index
traversal.

Use ntfs_icx_parent_inc() for walk-down transitions so the existing
depth limit is enforced before the arrays are updated. Make the helper
check the limit before incrementing pindex so failed callers do not
leave the context at an out-of-range depth.

This is reachable by iterating a crafted NTFS directory after the volume
has been mounted, including read-only mounts. The reproducer uses
getdents64() on an index root that points to an excessively deep chain
of child index blocks.

A crafted directory index with a chain of child-node entries reproduced
UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and
subsequent KASAN reports in ntfs_index_walk_up(). With this change, the
same image is rejected with "Index is over 32 level deep" and no KASAN
or UBSAN report is emitted.

Fixes: 0a8ac0c1fa0b ("ntfs: update directory operations")
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix out-of-bounds write in ntfs_rl_collapse_range() merge path

ntfs_rl_collapse_range() merges the run on the left of the collapsed
region with the run on its right when they are contiguous. The contiguous
check chooses a clamped index when @new_1st_cnt is 0:

i = new_1st_cnt == 0 ? 1 : new_1st_cnt;
if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) {

but the merge itself uses the unclamped value:

s_rl = &new_rl[new_1st_cnt - 1];
s_rl->length += s_rl[1].length;

When @new_1st_cnt is 0 this computes &new_rl[-1] and writes 8 bytes
before the kvcalloc() runlist buffer. The path is reachable through
fallocate(FALLOC_FL_COLLAPSE_RANGE) starting at vcn 0 against an
attribute whose first run after the collapsed region and the following
run are holes. In that case ntfs_rle_lcn_contiguous() returns true
because both checked entries are LCN_HOLE, so the merge path is entered
with @new_1st_cnt still 0. Such consecutive holes do not occur on a
well-formed runlist (NTFS keeps runlists coalesced in memory), so this
OOB path is only reachable from a crafted volume.

A normal runlist has no element to the left of vcn 0, so the left/right
merge is not valid when @new_1st_cnt is 0. Require @new_1st_cnt to be
positive before checking or performing the merge. This skips the merge
entirely in that case instead of clamping the merge target.

The out-of-bounds write can corrupt an adjacent slab object. On a
non-KASAN kernel, it is reachable after a crafted NTFS volume has been
mounted read-write with the legacy fs/ntfs driver, by a local user that
has write access to the crafted file.

Fixes: 11ccc9107dc4 ("ntfs: update runlist handling and cluster allocator")
Suggested-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix variable dereferenced before check ni in ntfs_attr_open()

Smatch warnings:
ntfs_attr_open() warn: variable dereferenced before check 'ni'

Moves the ntfs_debug() call after the NULL pointer checks to ensure safe
access to the structure members.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix default_upcase refcount underflow and UAF on fs_context teardown

ntfs_init_fs_context() allocates a fresh ntfs_volume with vol->upcase
left as NULL. ntfs_free_fs_context() unconditionally calls
ntfs_volume_free() during fs_context teardown, even when ntfs_fill_super()
never ran or already cleaned up. ntfs_volume_free() then executes:

mutex_lock(&ntfs_lock);
if (vol->upcase == default_upcase) {
ntfs_nr_upcase_users--;
vol->upcase = NULL;
}

When the global default_upcase is also NULL (very first mount attempt,
or all prior mounts have released the table), the comparison is
NULL == NULL, and ntfs_nr_upcase_users is decremented even though this
volume never claimed a reference. ntfs_nr_upcase_users is unsigned long,
so the decrement wraps to ULONG_MAX.

A subsequent successful mount can then free the shared table while the
mounted volume still points at it:

  1. ntfs_fill_super() does the temporary ntfs_nr_upcase_users++ at the
     "Generate the global default upcase table if necessary" block. With
     the prior wraparound this brings the counter back to 0.
  2. If the volume's $UpCase matches the default, the match path does
     ntfs_nr_upcase_users++ and sets vol->upcase = default_upcase. The
     counter is now 1.
  3. On the success path, !--ntfs_nr_upcase_users evaluates true and
     default_upcase is kvfree()'d while vol->upcase still points at it.
     Subsequent upcase comparisons through that mount touch freed
     memory.

This was reproduced with KASAN by closing a fresh fsopen("ntfs") context,
then mounting an NTFS image whose $UpCase table matches
generate_default_upcase(), and finally doing a case-insensitive lookup.
KASAN reports the dangling vol->upcase access:

  BUG: KASAN: use-after-free in ntfs_collate_names+0x3b4/0x420
  Read of size 2 at addr ffff888008d40048 by task init/1
   ntfs_collate_names+0x3b4/0x420
   ntfs_lookup_inode_by_name+0x1921/0x3130
   ntfs_lookup+0x193/0xc40
   vfs_statx+0xc7/0x190
   vfs_fstatat+0x4b/0xa0
   __do_sys_newfstatat+0x92/0xf0

The same QEMU reproducer was rerun after this change with KASAN
enabled. It reached "reproducer finished", and the log contained no
KASAN, use-after-free, Oops, or panic signatures.

Guard each comparison with an explicit vol->upcase non-NULL check so a
volume that never took a reference cannot decrement the global users
counter. Apply the same guard to the other default_upcase release sites
so all cleanup paths follow the same ownership rule: only volumes that
actually hold a default_upcase reference may drop one.

Fixes: 1e9ea7e04472 ("Revert "fs: Remove NTFS classic"")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: match ntfs_resident_attr_min_value_length with $AttrDef

Update ntfs_resident_attr_min_value_length() to align with $AttrDef.
The $VOLUME_NAME is allowed to have the  size of 0.

The Windows 11 $AttrDef values are as follows:

Attribute Name             (ID)   Size (Min-Max)  Flags

$STANDARD_INFORMATION      (16)   48-72           Resident
$ATTRIBUTE_LIST            (32)   No Limit        Non-resident
$FILE_NAME                 (48)   68-578          Resident, Index
$OBJECT_ID                 (64)   0-256           Resident
$SECURITY_DESCRIPTOR       (80)   No Limit        Non-resident
$VOLUME_NAME               (96)   2-256           Resident
$VOLUME_INFORMATION        (112)  12-12           Resident
$DATA                      (128)  No Limit        (None)
$INDEX_ROOT                (144)  No Limit        Resident
$INDEX_ALLOCATION          (160)  No Limit        Non-resident
$BITMAP                    (176)  No Limit        Non-resident
$REPARSE_POINT             (192)  0-16384         Non-resident
$EA_INFORMATION            (208)  8-8             Resident
$EA                        (224)  0-65536         (None)
$LOGGED_UTILITY_STREAM     (256)  0-65536         Non-resident

Reported-by: woot000 <woot000@woot000.com>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: avoid use-after-free of index inode in ntfs_inode_sync_filename()

ntfs_inode_sync_filename() walks every FILE_NAME attribute and, for
each one that points at a different parent, opens the parent index
inode with ntfs_iget() and locks index_ni->mrec_lock.  All three error
branches (NInoBeingDeleted, ntfs_index_ctx_get failure, ntfs_index_lookup
failure) drop the parent reference before unlocking:

iput(index_vi);
mutex_unlock(&index_ni->mrec_lock);
continue;

index_ni is NTFS_I(index_vi), so the ntfs_inode (and its mrec_lock) is
embedded in the inode allocation.  If the parent directory is not held
outside the icache - no open dentry, recently evicted from dcache, no
other concurrent lookup - ntfs_iget() returns with i_count == 1 and
our iput() drops the last reference.  evict_inode() then runs and
destroy_inode() schedules the slab object for RCU free, while
mutex_unlock() on the next line is still touching index_ni->mrec_lock.

Swap the order so the mutex is dropped while index_vi is still alive,
matching the success path at the bottom of the loop which already
unlocks before iput().

Reproduced under KASAN with a debug build that forces
ntfs_index_ctx_get() to fail when the parent index inode has been
opened with i_count == 1.  KASAN reports a slab-use-after-free read
on the parent's mrec_lock from mutex_unlock() on the writeback worker:

  BUG: KASAN: slab-use-after-free in __mutex_unlock_slowpath+0xb5/0x970
  Read of size 8 at addr ffff8880014b7598 by task kworker/u8:0/12
  Workqueue: writeback wb_workfn (flush-253:0)
  Call Trace:
   mutex_unlock
   ntfs_inode_sync_filename
   __ntfs_write_inode
   ntfs_write_inode
   __writeback_single_inode

  Allocated by task 103:
   ntfs_alloc_big_inode
   ntfs_iget
   ntfs_lookup
   __x64_sys_mkdir

  Freed by task 12:
   ntfs_free_big_inode
   i_callback
   rcu_do_batch

  Last potentially related work creation:
   call_rcu
   destroy_inode
   evict
   dispose_list
   evict_inodes
   ntfs_inode_sync_filename
   __ntfs_write_inode

  The buggy address belongs to the object at ffff8880014b7440
   which belongs to the cache ntfs_big_inode_cache of size 1800

The freed object is the parent directory inode itself: allocated by
mkdir(2) via ntfs_iget(), then released through call_rcu(i_callback)
that destroy_inode() scheduled when evict_inodes() ran from inside
ntfs_inode_sync_filename().  Re-running the same workload with
mutex_unlock() moved before iput() runs cleanly under KASAN.

Fixes: af0db57d4293 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix copy length in ntfs_bdev_write() for non-page-aligned start

This is not a normal data I/O hot path.  The single in-tree caller is
the $LogFile emptying path used during read-write mount/remount, and
the bug only becomes visible on NTFS volumes whose cluster_size is
strictly smaller than the kernel's PAGE_SIZE (typically 4 KiB on
x86_64).  Per Microsoft's format command documentation, NTFS supports
allocation unit sizes starting at 512 bytes, so 512 B, 1 KiB and 2 KiB
clusters are uncommon but valid on-disk configurations.  When
cluster_size >= PAGE_SIZE every "start" passed in is page-aligned and
the buggy "from != 0" path is never taken.

ntfs_bdev_write() splits the write across one or more block-device
folios.  Inside the loop, "to" is computed as the *end byte offset*
within the current page (0..PAGE_SIZE), and "from" is the start byte
offset within the page (reset to 0 from the second iteration onward).
The copy length should therefore be "to - from", but the current code
uses "to" directly:

to = min_t(u32, end - offset, PAGE_SIZE);
memcpy_to_folio(folio, from, buf + buf_off, to);
buf_off += to;

When "from != 0" (i.e. "start" is not page-aligned) memcpy_to_folio()
copies "from" extra bytes:

  - it reads "from" bytes past the source buffer into kernel heap;
  - it writes "from" bytes past the requested range into the next part
    of the block-device page (or, if "from + to > PAGE_SIZE", past the
    folio boundary entirely, which trips the VM_BUG_ON inside
    memcpy_to_folio() on CONFIG_DEBUG_VM=y kernels).

"buf_off" is then advanced by the wrong amount, so every subsequent
iteration also reads the source buffer at the wrong offset and writes
the wrong content to disk.

ntfs_empty_logfile() calls

ntfs_bdev_write(sb, empty_buf, NTFS_CLU_TO_B(vol, lcn),
vol->cluster_size);

with empty_buf sized to vol->cluster_size.  On a sub-PAGE_SIZE-cluster
volume, any $LogFile run whose LCN is not aligned to
PAGE_SIZE / cluster_size reaches the non-page-aligned path.  The
over-copy can read beyond empty_buf and overwrite the sectors following
the requested cluster in the block-device page with unrelated kernel
heap contents while $LogFile is being emptied.

A userspace reducer of the same arithmetic and copy loop confirms the
bug under AddressSanitizer: ASan reports a heap-buffer-overflow read
past the source buffer for the buggy length, and the fixed version is
ASan-clean.

Compute the copy length as "to - from" and advance buf_off by the same
amount.

Fixes: 5218cd102aec ("ntfs: update misc operations")
Link: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/format
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: wait for sync mft writes to complete

ntfs_sync_mft_mirror() and write_mft_record_nolock() with @sync set
are both documented as synchronous, but neither actually waits for
the bio they submit nor inspects bi_status.  write_inode() can
return success while dirty mft record bytes are still in flight, and
bio errors are silently dropped: the volume is not marked with
errors and the inode is not redirtied.  This breaks fsync()/sync
metadata durability.

Switch ntfs_sync_mft_mirror() and the @sync path of
write_mft_record_nolock() to submit_bio_wait() and propagate the
returned error to the caller.  Capture ntfs_sync_mft_mirror()'s
return value at its call sites in write_mft_record_nolock() so a
mirror write failure surfaces too.

The @sync parameter only controls the main MFT bio.  The !@sync main
submission is therefore unchanged and still uses ntfs_bio_end_io() to
drop the folio reference taken before submission.  The mirror call
has always been documented as performing synchronous I/O regardless
of @sync, so making it actually block restores the originally
intended contract for both @sync and !@sync callers.

Note this only fixes the synchronous mirror/main paths reachable
from write_mft_record_nolock().  The main MFT write submitted from
ntfs_write_mft_block() (the .writepages path) still does not wait
for completion or check bi_status; that requires a larger
restructuring and is left to a follow-up patch.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: capture mft mirror sync errors in ntfs_write_mft_block()

After ntfs_sync_mft_mirror() became able to return real I/O errors,
ntfs_write_mft_block() still discards its return value at the call
site inside the per-record loop. A failed $MFTMirr write therefore
leaves the volume looking clean from the writeback path even though
the on-disk mirror is now stale.

Capture the return value and feed it into the function's existing
@err variable using the same "first error wins" pattern already used
on other failure paths. The error is propagated to the caller and,
via the existing tail of the function, sets NVolErrors so umount and
chkdsk see the volume as inconsistent.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: redirty folio when ntfs_write_mft_block() runs out of memory

ntfs_write_mft_block() is called by writeback_iter() with the folio
locked. When the per-call allocations for @locked_nis or @ref_inos
fail, the function returns -ENOMEM directly without unlocking the
folio. Any later task that needs the folio's lock then stalls, and
the folio's dirty state is silently lost from the writeback
iterator's point of view.

Use folio_redirty_for_writepage() so the folio remains dirty for a
subsequent writeback pass, unlock it, and only then return -ENOMEM
so the caller can propagate the error to fsync()/sync_filesystem().

Fixes: f462fdf3d6a4 ("ntfs: reduce stack usage in ntfs_write_mft_block()")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: use base mft_no when looking up base inode for extent record

When the mft record is an extent record, ntfs_may_write_mft_record()
looks up its base inode in the icache. The hash key passed to
find_inode_nowait() must be the base inode's mft number (na.mft_no,
set just above to MREF_LE(m->base_mft_record)), but the code passes
@mft_no, the extent record's own number.

find_inode_nowait() uses its second argument as the hashval, so the
lookup lands in the wrong bucket and almost always returns NULL.
ntfs_may_write_mft_record() then returns false and the writeback
path (ntfs_write_mft_block()) skips that extent record, leaving the
on-disk copy permanently out of sync with the in-memory one.

The original ilookup5_nowait() call this conversion replaced used
na.mft_no. Restore that.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

dlm: init per node debugfs before add to node hash

Avoiding potential issues when a node is added to the hash but the
debugfs is not NULL or IS_ERR() so a potential iteration over the hash
and debugfs_remove() will not fail like in dlm_midcomms_exit().

However dlm_midcomms_exit() will be called in module init/exit function
and the hash should be empty anyway at those stages. We change the
behavior as cleanup to avoid potential issues.

Reported-by: Ginger <ginger.jzllee@gmail.com>
Closes: https://lore.kernel.org/gfs2/CAGp+u1ZE7UsQ4sSUHBKQXU8x3M_jwK=ek1urSjEtd3jXQGFmVg@mail.gmail.com
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: fix add msg handle in send_queue ordered

In a benchmark scenario triggering a lot of requests that triggers a lot
of DLM messages on the network it can be that the mh->seq is not ordered
according the oldest seq number. This ordering is required by
dlm_receive_ack as "before(mh->seq, seq)" will stop to check for older
sequence numbers that are ordered in the tail of "node->send_queue".

The side effects of not having it correct ordered regarding
"before(mh->seq, seq)" are refcounting issues and use-after free.

I only was able to reproduce this issue in a experimental DLM branch
and a user space DLM benchmark that uses io_uring. After changing this I
don't experienced any refcounting with the sending buffer issues anymore.

Fixes: 489d8e559c659 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: add usercopy whitelist to dlm_cb cache

The dlm_cb slab cache is created with kmem_cache_create(), which
provides no usercopy whitelist. When a callback carries LVB data,
dlm_user_add_ast() copies the LVB into the inline lvbptr[] array within
the slab-allocated struct dlm_callback and redirects ua->lksb.sb_lvbptr
to point to it. copy_result_to_user() then calls copy_to_user() with
this pointer. With CONFIG_HARDENED_USERCOPY enabled, this triggers
usercopy_abort().

Switch to kmem_cache_create_usercopy() with a whitelist covering the
lvbptr field.

Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: use hlist_for_each_entry_srcu for SRCU protected lists

The connection and node hash tables in DLM are protected by SRCU, but
the code currently uses hlist_for_each_entry_rcu() for traversal.
While this works functionally, it is semantically incorrect and triggers
warnings when RCU lockdep debugging is enabled, as it expects regular
RCU read-side critical sections.

This patch replaces the incorrect macros with hlist_for_each_entry_srcu()
and adds the appropriate lockdep expressions using srcu_read_lock_held()
to ensure consistency with the underlying locking mechanism.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

Merge tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes

RISC-V devicetrees fixes for v7.1-rc3

Microchip:
Fix a pinctrl misconfiguration caused by a erratum fixed between
engineering sample and production silicon, that causes settings for one
to not apply to the other.

Starfive:
Remove nodes relating to the "camss" video device that has been deleted
entirely from staging.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: microchip: fix icicle i2c pinctrl configuration
riscv: dts: starfive: jh7110: Drop CAMSS node

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ublk: fix use-after-free in ublk_cancel_cmd()

When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit()
concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a
stale pointer and pass it to io_uring_cmd_done(), causing a
use-after-free.

Fix by synchronizing the two paths with ubq->cancel_lock:

- ublk_cancel_cmd(): read and clear io->cmd under cancel_lock,
  then call io_uring_cmd_done() on the saved local copy outside
  the lock.

- ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit()
  so that io->cmd and io->flags are cleared atomically with respect
  to ublk_cancel_cmd().

Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260508123746.242018-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

batman-adv: bla: put backbone reference on failed claim hash insert

When batadv_bla_add_claim() fails to insert a new claim into the hash, it
leaked a reference to the backbone_gw for which the claim was intended.
Call batadv_backbone_gw_put() on the error path to release the reference
and avoid leaking the backbone_gw object.

Cc: stable@kernel.org
Fixes: 3db0decf1185 ("batman-adv: Fix non-atomic bla_claim::backbone_gw access")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: bla: only purge non-released claims

When batadv_bla_purge_claims() goes through the list of claims, it is only
traversing the hash list with an rcu_read_lock(). Due to a potential
parallel batadv_claim_put(), it can happen that it encounters a claim which
was actually in the process of being released+freed by
batadv_claim_release(). In this case, backbone_gw is set to NULL before the
delayed RCU kfree is started. Calling batadv_bla_claim_get_backbone_gw() is
then no longer allowed because it would cause a NULL-ptr derefence.

To avoid this, only claims with a valid reference counter must be purged.
All others are already taken care of.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: bla: prevent use-after-free when deleting claims

When batadv_bla_del_backbone_claims() removes all claims for a backbone, it
does this by dropping the link entry in the hash list. This list entry
itself was one of the references which need to be dropped at the same time
via batadv_claim_put().

But the batadv_claim_put() must not be done before the last access to the
claim object in this function. Otherwise the claim might be freed already
by the batadv_claim_release() function before the list entry was dropped.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tp_meter: fix tp_num leak on kmalloc failure

When batadv_tp_start() or batadv_tp_init_recv() fail to allocate a new
tp_vars object, the previously incremented bat_priv->tp_num counter is
never decremented. This causes tp_num to drift upward on each allocation
failure. Since only BATADV_TP_MAX_NUM sessions can be started and the count
is never reduced for these failed allocations, it causes to an exhaustion
of throughput meter sessions. In worst case, no new throughput meter
session can be started until the mesh interface is removed.

The error handling must decrement tp_num releasing the lock and aborting
the creation of an throughput meter session

Cc: stable@kernel.org
Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: stop caching unowned originator pointers in BAT IV

BAT IV keeps the last-hop neighbor address in each neigh_node, but some
paths also cache an originator pointer derived from a temporary lookup.
That pointer is not owned by the neigh_node and may no longer refer to a
live originator entry after purge handling runs.

Stop storing the auxiliary originator pointer in the BAT IV neighbor
state. When BAT IV needs the neighbor originator data, resolve it from
the stored neighbor address and drop the reference again after use.

Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
[sven: avoid bonding logic for outgoing OGM]
Signed-off-by: Sven Eckelmann <sven@narfation.org>