git.ipfire.org Git - thirdparty/linux.git/log

]> git.ipfire.org Git - thirdparty/linux.git/log

Joy Zou [Tue, 19 May 2026 11:15:18 +0000 (19:15 +0800)]

arm64: dts: imx91-11x11-evk: add reset gpios for ethernet PHYs

Both the PHYs of the EQOS interface and the FEC interface are supported
to be reset by I2C GPIO expander. So add the support to reset PHYs.

Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Joy Zou [Tue, 19 May 2026 11:15:17 +0000 (19:15 +0800)]

arm64: dts: imx91-11x11-evk: add pinctrl for wdog3 reset

The wdog3 node enables fsl,ext-reset-output to assert an external
reset signal upon watchdog timeout, but lacks pinctrl configuration
for the physical pad.

Without proper pinctrl settings, which could cause the watchdog timeout
to fail to reset the board hardware.

Add pinctrl configuration to ensure the pin is properly muxed and
configured for external watchdog reset functionality.

Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Joy Zou [Tue, 19 May 2026 11:15:16 +0000 (19:15 +0800)]

arm64: dts: imx91-9x9-qsb: add pinctrl for wdog3 reset

The wdog3 node enables fsl,ext-reset-output to assert an external
reset signal upon watchdog timeout, but lacks pinctrl configuration
for the physical pad.

Without proper pinctrl settings, which could cause the watchdog timeout
to fail to reset the board hardware.

Add pinctrl configuration to ensure the pin is properly muxed and
configured for external watchdog reset functionality.

Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Joy Zou [Tue, 19 May 2026 11:15:15 +0000 (19:15 +0800)]

arm64: dts: imx91-9x9-qsb: remove unused property clock-frequency from mdio node

The clock-frequency property is not implemented. Remove it to clean up the
device tree.

Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Alice Guo [Tue, 19 May 2026 10:55:16 +0000 (18:55 +0800)]

arm64: dts: freescale: add bootph-all to watchdog nodes for i.MX platforms

Add the bootph-all property to ULP watchdog nodes across multiple i.MX
SoC device trees, ensuring the watchdog is available during all boot
phases.

The affected watchdog nodes are:
- imx8ulp: wdog3
- imx91/93: wdog3, wdog4, wdog5
- imx94: wdog3
- imx95: wdog3
- imx952: wdog3

Signed-off-by: Alice Guo <alice.guo@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Alice Guo [Tue, 19 May 2026 10:55:15 +0000 (18:55 +0800)]

arm64: dts: imx94: fix DDR PMU interrupt number

The DDR Performance Monitor node was added with incorrect interrupt
number 91, which actually belongs to the wdog4 watchdog. Fix it to the
correct interrupt number 374.

Fixes: e918e5f847b3 ("arm64: dts: imx94: add DDR Perf Monitor node")
Signed-off-by: Alice Guo <alice.guo@nxp.com>
Reviewed-by: Xu Yang <xu.yang_2@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Khristine Andreea Barbulescu [Thu, 14 May 2026 08:26:39 +0000 (10:26 +0200)]

arm64: dts: s32g: add SAR ADC support for s32g2 and s32g3

Add ADC0 and ADC1 for S32G2 and S32G3 SoCs.

Signed-off-by: Khristine Andreea Barbulescu <khristineandreea.barbulescu@oss.nxp.com>
Reviewed-by: Enric Balletbo i Serra <eballetb@redhat.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Sherry Sun [Tue, 19 May 2026 05:54:31 +0000 (13:54 +0800)]

arm64: dts: imx943-evk: Fix PCIe EP vpcie-supply

The vpcie-supply property should reference the regulator that controls
the actual M.2 power supply, not the W_DISABLE1# signal.
On imx943-evk:
- reg_m2_wlan controls M.2 W_DISABLE1# signal
- reg_m2_pwr controls the actual M.2 power supply

Fix the vpcie-supply to use reg_m2_pwr for proper power control in
PCIe endpoint mode.

Fixes: 1962c596d51c ("arm64: dts: imx943-evk: Add pcie[0,1] and pcie-ep[0,1] support")
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Khristine Andreea Barbulescu [Mon, 18 May 2026 06:35:47 +0000 (08:35 +0200)]

arm64: dts: s32g: add PIT support for s32g2 and s32g3

Add PIT0 and PIT1 for S32G2 and S32G3 SoCs

Signed-off-by: Khristine Andreea Barbulescu <khristineandreea.barbulescu@oss.nxp.com>
Reviewed-by: Enric Balletbo i Serra <eballetb@redhat.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Frieder Schrempf [Wed, 13 May 2026 13:25:31 +0000 (15:25 +0200)]

arm64: dts: imx8mp-kontron: Reduce EERAM SPI clock frequency

There is an onboard level shifter for the SPI signals that causes
additional propagation delay and renders the SPI transmission
unreliable at 20 MHz. Reduce the clock frequency to a safe value.

Fixes: 946ab10e3f40 ("arm64: dts: Add support for Kontron OSM-S i.MX8MP SoM and BL carrier board")
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Sherry Sun [Wed, 22 Apr 2026 09:35:49 +0000 (17:35 +0800)]

arm64: dts: imx95: Add Root Port node and PERST property

Since describing the PCIe PERST# property under Host Bridge node is now
deprecated, it is recommended to add it to the Root Port node, so
creating the Root Port node and add the reset-gpios property in Root
Port.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Sherry Sun [Wed, 22 Apr 2026 09:35:48 +0000 (17:35 +0800)]

arm64: dts: imx8dxl/qm/qxp: Add Root Port node and PERST property

Since describing the PCIe PERST# property under Host Bridge node is now
deprecated, it is recommended to add it to the Root Port node, so
creating the Root Port node and add the reset-gpios property in Root
Port.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Sherry Sun [Wed, 22 Apr 2026 09:35:47 +0000 (17:35 +0800)]

arm64: dts: imx8mq: Add Root Port node and PERST property

Since describing the PCIe PERST# property under Host Bridge node is now
deprecated, it is recommended to add it to the Root Port node, so
creating the Root Port node and add the reset-gpios property in Root
Port.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Sherry Sun [Wed, 22 Apr 2026 09:35:46 +0000 (17:35 +0800)]

arm64: dts: imx8mp: Add Root Port node and PERST property

Since describing the PCIe PERST# property under Host Bridge node is now
deprecated, it is recommended to add it to the Root Port node, so
creating the Root Port node and add the reset-gpios property in Root
Port.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Sherry Sun [Wed, 22 Apr 2026 09:35:45 +0000 (17:35 +0800)]

arm64: dts: imx8mm: Add Root Port node and PERST property

Since describing the PCIe PERST# property under Host Bridge node is now
deprecated, it is recommended to add it to the Root Port node, so
creating the Root Port node and add the reset-gpios property in Root
Port.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Marek Vasut [Thu, 26 Mar 2026 04:43:21 +0000 (05:43 +0100)]

arm64: dts: imx8mp: Add DT overlays for DH i.MX8M Plus DHCOM SoM and boards

Add DT overlays to support DH i.MX8M Plus DHCOM SoM variants and carrier
board expansion modules. The following DT overlays are implemented:
- SoM:
  - DH 660-x00 SoM with 1xRMII PHY
  - DH 660-x00 SoM with 2xRMII PHY
- PDK2:
  - DH 505-200 Display board in edge connector X12 via direct LVDS
  - DH 531-100 SPI/I2C board in header X21
  - DH 531-200 SPI/I2C board in header X22
  - DH 560-200 Display board in edge connector X12
- PDK3:
  - DH 505-200 Display board in edge connector X36 via direct LVDS
  - DH 531-100 SPI/I2C board in header X40
  - DH 531-200 SPI/I2C board in header X41
  - DH 560-300 Display board in edge connector X36
  - EA muRata 2AE M.2 A/E-Key card in connector X20
  - NXP SPF-29853-C1 MINISASTOCSI with OV5640 sensor in connector X31
  - NXP SPF-29853-C1 MINISASTOCSI with OV5640 sensor in connector X29
- PicoITX:
  - DH 626-100 Display board in edge connector X2

Signed-off-by: Marek Vasut <marex@nabladev.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

commit | commitdiff | tree

Jacob Moroni [Thu, 4 Jun 2026 15:41:04 +0000 (15:41 +0000)]

RDMA/irdma: Initialize iwmr->access during MR registration

Initialize iwmr->access during initial user mem registration so
that it contains a valid value during a subsequent rereg_mr.

Otherwise, a rereg_mr that doesn't set IB_MR_REREG_ACCESS (for
example, one that only changes the PD) ends up clearing the
access flags in HW since iwmr->access is zero-initialized, which
is not intended.

Fixes: 5ac388db27c4 ("RDMA/irdma: Add support to re-register a memory region")
Link: https://patch.msgid.link/r/20260604154104.4035581-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Jacob Moroni [Tue, 2 Jun 2026 21:44:23 +0000 (21:44 +0000)]

RDMA/irdma: Fix OOB read during CQ MR registration

Sashiko pointed out an unrelated bug during a previous patch:
https://sashiko.dev/#/patchset/20260512183852.614045-1-jmoroni%40google.com

This change fixes the bug by eliminating the cqmr->split field which
was not being set properly and instead just checks the CQ resize
feature flag directly.

The cqmr->split field essentially tracks whether IRDMA_FEATURE_CQ_RESIZE
is set, but it was not being set until CQ creation time, which is _after_
CQ memory registration (the only other place where it is referenced).

As a result, it would always be false during MR registration and would
therefore cause irdma_handle_q_mem to populate cqmr->shadow even for GEN_2
HW and beyond:

cqmr->shadow = (dma_addr_t)arr[req->cq_pages];

The issue is that for GEN_2 and beyond, req->cq_pages may be exactly equal
to iwmr->page_cnt and therefore equal to the size of arr, which would cause
an OOB read by one.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://patch.msgid.link/r/20260602214423.1315105-2-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Jacob Moroni [Tue, 2 Jun 2026 21:44:22 +0000 (21:44 +0000)]

RDMA/irdma: Remove redundant legacy_mode checks

The driver has the following invariants:

1. legacy_mode is only allowed on GEN_1 hardware (enforced
   in irdma_alloc_ucontext).

2. GEN_1 hardware does not set IRDMA_FEATURE_CQ_RESIZE or
   IRDMA_FEATURE_RTS_AE. These feature flags are only set
   for GEN_2 and GEN_3 hardware.

Therefore, legacy_mode is always false if IRDMA_FEATURE_CQ_RESIZE
or IRDMA_FEATURE_RTS_AE is set, so remove the redundant checks.

Link: https://patch.msgid.link/r/20260602214423.1315105-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Michael Bommarito [Tue, 2 Jun 2026 19:47:00 +0000 (15:47 -0400)]

RDMA/siw: bound Read Response placement to the RREAD length

In drivers/infiniband/sw/siw/siw_qp_rx.c, siw_proc_rresp() places each
inbound Read Response DDP segment at sge->laddr + wqe->processed and then
accumulates wqe->processed, but it never checks the running total against
the sink buffer length on continuation segments. siw_check_sge() resolves
and validates the sink memory only on the first fragment (the if (!*mem)
branch), and siw_rresp_check_ntoh() compares the cumulative length against
wqe->bytes only on the final segment (the !frx->more_ddp_segs guard).

A connected siw peer that answers an outstanding RREAD with Read Response
segments that keep the DDP Last flag clear, carrying more total payload
than the RREAD requested, drives wqe->processed past the validated sink
buffer; the next siw_rx_data() call writes out of bounds at
sge->laddr + wqe->processed. siw runs iWARP over ordinary routable TCP,
so the peer is the remote end of an established RDMA connection and needs
no local privilege.

Bound every segment before placement, exactly as siw_proc_send() and
siw_proc_write() already do for their tagged and untagged paths, and
terminate the connection with a base-or-bounds DDP error when the
Read Response would overrun the sink buffer.

This is the second receive-path length fix for this file. A separate
change rejects an MPA FPDU length that underflows the per-fragment
remainder in the header decode; that guard does not cover this case,
because here each individual segment length is self-consistent and only
the accumulated placement offset overruns the buffer.

Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Link: https://patch.msgid.link/r/20260602194700.2273758-1-michael.bommarito@gmail.com
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Paolo Bonzini [Fri, 5 Jun 2026 16:54:37 +0000 (18:54 +0200)]

Merge tag 'kvmarm-fixes-7.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.1, take #5

- Correctly drop the ITS translation cache reference when it actually
  gets invalidated

- Take the SRCU lock for SW page table walks

- Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming
  inaccessible from EL0 after running a guest

- Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init
  and MMU notifiers are mutually exclusive

- Correctly handle FEAT_XNX at stage-2

commit | commitdiff | tree

Junrui Luo [Tue, 2 Jun 2026 08:58:48 +0000 (16:58 +0800)]

vfio: prevent infinite loop in vfio_mig_get_next_state() on blocked arc

vfio_mig_get_next_state() walks vfio_from_fsm_table[] one step at a time,
looping to skip optional states the device does not support until
*next_fsm is supported. A blocked transition is encoded as
VFIO_DEVICE_STATE_ERROR, which the trailing return reports as -EINVAL.

The skip loop does not account for the ERROR sentinel.
state_flags_table[ERROR] is ~0U and vfio_from_fsm_table[ERROR][*] is
ERROR, so once *next_fsm becomes ERROR the loop condition stays true and
*next_fsm never changes. The blocked arcs STOP_COPY -> PRE_COPY and
STOP_COPY -> PRE_COPY_P2P map to ERROR yet pass the support check on a
precopy-capable device, causing the loop to spin forever while holding
the driver state mutex. This can result in a soft lockup, and a panic
with softlockup_panic set.

Terminate the skip loop on the ERROR sentinel so a blocked transition
falls through to the existing return and reports -EINVAL.

Fixes: 4db52602a607 ("vfio: Extend the device migration protocol with PRE_COPY")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/SYBPR01MB7881290BBDE79B61AE6A017FAF122@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

commit | commitdiff | tree

Ankit Agrawal [Tue, 2 Jun 2026 06:30:15 +0000 (06:30 +0000)]

vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC

Add a CXL DVSEC-based readiness check for Blackwell-Next GPUs alongside
the existing legacy BAR0 polling path. The CXL Device DVSEC offset is
discovered at probe time. Probe, fault and read/write paths then branch
on that to use either the legacy BAR0 polling or the CXL DVSEC polling.

The CXL path polls Memory_Active, requiring MEM_INFO_VALID within 1s and
MEM_ACTIVE within Memory_Active_Timeout (up to 256s) as per CXL spec r4.0
sec 8.1.3.8.2. Given the long worst-case wait, the CXL poll runs outside
memory_lock with only a quick readiness check is done under the lock.

The poll loops sleep with schedule_timeout_killable() and return -EINTR
on a fatal signal. This avoids hung-task panics during the long
uninterruptible wait. Extend this to the legacy based wait as well for
improvement.

In the fault handler the wait runs locklessly before memory_lock. If a
reset races in, the in-lock recheck returns -EAGAIN and the wait is
retried rather than returning a spurious VM_FAULT_SIGBUS.

Add PCI_DVSEC_CXL_MEM_ACTIVE_TIMEOUT to pci_regs.h for the timeout field.

Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260602063015.3915-1-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

commit | commitdiff | tree

Li RongQing [Mon, 1 Jun 2026 09:58:18 +0000 (05:58 -0400)]

RDMA/mlx5: Fix state and counter desync on loopback enable failure

In mlx5_ib_enable_lb(), dev->lb.enabled was unconditionally set
to true even if mlx5_nic_vport_update_local_lb() failed.

Fix this by only setting dev->lb.enabled on success. On failure,
roll back the reference counters and return the error.

Link: https://patch.msgid.link/r/20260601095818.2227-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Li RongQing [Mon, 1 Jun 2026 09:56:54 +0000 (05:56 -0400)]

RDMA/mlx5: Fix error propagation in __mlx5_ib_add

__mlx5_ib_add() currently returns -ENOMEM on any stage initialization
failure, losing the actual error code returned by the init function.
This makes it impossible for callers to distinguish between different
failure reasons (e.g. -EINVAL, -EIO, -EOPNOTSUPP) and leads to
misleading error handling.

Fix it by returning the actual error code stored in 'err'.

Link: https://patch.msgid.link/r/20260601095654.2178-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 5 Jun 2026 16:34:14 +0000 (09:34 -0700)]

Merge tag 'nfs-for-7.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fix from Trond Myklebust:

- Fix a use after free in nfs_write_completion

* tag 'nfs-for-7.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: write_completion: dereference loop-local req, not hdr->req

commit | commitdiff | tree

Oliver Hartkopp [Fri, 29 May 2026 15:23:59 +0000 (17:23 +0200)]

ALSA: hda: fix Kconfig dependency of HD Audio PCI

With commit 2d9223d2d64c ("ALSA: hda: Move controller drivers into
sound/hda/controllers directory") the HD Audio drivers have been moved
from linux/sound/pci/hda to linux/sound/hda.

But the Kconfig dependency for SND_HDA_INTEL stayed on SND_PCI instead of
depending on PCI directly. To make the "HD Audio PCI" configuration entry
visible it is currently needed to enable "PCI sound devices" although
no PCI device in the submenu needs to be selected.

Make SND_HDA_INTEL directly depending on hardware/architecture like the
other entries in this Kconfig.

Fixes: 2d9223d2d64c ("ALSA: hda: Move controller drivers into sound/hda/controllers directory")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260529-hda-kconfig-v1-1-4a2c6a0efd56@hartkopp.net
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Lizhi Hou [Thu, 4 Jun 2026 19:54:59 +0000 (12:54 -0700)]

accel/amdxdna: Require carveout when PASID and force_iova are disabled

When both PASID and force_iova are disabled, carveout memory should be
used. Reject buffer allocations that cannot use carveout memory in this
configuration and return an error.

Fixes: 3cc5d7a59519 ("accel/amdxdna: Add carveout memory support for non-IOMMU systems")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260604195459.2423279-1-lizhi.hou@amd.com

commit | commitdiff | tree

Yong-Xuan Wang [Mon, 1 Jun 2026 10:26:26 +0000 (03:26 -0700)]

KVM: riscv: selftests: Split SBI FWFT into separate feature-specific sublists

Divide the monolithic SBI FWFT (Firmware Features) register list into
separate sublists, each testing a specific FWFT feature independently
with proper dependency checking.

Previously, all FWFT features were tested together in a single sublist.
This caused issues because:
1. Not all FWFT features are available on all platforms
2. Some features depend on specific ISA extensions (e.g., pointer_masking
requires Smnpm)
3. Tests would fail if any single feature was unavailable

Add the feature-specific SBI FWFT sublists with the following
improvements:
- Add check_fwft_feature() helper to verify FWFT feature availability
at runtime
- Update filter_reg() to handle per-feature FWFT register filtering

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-5-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>

commit | commitdiff | tree

Yong-Xuan Wang [Mon, 1 Jun 2026 10:26:25 +0000 (03:26 -0700)]

KVM: riscv: selftests: Refactor ISA and SBI extension sublist macros

Refactor the get-reg-list test to use unified sublist macros for ISA
and SBI extensions, eliminating code duplication and improving
maintainability.

Previously, each extension had its own hand-coded sublist definition
(e.g., SUBLIST_ZICBOM, SUBLIST_AIA, etc.) and the config structures
repeated the same pattern. This made the code verbose and error-prone.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-4-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>

commit | commitdiff | tree

Yong-Xuan Wang [Mon, 1 Jun 2026 10:26:24 +0000 (03:26 -0700)]

KVM: RISC-V: SBI FWFT: Fix stale feature exposure after runtime extension changes

Fix a bug where FWFT features could be incorrectly exposed to guests
after userspace disables their dependent ISA extensions at runtime.

The 'supported' field in kvm_sbi_fwft_config was set once during vCPU
initialization based on the initial hardware/extension availability.
However, when userspace subsequently disables ISA extensions via the KVM
ONE_REG interface, the 'supported' field was not updated. This caused
the following issues:
1. FWFT features would remain visible and accessible to guests even
   after their prerequisite ISA extensions were disabled
2. Guests could configure FWFT features that depend on disabled
   extensions, leading to undefined behavior
3. The static 'supported' flag and the dynamic supported() callback
   could disagree about feature availability

The fix introduces a two-layer checking mechanism:
1. Add an optional init() callback to the kvm_sbi_fwft_feature structure
   for features that require hardware probing during initialization. This
   separates the one-time hardware detection logic from the runtime
   availability check.
2. Add runtime checks in all FWFT-related functions that call
   feature->supported(vcpu) if the callback exists. This ensures feature
   availability is re-evaluated based on the current ISA extension state.

This approach maintains the cached 'supported' field for initialization-
time decisions while ensuring runtime availability is always determined
by the current vCPU configuration, not initialization-time snapshots.

Fixes: 6b72fd170592 ("RISC-V: KVM: add support for FWFT SBI extension")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-3-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>

commit | commitdiff | tree

Yong-Xuan Wang [Mon, 1 Jun 2026 10:26:23 +0000 (03:26 -0700)]

KVM: RISC-V: SBI FWFT: Add optional init() callback for hardware probing

Add an optional init() callback to separate one-time hardware probing
from runtime availability checks. For pointer masking, this allows
probing supported PMM lengths during initialization while checking ISA
extension availability at runtime.

Fix try_to_set_pmm() to restore the previous HENVCFG.PMM value after
probing, preventing side effects from hardware detection. Add preemption
protection to ensure CSR probe sequences complete atomically on the same
CPU.

Fixes: 6f576fc0aeb9 ("RISC-V: KVM: Add support for SBI_FWFT_POINTER_MASKING_PMLEN")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-2-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>

commit | commitdiff | tree

Yong-Xuan Wang [Mon, 1 Jun 2026 10:26:22 +0000 (03:26 -0700)]

KVM: RISC-V: SBI FWFT: Mark vCPU CSRs dirty after setting feature value

Mark the vCPU CSRs as dirty after successfully setting an FWFT feature
value. FWFT features may modify CSRs (e.g., pointer masking modifies
henvcfg.PMM), and failing to mark them dirty can lead to the guest
observing stale CSR state after vCPU scheduling or migration.

Fixes: 1323a5cfe52c ("KVM: riscv: Skip CSR restore if VCPU is reloaded on the same core")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-1-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>

commit | commitdiff | tree

Jason Gunthorpe [Tue, 2 Jun 2026 19:37:28 +0000 (16:37 -0300)]

IB/cm: Fix av cm device leak on an error path in cm_init_av_by_path()

Codex pointed out that cm_init_av_by_path() can call cm_set_av_port()
which takes a reference on the cm device, but then can immediately return
error if ib_init_ah_attr_from_path() fails.

Since callers like ib_send_cm_req() put the av on the stack this leaks
that cm device reference.

Re-order cm_init_av_by_path() so it doesn't touch the av until it has done
all its failable work, and then update the av in one shot so it is either
left alone or fully init'd.

Sashiko also pointed out that the cm_destroy_av() prior to
cm_init_av_by_path() is harmful as it leaves the AV broken in the error
case and thus the REJ won't send. Since cm_init_av_by_path() is now atomic
it is safe to delete the cm_destroy_av(). On succees the av from
cm_init_av_for_response() is cleaned up by cm_init_av_by_path(), on
failure the 'goto rejected' guarentees the av is destroyed during
ib_destroy_cm_id().

Fixes: 76039ac9095f ("IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock")
Link: https://patch.msgid.link/r/0-v1-38292501f539+14f-ib_cm_av_leak_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Sriharsha Basavapatna [Tue, 2 Jun 2026 14:56:18 +0000 (20:26 +0530)]

RDMA/bnxt_re: Update create_qp to use QP buffer umem attrs

Use ib_umem_get_attr_or_va() helper to pin QP buffer umems.
Pass attribute ids SQ_BUF_UMEM and RQ_BUF_UMEM for respective
buffers.

Link: https://patch.msgid.link/r/20260602145618.21643-1-sriharsha.basavapatna@broadcom.com
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Arnd Bergmann [Tue, 2 Jun 2026 14:04:34 +0000 (16:04 +0200)]

RDMA/hfi1: Open-code rvt_set_ibdev_name()

clang warns about a function missing a printf attribute:

include/rdma/rdma_vt.h:457:47: error: diagnostic behavior may be improved by adding the 'format(printf, 2, 3)' attribute to the declaration of 'rvt_set_ibdev_name' [-Werror,-Wmissing-format-attribute]
  447 | static inline void rvt_set_ibdev_name(struct rvt_dev_info *rdi,
      | __attribute__((format(printf, 2, 3)))
  448 |                                       const char *fmt, const char *name,
  449 |                                       const int unit)

The helper was originally added as an abstraction for the hfi1 and
qib drivers needing the same thing, but now qib is gone, and hfi1
is the only remaining user of rdma_vt.

Avoid the warning and allow the compiler to check the format string by
open-coding the helper and directly assigning the device name.

Fixes: 5084c8ff21f2 ("IB/{rdmavt, hfi1, qib}: Self determine driver name")
Link: https://patch.msgid.link/r/20260602140453.3542427-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Jason Gunthorpe [Mon, 1 Jun 2026 16:52:33 +0000 (13:52 -0300)]

RDMA/umem: Make ib_umem_is_contiguous() safe on 32 bit

Sashiko points out the roundup_pow_of_two() only uses unsigned long but
dma_addr_t can be u64.

Change this algorithm to be simpler, compute the page size, if any page
size is found and it results in a single block then it is contiguous.

Link: https://patch.msgid.link/r/3-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Jason Gunthorpe [Mon, 1 Jun 2026 16:52:32 +0000 (13:52 -0300)]

RDMA/umem: Be careful about boundary conditions in ib_umem_find_best_pgsz()

Several corner cases, especially important on 32 bits:

- umem->iova is u64, the function argument should pass in u64 or
iova will be truncated
- Check that the length is not too large for the iova
- Check that lengths > 4G don't overflow the GENMASK

Link: https://patch.msgid.link/r/2-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 5 Jun 2026 15:34:32 +0000 (08:34 -0700)]

Merge tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:
"A collection of fixes mostly for the RT device, including a small
  refactor that has no functional change"

* tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Remove mention of PageWriteback
  xfs: abort mount if xfs_fs_reserve_ag_blocks fails
  xfs: factor rtgroup geom write pointer reporting into a helper
  xfs: drop the RTG reference later in xfs_ioc_rtgroup_geometry
  xfs: fix rtgroup cleanup in CoW fork repair
  xfs: fix error returns in CoW fork repair
  xfs: fix overlapping extents returned for pNFS LAYOUTGET
  xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
  xfs: handle racing deletions in xfs_zone_gc_iter_irec

commit | commitdiff | tree

Linus Torvalds [Fri, 5 Jun 2026 15:28:10 +0000 (08:28 -0700)]

Merge tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

- Fix a UAF of sbi->sync_decompress when compressed I/Os
   race with unmount

- Fix a regression introduced this development cycle that
   incorrectly rejects multiple-algorithm images

* tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix EFSCORRUPTED on multi-algorithm images in z_erofs_map_sanity_check()
  erofs: fix use-after-free on sbi->sync_decompress

commit | commitdiff | tree

Linus Torvalds [Fri, 5 Jun 2026 15:23:02 +0000 (08:23 -0700)]

Merge tag 'v7.1-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- Fix use after free in SMB2_CANCEL

- Fix race in ksmbd_reopen_durable_fd

- Fix oplock and lease break potential NULL-dref

* tag 'v7.1-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix use-after-free of a deferred file_lock on double SMB2_CANCEL
  ksmbd: fix durable reconnect double-bind race in ksmbd_reopen_durable_fd
  ksmbd: fix NULL-deref of opinfo->conn in oplock/lease break notifiers

commit | commitdiff | tree

Tejun Heo [Mon, 1 Jun 2026 18:37:28 +0000 (08:37 -1000)]

bpf: Replace scratch PTE atomically when allocating arena pages

apply_range_set_cb() maps the pages for a new arena allocation and returned
-EBUSY when the target PTE was already populated. Kernel-fault recovery
leaves the per-arena scratch page in unallocated arena PTEs, so a later
bpf_arena_alloc_pages() over such a page hits that -EBUSY, and every
subsequent allocation of it fails the same way. Allocation must install the
real page over scratch instead.

Overwriting the scratch PTE in place is a valid->valid change, which arm64
forbids without break-before-make. Route through an invalid entry instead:
ptep_try_set() fills only a none slot, so the PTE goes scratch->none->page.
On finding scratch, clear it and flush_tlb_before_set() before retrying. The
new flush_tlb_before_set() is a no-op except on arches like arm64 that need
the break-before-make TLB invalidate. The loop also copes with a concurrent
fault re-scratching the slot.

Arches without ptep_try_set() never install the scratch page, so keep the
must-be-empty check and set_pte_at() for them.

Fixes: dc11a4dba246 ("bpf: Recover arena kernel faults with scratch page")
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260601183728.1800490-1-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Zhenghang Xiao [Sat, 30 May 2026 20:45:28 +0000 (21:45 +0100)]

misc: fastrpc: fix use-after-free race in fastrpc_map_create

fastrpc_map_lookup returns a raw pointer after releasing fl->lock. The
caller fastrpc_map_create then calls fastrpc_map_get (kref_get_unless_zero)
on this unprotected pointer. A concurrent MEM_UNMAP can free the map
between the lock release and the kref operation, resulting in a
use-after-free on the freed slab object.

Restore the take_ref parameter to fastrpc_map_lookup so the reference
is acquired atomically under fl->lock before the pointer is exposed to
the caller.

Fixes: 10df039834f8 ("misc: fastrpc: Skip reference for DMA handles")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-5-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Mukesh Ojha [Sat, 30 May 2026 20:45:27 +0000 (21:45 +0100)]

misc: fastrpc: Fix NULL pointer dereference in rpmsg callback

A NULL pointer dereference was observed on Hawi at boot when the DSP
sends a glink message before fastrpc_rpmsg_probe() has completed
initialization:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000178
  pc : _raw_spin_lock_irqsave+0x34/0x8c
  lr : fastrpc_rpmsg_callback+0x3c/0xcc [fastrpc]
  ...
  Call trace:
   _raw_spin_lock_irqsave+0x34/0x8c (P)
   fastrpc_rpmsg_callback+0x3c/0xcc [fastrpc]
   qcom_glink_native_rx+0x538/0x6a4
   qcom_glink_smem_intr+0x14/0x24 [qcom_glink_smem]

The faulting address 0x178 corresponds to the lock variable inside
struct fastrpc_channel_ctx, confirming that cctx is NULL when
fastrpc_rpmsg_callback() attempts to take the spinlock.

There are two issues here. First, dev_set_drvdata() is called before
spin_lock_init() and idr_init(), leaving a window where the callback
can retrieve a valid cctx pointer but operate on an uninitialized
spinlock. Second, the rpmsg channel becomes live as soon as the driver
is bound, so fastrpc_rpmsg_callback() can fire before dev_set_drvdata()
is called at all, resulting in dev_get_drvdata() returning NULL.

Fix both issues by moving all cctx initialization ahead of
dev_set_drvdata() so the structure is fully initialized before it
becomes visible to the callback, and add a NULL check in
fastrpc_rpmsg_callback() as a guard against any remaining window.

Fixes: f6f9279f2bf0 ("misc: fastrpc: Add Qualcomm fastrpc basic driver model")
Cc: stable@vger.kernel.org
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-4-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Junrui Luo [Sat, 30 May 2026 20:45:26 +0000 (21:45 +0100)]

misc: fastrpc: fix DMA address corruption due to find_vma misuse

fastrpc_get_args() uses find_vma() to look up the VMA for a user-provided
pointer and compute a DMA address offset. When the address falls in a gap
before the returned VMA, (ptr & PAGE_MASK) - vma->vm_start underflows,
corrupting the DMA address sent to the DSP.

Replace find_vma() with vma_lookup(), which returns NULL when the address
is not contained within any VMA.

Cc: stable@vger.kernel.org
Fixes: 80f3afd72bd4 ("misc: fastrpc: consider address offset before sending to DSP")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-3-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Anandu Krishnan E [Sat, 30 May 2026 20:45:25 +0000 (21:45 +0100)]

misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context

There is a race between fastrpc_device_release() and the workqueue
that processes DSP responses. When the user closes the file descriptor,
fastrpc_device_release() frees the fastrpc_user structure. Concurrently,
an in-flight DSP invocation can complete and fastrpc_rpmsg_callback()
schedules context cleanup via schedule_work(&ctx->put_work). If the
workqueue runs fastrpc_context_free() in parallel with or after
fastrpc_device_release() has freed the user structure, it dereferences
the freed fastrpc_user. Depending on the state of the context at the
time of the race, any one of the following accesses can be hit:

1. fastrpc_buf_free() calls fastrpc_ipa_to_dma_addr(buf->fl->cctx, ...)
    to strip the SID bits from the stored IOVA before passing the
    physical address to dma_free_coherent().

2. fastrpc_free_map() reads map->fl->cctx->vmperms[0].vmid to
    reconstruct the source permission bitmask needed for the
    qcom_scm_assign_mem() call that returns memory from the DSP VM
    back to HLOS.

3. fastrpc_free_map() acquires map->fl->lock to safely remove the
    map node from the fl->maps list.

The resulting use-after-free manifests as:

  pc : fastrpc_buf_free+0x38/0x80 [fastrpc]
  lr : fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_put_wq+0x78/0xa0 [fastrpc]
  process_one_work+0x180/0x450
  worker_thread+0x26c/0x388

Add kref-based reference counting to fastrpc_user. Have each invoke
context take a reference on the user at allocation time and release it
when the context is freed. Release the initial reference in
fastrpc_device_release() at file close. Move the teardown of the user
structure — freeing pending contexts, maps, mmaps, and the channel
context reference — into the kref release callback fastrpc_user_free(),
so that it runs only when the last reference is dropped, regardless of
whether that happens at device close or after the final in-flight
context completes.

Fixes: 6cffd79504ce ("misc: fastrpc: Add support for dmabuf exporter")
Cc: stable@kernel.org
Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-2-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

DaeMyung Kang [Sat, 30 May 2026 14:35:12 +0000 (23:35 +0900)]

ntfs: validate resident volume name values on lookup

The shared lookup-time attribute validator now has a safe caller path for
$VOLUME_NAME corruption: ntfs_write_volume_label() no longer treats
lookup errors as an absent label, and the mount path reinitializes its
search context before continuing to $VOLUME_INFORMATION.

Add $VOLUME_NAME-specific resident value validation. A volume name is
stored as a UTF-16LE string, so reject odd byte lengths, and reject
values longer than the NTFS volume label limit. Empty labels remain
valid.

Also reject non-resident $VOLUME_NAME records. $VOLUME_NAME is required
to be resident, like $FILE_NAME; a crafted non-resident record would
otherwise pass lookup and ntfs_write_volume_label() would remove it as if
it were a normal resident attribute.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

DaeMyung Kang [Sat, 30 May 2026 14:35:11 +0000 (23:35 +0900)]

ntfs: reinit search context before volume information lookup

On mount the volume inode is searched for $VOLUME_NAME and then, reusing
the same search context, for $VOLUME_INFORMATION. The $VOLUME_NAME lookup
is optional and its result is otherwise ignored.

Once lookup-time validation can reject a corrupt $VOLUME_NAME with -EIO,
the search context is left in an undefined state: ntfs_attr_find()
documents that on an actual error @ctx->attr is undefined. Continuing the
$VOLUME_INFORMATION search from that context is not contractually valid.

Reinitialize the search context before the $VOLUME_INFORMATION lookup so
it always starts from a well-defined state regardless of the
$VOLUME_NAME lookup outcome.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

DaeMyung Kang [Sat, 30 May 2026 14:35:10 +0000 (23:35 +0900)]

ntfs: do not replace volume name after lookup errors

ntfs_write_volume_label() removes an existing $VOLUME_NAME attribute and
then adds the replacement. The old code only distinguished lookup success
from all other results, so any lookup error was treated like an absent
label and the add path still ran.

That is unsafe once lookup-time validation rejects corrupt $VOLUME_NAME
records with -EIO: the corrupt record would remain in place and a second
$VOLUME_NAME record could be appended next to it.

Only add the replacement after the old label was removed successfully or
after lookup returned -ENOENT. Propagate all other lookup errors, and
also stop if removing the old attribute fails.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

DaeMyung Kang [Sat, 30 May 2026 14:35:09 +0000 (23:35 +0900)]

ntfs: validate attribute values on lookup

ntfs_attr_find() and ntfs_external_attr_find() check that generic
resident attribute values fit in their attribute records and that
fixed-size resident values are large enough. For variable-length resident
formats, however, the fixed part is not enough: embedded length fields
can still point callers past the resident value.

A crafted image can set a small resident $FILE_NAME value_length while
leaving file_name_length large. Callers then trust file_name_length and
read past the resident value when converting or comparing the name. This
was reproduced with a crafted image under KASAN as a slab-out-of-bounds
read from the kmalloc-1k MFT record copy. The stack included
ntfs_lookup(), ntfs_iget(), ntfs_read_locked_inode(), ntfs_attr_name_get(),
ntfs_ucstonls(), and utf16s_to_utf8s().

Add a shared attribute value validator and use it before a lookup path
can return an attribute, including the AT_UNUSED enumeration case where
callers inspect returned attributes directly. The helper validates
resident value bounds, minimum resident value sizes, variable-length
$FILE_NAME fields, and non-resident mapping-pairs metadata that was
previously checked separately in both lookup paths.

This also preserves the intended resident @val matching semantics in the
external attribute lookup path. The old duplicated validation block
overwrote the actual resident value length with the type-specific minimum
length before comparing @val, so variable-length resident values could
fail to match even when the bytes were identical. Keep the comparison on
the actual value length, and make ntfs_attrlist_entry_add() compare
resident attributes with lowest_vcn zero instead of reading the
non-resident union member after a successful resident match.

Reject non-resident $FILE_NAME records too: the format requires
$FILE_NAME to be resident and callers treat returned records as resident.

Cc: stable@vger.kernel.org # v7.1
Fixes: 6ceb4cc81ef3 ("ntfs: add bound checking to ntfs_attr_find")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Samuel Moelius [Wed, 3 Jun 2026 17:41:09 +0000 (17:41 +0000)]

ntfs: detect mapping-pairs LCN accumulator overflow

The NTFS mapping-pairs parser accumulates relative LCN deltas in a
signed integer. A corrupted attribute can drive that addition past
the representable range.

One corrupt runlist shape sets the accumulated LCN to S64_MAX and
then adds a delta of 1 in the next mapping-pairs entry.

Signed overflow is undefined and can turn an invalid runlist into a
different set of physical clusters.

Check the LCN addition for overflow before storing the next run.

Cc: stable@vger.kernel.org # v7.1
Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Marco Crivellari [Thu, 14 May 2026 13:54:08 +0000 (15:54 +0200)]

ntfs: Add WQ_PERCPU to alloc_workqueue users

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Cc: stable@vger.kernel.org # v7.1
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Tue, 2 Jun 2026 04:53:24 +0000 (13:53 +0900)]

ntfs: serialize volume label accesses

Protect vol->volume_label with a mutex and snaphost the label before
copy_to_user. This prevent a use-after-free when FS_IOC_SETFSLABEL
replaces the vol->volume_label and FS_IOC_GETTSLABEL reads it
concurrently.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Ron de Bruijn [Sat, 30 May 2026 00:19:18 +0000 (09:19 +0900)]

ntfs: fix off-by-one in mapping pairs decoding bounds checks

In ntfs_mapping_pairs_decompress(), attr_end points one byte past the
end of the attribute record:

    attr_end = (u8 *)attr + le32_to_cpu(attr->length);

The two bounds checks validating that mapping pair data bytes fit within
the attribute use strict greater-than (>), which allows a one-byte
out-of-bounds read when the data extends exactly to attr_end:

  b = *buf & 0xf;
  if (b) {
      if (unlikely(buf + b > attr_end))   // off-by-one
          goto io_error;
      for (deltaxcn = (s8)buf[b--]; b; b--)
          deltaxcn = (deltaxcn << 8) + buf[b];
  }

When buf + b == attr_end, the check evaluates to false and buf[b] reads
one byte past the valid attribute boundary. The same pattern appears in
the LCN delta bytes check.

Fix both checks to use >= so that buf[b] at exactly attr_end is
correctly rejected as out of bounds.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: Ron de Bruijn <rmbruijn@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Thu, 28 May 2026 02:15:35 +0000 (11:15 +0900)]

ntfs: not change 0-byte $DATA attribute to non-resident

When ntfs_resident_attr_resize() cannot grow a resident attribute in
place, it retries after converting other resident attributes to
non-resident to free space in the MFT recrord.

Do not select zero-length resident $DATA attributes for this conversion.
fsck treats 0-byte non-resident $DATA attribute as corruptions.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Zhao Zhang [Tue, 2 Jun 2026 08:43:33 +0000 (16:43 +0800)]

bpf: Reject fragmented frames in devmap

Devmap broadcast redirects clone the packet for all but the last
destination.

For native XDP, that clone path copies only the linear xdp_frame data,
while fragmented frames keep skb_shared_info in tailroom outside the
linear area. Cloning such a frame leaves XDP_FLAGS_HAS_FRAGS set but
without valid frag metadata, and the later free path can interpret
uninitialized tail data as skb_shared_info, leading to an out-of-bounds
access during frame return.

Reject fragmented native XDP frames in dev_map_enqueue_clone().

Add the same restriction to the generic XDP clone path in
dev_map_redirect_clone(). Generic XDP represents fragmented packets as
nonlinear skbs, and rejecting them here keeps clone-based broadcast
support aligned between native and generic XDP.

Fixes: e624d4ed4aa8 ("xdp: Extend xdp_redirect_map with broadcast support")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Assisted-by: Codex:GPT-5.4
Signed-off-by: Zhao Zhang <zzhan461@ucr.edu>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/21c2d153dd25603d359069a02bf06779b51f6423.1780385378.git.zzhan461@ucr.edu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Colin Ian King [Wed, 27 May 2026 07:49:41 +0000 (08:49 +0100)]

ntfs: Fix spelling mistake "etnry" -> "entry"

There is a spelling mistake in a ntfs_error message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

DaeMyung Kang [Sun, 24 May 2026 05:42:37 +0000 (14:42 +0900)]

ntfs: free link name from ntfs_name_cache

ntfs_link() converts the new link name with ntfs_nlstoucs() using
NTFS_MAX_NAME_LEN. In this case ntfs_nlstoucs() allocates the result
from ntfs_name_cache, and its contract requires callers to release the
buffer with kmem_cache_free(ntfs_name_cache, ...).

All other ntfs_nlstoucs() callers in namei.c do that, but ntfs_link()
uses kfree(), which mismatches the allocator for successfully converted
names.

The conversion failure path reaches the common out label with uname ==
NULL. That was harmless for kfree(), but kmem_cache_free() does not
provide the same NULL contract. Return directly on conversion failure
and free successful conversions with ntfs_name_cache.

Fixes: af0db57d4293 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Namjae Jeon [Thu, 21 May 2026 12:30:01 +0000 (21:30 +0900)]

ntfs: remove unnecessary NULL checks before kfree

NULL check before kfree() is unnecessary and triggers coccinelle warnings.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Namjae Jeon [Thu, 21 May 2026 12:27:41 +0000 (21:27 +0900)]

ntfs: remove unnecessary ternary boolean conversion

Coccinelle warned about unnecessary patterns when
assigning to bool variables.

Simply assign the condition directly.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

DaeMyung Kang [Sun, 17 May 2026 03:44:47 +0000 (12:44 +0900)]

ntfs: remove unsupported quota handling

The ntfs driver does not implement quota accounting. It creates
new inodes with the NTFS 1.2 $STANDARD_INFORMATION layout and does
not maintain the NTFS 3.x owner_id/quota_charged fields or the
$Quota usage records that Windows would need for meaningful quota
accounting.

The only runtime quota path left in the driver is the remount-rw
code that tries to mark $Quota/$Q out of date, plus the mount-time
code that loads $Quota and its $Q index solely to support that
marker.

Since the driver does not maintain the per-file quota metadata,
setting QUOTA_FLAG_OUT_OF_DATE does not make the quota state
meaningful, and failures in this unsupported path can unnecessarily
block remount-rw or force a mount read-only.

Remove the quota marker, the $Quota/$Q loading state, and the
unused quota volume flag. Keep the on-disk quota layout definitions
in layout.h so the documented NTFS structures remain available.

Suggested-by: Hyunchul Lee <hyc.lee@gmail.com>
Link: https://lore.kernel.org/all/CANFS6bYTzioqZjYt=51Kb9RdR3MKXaez_fh_WCLoym093VxFmg@mail.gmail.com/
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Thorsten Blum [Mon, 11 May 2026 09:50:33 +0000 (11:50 +0200)]

ntfs: use str_plural in ntfs_attr_make_non_resident

Replace the manual ternary "s" pluralization with str_plural() to
simplify the code.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Sat, 23 May 2026 04:14:23 +0000 (13:14 +0900)]

ntfs: add bounds check before accessing EA entries

in ntfs_ea_lookup and ntfs_listxattr, this verifies that there is enough
space in the EA entry before accessing the next_entry_offset field of
the EA entry.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Sat, 23 May 2026 04:14:22 +0000 (13:14 +0900)]

ntfs: validate index entries on reading

Validate index entries immediately after reading an index root or index
block from disk. This eliminates repeated checks in lookup and readdir,
and reduce the risk of missing checks in those paths.

Cc: stable@vger.kernel.org # v7.1
Tested-by: woot000 <woot000@woot000.com>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Sat, 23 May 2026 04:14:21 +0000 (13:14 +0900)]

ntfs: centalize $INDEX_ROOT header validation

Add a dedicated helper to perform stricter validation of $INDEX_ROOT and
use it for both directory inodes and named index inodes. This keeps the
root size and header geometry checks consistent across both read paths.

Cc: stable@vger.kernel.org # v7.1
Tested-by: woot000 <woot000@woot000.com>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Sat, 23 May 2026 04:14:20 +0000 (13:14 +0900)]

ntfs: validate index block header more strictly

Modify ntfs_index_block_inconsisent() to perform stricter validation of
INDEX_HEADER geometry in INDX blocks, and update
ntfs_lookup_inode_by_name() to use that function to validate INDX
blocks.

Cc: stable@vger.kernel.org # v7.1
Tested-by: woot000 <woot000@woot000.com>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Bjorn Andersson [Sat, 30 May 2026 20:44:21 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lock

During the SSR/PDR down notification the tx_lock is taken with the
intent to provide synchronization with active DMA transfers.

But during this period qcom_slim_ngd_down() is invoked, which ends up in
slim_report_absent(), which takes the slim_controller lock. In multiple
other codepaths these two locks are taken in the opposite order (i.e.
slim_controller then tx_lock).

The result is a lockdep splat, and a possible deadlock:

  rprocctl/449 is trying to acquire lock:
  ffff00009793e620 (&ctrl->lock){+.+.}-{4:4}, at: slim_report_absent (drivers/slimbus/core.c:322) slimbus

  but task is already holding lock:
  ffff00009793fb50 (&ctrl->tx_lock){+.+.}-{4:4}, at: qcom_slim_ngd_ssr_pdr_notify (drivers/slimbus/qcom-ngd-ctrl.c:1475) slim_qcom_ngd_ctrl

  which lock already depends on the new lock.

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&ctrl->tx_lock);
                                lock(&ctrl->lock);
                                lock(&ctrl->tx_lock);
   lock(&ctrl->lock);

The assumption is that the comment refers to the desire to not call
qcom_slim_ngd_exit_dma() while we have an ongoing DMA TX transaction.
But any such transaction is initiated and completed within a single
qcom_slim_ngd_xfer_msg().

Prior to calling qcom_slim_ngd_exit_dma() the slim_controller is torn
down, all child devices are notified that the slimbus is gone and the
child devices are removed.

Stop taking the tx_lock in qcom_slim_ngd_ssr_pdr_notify() to avoid the
deadlock.

Fixes: a899d324863a ("slimbus: qcom-ngd-ctrl: add Sub System Restart support")
Cc: stable@vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-9-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bjorn Andersson [Sat, 30 May 2026 20:44:20 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGD

The pm_runtime_enable() and pm_runtime_use_autosuspend() calls are
supposed to be balanced on exit, add these calls.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-8-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bjorn Andersson [Sat, 30 May 2026 20:44:19 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: Initialize controller resources in controller

The work structs and work queue are controller resources, create and
destroy them in the controller context. Creating them as part of the
child device's probe path seems to be okay now that the controller's
probe has been updated, but if for some reason the child does not probe
successfully a SSR or PDR notification will schedule_work() on an
uninitialized "ngd_up_work".

Move the initialization of these controller resources to the controller
probe function to avoid any issues, and to clarify the ownership.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-7-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bjorn Andersson [Sat, 30 May 2026 20:44:18 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: Register callbacks after creating the ngd

When the remoteproc starts in parallel with the NGD driver being probed,
or the remoteproc is already up when the PDR lookup is being registered,
or in the theoretical event that we get an interrupt from the hardware,
these callbacks will operate on uninitialized data. This result in
issues to boot the affected boards.

One such example can be seen in the following fault, where
qcom_slim_ngd_ssr_pdr_notify() schedules work on the NULL ngd_up_work.

[   21.858578] ------------[ cut here ]------------
[   21.858745] WARNING: kernel/workqueue.c:2338 at __queue_work+0x5e0/0x790, CPU#2: kworker/2:2/116
...
[   21.859251] Call trace:
[   21.859255]  __queue_work+0x5e0/0x790 (P)
[   21.859265]  queue_work_on+0x6c/0xf0
[   21.859273]  qcom_slim_ngd_ssr_pdr_notify+0x110/0x150 [slim_qcom_ngd_ctrl]
[   21.859304]  qcom_slim_ngd_ssr_notify+0x24/0x40 [slim_qcom_ngd_ctrl]
[   21.859318]  notifier_call_chain+0xa4/0x230
[   21.859329]  srcu_notifier_call_chain+0x64/0xb8
[   21.859338]  ssr_notify_start+0x40/0x78 [qcom_common]
[   21.859355]  rproc_start+0x130/0x230
[   21.859367]  rproc_boot+0x3d4/0x518
...

Move the enablement of interrupts, and the registration of SSR and PDR
until after the NGD device has been registered.

This could be further refined by moving initialization to the control
driver probe and by removing the platform driver model from the picture.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-6-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bjorn Andersson [Sat, 30 May 2026 20:44:17 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownership

PDR and SSR callbacks are registred from the controller probe function,
but currently released from the child device's remove function.

The remove() function should only be unwinding what was done in the
same device's probe() function.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-5-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bjorn Andersson [Sat, 30 May 2026 20:44:16 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: Fix probe error path ordering

qcom_slim_ngd_ctrl_probe() first registers the SSR callback then
allocates the PDR context, as such the error path needs to come in
opposite order to allow us to unroll each step.

Fixes: 16f14551d0df ("slimbus: qcom-ngd: cleanup in probe error path")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-4-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bjorn Andersson [Sat, 30 May 2026 20:44:15 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: Fix up platform_driver registration

Device drivers should not invoke platform_driver_register()/unregister()
in their probe and remove paths. They should further not rely on
platform_driver_unregister() as their only means of "deleting" their
child devices.

Introduce a helper to unregister the child device and move the
platform_driver_register()/unregister() to module_init()/exit().

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-3-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bartosz Golaszewski [Sat, 30 May 2026 20:44:14 +0000 (21:44 +0100)]

slimbus: qcom-ngd-ctrl: fix OF node refcount

Platform devices created with platform_device_alloc() call
platform_device_release() when the last reference to the device's
kobject is dropped. This function calls of_node_put() unconditionally.
This works fine for devices created with platform_device_register_full()
but users of the split approach (platform_device_alloc() +
platform_device_add()) must bump the reference of the of_node they
assign manually. Add the missing call to of_node_get().

Cc: stable@vger.kernel.org
Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-2-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

DaeMyung Kang [Fri, 22 May 2026 14:20:48 +0000 (23:20 +0900)]

ntfs: avoid heap allocation for free-cluster readahead state

get_nr_free_clusters() allocates a temporary file_ra_state before it
publishes the precomputed free cluster count, sets NVolFreeClusterKnown(),
and wakes vol->free_waitq. If that allocation fails, the worker returns
without setting the flag or waking waiters, so callers waiting for the free
count can block indefinitely.

The readahead state is only used synchronously while scanning the bitmap.
Keep it on the stack and pass it by address to the readahead helper. This
eliminates the early allocation failure path instead of adding a special
case that publishes a conservative count and wakes the waitqueue.
Zero-initialize the on-stack state because file_ra_state_init() only sets
ra_pages and prev_pos.

Apply the same treatment to __get_nr_free_mft_records(), which scans the
MFT bitmap with the same short-lived readahead state.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Thu, 21 May 2026 05:37:03 +0000 (14:37 +0900)]

ntfs: skip extent mft records in writeback to prevent deadlock

This patch fixes the ABBA deadlock between extent_lock and extent
mrec_lock triggered by xfstests generic/113, that occurs since the commit
6994acf33bae ("ntfs: use base mft_no when looking up base inode for
extent record").

Path A (inode writeback):
  VFS writeback
    -> ntfs_write_inode()
      -> __ntfs_write_inode()
        -> mutex_lock(&ni->extent_lock)
        -> mutex_lock(&tni->mrec_lock)

Path B (MFT folio writeback):
  VFS writeback of $MFT dirty folios
    -> ntfs_mft_writepages()
      -> ntfs_write_mft_block()
        -> ntfs_may_write_mft_record()
          -> holds one extent mrec_lock from a previous iteration
          -> tries to acquire another base inode extent_lock

By removing all extent_lock and extent mrec_lock acquisition from the MFT
folio writeback path, the ABBA lock ordering is eliminated:

Path A: __ntfs_write_inode(): extent_lock -> mrec_lock
Path B (removed): ntfs_write_mft_block(): mrec_lock -> extent_lock

Path B is always redundant for extent records because:

1. mark_mft_record_dirty(ext_ni) does NOT dirty the MFT folio.
   It only sets NInoDirty(ext_ni) and marks the base VFS inode dirty
   via __mark_inode_dirty(I_DIRTY_DATASYNC), which triggers Path A.
   Therefore, normal extent modifications never create a situation where
   the MFT folio is dirty and Path B is not scheduled.

2. The MFT folio only gets dirtied via ntfs_mft_mark_dirty() inside
   ntfs_mft_record_alloc(). But all identified callers in attrib.c
   (ntfs_attr_add, ntfs_attr_record_move_away,
   ntfs_attr_make_non_resident, ntfs_attr_record_resize) follow through
   with mark_mft_record_dirty(), which triggers Path A to write the
   complete record.

3. ntfs_evict_big_inode() calls ntfs_commit_inode() before freeing extent
   inodes, ensuring all dirty extents are flushed via Path A before the
   base inode leaves the icache.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

DaeMyung Kang [Thu, 21 May 2026 10:17:51 +0000 (19:17 +0900)]

ntfs: only alias volume $UpCase to default on exact match

load_and_init_upcase() currently aliases vol->upcase to the global
default upcase whenever the shared prefix matches, and then truncates
vol->upcase_len to that shorter prefix. The result is correct only by
accident: upcase[] accesses in name collation are gated by upcase_len,
so the prefix-equality alias produces the same fold output as keeping
the volume's own shorter table.

Still, prefix equality is not equality: the volume table is logically
distinct from the default and should not be replaced by it unless they
are byte-for-byte identical. Use memcmp() to compare the complete table
in one expression and drop the now-redundant upcase_len rewrite.

No user-visible change is expected for compliant volumes whose $UpCase
has exactly default_upcase_len entries; shorter volume tables are no
longer aliased to the default.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

DaeMyung Kang [Thu, 21 May 2026 10:17:49 +0000 (19:17 +0900)]

ntfs: free volume-wide resources on fill_super failure

ntfs_fill_super()'s err_out_now path frees only the volume struct via
kfree(vol), leaving several vol-owned allocations behind on every mount
failure:

  - vol->nls_map, loaded by ntfs_init_fs_context() via
    load_nls_default() (or replaced by an explicit nls= option in
    ntfs_parse_param()), is never unload_nls()'d.

  - vol->volume_label, allocated by load_system_files() through
    ntfs_ucstonls() once the $Volume name attribute has been parsed, is
    not released by load_system_files()'s own error labels nor by the
    fill_super() inline cleanup that only runs on d_make_root()
    failure.  Any later failure inside load_system_files() leaks it.

  - vol->lcn_empty_bits_per_page was kvfree()'d in
    unl_upcase_iput_tmp_ino_err_out_now without clearing the pointer,
    so it could not be folded into a single common cleanup.

Because the failure paths never call ntfs_volume_free() and never reach
the d_make_root() inline cleanup block (it sits above the label and is
jumped over by the load_system_files() / kvmalloc failure gotos), these
resources accumulate per failed mount attempt with no chance of
recovery short of unloading the module.  This is a silent leak: the
inodes loaded prior to failure remain hashed but generic_shutdown_super()
skips evict_inodes() when sb->s_root is unset, so no CHECK_DATA_CORRUPTION
warning is emitted either.

Move the per-volume frees down to err_out_now and drop the
lcn_empty_bits_per_page kvfree() from the upper label so the cleanup is
performed exactly once on every failure path.  Using unconditional
kvfree() / kfree() / unload_nls() is safe because they all accept NULL
and the upper labels that previously freed nls_map (the d_make_root()
inline cleanup) already clear the pointer.

Cc: stable@vger.kernel.org # v7.1
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

commit | commitdiff | tree

Bartosz Golaszewski [Sat, 30 May 2026 20:43:40 +0000 (21:43 +0100)]

nvmem: core: fix use-after-free bugs in error paths

Fix several instances of error paths in which we call
__nvmem_device_put() - which may end up freeing the underlying memory
and other resources - and then keep on using the nvmem structure. Always
put the reference to the nvmem device as the last step before returning
the error code.

Cc: stable@vger.kernel.org
Fixes: 7ae6478b304b ("nvmem: core: rework nvmem cell instance creation")
Fixes: e888d445ac33 ("nvmem: resolve cells from DT at registration time")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204340.116743-3-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Andre Heider [Sat, 30 May 2026 20:43:39 +0000 (21:43 +0100)]

nvmem: layouts: onie-tlv: fix hang on unknown types

The EEPROM on my board has a vendor specific entry of type 0x41. When
stumbling upon that, this driver hangs in an endless loop.

Fix it by keep incrementing the offset on unknown entries, so the loop
will eventually stop.

Fixes: d3c0d12f6474 ("nvmem: layouts: onie-tlv: Add new layout driver")
Cc: Stable@vger.kernel.org
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204340.116743-2-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Greg Kroah-Hartman [Fri, 5 Jun 2026 15:17:30 +0000 (17:17 +0200)]

Merge tag 'svc_fixes_for_v7.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into char-misc-linus

Dinh writes:

firmware: stratix10-svc and stratix10-rsu: fixes for v7.1
- Return -EOPNOTSUPP when ATF async is not supported
- Fix SVC driver from loading entirely when asynchronous ops is not
  supported in older ATF.
- Fix a NULL pointer dereference on a timeout in rsu_send_msg()

* tag 'svc_fixes_for_v7.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  firmware: stratix10-rsu: Fix NULL deref on rsu_send_msg() timeout in probe
  firmware: stratix10-svc: Don't fail probe when async ops unsupported
  firmware: stratix10-svc: Return -EOPNOTSUPP when ATF async unsupported

commit | commitdiff | tree

Greg Kroah-Hartman [Fri, 5 Jun 2026 15:16:27 +0000 (17:16 +0200)]

Merge tag 'usb-serial-7.1-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB serial fixes for 7.1-rc7

Here are two fixes for buffer overflows in the io_ti driver and a new
modem device id.

All have been in linux-next with no reported issues.

* tag 'usb-serial-7.1-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: option: add usb-id for Dell Wireless DW5826e-m
  USB: serial: io_ti: fix heap overflow in build_i2c_fw_hdr()
  USB: serial: io_ti: fix heap overflow in get_manuf_info()

commit | commitdiff | tree

Song Chen [Wed, 3 Jun 2026 09:19:10 +0000 (17:19 +0800)]

bpf: Reject registration of duplicated kfunc

Search for duplicated kfunc in btf_vmlinux and btf_modules
before a kernel module attempts to register a kfunc.
If kfunc would shadow existing kfunc then pr_err() and
reject module loading.

Reviewed-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Song Chen <chensong_2000@126.com>
Link: https://lore.kernel.org/r/20260603091910.7212-1-chensong_2000@126.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Emil Tsalapatis [Thu, 4 Jun 2026 18:42:52 +0000 (14:42 -0400)]

MAINTAINERS: BPF: Add self as reviewer and run parse_maintainers.pl

Add myself as a reviewer for the BPF subsystem. While at it, run
./scripts/parse_maintainers.pl --order and reorder the BPF-related
entries in the file accordingly.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260604184252.9917-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Fri, 5 Jun 2026 15:00:09 +0000 (08:00 -0700)]

Merge branch 'bpf-introduce-resizable-hash-map'

Mykyta Yatsenko says:

====================
bpf: Introduce resizable hash map

This patch series introduces BPF_MAP_TYPE_RHASH, a new hash map type that
leverages the kernel's rhashtable to provide resizable hash map for BPF.

The existing BPF_MAP_TYPE_HASH uses a fixed number of buckets determined at
map creation time. While this works well for many use cases, it presents
challenges when:

1. The number of elements is unknown at creation time
2. The element count varies significantly during runtime
3. Memory efficiency is important (over-provisioning wastes memory,
under-provisioning hurts performance)

BPF_MAP_TYPE_RHASH addresses these issues by using rhashtable, which
automatically grows and shrinks based on load factor.

The implementation wraps the kernel's rhashtable with BPF map operations:

- Uses bpf_mem_alloc for RCU-safe memory management
- Supports all standard map operations (lookup, update, delete, get_next_key)
- Supports batch operations (lookup_batch, lookup_and_delete_batch)
- Supports BPF iterators for traversal
- Supports BPF_F_LOCK for spin locks in values
- Requires BPF_F_NO_PREALLOC flag (elements allocated on demand)
- In-place updates for improved performance.
- max_entries serves as a hard limit, not bucket count
- Uses bit_spin_lock() + local_irq_save() for bucket locking,
similar to existing BPF hashmap's raw_spin_lock_irqsave(), insertions and
deletes may fail.
- Iterations are best-effort, if resize, insertions, deletions take place
concurrently, iterations may visit same elements multiple times or skip
elements.
- Lock out insertions, when running special fields destructor to guarantee
its completion.

The series includes comprehensive tests:
- Basic operations in test_maps (lookup, update, delete, get_next_key)
- BPF program tests for lookup/update/delete semantics
- Seq file tests

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Update implementation
---------------------
Current implementation of the BPF_MAP_TYPE_RHASH does not provide
the same strong guarantees on the values consistency under concurrent
reads/writes as BPF_MAP_TYPE_HASH.
BPF_MAP_TYPE_HASH allocates a new element and atomically swaps the
pointer. BPF_MAP_TYPE_RHASH does memcpy in place with no lock held.
rhash trades consistency for speed, concurrent readers can observe
partially updated data. Two concurrent writers to the same key can
also interleave, producing mixed values. This is similar to arraymap
update implementation, including handling of the special fields.
As a solution, user may use BPF_F_LOCK to guarantee consistent reads
and write serialization.

Summary of the read consistency guarantees:

  map type     |  write mechanism |  read consistency
  -------------+------------------+--------------------------
  htab         |  alloc, swap ptr |  always consistent (RCU)
  htab  F_LOCK |  in-place + lock |  consistent if reader locks
  -------------+------------------+--------------------------
  rhtab        |  in-place memcpy |  torn reads
  rhtab F_LOCK |  in-place + lock |  consistent if reader locks

Benchmarks
----------
1. LOOKUP  (single producer, M events/sec)
  key | max | nr    |    htab |   rhtab | ratio | delta
  ----+-----+-------+---------+---------+-------+-------
    8 |  1K |   750 |   99.85 |   81.92 | 0.82x |  -18 %
    8 |  1K |    1K |  100.71 |   80.19 | 0.80x |  -20 %
    8 |  1M |  750K |   23.37 |   72.09 | 3.08x | +208 %
    8 |  1M |    1M |   13.39 |   53.72 | 4.01x | +301 %
   32 |  1K |   750 |   51.57 |   42.78 | 0.83x |  -17 %
   32 |  1K |    1K |   50.81 |   45.83 | 0.90x |  -10 %
   32 |  1M |  750K |   11.27 |   15.29 | 1.36x |  +36 %
   32 |  1M |    1M |    7.32 |    8.75 | 1.19x |  +19 %
  256 |  1K |   750 |    7.58 |    7.88 | 1.04x |   +4 %
  256 |  1K |    1K |    7.43 |    7.81 | 1.05x |   +5 %
  256 |  1M |  750K |    3.69 |    4.27 | 1.16x |  +16 %
  256 |  1M |    1M |    2.60 |    3.12 | 1.20x |  +20 %

Pattern:
  * Small map (1K): htab wins for 8 / 32 byte keys by 10-20 %
    because the preallocated bucket array fits in L1.  Equalises
    at 256 byte keys.
  * Large map (1M): rhtab wins everywhere, up to 4x at high load
    factor with 8 byte keys.
  * Higher load factor amplifies rhtab's lead: rhtab grows the
    bucket array; htab stays at user-declared max.

2. FULL UPDATE  (M events/sec per producer, -p 7)

  htab  per-producer:
    20.33   22.02   19.27   23.61   24.18   23.17   21.07
    mean  21.94   range  19.27 - 24.18

  rhtab per-producer:
   133.51  129.47   74.52  129.29  102.26  129.98  107.64
    mean 115.24   range  74.52 - 133.51

  speedup (mean): 5.25x   (+425 %)

In-place memcpy avoids the per-update alloc + RCU pointer swap
that htab pays.

3. MEMORY  (overwrite, -p 8, no --preallocated)

  value_size |  htab ops/s | rhtab ops/s | htab mem | rhtab mem
  -----------+-------------+-------------+----------+----------
       32 B  |  122.87 k/s |  133.04 k/s | 2.47 MiB | 2.49 MiB
     4096 B  |   64.43 k/s |   65.38 k/s | 6.74 MiB | 6.44 MiB
  rhtab/htab :  +8 % ops, +0.8 % mem   (32 B)
                +1 % ops,  -4  % mem (4096 B)

SUMMARY

  * Small / well-fitting map: htab is faster (cache-friendly
    fixed bucket array), but only by ~10-20 %.
  * Large / high-load-factor map: rhtab is dramatically faster
    (1.2x to 4x) because rhashtable resizes to keep the load
    factor sane while htab stays stuck at user-declared max.
  * Update-heavy workloads: rhtab is ~5x faster per producer
    via in-place memcpy.
  * Memory benchmark: effectively on par

---
Changes in v7:
- rhashtable_next_key: move into lib/rhashtable.c, drop params argument
  (Herbert).
- rhashtable_next_key: kdoc clarifies that behavior on tables with
  duplicate keys is undefined (sashiko).
- rhashtable: include Herbert's "Use irq work for shrinking" patch so
  __rhashtable_remove_fast_one() can fire the shrink path from NMI
  context (Herbert).
- hashtab: fix u32 multiply overflow in __rhtab_map_lookup_and_delete_batch
  copy_to_user; cast total to size_t before multiplying by key_size /
  value_size (sashiko, bot+bpf-ci).
- hashtab: allow kptr/refcount fields in rhtab values (same model as
  array map).
- Link to v6: https://patch.msgid.link/20260602-rhash-v6-0-1bfd35a4184f@meta.com

Changes in v6:
- rhashtable_next_key: advance past duplicate keys in the main bucket
  chain to avoid an infinite loop when there are duplicate keys
  (sashiko).
- rhashtable_next_key: return ERR_PTR(-EOPNOTSUPP) on rhltable (sashiko).
- rhashtable: selftest pre-sizes the table to avoid concurrent rehash
  triggering spurious failures (sashiko).
- hashtab: real rhtab_map_mem_usage in the basic commit; move
  bpf_map_free_internal_structs from rhtab_free_elem into the
  special-fields commit where it does meaningful work (bot+bpf-ci).
- bpf_iter (seq_file): switch to rhashtable_walk_* for stronger
  coverage under concurrent rehash; get_next_key and batch keep
  rhashtable_next_key (sashiko).
- iter ops: rhtab_map_get_next_key adds IS_ERR check
  before dereferencing the element pointer (sashiko).
- iter ops: bpf_each_rhash_elem removes cond_resched() (sashiko).
- iter ops: batch returns -EAGAIN (not -ENOENT) on cursor delete,
  so userspace can distinguish lost cursor from end-of-iteration
  and restart from NULL (sashiko).

- Link to v5: https://patch.msgid.link/20260528-rhash-v5-0-7205191b6c57@meta.com

Changes in v5:
- rhashtable_next_key: add kdoc WARNING to highlight lack of rehash
  detection and unbounded iteration (Herbert).
- rhashtable: selftest now checks IS_ERR() before PTR_ERR comparison
  on the missing-key path (bot+bpf-ci).
- hashtab: drop dead stub bodies and unused map_ops registrations
  from the basic commit; iteration commit adds bodies, structs, and
  registrations together. .map_get_next_key keeps a stub registration
  in the basic commit because the syscall dispatcher does not
  NULL-check it; iteration commit replaces the stub body with the
  real implementation (bot+bpf-ci).
- hashtab: fix batch cursor advancement. v4 stashed the lookahead
  element key but then resumed via next_key(cursor), skipping that
  element across batch boundaries and orphaning it on
  lookup_and_delete_batch. v5 stashes the lookahead key and looks
  it up directly on the next batch entry (bot+bpf-ci, sashiko v3).
- hashtab: document torn-read race in rhtab_map_update_existing,
  matching arraymap semantics (bot+bpf-ci).
- Link to v4: https://patch.msgid.link/20260513-rhash-v4-0-dd3d541ccb0b@meta.com

Changes in v4:
- rhashtable: introduce rhashtable_next_key(), drop walker-based
  iteration for BPF (also drops earlier rhashtable_walk_enter_from()
  proposal).
- map_extra: presize hint via lower 32 bits (nelem_hint), capped at
  U16_MAX.
- Automatic shrinking enabled (was missing despite being advertised).
- Reject key_size > U16_MAX (rhashtable_params.key_len is u16).
- Replace irqs_disabled() guard with bpf_disable_instrumentation around
  bucket-lock paths: closes same-CPU NMI tracing recursion without
  rejecting legitimate IRQ-context callers.
- lookup_and_delete reordered: unlink before copy to avoid populating
  user buffer on concurrent-unlink -ENOENT.
- update_existing reordered: copy then free_fields, matching arraymap.
- Word-sized key fast path (sizeof(long) bytes), inlined hashfn/cmpfn
  via static-const rhashtable_params; works on both 32-bit and 64-bit.
- check_and_init_map_value() on insert (zero special-field bytes from
  recycled bpf_mem_alloc memory; previously bpf_spin_lock could read
  garbage and qspinlock would deadlock).
- BPF_SPIN_LOCK / BPF_RES_SPIN_LOCK allowlist moved to the special-
  fields commit so each commit is bisect-safe.
- Link to v3: https://patch.msgid.link/20260424-rhash-v3-0-d0fa0ce4379b@meta.com

Changes in v3:
- Squash all commits implementing basic functions into one (Alexei)
- Remove selftests that were not necessary (Alexei)
- Resize detection for kernel full iterations, error out on resize (Alexei)
- Remove second lookup in get_next_key() (Emil)
- __acquires(RCU)/__releases(RCU) on seq_start/seq_stop (Emil)
- Use bpf_map_check_op_flags() where it makes sense (Leon)
- Benchmarks refresh, experiment with alternative hash functions
- Rely on iterator invalidation during rehash to handle table resizes:
fail on resize where we fully iterate on table inside kernel, dont fail on
resize where iteration goes through userspace. Exception -
rhtab_map_free_internal_structs() should be just safe to iterate fully
in kernel, no risk of infinite loop, because no user holding reference.
- Handle special fields during in-place updates (Emil, sashiko)
- Link to v2: https://lore.kernel.org/all/20260408-rhash-v2-0-3b3675da1f6e@meta.com/

Changes in v2:
- Added benchmarks
- Reworked all functions that walk the rhashtable, use walk API, instead
of directly accessing tbl and future_tbl
- Added rhashtable_walk_enter_from() into rhashtable to support O(1)
iteration continuations
- Link to v1: https://lore.kernel.org/r/20260205-rhash-v1-0-30dd6d63c462@meta.com

---
====================

Link: https://patch.msgid.link/20260605-rhash-v7-0-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:29 +0000 (04:41 -0700)]

selftests/bpf: Add resizable hashmap to benchmarks

Support resizable hashmap in BPF map benchmarks.

1. LOOKUP  (single producer, M events/sec)

  key | max | nr    |    htab |   rhtab | ratio | delta
  ----+-----+-------+---------+---------+-------+-------
    8 |  1K |   750 |   99.85 |   81.92 | 0.82x |  -18 %
    8 |  1K |    1K |  100.71 |   80.19 | 0.80x |  -20 %
    8 |  1M |  750K |   23.37 |   72.09 | 3.08x | +208 %
    8 |  1M |    1M |   13.39 |   53.72 | 4.01x | +301 %
   32 |  1K |   750 |   51.57 |   42.78 | 0.83x |  -17 %
   32 |  1K |    1K |   50.81 |   45.83 | 0.90x |  -10 %
   32 |  1M |  750K |   11.27 |   15.29 | 1.36x |  +36 %
   32 |  1M |    1M |    7.32 |    8.75 | 1.19x |  +19 %
  256 |  1K |   750 |    7.58 |    7.88 | 1.04x |   +4 %
  256 |  1K |    1K |    7.43 |    7.81 | 1.05x |   +5 %
  256 |  1M |  750K |    3.69 |    4.27 | 1.16x |  +16 %
  256 |  1M |    1M |    2.60 |    3.12 | 1.20x |  +20 %

Pattern:
  * Small map (1K): htab wins for 8 / 32 byte keys by 10-20%
  * Large map (1M): rhtab wins everywhere, up to 4x at high load
    factor with 8 byte keys.
  * Higher load factor amplifies rhtab's lead: rhtab grows the
    bucket array; htab stays at user-declared max.

2. FULL UPDATE  (M events/sec per producer)

  htab  per-producer:
    20.33   22.02   19.27   23.61   24.18   23.17   21.07
    mean  21.94   range  19.27 - 24.18

  rhtab per-producer:
   133.51  129.47   74.52  129.29  102.26  129.98  107.64
    mean 115.24   range  74.52 - 133.51

  speedup (mean): 5.25x   (+425 %)

In-place memcpy avoids the per-update alloc + RCU pointer swap
that htab pays.

3. MEMORY

  value_size |  htab ops/s | rhtab ops/s | htab mem | rhtab mem
  -----------+-------------+-------------+----------+----------
       32 B  |  122.87 k/s |  133.04 k/s | 2.47 MiB | 2.49 MiB
     4096 B  |   64.43 k/s |   65.38 k/s | 6.74 MiB | 6.44 MiB
  rhtab/htab :  +8 % ops, +0.8 % mem   (32 B)
                +1 % ops,  -4  % mem (4096 B)

Throughput effectively tied

SUMMARY

  * Small / well-fitting map: htab is faster (cache-friendly
    fixed bucket array), but only by ~10-20 %.
  * Large / high-load-factor map: rhtab is dramatically faster
    (1.2x to 4x) because rhashtable resizes to keep the load
    factor sane while htab stays stuck at user-declared max.
  * Update-heavy workloads: rhtab is ~5x faster per producer
    via in-place memcpy.
  * Memory benchmark: effectively on par.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-12-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:28 +0000 (04:41 -0700)]

bpftool: Add rhash map documentation

Make bpftool documentation aware of the resizable hash map.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-11-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:27 +0000 (04:41 -0700)]

selftests/bpf: Add BPF iterator tests for resizable hash map

Test basic BPF iterator functionality for BPF_MAP_TYPE_RHASH,
verifying all elements are visited.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-10-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:26 +0000 (04:41 -0700)]

selftests/bpf: Add basic tests for resizable hash map

Test basic map operations (lookup, update, delete) for
BPF_MAP_TYPE_RHASH including boundary conditions like duplicate
key insertion and deletion of nonexistent keys.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-9-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:25 +0000 (04:41 -0700)]

libbpf: Support resizable hashtable

Add BPF_MAP_TYPE_RHASH to libbpf's map type name table and feature
probing so that libbpf-based tools can create and identify resizable
hash maps.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-8-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:24 +0000 (04:41 -0700)]

bpf: Optimize word-sized keys for resizable hashtable

Specialize the lookup/update/delete paths for keys whose size matches
sizeof(long) (4 bytes on 32-bit, 8 bytes on 64-bit). A static-const
rhashtable_params lets the compiler inline a custom XOR-fold hashfn and
a single-word equality cmpfn, eliminating the indirect jhash dispatch.
The same hashfn and cmpfn are installed into rhashtable's stored params
at rhashtable_init time, so the rehash worker, slow-path inserts, and
rhashtable_next_key() all agree with the inlined fast paths.

The seq_file BPF iterator uses rhashtable_walk_* and is unaffected.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-7-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:23 +0000 (04:41 -0700)]

bpf: Allow special fields in resizable hashtab

Add support for timers, workqueues, task work, spin locks and kptrs.
Without this, users needing deferred callbacks, BPF_F_LOCK, or
refcounted kernel pointers in a dynamically-sized map have no option -
fixed-size htab is the only map supporting these field types.
Resizable hashtab should offer the same capability.

kptr semantics under in-place updates are identical to array map.

Properly clean up BTF record fields on element delete and map
teardown by wiring up bpf_obj_free_fields through a memory allocator
destructor, matching the pattern used by htab for non-prealloc maps.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-6-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:22 +0000 (04:41 -0700)]

bpf: Implement iteration ops for resizable hashtab

Implement get_next_key, batch lookup/lookup-and-delete, for_each_map_elem
callback, and the seq_file BPF iterator for BPF_MAP_TYPE_RHASH.

get_next_key() and batch use rhashtable_next_key() — stateless,
matches the syscall UAPI shape (no kernel-side iterator state).
get_next_key falls back to the first key when prev_key was
concurrently deleted (matches htab semantics). Batch reports
cursor loss as -EAGAIN so userspace can distinguish it from
end-of-iteration (-ENOENT) and restart from NULL.

The seq_file BPF iterator uses rhashtable_walk_* instead. It runs
only from read() syscall context, so the walker's spin_lock is
safe, and seq_file's per-fd state lets the walker handle rehash
correctly (retry on -EAGAIN) for stronger coverage than the
stateless API can provide.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-5-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:21 +0000 (04:41 -0700)]

bpf: Implement resizable hashmap basic functions

Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast()
for deletes, and rhashtable_lookup_get_insert_fast() for inserts.

Updates modify values in place under RCU rather than allocating a
new element and swapping the pointer (as regular htab does). This
trades read consistency for performance: concurrent readers may
see partial updates. BPF_F_LOCK support and special-field
handling (timers, kptrs, etc.) follow in a later commit.

Initialize rhashtable with bpf_mem_alloc element cache. Require
BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via
rhashtable_free_and_destroy().

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-4-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Herbert Xu [Fri, 5 Jun 2026 11:41:20 +0000 (04:41 -0700)]

rhashtable: Use irq work for shrinking

Use irq work for automatic shrinking so that this may be called in NMI
context.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-3-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:19 +0000 (04:41 -0700)]

rhashtable: Add selftest for rhashtable_next_key()

Insert n elements, then verify:
- NULL prev_key walks from the beginning, visiting all n
- non-existing prev_key returns ERR_PTR(-ENOENT)

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20260605-rhash-v7-2-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Fri, 5 Jun 2026 11:41:18 +0000 (04:41 -0700)]

rhashtable: Add rhashtable_next_key() API

Introduce a simpler iteration mechanism for rhashtable that lets
the caller continue from an arbitrary position by supplying the
previous key, without the per-iterator state of the
rhashtable_walk_* API.

void *rhashtable_next_key(struct rhashtable *ht,
const void *prev_key);

Caller holds RCU; passes NULL prev_key for the first element or
the previously returned key to advance. Walks tbl->future_tbl
chain so in-flight rehashes are observed.

Best-effort: in case of concurrent resize, provides no guarantees:
- may produce duplicate elements
- may skip any amount of elements
- termination of the loop is not guaranteed in case of
sustained rehash. Callers are advised to bound loop externally
or avoid inserting new elements during such loop.

Returns ERR_PTR(-ENOENT) if prev_key is not found.
Behavior on tables with duplicate keys is undefined.
rhltable is not supported — returns ERR_PTR(-EOPNOTSUPP).

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20260605-rhash-v7-1-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Inochi Amaoto [Thu, 28 May 2026 11:38:39 +0000 (19:38 +0800)]

RISC-V: KVM: Enhance the logging check for mmu mapping

When enabling dirty ring, the dirty bitmap is disable, and the logging
check is always false as the RISC-V architecture does not select
"NEED_KVM_DIRTY_RING_WITH_BITMAP". Although the dirty log is recorded
since the write path already trying to add the dirty log, the logic for
logging check is broken and some side effect will occurs.

Enhance the logging check for mmu mapping so it can check both the dirty
ring and the dirty bitmap.

Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260528113840.2629186-1-inochiama@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>

commit | commitdiff | tree

Pablo Neira Ayuso [Thu, 4 Jun 2026 06:21:13 +0000 (08:21 +0200)]

netfilter: conntrack: call nf_ct_gre_keymap_destroy() if master helper is pptp

For GRE flows, validate that the ct master helper (if any) is pptp
before calling nf_ct_gre_keymap_destroy(), so the helper data area
can be accessed safely. Note that only the pptp helper provides a
.destroy callback.

Fixes: e56894356f60 ("netfilter: conntrack: remove l4proto destroy hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Pablo Neira Ayuso [Thu, 4 Jun 2026 06:21:12 +0000 (08:21 +0200)]

netfilter: conntrack: revert ct extension genid infrastructure

This infrastructure is not used anymore after moving ct timeout and
helper to use datapath refcount to track object use.

Revert commit c56716c69ce1 ("netfilter: extensions: introduce extension
genid count") this patch disables all ct extensions (leading to NULL)
for unconfirmed conntracks, when this is only targeted at ct helper and
ct timeout. There is also codebase that dereferences the ct extension
without checking for NULL which could lead to crash.

Fixes: c56716c69ce1 ("netfilter: extensions: introduce extension genid count")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

A mirror of Linus' kernel repository

RSS Atom