Jeremy Kerr [Mon, 8 Jun 2026 01:25:41 +0000 (09:25 +0800)]
net: mctp: usb: don't fail mctp_usb_rx_queue on a deferred submission
In the ndo_open path, a deferred queue open will report a failure, and
so the netdev will not be ndo_stop()ed, leaving us with the rx_retry
work potentially pending.
Don't report a deferred queue as an error, as we are still operational.
This means we use the ndo_stop() path for future cleanup, which handles
rx_retry_work cancellation.
That urb completion can then re-schedule rx_retry_work.
Strenghen the sequencing between the stop (preventing another requeue)
and the cancel by updating both atomically under a new rx lock. After
setting ->rx_stopped, and cancelling pending work, we know that the
requeue cannot occur, so all that's left is killing any pending urb.
Minxi Hou [Thu, 4 Jun 2026 16:30:16 +0000 (00:30 +0800)]
selftests/net/openvswitch: guard command substitutions against empty output
When ip-link output is unavailable, when the upcall daemon log has not
been written yet, or when pahole does not know the OVS drop subsystem
ID, the affected command substitutions silently produce empty strings.
The caller then passes empty sha= or pid= arguments to ovs_add_flow,
or matches against wrong drop reason codes, all without a diagnostic.
Add [ -z ] guards immediately after each assignment. For test_arp_ping,
also align the MAC extraction to use awk '/link\/ether/' as in
test_pop_vlan. The drop_reason guard returns ksft_skip because an
absent subsystem ID is an environment issue, not a test failure.
91b9aed7381c ("ARM: dts: aspeed-g6: Add nodes for i3c controllers") currently
causes a new warning:
... /ahb/apb/bus@1e7a0000/syscon@0: failed to match any schema with compatible: ['aspeed,ast2600-i3c-global', 'syscon']
The patch necessary to address it has an R-b tag from Kryzsztof[2] but as best
I can tell is yet to be applied to the MFD tree. I've left the change in for now
as the warning will resolve once the binding patch is applied.
Lastly, as part of improving support for the Kommando card Anirudh has also
addressed[1] the persistent pain we've had with the phy-mode property for the
AST2600 MACs. Thanks to Andrew Lunn for being on the case for a while now, and
for those who tested Anirudh's patch.
Merge tag 'riscv-sophgo-dt-for-v7.2' of https://github.com/sophgo/linux into soc/dt
RISC-V Devicetrees for v7.2
Sophgo:
For CV18xx serials:
- Add bindings for Milk-V "Duo S" board.
For SG2042:
- The CPU unit address incorrectly used decimal numbers,
especially for those nodes which value >= 10. Now
corrected to use hexadecimal.
- The MSI controller actually only supports 16 interrupts;
corrected to match the actual situation.
- PCIe RCs are cache-coherent with the CPU. Marked it out
for RC nodes.
For SG2044:
- The same as SG2042, use hex for CPU unit address.
In additional, update Chen Wang's email address for Sopgho
SoC maintainer.
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
* tag 'riscv-sophgo-dt-for-v7.2' of https://github.com/sophgo/linux:
riscv: dts: sophgo: reduce SG2042 MSI count to 16
riscv: dts: sophgo: sg2042: use hex for CPU unit address
riscv: dts: sophgo: sg2044: use hex for CPU unit address
riscv: dts: sophgo: Add dma-coherent to SG2042 PCIe controllers
dt-bindings: soc: sophgo: add sg2000 plic and clint documentation
dt-bindings: soc: sophgo: add Milk-V Duo S board compatibles
MAINTAINERS: update Chen Wang's email address
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'aspeed-7.2-drivers-0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bmc/linux into soc/drivers
aspeed: First batch of driver changes for v7.2
While bc13f14f5cd3 ("soc: aspeed: cleanup dead default for
ASPEED_SOCINFO") was committed just now it has been in -next for a while as b333a0f1c857411d83a02aa6f1d9ecc7666d6179. The commit is fresh as I moved it
between branches.
Other than that it's just the one other patch from Krzysztof tidying up the
location of MODULE_DEVICE_TABLE().
* tag 'aspeed-7.2-drivers-0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bmc/linux:
soc: aspeed: cleanup dead default for ASPEED_SOCINFO
soc: aspeed: Move MODULE_DEVICE_TABLE next to the table itself
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'amlogic-arm64-dt-for-v7.2-v2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into soc/dt
Amlogic ARM64 DT for v7.2 v2 take 1:
- Khadas VIM4 (T7 SoC) features:
- Memory layout fixup
- GIC register range
- Model name fixup
- PWM, eMMC, SD card and SDIO support
- PWM LED
- I2C pinctrl node
- Khadas VIM1s Features
- Bluetooth
- PWM LED
- Power Key
- Function Key via SARADC
- RTC
- Remote control keymap
- Bluetooth node for Phicomm N1
Merge tag 'thead-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux into soc/dt
T-HEAD Devicetrees for 7.2
Enable wifi on two TH1520 boards: BeagleV Ahead and Lichee Pi 4a.
The BeagleV Ahead board uses an AP6203BM WiFi module connected to SDIO1.
The Lichee Pi 4A has an RTL8723DS WiFi module also connected to SDIO1.
The module reset line is driven through a PCA9557 GPIO expander on the
I2C1 bus.
* tag 'thead-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux:
riscv: dts: thead: Enable wifi on the BeagleV-Ahead
riscv: dts: thead: Enable WiFi on Lichee Pi 4A
riscv: dts: thead: Add TH1520 I2C1 controller
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'tenstorrent-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/tenstorrent/linux into soc/dt
Tenstorrent device tree for v7.2
Add a riscv,pmu node to the Tenstorrent Blackhole SoC device tree. This
enables OpenSBI to expose the SBI PMU extension, allowing Linux perf to
use the 4 programmable counters (mhpmcounter3-6) across 3 event classes:
instruction commit, microarchitectural, and memory system events.
Extend the RISC-V IOMMU device tree bindings to document the Tenstorrent
IOMMU used in the Tenstorrent Atlantis SoC. A second register range is
added which contains M-mode only registers like PMAs and PMPs. The
binding will be used by OpenSBI and potentially other M-mode software.
* tag 'tenstorrent-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/tenstorrent/linux:
dt-bindings: iommu: riscv: Add bindings for Tenstorrent RISC-V IOMMU
riscv: dts: tenstorrent: Add PMU node to blackhole for Linux perf support
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'sunxi-drivers-for-7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/drivers
Allwinner driver changes for 7.2
Mostly changes to the SRAM driver to allow for one SRAM region to be
"claimed" by multiple changes. When a region is "claimed" it is removed
or disconnected from the CPU's view. This is needed on the H6 and H616,
which have one alias region seemingly shared between the video codec
engine and the display engine.
One minor fix for the RSB driver is also included.
* tag 'sunxi-drivers-for-7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
bus: sunxi-rsb: Always check register address validity
soc: sunxi: sram: Add H616 SRAM regions
soc: sunxi: sram: Support claiming multiple regions per device
soc: sunxi: sram: Allow SRAM to be claimed multiple times
soc: sunxi: sram: Const-ify sunxi_sram_func data and references
dt-bindings: sram: sunxi-sram: Add H616 SRAM regions
dt-bindings: sram: Document Allwinner H616 VE SRAM
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
gpio: mt7621: fix interrupt banks mapping on gpio chips
The GPIO controller's registers are organized as sets of eight 32-bit
registers with each set controlling a bank of up to 32 pins. A single
interrupt is shared for all of the banks handled by the controller.
The driver implements this using three gpio chip instances every one
with its own irq chip. Every single pin can generate interrupts having
a total of 96 possible interrupts here. It looks like there is a problem
with interrupts being properly mapped to the gpio bank using this solution.
This problem report is in the following lore's link [0].
Device tree is using two cells for this, so only the interrupt pin and the
interrupt type are described there. Changing to have three cells to setup
also the bank and implement 'of_node_instance_match()' would also work but
this would be an ABI breakage and also a bit incoherent since gpios itself
are also using two cells and properly mapped in desired bank using through
its pin number on 'of_xlate()'.
That said, register a linear IRQ domain of the total of 96 interrupts shared
with the three gpio chip instances so the bank and the interrupt is properly
decoded and devices using gpio IRQs properly work.
Marco Scardovi [Sun, 7 Jun 2026 23:05:02 +0000 (01:05 +0200)]
gpio: rockchip: fix generic IRQ chip leak on remove
The driver allocates domain generic chips using
irq_alloc_domain_generic_chips() during probe. However, on driver
remove/teardown, the generic chips are not automatically freed when the
IRQ domain is removed because the domain flags do not include
IRQ_DOMAIN_FLAG_DESTROY_GC.
This causes both the domain generic chips structure and the associated
generic chips to be leaked. Additionally, the generic chips remain on
the global gc_list and may later be visited by generic IRQ chip suspend,
resume, or shutdown callbacks after the GPIO bank has been removed,
potentially resulting in a use-after-free and kernel crash.
Fix the resource leak by explicitly calling
irq_domain_remove_generic_chips() before removing the IRQ domain in
rockchip_gpio_remove().
Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Assisted-by: Antigravity:gemini-3.5-flash Signed-off-by: Marco Scardovi <scardracs@disroot.org> Link: https://patch.msgid.link/20260607230504.35392-2-scardracs@disroot.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
gpio-mockup validates only that each second gpio_mockup_ranges value is
non-negative before creating the mock chips. The fixed-base form uses
the second value as the first GPIO number after the range, while the
dynamic-base form uses it as the number of GPIOs.
gpio_mockup_register_chip() stores the resulting number of GPIOs in a
u16 and passes it through a PROPERTY_ENTRY_U16("nr-gpios", ...). Values
greater than U16_MAX therefore truncate silently. For example,
gpio_mockup_ranges=-1,65537 creates a one-line mock GPIO chip instead of
rejecting the invalid request.
Reject zero-width, reversed, and over-U16 ranges before registering any
mock chip.
====================
ipv4: igmp: annotate diagnostic procfs data races
This patch series addresses several unannotated data races between lockless
RCU-protected diagnostic reads in /proc/net/igmp (igmp_mc_seq_show())
and concurrent writes in serialized paths (RTNL and group spinlocks).
Following the precedent in commit 061c0aa740d5 ("ipv4: igmp: annotate
data-races around im->users"), we annotate these intentional data races
using READ_ONCE() and WRITE_ONCE() macros.
- Patch 1 annotates races around `in_dev->mc_count` (interface-level joins).
- Patch 2 annotates races around active timer-related state tracking fields
(`tm_running`, `reporter`, `expires`) on individual multicast groups.
====================
Yuyang Huang [Fri, 5 Jun 2026 01:43:18 +0000 (10:43 +0900)]
ipv4: igmp: annotate data-races around timer-related fields
/proc/net/igmp walks the multicast list locklessly under RCU and reads
timer-related fields (im->tm_running, im->reporter, im->timer.expires)
to print the timer state of multicast memberships. Concurrently, these
fields are modified under im->lock spinlock in timer management paths
(igmp_stop_timer(), igmp_start_timer(), and igmp_timer_expire()). Fix this
intentional lockless snapshot by annotating the lockless reads with
READ_ONCE() and the updates with WRITE_ONCE().
Yuyang Huang [Fri, 5 Jun 2026 01:43:17 +0000 (10:43 +0900)]
ipv4: igmp: annotate data-races around in_dev->mc_count
/proc/net/igmp walks the multicast list for IPv4 interfaces locklessly
under RCU and prints state->in_dev->mc_count. Concurrently, device
init/destruction and multicast join/leave paths update the count
under the RTNL lock. Fix this intentional lockless snapshot by
annotating the read with READ_ONCE() and the updates with WRITE_ONCE().
Ruoyu Wang [Tue, 9 Jun 2026 07:33:13 +0000 (15:33 +0800)]
gpio: zynq: fix runtime PM leak on remove
pm_runtime_get_sync() increments the runtime PM usage counter even when it
returns an error. zynq_gpio_remove() uses it to keep the controller active
while removing the GPIO chip, but never drops the usage counter again.
Balance the get with pm_runtime_put_noidle() after disabling runtime PM.
Fixes: 3242ba117e9b ("gpio: Add driver for Zynq GPIO controller") Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com> Link: https://patch.msgid.link/20260609073313.5-1-ruoyuw560@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Anton Leontev [Thu, 4 Jun 2026 16:59:38 +0000 (19:59 +0300)]
hv_netvsc: use kmap_local_page in netvsc_copy_to_send_buf
netvsc_copy_to_send_buf() copies page buffer entries into the VMBus
send buffer using phys_to_virt() on the entry PFN. Entries for the
RNDIS header and the skb linear data come from kmalloc'd memory and
are always in the kernel direct map, but entries for skb fragments
reference page cache or user pages, which on 32-bit x86 with
CONFIG_HIGHMEM=y can live above the LOWMEM boundary. For such a page
phys_to_virt() returns an address outside the direct map and the
subsequent memcpy() faults on the transmit softirq path, which is
fatal.
Map the pages with kmap_local_page() instead, handling two properties
of the page buffer entries:
- pb[i].pfn is a Hyper-V PFN at HV_HYP_PAGE_SIZE (4K) granularity,
not a native PFN. Reconstruct the physical address first and derive
the native page from it, so the mapping stays correct where
PAGE_SIZE > HV_HYP_PAGE_SIZE (e.g. arm64 with 64K pages).
- Since commit 41a6328b2c55 ("hv_netvsc: Preserve contiguous PFN
grouping in the page buffer array"), an entry describes a full
physically contiguous fragment and pb[i].len can exceed PAGE_SIZE,
while kmap_local_page() maps a single page. Copy page by page,
splitting at native page boundaries.
The copy path only handles packets smaller than the send section size
(6144 bytes by default); larger packets take the cp_partial path where
only the RNDIS header is copied. So entries here are bounded by the
section size and a copy is split at most once on 4K-page systems. On
!CONFIG_HIGHMEM configs kmap_local_page() folds to page_address() and
no mapping work is added.
Convert the device-tree parsing path to the generic fwnode/device
property accessors so the driver can be probed on ACPI and swnode
platforms as well as OF. The helper is renamed from
i2c_mux_reg_probe_dt() to i2c_mux_reg_probe_fw() to reflect that.
The child-node branch uses is_acpi_device_node() rather than
is_acpi_node(): the latter also matches ACPI data nodes (the
_DSD hierarchical-property children used by PRP0001-style
firmware), which have no ACPI handle and would make
acpi_get_local_address() fall back to evaluating _ADR against the
root namespace and return -ENODATA. Routing data nodes through
fwnode_property_read_u32() instead lets them resolve the "reg"
property the same way OF and swnode children do.
Behavioural preservations (deliberate, to avoid regressing existing
users):
- The three-way endian fallback is kept verbatim: an explicit
"little-endian" property wins, then "big-endian", and otherwise
the host's compile-time byte order. device_is_big_endian() is
not used here because it ignores "little-endian" and introduces
"native-endian" semantics, which would diverge from the binding.
- The "if (!mux->data.reg)" guard around
devm_platform_get_and_ioremap_resource() in probe() is kept.
drivers/platform/mellanox/mlx-platform.c registers i2c-mux-reg
platform_devices with no memory resource and supplies a
pre-set .reg / .reg_size through struct
i2c_mux_reg_platform_data; without the guard those
registrations would fail in probe().
- The "if (!mux->data.reg)" ioremap block (and the paired
reg_size validation that depends on it) is hoisted above
i2c_get_adapter(mux->data.parent), so the fwnode path
preserves master's ordering of "ioremap before parent-adapter
get". For platdata users the validation runs from a slightly
earlier position, but mux->data.reg_size is already set from
platdata by then, so the order is functionally neutral.
The OF-only of_address_to_resource() translation in the old
probe_dt() is dropped because the same address is available from
the platform_device resource table on OF as well as ACPI, and the
existing fallback in probe() ioremaps it.
Acked-by: Peter Rosin <peda@lysator.liu.se> Signed-off-by: Abdurrahman Hussain <abdurrahman@nexthop.ai> Assisted-by: Claude-Code:claude-opus-4-7 Assisted-by: sashiko:gemini-3.1-pro-preview Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Andy Shevchenko [Tue, 25 Nov 2025 09:40:11 +0000 (10:40 +0100)]
i2c: acpi: Return -ENOENT when no resources found in i2c_acpi_client_count()
Some users want to return an error to the upper layers when
i2c_acpi_client_count() returns 0. Follow the common pattern
in such cases, i.e. return -ENOENT instead of 0.
While at it, fix the kernel-doc warning about missing return value
description.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Merge tag 'renesas-dts-for-v7.2-tag2' of https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/dt
Renesas DTS updates for v7.2 (take two)
- Add timer (MTU3) and xSPI FLASH support for the RZ/T2H and RZ/N2H
SoCs and their EVK boards,
- Add PCIe support for the RZ/V2N SoC and the RZ/V2N EVK board,
- Add support for the R-Car M3Le SoC and the Geist development board,
- Specify ethernet PHY reset timings on various R-Car boards,
- Add (more) serial, I2C, DMA, and sound support for the RZ/G3L SoC,
- Add PSCI, Multifunctional Interface (MFIS), and SCMI support for the
R-Car X5H SoC and Ironhide development board,
- Add serial DMA support for the RZ/G2L SoC,
- Add keyboard, I2C, Versa clock, and audio support for the RZ/G3L
SMARC SoM and EVK boards,
- Miscellaneous fixes and improvements.
Takashi Iwai [Tue, 9 Jun 2026 07:49:04 +0000 (09:49 +0200)]
ALSA: aloop: Drop superfluous break
At converting the spinlock to guard(), a break statement was put in
the scoped_guard block in loopback_jiffies_timer_function(), but it's
obviously superfluous (although it's harmless). Better to drop it for
avoiding confusion.
Qu Wenruo [Wed, 13 May 2026 04:36:21 +0000 (14:06 +0930)]
btrfs: introduce support for huge folios
With all the previous preparations, it's finally time to enable the
huge folio support.
- The max folio size
Here we define BTRFS_MAX_FOLIO_SIZE, which is fixed at 2MiB.
This will ensure we have a large enough but not too large folio for
btrfs. This limit applies to all systems regardless of page size.
Then we also define BTRFS_MAX_BLOCKS_PER_FOLIO, which depends on
CONFIG_BTRFS_EXPERIMENTAL.
If it's an experimental build, BTRFS_MAX_BLOCKS_PER_FOLIO is 512,
otherwise it's BITS_PER_LONG.
The filemap max order will be calculated using both
BTRFS_MAX_FOLIO_SIZE and BTRFS_MAX_BLOCKS_PER_FOLIO.
E.g. for 64K page size with 64K fs block size, the limit will be
BTRFS_MAX_FOLIO_SIZE (2M), which limits the filemap max order to 5.
This will be lower than the old order (6), but folios larger than 2M
are rarely any better for IO performance. Meanwhile excessively large
folios can cause other problems like stalling the IO pipeline for too
long.
For 4K page size and 4K fs block size, the limit will be increased to
2M from the old 256K.
This new size is constrained by both BTRFS_MAX_FOLIO_SIZE (2M) and
BTRFS_MAX_BLOCKS_PER_FOLIO (512 * 4K), allowing x86_64 to achieve huge
folio support, and the filemap max order will be 9.
- btrfs_bio_ctrl::submit_bitmap
This will be enlarged to contain BTRFS_MAX_BLOCKS_PER_FOLIO bits, and
this will be on-stack memory.
This will increase on-stack memory usage by 56 bytes compared to the
baseline (before the first patch in the series).
- Local @delalloc_bitmap inside writepage_delalloc()
Unfortunately we cannot afford to handle an allocation error here, thus
again we use on-stack memory.
Thus this will increase on-stack memory usage by 56 bytes again.
So unfortunately this means during the delalloc window, the writeback path
will have +112 bytes on-stack memory usage, and for other cases the
writeback path will have +56 bytes on-stack memory usage.
The +56 bytes (btrfs_bio_ctrl::submit_bitmap) can be removed
after we have reworked the compression submission, so the current
on-stack submit_bitmap is mostly a workaround until then.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 13 May 2026 04:36:20 +0000 (14:06 +0930)]
btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps
[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.
One call site that utilizes this limit is btrfs_bio_ctrl::submit_bitmap,
which makes it very simple and straightforward to just grab an unsigned
long value and assign it to submit_bitmap.
Unfortunately that limit prevents us from supporting huge folios.
For 4K page size and block size, a huge folio (order 9) means 512 blocks
inside a 2M folio.
[ENHANCEMENT]
Instead of using a fixed unsigned long value, change
btrfs_bio_ctrl::submit_bitmap to an unsigned long pointer.
And for cases where an unsigned long can hold the whole bitmap,
introduce @submit_bitmap_value, and just point that pointer to that
unsigned long.
Then update all direct users of bio_ctrl->submit_bitmap to use the
pointer version.
There are several call sites that get extra changes:
- @range_bitmap inside extent_writepage_io()
Which is only utilized to truncate the bitmap.
Since we do not want to allocate new memory just for such temporary
usage, change the original bitmap_set() and bitmap_and() into
bitmap_clear() for the ranges outside of the target range.
- Getting dirty subpage bitmap inside writepage_delalloc()
Since we're passing an unsigned long pointer now, we need to go with
different handling (bs == ps, blocks_per_folio <= BITS_PER_LONG,
blocks_per_folio > BITS_PER_LONG).
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 13 May 2026 04:36:19 +0000 (14:06 +0930)]
btrfs: prepare subpage operations to support more than BITS_PER_LONG sub-bitmaps
[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.
That limit allows us to easily grab an unsigned long without the need to
properly allocate memory for a larger bitmap.
Unfortunately that limit prevents us from supporting huge folios.
For 4K page size and block size, a huge folio (order 9) means 512 blocks
inside a 2M folio.
[ENHANCEMENT]
To allow direct bitmap operations without allocating new memory,
introduce two different ways to access the subpage bitmaps:
- Return an unsigned long value
This only happens if blocks_per_folio <= BITS_PER_LONG.
We read out the sub-bitmap into an unsigned long, and return the
value.
This is the old existing method.
This involves get_bitmap_value_##name() helper functions.
And this time the helper functions are defined as inline functions
instead of macros to provide better type checks.
- Return a pointer where the sub-bitmap starts
This only happens if blocks_per_folio >= BITS_PER_LONG.
This is the new method for sub-bitmaps larger than BITS_PER_LONG.
Since the sizes of sub-bitmaps are all aligned to BITS_PER_LONG, we
can directly access the start word of the sub-bitmap.
This involves get_bitmap_pointer_##name() helper functions.
Then change the existing sub-bitmaps users to use the new helpers:
- Bitmap dumping
Switch between get_bitmap_value_##name() and
get_bitmap_pointer_##name() depending on the sub-bitmap size.
- btrfs_get_subpage_dirty_bitmap()
Rename it to btrfs_get_subpage_dirty_bitmap_value() to follow the new
value/pointer naming.
Since we do not support huge folios yet, there is no pointer version
for the dirty bitmap.
Furthermore, add the support for block size == page size cases for
btrfs_get_subpage_dirty_bitmap_value(), so that the caller no longer
needs to check if the folio needs subpage handling.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 7 May 2026 17:59:32 +0000 (19:59 +0200)]
btrfs: simplify how first hit is passed to __btrfs_abort_transaction()
Optimize the btrfs_abort_transaction() for size as it (by our
convention) must be put right after the error condition is detected.
The exact file:line is reported so there's a portion that must be
inlined. As this is cold code it bloats functions. In previous patch
"btrfs: move transaction abort message to __btrfs_abort_transaction()"
the error message was moved to the common helper, saving like 20KiB of
btrfs.ko and several instructions per call site and some stack space.
There's little left to be optimized, we need to keep the atomic
test_and_set_bit() and to convey that as 'first hit' to
__btrfs_abort_transaction().
Right now it's a bool, which takes 8 bytes on stack for each call but
it's 1 bit of information. We can encode that to some of the other
parameters.
For that let's use the 'error' parameter, by convention it's negative
errno so we can reliably detect if it's the first hit or a later error.
Also the negation is usually implemented by a single instruction (NEG on
x86_64) so the resulting object code is kept short.
This reduces btrfs.ko by 8K and stack in several functions by 8 bytes.
Cumulative effect with the other commit is -30K of btrfs.ko. While the
encoding is an implementation detail, it's contained within the API.
Making the transaction abort calls very light is desired.
David Sterba [Thu, 7 May 2026 17:59:31 +0000 (19:59 +0200)]
btrfs: validate negative error number passed to btrfs_abort_transaction()
In preparation to encode more information to the error value add a step
that verifies if the value is valid (i.e. < 0). This works for
compile-time and runtime (in debugging mode).
The compile-time check recognizes direct constants and defines an array
type. An invalid condition leads to negative array size which is caught
by compiler.
The runtime check constructs the array type from the condition and only
verifies the correct size, as we don't need to tweak the size to be
negative.
The sizeof() expressions do not generate any code. In the debugging
config the warning adds about 9KiB of btrfs.ko code size.
The array size trick is needed as we can't use static_array(), not even
with __builtin_constant_p().
Sample error message:
In file included from inode.c:40:
inode.c: In function ‘__cow_file_range_inline’:
transaction.h:261:26: error: size of unnamed array is negative
261 | (void)sizeof(char[-!(__builtin_constant_p(error) ? (error) < 0 : 1)]); \
| ^
transaction.h:275:9: note: in expansion of macro ‘VERIFY_NEGATIVE_ERROR’
275 | VERIFY_NEGATIVE_ERROR(error); \
| ^~~~~~~~~~~~~~~~~~~~~
inode.c:665:17: note: in expansion of macro ‘btrfs_abort_transaction’
665 | btrfs_abort_transaction(trans, 17);
| ^~~~~~~~~~~~~~~~~~~~~~~
Filipe Manana [Thu, 21 May 2026 14:19:37 +0000 (15:19 +0100)]
btrfs: fix invalid pointer dereference in __btrfs_run_delayed_refs()
In the beginning of the loop, we try to obtain a locked delayed ref head,
if 'locked_ref' is currently NULL, by calling btrfs_select_ref_head(),
which can return an error pointer. If the error pointer is -EAGAIN we do
a continue and go back to the beginning of the loop, which will not try
again to call btrfs_select_ref_head() since 'locked_ref' is no longer
NULL but it's ERR_PTR(-EAGAIN), and then we do:
spin_lock(&locked_ref->lock);
against a ERR_PTR(-EAGAIN) value, generating an invalid pointer
dereference.
Fix this by ensuring that 'locked_ref' is set to NULL when
btrfs_select_ref_head() returns ERR_PTR(-EAGAIN) and incrementing 'count'
as well, to prevent infinite looping. We do this by doing a goto to the
bottom of the loop that already sets 'locked_ref' to NULL and does a
cond_resched(), with an increment to 'count' right before the goto.
These measures were in place before the refactoring in commit 0110a4c43451
("btrfs: refactor __btrfs_run_delayed_refs loop") but were unintentionally
lost afterwards.
Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/linux-btrfs/ag8ARRwykv8bpJ87@stanley.mountain/ Fixes: 0110a4c43451 ("btrfs: refactor __btrfs_run_delayed_refs loop") Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
KangNing Liao [Thu, 21 May 2026 12:29:45 +0000 (20:29 +0800)]
btrfs: protect sb_write_pointer() with invalidate lock
sb_write_pointer() reads the super block from the block device page cache
using read_cache_page_gfp(). This has the same race with BLKBSZSET as the
one fixed by commit 3f29d661e568 ("btrfs: sync read disk super and set
block size").
Take the mapping invalidate lock around read_cache_page_gfp() to
serialize the read against block size changes.
Filipe Manana [Thu, 14 May 2026 16:35:40 +0000 (17:35 +0100)]
btrfs: tracepoints: show inode type in btrfs_sync_file_enter() event
Print the type of the inode (directory, regular file, symlink, etc) to
facilitate debugging.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 14 May 2026 15:11:43 +0000 (16:11 +0100)]
btrfs: tracepoints: add trace event for btrfs_sync_log()
btrfs_sync_log() is one of the main functions called during a fsync.
Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 15:38:25 +0000 (16:38 +0100)]
btrfs: tracepoints: add trace event for btrfs_log_new_name()
btrfs_log_new_name() is an important function that affects inode logging
and is called during link and rename operations. Add trace events for when
entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 15:13:18 +0000 (16:13 +0100)]
btrfs: tracepoints: add trace event for btrfs_record_new_subvolume()
btrfs_record_new_subvolume() is an important operation that affects
inode logging and is called during subvolume creation. Add a trace event
for it to help debug issues.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 15:05:13 +0000 (16:05 +0100)]
btrfs: tracepoints: add trace event for btrfs_record_snapshot_destroy()
btrfs_record_snapshot_destroy() is an important operation that affects
inode logging and is called during subvolume/snapshot deletion as well as
during rmdir. Add a trace event for it to help debug issues.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 14:51:13 +0000 (15:51 +0100)]
btrfs: tracepoints: add trace event for btrfs_record_unlink_dir()
btrfs_record_unlink_dir() is an important operation that affects inode
logging and is called during unlink and rename operations. Add a trace
event for it to help debug issues.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 8 May 2026 16:09:48 +0000 (17:09 +0100)]
btrfs: tracepoints: add trace event for log_new_delayed_dentries()
log_new_delayed_dentries() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 7 May 2026 12:05:16 +0000 (13:05 +0100)]
btrfs: use simple assertions where enough during inode logging and replay
In overwrite_item():
There's no point in printing the root's ID if the assertion fails, since
it can only be BTRFS_TREE_LOG_OBJECTID if it fails.
In log_new_delayed_dentries():
There's no point in using a verbose assertion to print the value of
ctx->logging_new_delayed_dentries because it's a boolean, so if the
assertion fails we know its value is true (1).
So convert them to simpler assertion to make the code less verbose.
It also slightly reduces the object size, at least on x86_64 using
Debian's gcc 14.2.0-19 (if CONFIG_BTRFS_ASSERT is enabled in the kernel
config, which is the case for SUSE distributions for example).
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 2028244 197176 15624 2241044 223214 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 2028228 197176 15624 2241028 223204 fs/btrfs/btrfs.ko
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 7 May 2026 12:03:13 +0000 (13:03 +0100)]
btrfs: tracepoints: add trace event for log_conflicting_inodes()
log_conflicting_inodes() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 7 May 2026 10:17:55 +0000 (11:17 +0100)]
btrfs: tracepoints: add trace event for add_conflicting_inode()
add_conflicting_inode() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Merge tag 'mtk-dts64-for-v7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/dt
MediaTek ARM64 DeviceTree updates
This adds improvements for already supported SoCs and devices.
In particular:
- Adds support for the MT7981 SoC's Crypto Accelerator
- Enables gpio-keys and leds found on the MT7981b based
Xiaomi AX3000T router
- Adds new variants of MT7988 BananaPi BPi-R4 Pro
- ...and some spare cleanups for all BPi-R4 Pro boards
- Adds a MediaTek MT6365 devicetree and uses it in all of
the relevant supported boards in place of MT6359, where
needed (the MT6365 PMIC is a fully compatible variant
of the MT6359 PMIC, but still not named MT6359).
- Adds correct power supplies for CPUs and devices on a
variety of MediaTek Chromebooks and Genio AIoT boards,
including:
- MT8186 Corsola Chromebooks
- MT8192 Asurada Chromebooks
- MT8195 Cherry Chromebooks
- MT8390 Genio based boards
- MT8395 Genio based boards
- Adds HDMI TX support for Ezurio Tungsten 510/700 boards.
* tag 'mtk-dts64-for-v7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: (37 commits)
arm64: dts: mediatek: add LED and key support on Xiaomi AX3000T
arm64: dts: mediatek: mt8195-cherry: Sort top level nodes correctly
arm64: dts: mediatek: mt8195-cherry: Fix names for EC controlled regulators
arm64: dts: mediatek: mt8192-asurada: Add (BT|WIFI)_KILL_1V8_L GPIO line names
arm64: dts: mediatek: mt8192-asurada: Fix SPI-NOR flash compatible
arm64: dts: mediatek: mt8390-tungsten-smarc: add HDMI support
arm64: dts: mediatek: mt8188-geralt: Add little core CPU power supplies
arm64: dts: mediatek: mt8188-geralt: Add MT6359 PMIC supplies
arm64: dts: mediatek: mt8195-cherry: Add vusb33 supplies for XHCI controllers
arm64: dts: mediatek: mt8195-cherry: Add supply for SPI NOR flash
arm64: dts: mediatek: mt8195-cherry: Fix VBUS regulator description
arm64: dts: mediatek: mt8195-cherry: Add supplies for ChromeOS EC regulators
arm64: dts: mediatek: mt8195-cherry: Add MT6315 PMIC supplies
arm64: dts: mediatek: mt8195-cherry: Add MT6359 PMIC supplies
arm64: dts: mediatek: mt8192-asurada: Fix WiFi regulator description
arm64: dts: mediatek: mt8192-asurada: Add SPI NOR flash power supply
arm64: dts: mediatek: mt8192-asurada: Add CPU power supplies
arm64: dts: mediatek: mt8192-asurada: Add supplies for ChromeOS EC regulators
arm64: dts: mediatek: mt8192-asurada: Add MT6315 PMIC supplies
arm64: dts: mediatek: mt8192-asurada: Add MT6359 PMIC supplies
...
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
These are some of the issues that LLM reported to netconsole, and they
are being addressed here before big refactors.
I was doing some big refactors, and got some "pre-existent-issues"
during LLM review of the refactor, that make them hard to guarantee that
refactor is not introducing any bug, so, let's clean these pre-existent
bugs first, and then submit the refactor.
The issues fixed in this patchset were reported during the review of
https://lore.kernel.org/all/20260524-netconsole_move_more-v1-0-909d1ab398b4@debian.org/
Not all of them got fixed, but, those that were easy to reason about.
Why net-next and not 'net' tree.
Most of the functions that are being fixed here moved from netpoll to
netconsole, thus, fixing this on net will cause merge conflicts from
'net' to 'net-next', thus I decided to fix it on 'net-next', given we
are on 7.1-rc6 already. Sorry if that is not the right approach.
====================
Breno Leitao [Thu, 4 Jun 2026 16:10:14 +0000 (09:10 -0700)]
netconsole: close netdevice unregister window during target resume
process_resume_target() removes the target from target_list before
calling resume_target() so that netpoll_setup() can run with interrupts
enabled, then re-adds it once setup completes. netpoll_setup() acquires a
net_device reference (netdev_hold()) and releases the RTNL before
returning.
While the target is off target_list and the RTNL is not held,
netconsole_netdev_event() cannot find it. If the egress device is
unregistered in that window, the NETDEV_UNREGISTER notifier walks
target_list, misses the resuming target, and never tears it down. The
target is then re-added in STATE_ENABLED still holding a reference to the
now-unregistered device, leaking it and hanging unregister_netdevice() in
netdev_wait_allrefs().
Re-check under RTNL before re-publishing the target: if the device left
NETREG_REGISTERED while we were off the list, run do_netpoll_cleanup() and
mark the target disabled. Taking the RTNL across the check and the
list_add() serialises against the NETDEV_UNREGISTER notifier, which also
runs under RTNL, so the device is either still registered (and the
notifier will find the re-added target later) or already unregistering
(and we drop the reference here). netdev_wait_allrefs() runs from
netdev_run_todo() outside the RTNL, so dropping the reference here cannot
deadlock against the pending unregister.
Breno Leitao [Thu, 4 Jun 2026 16:10:13 +0000 (09:10 -0700)]
netconsole: clean up deactivated targets dropped before the cleanup worker
drop_netconsole_target() downgrades a STATE_DEACTIVATED target to
STATE_DISABLED and then only calls netpoll_cleanup() when the target is
STATE_ENABLED. A target becomes STATE_DEACTIVATED when its underlying
interface is unregistered: netconsole_netdev_event() moves it to
target_cleanup_list, and netconsole_process_cleanups_core() is expected
to run do_netpoll_cleanup() on it.
Now that drop_netconsole_target() takes target_cleanup_list_lock around
the unlink, a configfs removal racing with NETDEV_UNREGISTER can pull the
target off target_cleanup_list before the cleanup worker processes it.
The notifier drops the lock before calling
netconsole_process_cleanups_core(), so the worker then iterates a list
that no longer contains the target and never runs do_netpoll_cleanup() on
it. Because drop_netconsole_target() has already rewritten the state to
STATE_DISABLED, its own STATE_ENABLED check is false and netpoll_cleanup()
is skipped too. The net_device reference taken by netpoll_setup() is then
leaked and unregister_netdevice() hangs forever in netdev_wait_allrefs().
Capture whether the target still owns a netpoll before the state is
downgraded and clean it up for both STATE_ENABLED and STATE_DEACTIVATED
targets. netpoll_cleanup() is idempotent -- it skips when np->dev is
already NULL -- so it is safe even when the cleanup worker won the race
and already tore the netpoll down.
Breno Leitao [Thu, 4 Jun 2026 16:10:12 +0000 (09:10 -0700)]
netconsole: take target_cleanup_list_lock in drop_netconsole_target()
drop_netconsole_target() unlinks the target while only holding
target_list_lock. However, when the underlying interface has been
unregistered, netconsole_netdev_event() moves the target from
target_list to target_cleanup_list, and netconsole_process_cleanups_core()
walks that list under target_cleanup_list_lock only.
If a user removes the configfs target at the same time the cleanup
worker is iterating target_cleanup_list, list_del() can corrupt the list
because the two paths take disjoint locks while operating on the same
list node.
Acquire target_cleanup_list_lock around the list_del() so the unlink is
serialised against netconsole_process_cleanups_core() regardless of
which list the target currently belongs to. The state transition that
downgrades STATE_DEACTIVATED to STATE_DISABLED is left intact and is
performed under the same combined locking, preserving the existing
ordering with resume_target().
Breno Leitao [Thu, 4 Jun 2026 16:10:11 +0000 (09:10 -0700)]
netconsole: do not dequeue pooled skbs that cannot satisfy len
find_skb() falls back to np->skb_pool when the GFP_ATOMIC alloc_skb()
fails. The pool is refilled by refill_skbs(), which always allocates
buffers of MAX_SKB_SIZE (ethhdr + iphdr + udphdr + MAX_UDP_CHUNK ==
1502 bytes).
netconsole, however, computes the requested length dynamically as
total_len + np->dev->needed_tailroom
If the egress device declares a non-zero needed_tailroom (e.g. some
tunnel or hardware accelerator devices), the required length can exceed
MAX_SKB_SIZE. The pooled skb is then handed back to the caller, which
immediately performs skb_put(skb, len), trips the tail > end check, and
triggers skb_over_panic().
Leave the normal alloc_skb(len, GFP_ATOMIC) path untouched -- the slab
allocator can still satisfy oversized requests when memory is available,
so senders to devices with non-zero needed_tailroom keep working in the
common case. Only the pool fallback is gated: when alloc_skb() failed
and len exceeds the pool buffer size, skip the skb_dequeue() instead of
burning a pre-allocated skb on a request that would later trip
skb_over_panic(). Reserving pool entries for requests they can actually
satisfy also keeps the panic path, which depends on the pool being
primed, intact.
When that drop happens, emit a rate-limited net_warn() so the user
notices that netconsole is unable to push messages on the egress device.
The warn is skipped under in_nmi() for the same reason schedule_work()
is: printk machinery taken by net_warn_ratelimited() is not NMI-safe and
would risk recursing into the same nbcon console we are servicing.
MAX_SKB_SIZE / MAX_UDP_CHUNK were private to net/core/netpoll.c. Move
them to include/linux/netpoll.h so netconsole can reference the same
definition that refill_skbs() uses, keeping the two in sync by
construction. The header now pulls in <linux/ip.h> and <linux/udp.h>
explicitly so MAX_SKB_SIZE remains self-contained for any future user.
Breno Leitao [Thu, 4 Jun 2026 16:10:10 +0000 (09:10 -0700)]
netconsole: do not schedule skb pool refill from NMI
When alloc_skb() fails in find_skb(), the fallback path dequeues an skb
from np->skb_pool and unconditionally calls schedule_work() to top the
pool back up. schedule_work() ends up taking the workqueue pool locks,
which are not NMI-safe.
netconsole_write() is registered as the nbcon write_atomic callback and
is explicitly marked CON_NBCON_ATOMIC_UNSAFE, meaning it is invoked from
emergency/panic contexts including NMIs. If the NMI interrupts a thread
already holding the workqueue pool lock, calling schedule_work()
self-deadlocks and the panic message that was being printed is lost.
Introduce netcons_skb_pop() to fold the pool dequeue and the refill
request into a single helper. The helper skips schedule_work() when
called from NMI context; the pool is best-effort, so the refill is simply
deferred to the next non-NMI find_skb() call that exhausts alloc_skb()
and hits the fallback again. This keeps the fast path untouched and the
locking rules around the fallback pool documented in one place.
Note this only removes the schedule_work() hazard from the NMI path. The
allocation itself is still not fully NMI-safe: the alloc_skb(GFP_ATOMIC)
attempted first may take slab locks, and the skb_dequeue() fallback takes
np->skb_pool.lock, so either can deadlock if the NMI interrupts a holder
of those locks. Closing those windows requires an NMI-safe (lockless) skb
pool and is left to a follow-up; this patch addresses the schedule_work()
deadlock, which is both the most likely and the easiest to trigger.
Merge tag 'mtk-soc-for-v7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/drivers
MediaTek SoC driver updates
This adds subsys ID compatibility in MediaTek CMDQ, paving
the way for adding support for the MT8196 SoC, and fixes
the Multimedia System (MMSYS) routing masks for the MT8167
SoC.
* tag 'mtk-soc-for-v7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mediatek/linux:
soc: mediatek: mtk-mmsys: Restore MT8167 routing masks lost during merge
soc: mediatek: mtk-cmdq: Add cmdq_pkt_jump_rel_temp() for removing shift_pa
soc: mediatek: Use pkt_write function pointer for subsys ID compatibility
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Rui Qi [Thu, 4 Jun 2026 08:32:08 +0000 (16:32 +0800)]
selftests/livepatch: fix resource leak in test_klp_syscall init error path
In livepatch_init(), if klp_enable_patch() fails, the previously
created kobject and sysfs file are never cleaned up, causing a
resource leak. Capture the return value and add proper cleanup
on the error path.
Ingyu Jang [Tue, 19 May 2026 08:52:14 +0000 (17:52 +0900)]
wifi: mt76: Drop unneeded mt76_register_debugfs_fops() return checks
mt76_register_debugfs_fops() returns the dentry from
debugfs_create_dir(), which yields an error pointer on failure
(notably ERR_PTR(-ENODEV) when CONFIG_DEBUG_FS=n), never NULL. Per
commit ff9fb72bc077 ("debugfs: return error values, not NULL"),
callers do not need to check the return value.
Drop the dead !dir checks in mt7615/mt7915/mt7921/mt7925/mt7996
_init_debugfs(). Converting them to IS_ERR() instead would have
exposed a probe abort on CONFIG_DEBUG_FS=n, since each
*_init_debugfs() caller propagates the helper's return value.
This patch supersedes an earlier proposal that converted the checks
to IS_ERR().
Devin Wittmayer [Fri, 15 May 2026 18:39:21 +0000 (11:39 -0700)]
wifi: mt76: mt7921: assert sniffer on chanctx change
mt7921_change_chanctx() configures the channel for monitor vifs but
does not re-assert sniffer mode. mt7925_change_chanctx() does. Match
mt7925 by adding the missing mt7921_mcu_set_sniffer(true) call,
completing the architectural pattern from commit 914189af23b8 ("wifi:
mt76: mt7921: fix channel switch fail in monitor mode").
The user-visible regression this asymmetry produced on v6.17 and v6.18
was addressed by commit cdb2941a516c ("Revert "wifi: mt76: mt792x:
improve monitor interface handling"") in v6.19 and backported to the
6.17.y and 6.18.y stable trees. This patch is defense in depth in
case the NO_VIRTUAL_MONITOR change is reintroduced in a future series.
Dawei Feng [Thu, 4 Jun 2026 14:37:56 +0000 (22:37 +0800)]
octeontx2-af: fix memory leak in rvu_setup_hw_resources()
If rvu_npc_exact_init() fails in rvu_setup_hw_resources(), the function
returns directly instead of jumping to the error handling path. This
causes a resource leak for the previously initialized CGX, NPC, fwdata,
and MSI-X states.
Fix this by replacing the direct return with goto cgx_err to ensure
proper cleanup.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still present in
v7.1-rc6.
An x86_64 allyesconfig build showed no new warnings. As we do not have
access to Marvell OcteonTX2 RVU AF hardware to test with, no runtime
testing was able to be performed.
Fixes: 3571fe07a090 ("octeontx2-af: Drop rules for NPC MCAM") Cc: stable@vger.kernel.org Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Link: https://patch.msgid.link/20260604143756.1524482-1-dawei.feng@seu.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When FIELD_GET returns 0 for the retry count, subtracting 1 causes
an unsigned integer underflow, resulting in tx_retries becoming a
very large value (0xFFFFFFFF for u32).
Fix by checking if count is non-zero before subtracting 1.
When FIELD_GET returns 0 for the retry count, subtracting 1 causes
an unsigned integer underflow, resulting in tx_retries becoming a
very large value (0xFFFFFFFF for u32).
Fix by checking if count is non-zero before subtracting 1.
When FIELD_GET returns 0 for the retry count, subtracting 1 causes
an unsigned integer underflow, resulting in tx_retries becoming a
very large value (0xFFFFFFFF for u32).
Fix by checking if count is non-zero before subtracting 1.
When FIELD_GET returns 0 for the retry count, subtracting 1 causes
an unsigned integer underflow, resulting in tx_retries becoming a
very large value (0xFFFFFFFF for u32).
Fix by checking if count is non-zero before subtracting 1.
JB Tsai [Tue, 3 Mar 2026 05:36:36 +0000 (13:36 +0800)]
wifi: mt76: mt7921: add auto regdomain switch support
Implement 802.11d-based automatic regulatory domain switching to
dynamically determine the regulatory domain at runtime.
The scan-done event structure by reusing reserved padding and appending
new fields; the layout and values remains backward-compatible with
existing users.
Jakub Kicinski [Sun, 7 Jun 2026 00:24:01 +0000 (17:24 -0700)]
selftests: drv-net: gro: signal over-coalescing more reliably
GRO test is very timing-sensitive, packets may be delayed
by the network or just sent slowly. Because of this we retry
each test case up to 6 times.
This makes perfect sense for positive cases, in which we want
to see coalescing. Negative test cases, which modify headers
and expect no coalescing should have opposite treatment.
We should really try 6 times and make sure that each time
the test failed. This would, however, require that we annotate
each test to indicate whether its positive or negative.
Let's start with a simpler improvement. Do not allow
retries if we detected over-coalescing. Previously the negative
case would have to get lucky at least once in 6 tries to pass.
Now the first failure breaks the retry loop.
For background - NICs tend to ignore the contents of the TCP
timestamp option, so that test case commonly fails. In NIPA
having 6 attempts, however, was enough for some NICs to get
multiple successful runs in a row, getting the test cases
auto-classified as expected to pass, even tho the NIC does
not comply with the expectations.
Bryam Vargas [Sun, 7 Jun 2026 01:18:27 +0000 (01:18 +0000)]
isofs: bound Rock Ridge symlink components to the SL record
get_symlink_chunk() and the SL handling in
parse_rock_ridge_inode_internal() walk the variable-length components of
a Rock Ridge "SL" (symbolic link) record. Each component is a two-byte
header (flags, len) followed by len bytes of text, so it occupies
slp->len + 2 bytes. Both loops read slp->len and advance to the next
component, and get_symlink_chunk() additionally does
memcpy(rpnt, slp->text, slp->len), but neither checks that the component
lies within the SL record before dereferencing it.
A crafted SL record whose component declares a len that runs past the
record (rr->len) therefore triggers an out-of-bounds read of up to 255
bytes. When the record sits at the tail of its backing buffer - for
example a small kmalloc()ed continuation block reached through a CE
record - the read crosses the allocation; get_symlink_chunk() then
copies the out-of-bounds bytes into the symlink body returned to user
space by readlink(), disclosing adjacent kernel memory.
ISO 9660 images are routinely mounted from untrusted removable media -
desktop environments auto-mount them (e.g. via udisks2) without
CAP_SYS_ADMIN - so the record contents are attacker-controlled.
Reject any component that does not fit in the remaining record bytes
before using it. In get_symlink_chunk() return NULL, like the existing
output-buffer (plimit) checks, so a malformed record makes readlink()
fail with -EIO rather than silently returning a truncated target; in
parse_rock_ridge_inode_internal() stop the inode-size walk.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Suggested-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Link: https://patch.msgid.link/20260607011823.217748-1-hexlabsecurity@proton.me Signed-off-by: Jan Kara <jack@suse.cz>
Sean Wang [Wed, 1 Apr 2026 18:23:22 +0000 (13:23 -0500)]
wifi: mt76: mt792x: report txpower for the requested vif link
mt792x currently reports txpower from the generic PHY cached state,
which may not match the requested vif/link context.
Resolve the requested link channel and derive txpower from that channel
instead, with fallback to the current PHY chandef if no valid chanctx is
available.
Reported-by: Devin Wittmayer <lucid_duck@justthetip.ca> Closes: https://lore.kernel.org/linux-wireless/20260130215839.53270-1-lucid_duck@justthetip.ca/ Tested-by: Devin Wittmayer <lucid_duck@justthetip.ca> Tested-by: Satadru Pramanik <satadru@gmail.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Link: https://patch.msgid.link/20260401182322.64355-3-sean.wang@kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name>
wifi: mt76: mt7996: limit work in set_bitrate_mask
Calls to mt7996_set_bitrate_mask() would propagate work for all stations
on the ieee80211_hw regardless of the vif specified in the call. To
prevent unnecessary work in FW, limit setting the sta_rate to only the
specified vif in mt7996_sta_rate_ctrl_update().
Fixes: afff4325548f0 ("wifi: mt76: mt7996: Use proper link_id in link_sta_rc_update callback") Signed-off-by: Dylan Eskew <dylan.eskew@candelatech.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260408145057.2356878-2-dylan.eskew@candelatech.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
wifi: mt76: mt7996: reduce phy work in set_coverage
In mt7996_set_coverage_class(), each phy is iterated over in calling
mt7996_mac_set_coverage_class(). Thus, the phy2 and phy3 configuration
logic in mt7996_mac_set_coverage_class() can be dropped.
Sean Wang [Sat, 25 Apr 2026 16:09:30 +0000 (11:09 -0500)]
wifi: mt76: mt7921u: add MT7902 USB support
Add the 0e8d:7902 USB ID and select the MT7902 WM firmware. Use the
same USB queue mapping as mt7921/mt7925 so MT7902U can bind and probe
through mt7921u driver.
Jiajia Liu [Tue, 2 Jun 2026 05:43:49 +0000 (13:43 +0800)]
wifi: mt76: transform aspm_conf for pci_disable_link_state
commit b478e162f227 ("PCI/ASPM: Consolidate link state defines") changed
PCIE_LINK_STATE_L0S (1) to (BIT(0) | BIT(1)). PCI_EXP_LNKCTL_ASPM_L0S (1)
and PCI_EXP_LNKCTL_ASPM_L1 (2) are no longer matched with
PCIE_LINK_STATE_L0S (3) and PCIE_LINK_STATE_L1 (4).
On the platform enabling ASPM L0s and L1, mt76_pci_disable_aspm is not able
to disable L1. Fix this by transforming aspm_conf to pcie link state.
Jiajia Liu [Thu, 28 May 2026 03:38:14 +0000 (11:38 +0800)]
wifi: mt76: add wcid publish check in mt76_sta_add
Since mt7925_mac_sta_add publishes wcid, add publish check in mt76_sta_add
to avoid reinitializing the wcid->poll_list.
Found dev->sta_poll_list corruption when using mt7925 and 7.1-rc4.
According to the corruption information, prev->next was changed to itself.
wlan0: disconnect from AP 90:fb:5d:94:8b:e3 for new auth to 90:fb:5d:94:8b:e2
wlan0: authenticate with 90:fb:5d:94:8b:e2 (local address=84:9e:56:9c:7e:6b)
wlan0: send auth to 90:fb:5d:94:8b:e2 (try 1/3)
slab kmalloc-8k start ffff8c80958a6000 pointer offset 4160 size 8192
list_add corruption. prev->next should be next (ffff8c808a7488f8), but was ffff8c80958a7040. (prev=ffff8c80958a7040).
Lorenzo Bianconi [Fri, 22 May 2026 07:24:52 +0000 (09:24 +0200)]
wifi: mt76: mt7996: remove redundant pdev->bus check in probe
Drop the unnecessary pdev->bus NULL check in mt7996_pci_probe() since
the pointer is already dereferenced earlier in mt76_pci_disable_aspm(),
making the check dead code. Silences the related Smatch warning.
Lorenzo Bianconi [Sun, 31 May 2026 08:55:04 +0000 (10:55 +0200)]
wifi: mt76: mt7996: fix reading zeroed info->control.flags after mt76_tx_status_skb_add()
mt76_tx_status_skb_add() zeroes the mt76_tx_cb struct stored at
info->status.status_driver_data via memset(). Since info->control and
info->status are members of the same union in ieee80211_tx_info,
this overwrites info->control.flags.
In mt7996_tx_prepare_skb(), mt76_tx_status_skb_add() is called before
mt7996_mac_write_txwi(), which re-reads info->control.flags to extract
IEEE80211_TX_CTRL_MLO_LINK. Because the field has been zeroed, the
link_id always resolves to 0 for frames using global_wcid, leading to
incorrect TXWI configuration.
Fix this by passing link_id as an explicit parameter to
mt7996_mac_write_txwi(). In mt7996_tx_prepare_skb(), the link_id is
already extracted from info->control.flags before the destructive
mt76_tx_status_skb_add() call. For the beacon and inband discovery
callers in mcu.c, use link_conf->link_id directly.
Lorenzo Bianconi [Sun, 31 May 2026 09:38:57 +0000 (11:38 +0200)]
wifi: mt76: mt7996: Fix possible NULL pointer dereference in mt7996_mac_write_txwi_80211()
For injected frames (e.g. via radiotap), mac80211 can pass
info->control.vif = NULL, as explicitly noted in struct ieee80211_tx_info.
Check vif pointer before executing ieee80211_vif_is_mld() in
mt7996_mac_write_txwi_80211 routine in order to avoid a possible NULL
pointer dereference.
Lorenzo Bianconi [Sun, 31 May 2026 09:10:59 +0000 (11:10 +0200)]
wifi: mt76: mt7996: Fix possible token leak in mt7996_tx_prepare_skb()
If link_conf or link_sta lookup fails in mt7996_tx_prepare_skb routine,
mt7996 driver leaks an already allocated tx token. Fix the issue
releasing the token in case of error.
wifi: mt76: mt7915: validate skb length in txpower SKU query
In mt7915_mcu_get_txpower_sku(), the response skb from
mt76_mcu_send_and_get_msg() is used in memcpy without validating
its length:
For TX_POWER_INFO_RATE:
memcpy(res, skb->data + 4, sizeof(res));
where sizeof(res) is MT7915_SKU_RATE_NUM * 2 = 322 bytes.
For TX_POWER_INFO_PATH:
memcpy(txpower, skb->data + 4, len);
In both cases, if the firmware returns a response shorter than
the expected size, the memcpy reads beyond the skb data buffer.
The data surfaces to userspace via debugfs (txpower_sku and
txpower_path).
Add length checks for both code paths before the memcpy.
where MT7925_EVT_RSP_LEN is 512. If the firmware returns a response
shorter than 520 bytes (8 + 512), this reads beyond the skb data
buffer. The over-read data is then returned to userspace via nla_put()
in mt7925_testmode_dump().
Add a length check before the memcpy to ensure the skb contains
sufficient data.
mt792x_tx() rewrites addr1/addr2/addr3 by treating skb->data as
an 802.11 header for MLD traffic.
That is only valid for native 802.11 frames. Direct 802.3 TX can also
reach this path with IEEE80211_TX_CTL_HW_80211_ENCAP set, where
skb->data is not an 802.11 header.
Skip the MLD header rewrite for HW-encap packets to avoid corrupting
802.3 frame contents.
Sean Wang [Sat, 25 Apr 2026 15:47:21 +0000 (10:47 -0500)]
wifi: mt76: mt7925: program BA state on active links
With MLO, traffic for one TID can be sent on any active link. Programming
BA state only on the default link leaves the other active links out of
sync.
Program BA state on all active links instead.
Fixes: 766ea2cf5a39 ("Revert "wifi: mt76: mt7925: Update mt7925_mcu_uni_[tx,rx]_ba for MLO"") Tested-by: Yao Ting Hsieh <yao-ting.hsieh@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Link: https://patch.msgid.link/20260425154721.738101-3-sean.wang@kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name>
Sean Wang [Sat, 25 Apr 2026 15:47:19 +0000 (10:47 -0500)]
wifi: mt76: mt7925: keep TX BA state in the primary WCID
For MLO, the same TID can run over different links. Keeping TX BA state in
a link WCID makes the state depend on which link starts aggregation first.
Store it in the primary WCID instead, so the BA state stays stable across
links.
Fixes: 44eb173bdd4f ("wifi: mt76: mt7925: add link handling in mt7925_txwi_free") Tested-by: Yao Ting Hsieh <yao-ting.hsieh@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Link: https://patch.msgid.link/20260425154721.738101-1-sean.wang@kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name>