Joanne Koong [Tue, 19 May 2026 05:28:07 +0000 (22:28 -0700)]
fuse: re-lock request before returning from fuse_ref_folio()
fuse_ref_folio() unlocks the request but does not re-lock it before
returning. fuse_chan_abort() can end the request and the async end
callback (eg fuse_writepage_free()) can free the args while the
subsequent copy chain logic after fuse_ref_folio() accesses them,
leading to use-after-free issues.
Fix this by locking the request in fuse_ref_folio() before returning.
Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device") Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Joanne Koong [Tue, 19 May 2026 05:28:06 +0000 (22:28 -0700)]
fuse: re-lock request before replacing page cache folio
fuse_try_move_folio() unlocks the request on entry but does not
re-lock it on the success path. This means fuse_chan_abort() can end the
request and free the fuse_io_args (eg fuse_readpages_end()) while the
subsequent copy chain logic after fuse_try_move_folio() accesses the
fuse_io_args, leading to use-after-free issues.
Fix this by calling lock_request() before replace_page_cache_folio().
This ensures the request is locked on the success path which will
prevent the fuse_io_args from being freed while the later copying logic
runs, and also ensures that the ap->folios[i]->mapping is never null
since ap->folios[i] will always point to the newfolio after
replace_page_cache_folio().
Fixes: ce534fb05292 ("fuse: allow splice to move pages") Cc: stable@vger.kernel.org Reported-by: Lei Lu <llfamsec@gmail.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Merge branch 'xfrm: XFRM_MSG_MIGRATE_STATE new netlink message'
Antony Antony says:
====================
The current XFRM_MSG_MIGRATE interface is tightly coupled to policy and
SA migration, and it lacks the information required to reliably migrate
individual SAs. This makes it unsuitable for IKEv2 deployments,
dual-stack setups (IPv4/IPv6), and scenarios where policies are managed
externally (e.g., by daemons other than the IKE daemon).
Mandatory SA selector list
The current API requires a non-empty SA selector list, which does not
reflect the IKEv2 use case.
A single Child SA may correspond to multiple policies,
and SA discovery already occurs via address and reqid matching. With
dual-stack Child SAs this leads to excessive churn: the current method
would have to be called up to six times (in/out/fwd × v4/v6) on SA,
while the new method only requires two calls.
Selectors lack SPI (and marks)
XFRM_MSG_MIGRATE cannot uniquely identify an SA when multiple SAs share
the same policies (per-CPU SAs, SELinux label-based SAs, etc.). Without
the SPI, the kernel may update the wrong SA instance.
Reqid cannot be changed
Some implementations allocate reqids based on traffic selectors. In
host-to-host or selector-changing scenarios, the reqid must change,
which the current API cannot express.
Because strongSwan and other implementations manage policies
independently of the kernel, an interface that updates only a specific
SA - with complete and unambiguous identification - is required.
SA Selector, x->sel, can't be changed, especially Transport mode.
XFRM_MSG_MIGRATE_STATE provides that interface. It supports migration
of a single SA via xfrm_usersa_id (including SPI) and we fix
encap removal in this patch set, reqid updates, address changes,
and other SA-specific parameters. It avoids the structural limitations
of XFRM_MSG_MIGRATE and provides a simpler, extensible mechanism for
precise per-SA migration without involving policies.
This method also allows migtrating SA selectors typically used with
host-to-host in Transport mode.
New migration steps: first install block policy, remove the old policy,
call XFRM_MSG_MIGRATE_STATE for each state, then re-install the
policies and remove the block policy.
If the target SA tuple (daddr, SPI, proto, family) is already
occupied, the operation returns -EEXIST. In this case the original
SA is not preserved. Userspace must handle -EEXIST by
re-establishing the SA at the IKE level and manage policies.
====================
David Laight [Mon, 8 Jun 2026 12:42:42 +0000 (13:42 +0100)]
fbdev: sm501fb: Fix buffer errors in OF binding code
The code that gets the frame buffer mode from OF has 'use after free',
'buffer overrun' and memory leaks.
info->edid_data isn't free if the probe functions fail or if
pd->def_mode is set.
If both the CRT and PANEL are enabled info->edid_data is used after
being freed and is freed twice.
The string returned by of_get_property(np, "mode", &len) is just
written over either the static "640x480-16@60" or the module parameter
string without any regard for the length (which is most likely longer).
Use kstrump() for the OF mode and free everything before freeing 'info.
Fixes: 4295f9bf74a88 ("video, sm501: add OF binding to support SM501") Signed-off-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Tue, 9 Jun 2026 07:22:09 +0000 (09:22 +0200)]
fbdev/arm: Export acorndata_8x8 font symbol for bootloader
The text display code used in the Risc PC kernel image decompression
code uses arch/arm/boot/compressed/font.c, which includes
lib/fonts/font_acorn_8x8.c, which further includes <linux/font.h>.
Since commit 97df8960240a ("lib/fonts: Provide helpers for calculating
glyph pitch and size") <linux/font.h> contains inline functions that
require __do_div64, which is not linked into the ARM kernel
decompressor. This makes Risc PC zImages fail to build.
Resolve this issue by defining the BOOTLOADER symbol and use it to avoid
a static declaration of the acorndata_8x8 symbol. That way it can be
referenced by the arm bootloader, and other static math functions and
symbols (like __do_div64) stay static and don't get unneccesary included
in the ARM kernel bootloader decompressor object file.
Fixes: 97df8960240a ("lib/fonts: Provide helpers for calculating glyph pitch and size") Reported-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: linux-arm-kernel@lists.infradead.org Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Helge Deller <deller@gmx.de>
fbdev: Wrap fbcon updates from vga-switcheroo in helper
Handle console remapping in fbcon in fb_switch_output(). Vga-switcheroo
invokes this functionality before switching physical outputs to a new
graphics device. Open-coding fbcon state in vga-switcheroo exposed fbdev
implementation details.
Vga-switcheroo is used for switching physical outputs among graphics
hardware. This functionality is only supported by DRM drivers. A later
update will further move fb_switch_output() into DRM's fbdev emulation;
thus fully decoupling vga-switcheroo from fbdev.
v3:
- remove Kconfig dependency related to fbcon (Geert)
v2:
- use '#if defined' (Helge)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
fbdev: Wrap user-invoked calls to fb_blank() in helper
Handle fbcon during blanking in fb_blank_from_user(). First blank the
hardware, then blank fbcon. Same for unblanking. Update all callers and
resolve the duplicated logic.
With the new helper, fbdev's sysfb code no longer maintains fbcon state
by itself.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
fbdev: Wrap user-invoked calls to fb_set_var() in helper
Handle fbcon during display updates in fb_set_var_from_user(). Check
with fbcon if the mode change is possible, update hardware state and
finally update fbcon. Update all callers.
Only the FBIOPUT_VSCREENINFO ioctl currently does all steps. Other
mode-changes callers in sysfs and driver code are missing fbcon-related
steps.
With the new helper, ps3fb and sh_mobile_lcdcfb no longer maintain
fbcon state themselves.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
Hongling Zeng [Tue, 2 Jun 2026 08:54:21 +0000 (16:54 +0800)]
fbdev: omap2: fix use-after-free in omapfb_mmap
omapfb_mmap() has a race condition with OMAPFB_SETUP_PLANE ioctl that
can lead to use-after-free:
The fb_mmap() entry point holds mm_lock but not lock (fb_info->lock),
while ioctl handlers like OMAPFB_SETUP_PLANE hold lock but not mm_lock.
This allows concurrent execution.
In omapfb_mmap():
1. rg = omapfb_get_mem_region(ofbi->region); // Get old region ref
2. start = omapfb_get_region_paddr(ofbi); // Read from NEW region
3. len = fix->smem_len; // Read from NEW region
4. vm_iomap_memory(vma, start, len); // Map NEW region memory
5. atomic_inc(&rg->map_count); // Increment OLD region!
Concurrently, OMAPFB_SETUP_PLANE can:
- Reassign ofbi->region = new_rg
- Update fix->smem_len
- OMAPFB_SETUP_MEM then checks NEW region's map_count (0!) and frees it
This leaves userspace with a mapping to freed physical memory.
The fix is to read all required values (start, len) from the same
region reference (rg) that will have its map_count incremented,
preventing the region from being freed while still mapped.
AlbertoArostegui [Wed, 27 May 2026 15:39:13 +0000 (15:39 +0000)]
fbdev: pxa168fb: use devm_ioremap_resource() for MMIO
pxa168fb maps the LCD controller register resource with devm_ioremap(),
which does not request the memory region. Use devm_ioremap_resource()
instead so the MMIO range is claimed before being mapped.
Eduardo Silva [Mon, 1 Jun 2026 19:46:44 +0000 (21:46 +0200)]
fbdev: grvga: Fix CLUT register address offset in comment
The comment does not match the actual address offset. According
to the GRLIB IP Library Reference Manual (p. 2119), the CLUT register
is at offset 0x28, not the value stated in the comment.
Lu Yao [Thu, 30 Apr 2026 06:01:37 +0000 (14:01 +0800)]
fbcon: don't suspend/resume when vc is graphics mode
Don't need to do suspend/resume for fbcon in graphic mode.
Doing this may cause error, eg:
At the beginning, starting the Xorg with single screen and then an
external screen was plugged in. After logging out in Xorg, fbdev
info may using screen which is connected later on for info always
using first connected connector in list in func 'drm_setup_crtcs_fb'.
Then, S3 executed, fbcon found that the information did not match
and do atomic to switch fb. However, Xorg will not re-bind the crtc
fb but continues doing ioctl. At this time, the fb is incorrect.
With some modifications by Helge Deller.
Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Helge Deller <deller@gmx.de>
Li RongQing [Fri, 15 May 2026 01:02:02 +0000 (21:02 -0400)]
fbdev: sm712: Fix operator precedence in big_swap macro
The big_swap(p) macro was intended to swap bytes within 16-bit halves
of a 32-bit value. However, because the bitwise shift operators (<<, >>)
have higher precedence than the bitwise AND operator (&), the original
code failed to perform any shifting on the masked bits.
For example, 'p & 0xff00ff00 >> 8' was evaluated as 'p &
(0xff00ff00 >> 8)', effectively neutralizing the intended swap.
Fix this by adding parentheses to ensure the bitwise AND is performed
before the shift, correctly implementing the byte swap logic.
Fixes: 1461d66728648 ("staging: sm7xxfb: merge sm712fb with fbdev") Cc: stable@vger.kernel.org Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Helge Deller <deller@gmx.de>
esp: fix page frag reference leak on skb_to_sgvec failure
In esp_output_tail(), when esp->inplace is false, the old skb page frags
are replaced with a new page from the xfrm page_frag cache The source
scatterlist (sg) is built from the old frags before the replacement, and
esp_ssg_unref() is responsible for releasing the old page references
after the crypto operation completes
However, if the second skb_to_sgvec() call (which builds the destination
scatterlist from the new page) fails, the code jumps to error_free which
only calls kfree(tmp). The old page frag references captured in the
source scatterlist are never released:
1 sg[] is built from old frags via skb_to_sgvec() (no extra get_page)
2 nr_frags is set to 1 and frag[0] is replaced with the new page
3 Second skb_to_sgvec() fails -> goto error_free
Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
unconditionally unrefs the source scatterlist frags. Since req->src is
not yet initialized by aead_request_set_crypt() at the point of the
error, the source scatterlist is obtained directly via esp_req_sg()
Existing callers pass false to preserve the original behavior
The same issue exists in both esp4 and esp6 as the code is identical
Wen Gong [Thu, 4 Jun 2026 09:58:31 +0000 (15:28 +0530)]
wifi: ath12k: enable IEEE80211_VHT_EXT_NSS_BW_CAPABLE when NSS ratio is reported
When firmware reports NSS ratio support, SUPPORTS_VHT_EXT_NSS_BW is enabled in
ath12k. However, IEEE80211_VHT_EXT_NSS_BW_CAPABLE must also be set to make the
advertisement valid.
According to IEEE Std 802.11-2024, Subclause 9.4.2.156.3 (Supported VHT-MCS and
NSS Set subfields), the VHT Extended NSS BW Capable bit indicates whether a STA
is capable of interpreting the Extended NSS BW Support subfield of the VHT
capabilities information field. Advertising extended NSS BW support without
setting this capability bit is therefore invalid.
Without this change, mac80211 detects the inconsistency and logs:
ieee80211 phy0: copying sband (band 1) due to VHT EXT NSS BW flag
This indicates that mac80211 implicitly aligns IEEE80211_VHT_EXT_NSS_BW_CAPABLE
during ieee80211_register_hw(). Explicitly setting the bit in ath12k avoids this
fixup and ensures capabilities are advertised correctly by the driver.
This change follows the same approach as the existing ath11k fix.
https://lore.kernel.org/all/20211013073704.15888-1-wgong@codeaurora.org/
Baochen Qiang [Tue, 9 Jun 2026 02:10:47 +0000 (10:10 +0800)]
wifi: ath12k: fix EAPOL TX failure caused by stale tcl_metadata bits
On WCN7850, after the following sequence:
1. load ath12k and connect to a non-MLO AP
2. disconnect and connect to an MLO AP
3. disconnect and reconnect to the non-MLO AP
the third connection always fails with a 4-Way handshake timeout. The
supplicant transmits message 2 of 4 four times in response to AP
retries of message 1, but the AP never sees any of them.
ath12k_dp_vdev_tx_attach() composes dp_link_vif->tcl_metadata using |=,
but dp_link_vif is embedded in struct ath12k_dp_vif and its slots are
reused across vif/peer teardown and setup. Since tcl_metadata is never
cleared on detach, vdev_id bits from a previous attach remain set when
the same link slot is reused with a different vdev_id. In this specific
issue, the same link slot is used for vdev_id 0, then vdev_id 1, then
vdev_id 0 again, the OR yields tcl_metadata == 0x9, which encodes
vdev_id 1 in the HTT_TCL_META_DATA_VDEV_ID field even though
ti.vdev_id is 0. Firmware then routes the EAPOL frame to the wrong
vdev and the AP never receives message 2.
Use plain assignment instead of |= so the field is fully recomputed
from the current arvif on every attach.
Michal Piekos [Sat, 16 May 2026 05:34:16 +0000 (07:34 +0200)]
arm64: dts: allwinner: a523: add gpadc node
Describe GPADC block on Allwinner A523.
Tested on Radxa Cubie A5E:
- 2 connected channels are showing voltages in agreement with
schematics.
BOOT-SEL-ADC ~500mV
BOM-ADC ~1800mV
- 3rd channel exposed on 40pin header is showing correct voltages when
connected to known voltage source.
MIPI CSI-2 is supported on the A83T with a dedicated controller that
covers both the protocol and D-PHY. It is connected to the only CSI
receiver with a fwnode graph link. Note that the CSI receiver supports
both this MIPI CSI-2 source and a parallel source.
An empty port with a label for the MIPI CSI-2 sensor input is also
defined for convenience.
Arnd Bergmann [Tue, 9 Jun 2026 13:27:33 +0000 (15:27 +0200)]
Merge tag 'tegra-for-7.2-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers
soc/tegra: Changes for v7.2-rc1
These changes update some maintainer contact information, add a modern
way of reading the chip information and cleanup/enhance some existing
code.
* tag 'tegra-for-7.2-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
soc/tegra: Use ARM SMCCC to get chip ID, revision, and platform info
soc/tegra: fuse: Register nvmem lookups at probe
Documentation: ABI: Take over as contact for sysfs-driver-tegra-fuse
MAINTAINERS: Move Peter De Schrijver to CREDITS
bus: tegra-aconnect: Use dev_err_probe for probe error paths
====================
net: mctp: usb: minor fixes for MCTP over USB transport driver
This series adds a couple of fixes in the ndo_open / ndo_stop path for
the MCTP over USB transport, where we are incorrectly sequencing two
error cases.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
====================
Jeremy Kerr [Mon, 8 Jun 2026 01:25:41 +0000 (09:25 +0800)]
net: mctp: usb: don't fail mctp_usb_rx_queue on a deferred submission
In the ndo_open path, a deferred queue open will report a failure, and
so the netdev will not be ndo_stop()ed, leaving us with the rx_retry
work potentially pending.
Don't report a deferred queue as an error, as we are still operational.
This means we use the ndo_stop() path for future cleanup, which handles
rx_retry_work cancellation.
That urb completion can then re-schedule rx_retry_work.
Strenghen the sequencing between the stop (preventing another requeue)
and the cancel by updating both atomically under a new rx lock. After
setting ->rx_stopped, and cancelling pending work, we know that the
requeue cannot occur, so all that's left is killing any pending urb.
Minxi Hou [Thu, 4 Jun 2026 16:30:16 +0000 (00:30 +0800)]
selftests/net/openvswitch: guard command substitutions against empty output
When ip-link output is unavailable, when the upcall daemon log has not
been written yet, or when pahole does not know the OVS drop subsystem
ID, the affected command substitutions silently produce empty strings.
The caller then passes empty sha= or pid= arguments to ovs_add_flow,
or matches against wrong drop reason codes, all without a diagnostic.
Add [ -z ] guards immediately after each assignment. For test_arp_ping,
also align the MAC extraction to use awk '/link\/ether/' as in
test_pop_vlan. The drop_reason guard returns ksft_skip because an
absent subsystem ID is an environment issue, not a test failure.
91b9aed7381c ("ARM: dts: aspeed-g6: Add nodes for i3c controllers") currently
causes a new warning:
... /ahb/apb/bus@1e7a0000/syscon@0: failed to match any schema with compatible: ['aspeed,ast2600-i3c-global', 'syscon']
The patch necessary to address it has an R-b tag from Kryzsztof[2] but as best
I can tell is yet to be applied to the MFD tree. I've left the change in for now
as the warning will resolve once the binding patch is applied.
Lastly, as part of improving support for the Kommando card Anirudh has also
addressed[1] the persistent pain we've had with the phy-mode property for the
AST2600 MACs. Thanks to Andrew Lunn for being on the case for a while now, and
for those who tested Anirudh's patch.
Merge tag 'riscv-sophgo-dt-for-v7.2' of https://github.com/sophgo/linux into soc/dt
RISC-V Devicetrees for v7.2
Sophgo:
For CV18xx serials:
- Add bindings for Milk-V "Duo S" board.
For SG2042:
- The CPU unit address incorrectly used decimal numbers,
especially for those nodes which value >= 10. Now
corrected to use hexadecimal.
- The MSI controller actually only supports 16 interrupts;
corrected to match the actual situation.
- PCIe RCs are cache-coherent with the CPU. Marked it out
for RC nodes.
For SG2044:
- The same as SG2042, use hex for CPU unit address.
In additional, update Chen Wang's email address for Sopgho
SoC maintainer.
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
* tag 'riscv-sophgo-dt-for-v7.2' of https://github.com/sophgo/linux:
riscv: dts: sophgo: reduce SG2042 MSI count to 16
riscv: dts: sophgo: sg2042: use hex for CPU unit address
riscv: dts: sophgo: sg2044: use hex for CPU unit address
riscv: dts: sophgo: Add dma-coherent to SG2042 PCIe controllers
dt-bindings: soc: sophgo: add sg2000 plic and clint documentation
dt-bindings: soc: sophgo: add Milk-V Duo S board compatibles
MAINTAINERS: update Chen Wang's email address
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'aspeed-7.2-drivers-0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bmc/linux into soc/drivers
aspeed: First batch of driver changes for v7.2
While bc13f14f5cd3 ("soc: aspeed: cleanup dead default for
ASPEED_SOCINFO") was committed just now it has been in -next for a while as b333a0f1c857411d83a02aa6f1d9ecc7666d6179. The commit is fresh as I moved it
between branches.
Other than that it's just the one other patch from Krzysztof tidying up the
location of MODULE_DEVICE_TABLE().
* tag 'aspeed-7.2-drivers-0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bmc/linux:
soc: aspeed: cleanup dead default for ASPEED_SOCINFO
soc: aspeed: Move MODULE_DEVICE_TABLE next to the table itself
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'amlogic-arm64-dt-for-v7.2-v2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into soc/dt
Amlogic ARM64 DT for v7.2 v2 take 1:
- Khadas VIM4 (T7 SoC) features:
- Memory layout fixup
- GIC register range
- Model name fixup
- PWM, eMMC, SD card and SDIO support
- PWM LED
- I2C pinctrl node
- Khadas VIM1s Features
- Bluetooth
- PWM LED
- Power Key
- Function Key via SARADC
- RTC
- Remote control keymap
- Bluetooth node for Phicomm N1
Merge tag 'thead-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux into soc/dt
T-HEAD Devicetrees for 7.2
Enable wifi on two TH1520 boards: BeagleV Ahead and Lichee Pi 4a.
The BeagleV Ahead board uses an AP6203BM WiFi module connected to SDIO1.
The Lichee Pi 4A has an RTL8723DS WiFi module also connected to SDIO1.
The module reset line is driven through a PCA9557 GPIO expander on the
I2C1 bus.
* tag 'thead-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux:
riscv: dts: thead: Enable wifi on the BeagleV-Ahead
riscv: dts: thead: Enable WiFi on Lichee Pi 4A
riscv: dts: thead: Add TH1520 I2C1 controller
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'tenstorrent-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/tenstorrent/linux into soc/dt
Tenstorrent device tree for v7.2
Add a riscv,pmu node to the Tenstorrent Blackhole SoC device tree. This
enables OpenSBI to expose the SBI PMU extension, allowing Linux perf to
use the 4 programmable counters (mhpmcounter3-6) across 3 event classes:
instruction commit, microarchitectural, and memory system events.
Extend the RISC-V IOMMU device tree bindings to document the Tenstorrent
IOMMU used in the Tenstorrent Atlantis SoC. A second register range is
added which contains M-mode only registers like PMAs and PMPs. The
binding will be used by OpenSBI and potentially other M-mode software.
* tag 'tenstorrent-dt-for-v7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/tenstorrent/linux:
dt-bindings: iommu: riscv: Add bindings for Tenstorrent RISC-V IOMMU
riscv: dts: tenstorrent: Add PMU node to blackhole for Linux perf support
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'sunxi-drivers-for-7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/drivers
Allwinner driver changes for 7.2
Mostly changes to the SRAM driver to allow for one SRAM region to be
"claimed" by multiple changes. When a region is "claimed" it is removed
or disconnected from the CPU's view. This is needed on the H6 and H616,
which have one alias region seemingly shared between the video codec
engine and the display engine.
One minor fix for the RSB driver is also included.
* tag 'sunxi-drivers-for-7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
bus: sunxi-rsb: Always check register address validity
soc: sunxi: sram: Add H616 SRAM regions
soc: sunxi: sram: Support claiming multiple regions per device
soc: sunxi: sram: Allow SRAM to be claimed multiple times
soc: sunxi: sram: Const-ify sunxi_sram_func data and references
dt-bindings: sram: sunxi-sram: Add H616 SRAM regions
dt-bindings: sram: Document Allwinner H616 VE SRAM
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
gpio: mt7621: fix interrupt banks mapping on gpio chips
The GPIO controller's registers are organized as sets of eight 32-bit
registers with each set controlling a bank of up to 32 pins. A single
interrupt is shared for all of the banks handled by the controller.
The driver implements this using three gpio chip instances every one
with its own irq chip. Every single pin can generate interrupts having
a total of 96 possible interrupts here. It looks like there is a problem
with interrupts being properly mapped to the gpio bank using this solution.
This problem report is in the following lore's link [0].
Device tree is using two cells for this, so only the interrupt pin and the
interrupt type are described there. Changing to have three cells to setup
also the bank and implement 'of_node_instance_match()' would also work but
this would be an ABI breakage and also a bit incoherent since gpios itself
are also using two cells and properly mapped in desired bank using through
its pin number on 'of_xlate()'.
That said, register a linear IRQ domain of the total of 96 interrupts shared
with the three gpio chip instances so the bank and the interrupt is properly
decoded and devices using gpio IRQs properly work.
Marco Scardovi [Sun, 7 Jun 2026 23:05:02 +0000 (01:05 +0200)]
gpio: rockchip: fix generic IRQ chip leak on remove
The driver allocates domain generic chips using
irq_alloc_domain_generic_chips() during probe. However, on driver
remove/teardown, the generic chips are not automatically freed when the
IRQ domain is removed because the domain flags do not include
IRQ_DOMAIN_FLAG_DESTROY_GC.
This causes both the domain generic chips structure and the associated
generic chips to be leaked. Additionally, the generic chips remain on
the global gc_list and may later be visited by generic IRQ chip suspend,
resume, or shutdown callbacks after the GPIO bank has been removed,
potentially resulting in a use-after-free and kernel crash.
Fix the resource leak by explicitly calling
irq_domain_remove_generic_chips() before removing the IRQ domain in
rockchip_gpio_remove().
Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Assisted-by: Antigravity:gemini-3.5-flash Signed-off-by: Marco Scardovi <scardracs@disroot.org> Link: https://patch.msgid.link/20260607230504.35392-2-scardracs@disroot.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
gpio-mockup validates only that each second gpio_mockup_ranges value is
non-negative before creating the mock chips. The fixed-base form uses
the second value as the first GPIO number after the range, while the
dynamic-base form uses it as the number of GPIOs.
gpio_mockup_register_chip() stores the resulting number of GPIOs in a
u16 and passes it through a PROPERTY_ENTRY_U16("nr-gpios", ...). Values
greater than U16_MAX therefore truncate silently. For example,
gpio_mockup_ranges=-1,65537 creates a one-line mock GPIO chip instead of
rejecting the invalid request.
Reject zero-width, reversed, and over-U16 ranges before registering any
mock chip.
====================
ipv4: igmp: annotate diagnostic procfs data races
This patch series addresses several unannotated data races between lockless
RCU-protected diagnostic reads in /proc/net/igmp (igmp_mc_seq_show())
and concurrent writes in serialized paths (RTNL and group spinlocks).
Following the precedent in commit 061c0aa740d5 ("ipv4: igmp: annotate
data-races around im->users"), we annotate these intentional data races
using READ_ONCE() and WRITE_ONCE() macros.
- Patch 1 annotates races around `in_dev->mc_count` (interface-level joins).
- Patch 2 annotates races around active timer-related state tracking fields
(`tm_running`, `reporter`, `expires`) on individual multicast groups.
====================
Yuyang Huang [Fri, 5 Jun 2026 01:43:18 +0000 (10:43 +0900)]
ipv4: igmp: annotate data-races around timer-related fields
/proc/net/igmp walks the multicast list locklessly under RCU and reads
timer-related fields (im->tm_running, im->reporter, im->timer.expires)
to print the timer state of multicast memberships. Concurrently, these
fields are modified under im->lock spinlock in timer management paths
(igmp_stop_timer(), igmp_start_timer(), and igmp_timer_expire()). Fix this
intentional lockless snapshot by annotating the lockless reads with
READ_ONCE() and the updates with WRITE_ONCE().
Yuyang Huang [Fri, 5 Jun 2026 01:43:17 +0000 (10:43 +0900)]
ipv4: igmp: annotate data-races around in_dev->mc_count
/proc/net/igmp walks the multicast list for IPv4 interfaces locklessly
under RCU and prints state->in_dev->mc_count. Concurrently, device
init/destruction and multicast join/leave paths update the count
under the RTNL lock. Fix this intentional lockless snapshot by
annotating the read with READ_ONCE() and the updates with WRITE_ONCE().
Ruoyu Wang [Tue, 9 Jun 2026 07:33:13 +0000 (15:33 +0800)]
gpio: zynq: fix runtime PM leak on remove
pm_runtime_get_sync() increments the runtime PM usage counter even when it
returns an error. zynq_gpio_remove() uses it to keep the controller active
while removing the GPIO chip, but never drops the usage counter again.
Balance the get with pm_runtime_put_noidle() after disabling runtime PM.
Fixes: 3242ba117e9b ("gpio: Add driver for Zynq GPIO controller") Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com> Link: https://patch.msgid.link/20260609073313.5-1-ruoyuw560@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Anton Leontev [Thu, 4 Jun 2026 16:59:38 +0000 (19:59 +0300)]
hv_netvsc: use kmap_local_page in netvsc_copy_to_send_buf
netvsc_copy_to_send_buf() copies page buffer entries into the VMBus
send buffer using phys_to_virt() on the entry PFN. Entries for the
RNDIS header and the skb linear data come from kmalloc'd memory and
are always in the kernel direct map, but entries for skb fragments
reference page cache or user pages, which on 32-bit x86 with
CONFIG_HIGHMEM=y can live above the LOWMEM boundary. For such a page
phys_to_virt() returns an address outside the direct map and the
subsequent memcpy() faults on the transmit softirq path, which is
fatal.
Map the pages with kmap_local_page() instead, handling two properties
of the page buffer entries:
- pb[i].pfn is a Hyper-V PFN at HV_HYP_PAGE_SIZE (4K) granularity,
not a native PFN. Reconstruct the physical address first and derive
the native page from it, so the mapping stays correct where
PAGE_SIZE > HV_HYP_PAGE_SIZE (e.g. arm64 with 64K pages).
- Since commit 41a6328b2c55 ("hv_netvsc: Preserve contiguous PFN
grouping in the page buffer array"), an entry describes a full
physically contiguous fragment and pb[i].len can exceed PAGE_SIZE,
while kmap_local_page() maps a single page. Copy page by page,
splitting at native page boundaries.
The copy path only handles packets smaller than the send section size
(6144 bytes by default); larger packets take the cp_partial path where
only the RNDIS header is copied. So entries here are bounded by the
section size and a copy is split at most once on 4K-page systems. On
!CONFIG_HIGHMEM configs kmap_local_page() folds to page_address() and
no mapping work is added.
Convert the device-tree parsing path to the generic fwnode/device
property accessors so the driver can be probed on ACPI and swnode
platforms as well as OF. The helper is renamed from
i2c_mux_reg_probe_dt() to i2c_mux_reg_probe_fw() to reflect that.
The child-node branch uses is_acpi_device_node() rather than
is_acpi_node(): the latter also matches ACPI data nodes (the
_DSD hierarchical-property children used by PRP0001-style
firmware), which have no ACPI handle and would make
acpi_get_local_address() fall back to evaluating _ADR against the
root namespace and return -ENODATA. Routing data nodes through
fwnode_property_read_u32() instead lets them resolve the "reg"
property the same way OF and swnode children do.
Behavioural preservations (deliberate, to avoid regressing existing
users):
- The three-way endian fallback is kept verbatim: an explicit
"little-endian" property wins, then "big-endian", and otherwise
the host's compile-time byte order. device_is_big_endian() is
not used here because it ignores "little-endian" and introduces
"native-endian" semantics, which would diverge from the binding.
- The "if (!mux->data.reg)" guard around
devm_platform_get_and_ioremap_resource() in probe() is kept.
drivers/platform/mellanox/mlx-platform.c registers i2c-mux-reg
platform_devices with no memory resource and supplies a
pre-set .reg / .reg_size through struct
i2c_mux_reg_platform_data; without the guard those
registrations would fail in probe().
- The "if (!mux->data.reg)" ioremap block (and the paired
reg_size validation that depends on it) is hoisted above
i2c_get_adapter(mux->data.parent), so the fwnode path
preserves master's ordering of "ioremap before parent-adapter
get". For platdata users the validation runs from a slightly
earlier position, but mux->data.reg_size is already set from
platdata by then, so the order is functionally neutral.
The OF-only of_address_to_resource() translation in the old
probe_dt() is dropped because the same address is available from
the platform_device resource table on OF as well as ACPI, and the
existing fallback in probe() ioremaps it.
Acked-by: Peter Rosin <peda@lysator.liu.se> Signed-off-by: Abdurrahman Hussain <abdurrahman@nexthop.ai> Assisted-by: Claude-Code:claude-opus-4-7 Assisted-by: sashiko:gemini-3.1-pro-preview Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Andy Shevchenko [Tue, 25 Nov 2025 09:40:11 +0000 (10:40 +0100)]
i2c: acpi: Return -ENOENT when no resources found in i2c_acpi_client_count()
Some users want to return an error to the upper layers when
i2c_acpi_client_count() returns 0. Follow the common pattern
in such cases, i.e. return -ENOENT instead of 0.
While at it, fix the kernel-doc warning about missing return value
description.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Merge tag 'renesas-dts-for-v7.2-tag2' of https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/dt
Renesas DTS updates for v7.2 (take two)
- Add timer (MTU3) and xSPI FLASH support for the RZ/T2H and RZ/N2H
SoCs and their EVK boards,
- Add PCIe support for the RZ/V2N SoC and the RZ/V2N EVK board,
- Add support for the R-Car M3Le SoC and the Geist development board,
- Specify ethernet PHY reset timings on various R-Car boards,
- Add (more) serial, I2C, DMA, and sound support for the RZ/G3L SoC,
- Add PSCI, Multifunctional Interface (MFIS), and SCMI support for the
R-Car X5H SoC and Ironhide development board,
- Add serial DMA support for the RZ/G2L SoC,
- Add keyboard, I2C, Versa clock, and audio support for the RZ/G3L
SMARC SoM and EVK boards,
- Miscellaneous fixes and improvements.
Takashi Iwai [Tue, 9 Jun 2026 07:49:04 +0000 (09:49 +0200)]
ALSA: aloop: Drop superfluous break
At converting the spinlock to guard(), a break statement was put in
the scoped_guard block in loopback_jiffies_timer_function(), but it's
obviously superfluous (although it's harmless). Better to drop it for
avoiding confusion.
Qu Wenruo [Wed, 13 May 2026 04:36:21 +0000 (14:06 +0930)]
btrfs: introduce support for huge folios
With all the previous preparations, it's finally time to enable the
huge folio support.
- The max folio size
Here we define BTRFS_MAX_FOLIO_SIZE, which is fixed at 2MiB.
This will ensure we have a large enough but not too large folio for
btrfs. This limit applies to all systems regardless of page size.
Then we also define BTRFS_MAX_BLOCKS_PER_FOLIO, which depends on
CONFIG_BTRFS_EXPERIMENTAL.
If it's an experimental build, BTRFS_MAX_BLOCKS_PER_FOLIO is 512,
otherwise it's BITS_PER_LONG.
The filemap max order will be calculated using both
BTRFS_MAX_FOLIO_SIZE and BTRFS_MAX_BLOCKS_PER_FOLIO.
E.g. for 64K page size with 64K fs block size, the limit will be
BTRFS_MAX_FOLIO_SIZE (2M), which limits the filemap max order to 5.
This will be lower than the old order (6), but folios larger than 2M
are rarely any better for IO performance. Meanwhile excessively large
folios can cause other problems like stalling the IO pipeline for too
long.
For 4K page size and 4K fs block size, the limit will be increased to
2M from the old 256K.
This new size is constrained by both BTRFS_MAX_FOLIO_SIZE (2M) and
BTRFS_MAX_BLOCKS_PER_FOLIO (512 * 4K), allowing x86_64 to achieve huge
folio support, and the filemap max order will be 9.
- btrfs_bio_ctrl::submit_bitmap
This will be enlarged to contain BTRFS_MAX_BLOCKS_PER_FOLIO bits, and
this will be on-stack memory.
This will increase on-stack memory usage by 56 bytes compared to the
baseline (before the first patch in the series).
- Local @delalloc_bitmap inside writepage_delalloc()
Unfortunately we cannot afford to handle an allocation error here, thus
again we use on-stack memory.
Thus this will increase on-stack memory usage by 56 bytes again.
So unfortunately this means during the delalloc window, the writeback path
will have +112 bytes on-stack memory usage, and for other cases the
writeback path will have +56 bytes on-stack memory usage.
The +56 bytes (btrfs_bio_ctrl::submit_bitmap) can be removed
after we have reworked the compression submission, so the current
on-stack submit_bitmap is mostly a workaround until then.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 13 May 2026 04:36:20 +0000 (14:06 +0930)]
btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps
[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.
One call site that utilizes this limit is btrfs_bio_ctrl::submit_bitmap,
which makes it very simple and straightforward to just grab an unsigned
long value and assign it to submit_bitmap.
Unfortunately that limit prevents us from supporting huge folios.
For 4K page size and block size, a huge folio (order 9) means 512 blocks
inside a 2M folio.
[ENHANCEMENT]
Instead of using a fixed unsigned long value, change
btrfs_bio_ctrl::submit_bitmap to an unsigned long pointer.
And for cases where an unsigned long can hold the whole bitmap,
introduce @submit_bitmap_value, and just point that pointer to that
unsigned long.
Then update all direct users of bio_ctrl->submit_bitmap to use the
pointer version.
There are several call sites that get extra changes:
- @range_bitmap inside extent_writepage_io()
Which is only utilized to truncate the bitmap.
Since we do not want to allocate new memory just for such temporary
usage, change the original bitmap_set() and bitmap_and() into
bitmap_clear() for the ranges outside of the target range.
- Getting dirty subpage bitmap inside writepage_delalloc()
Since we're passing an unsigned long pointer now, we need to go with
different handling (bs == ps, blocks_per_folio <= BITS_PER_LONG,
blocks_per_folio > BITS_PER_LONG).
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 13 May 2026 04:36:19 +0000 (14:06 +0930)]
btrfs: prepare subpage operations to support more than BITS_PER_LONG sub-bitmaps
[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.
That limit allows us to easily grab an unsigned long without the need to
properly allocate memory for a larger bitmap.
Unfortunately that limit prevents us from supporting huge folios.
For 4K page size and block size, a huge folio (order 9) means 512 blocks
inside a 2M folio.
[ENHANCEMENT]
To allow direct bitmap operations without allocating new memory,
introduce two different ways to access the subpage bitmaps:
- Return an unsigned long value
This only happens if blocks_per_folio <= BITS_PER_LONG.
We read out the sub-bitmap into an unsigned long, and return the
value.
This is the old existing method.
This involves get_bitmap_value_##name() helper functions.
And this time the helper functions are defined as inline functions
instead of macros to provide better type checks.
- Return a pointer where the sub-bitmap starts
This only happens if blocks_per_folio >= BITS_PER_LONG.
This is the new method for sub-bitmaps larger than BITS_PER_LONG.
Since the sizes of sub-bitmaps are all aligned to BITS_PER_LONG, we
can directly access the start word of the sub-bitmap.
This involves get_bitmap_pointer_##name() helper functions.
Then change the existing sub-bitmaps users to use the new helpers:
- Bitmap dumping
Switch between get_bitmap_value_##name() and
get_bitmap_pointer_##name() depending on the sub-bitmap size.
- btrfs_get_subpage_dirty_bitmap()
Rename it to btrfs_get_subpage_dirty_bitmap_value() to follow the new
value/pointer naming.
Since we do not support huge folios yet, there is no pointer version
for the dirty bitmap.
Furthermore, add the support for block size == page size cases for
btrfs_get_subpage_dirty_bitmap_value(), so that the caller no longer
needs to check if the folio needs subpage handling.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 7 May 2026 17:59:32 +0000 (19:59 +0200)]
btrfs: simplify how first hit is passed to __btrfs_abort_transaction()
Optimize the btrfs_abort_transaction() for size as it (by our
convention) must be put right after the error condition is detected.
The exact file:line is reported so there's a portion that must be
inlined. As this is cold code it bloats functions. In previous patch
"btrfs: move transaction abort message to __btrfs_abort_transaction()"
the error message was moved to the common helper, saving like 20KiB of
btrfs.ko and several instructions per call site and some stack space.
There's little left to be optimized, we need to keep the atomic
test_and_set_bit() and to convey that as 'first hit' to
__btrfs_abort_transaction().
Right now it's a bool, which takes 8 bytes on stack for each call but
it's 1 bit of information. We can encode that to some of the other
parameters.
For that let's use the 'error' parameter, by convention it's negative
errno so we can reliably detect if it's the first hit or a later error.
Also the negation is usually implemented by a single instruction (NEG on
x86_64) so the resulting object code is kept short.
This reduces btrfs.ko by 8K and stack in several functions by 8 bytes.
Cumulative effect with the other commit is -30K of btrfs.ko. While the
encoding is an implementation detail, it's contained within the API.
Making the transaction abort calls very light is desired.
David Sterba [Thu, 7 May 2026 17:59:31 +0000 (19:59 +0200)]
btrfs: validate negative error number passed to btrfs_abort_transaction()
In preparation to encode more information to the error value add a step
that verifies if the value is valid (i.e. < 0). This works for
compile-time and runtime (in debugging mode).
The compile-time check recognizes direct constants and defines an array
type. An invalid condition leads to negative array size which is caught
by compiler.
The runtime check constructs the array type from the condition and only
verifies the correct size, as we don't need to tweak the size to be
negative.
The sizeof() expressions do not generate any code. In the debugging
config the warning adds about 9KiB of btrfs.ko code size.
The array size trick is needed as we can't use static_array(), not even
with __builtin_constant_p().
Sample error message:
In file included from inode.c:40:
inode.c: In function ‘__cow_file_range_inline’:
transaction.h:261:26: error: size of unnamed array is negative
261 | (void)sizeof(char[-!(__builtin_constant_p(error) ? (error) < 0 : 1)]); \
| ^
transaction.h:275:9: note: in expansion of macro ‘VERIFY_NEGATIVE_ERROR’
275 | VERIFY_NEGATIVE_ERROR(error); \
| ^~~~~~~~~~~~~~~~~~~~~
inode.c:665:17: note: in expansion of macro ‘btrfs_abort_transaction’
665 | btrfs_abort_transaction(trans, 17);
| ^~~~~~~~~~~~~~~~~~~~~~~
Filipe Manana [Thu, 21 May 2026 14:19:37 +0000 (15:19 +0100)]
btrfs: fix invalid pointer dereference in __btrfs_run_delayed_refs()
In the beginning of the loop, we try to obtain a locked delayed ref head,
if 'locked_ref' is currently NULL, by calling btrfs_select_ref_head(),
which can return an error pointer. If the error pointer is -EAGAIN we do
a continue and go back to the beginning of the loop, which will not try
again to call btrfs_select_ref_head() since 'locked_ref' is no longer
NULL but it's ERR_PTR(-EAGAIN), and then we do:
spin_lock(&locked_ref->lock);
against a ERR_PTR(-EAGAIN) value, generating an invalid pointer
dereference.
Fix this by ensuring that 'locked_ref' is set to NULL when
btrfs_select_ref_head() returns ERR_PTR(-EAGAIN) and incrementing 'count'
as well, to prevent infinite looping. We do this by doing a goto to the
bottom of the loop that already sets 'locked_ref' to NULL and does a
cond_resched(), with an increment to 'count' right before the goto.
These measures were in place before the refactoring in commit 0110a4c43451
("btrfs: refactor __btrfs_run_delayed_refs loop") but were unintentionally
lost afterwards.
Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/linux-btrfs/ag8ARRwykv8bpJ87@stanley.mountain/ Fixes: 0110a4c43451 ("btrfs: refactor __btrfs_run_delayed_refs loop") Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
KangNing Liao [Thu, 21 May 2026 12:29:45 +0000 (20:29 +0800)]
btrfs: protect sb_write_pointer() with invalidate lock
sb_write_pointer() reads the super block from the block device page cache
using read_cache_page_gfp(). This has the same race with BLKBSZSET as the
one fixed by commit 3f29d661e568 ("btrfs: sync read disk super and set
block size").
Take the mapping invalidate lock around read_cache_page_gfp() to
serialize the read against block size changes.
Filipe Manana [Thu, 14 May 2026 16:35:40 +0000 (17:35 +0100)]
btrfs: tracepoints: show inode type in btrfs_sync_file_enter() event
Print the type of the inode (directory, regular file, symlink, etc) to
facilitate debugging.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 14 May 2026 15:11:43 +0000 (16:11 +0100)]
btrfs: tracepoints: add trace event for btrfs_sync_log()
btrfs_sync_log() is one of the main functions called during a fsync.
Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 15:38:25 +0000 (16:38 +0100)]
btrfs: tracepoints: add trace event for btrfs_log_new_name()
btrfs_log_new_name() is an important function that affects inode logging
and is called during link and rename operations. Add trace events for when
entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 15:13:18 +0000 (16:13 +0100)]
btrfs: tracepoints: add trace event for btrfs_record_new_subvolume()
btrfs_record_new_subvolume() is an important operation that affects
inode logging and is called during subvolume creation. Add a trace event
for it to help debug issues.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 15:05:13 +0000 (16:05 +0100)]
btrfs: tracepoints: add trace event for btrfs_record_snapshot_destroy()
btrfs_record_snapshot_destroy() is an important operation that affects
inode logging and is called during subvolume/snapshot deletion as well as
during rmdir. Add a trace event for it to help debug issues.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 11 May 2026 14:51:13 +0000 (15:51 +0100)]
btrfs: tracepoints: add trace event for btrfs_record_unlink_dir()
btrfs_record_unlink_dir() is an important operation that affects inode
logging and is called during unlink and rename operations. Add a trace
event for it to help debug issues.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 8 May 2026 16:09:48 +0000 (17:09 +0100)]
btrfs: tracepoints: add trace event for log_new_delayed_dentries()
log_new_delayed_dentries() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 7 May 2026 12:05:16 +0000 (13:05 +0100)]
btrfs: use simple assertions where enough during inode logging and replay
In overwrite_item():
There's no point in printing the root's ID if the assertion fails, since
it can only be BTRFS_TREE_LOG_OBJECTID if it fails.
In log_new_delayed_dentries():
There's no point in using a verbose assertion to print the value of
ctx->logging_new_delayed_dentries because it's a boolean, so if the
assertion fails we know its value is true (1).
So convert them to simpler assertion to make the code less verbose.
It also slightly reduces the object size, at least on x86_64 using
Debian's gcc 14.2.0-19 (if CONFIG_BTRFS_ASSERT is enabled in the kernel
config, which is the case for SUSE distributions for example).
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 2028244 197176 15624 2241044 223214 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 2028228 197176 15624 2241028 223204 fs/btrfs/btrfs.ko
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 7 May 2026 12:03:13 +0000 (13:03 +0100)]
btrfs: tracepoints: add trace event for log_conflicting_inodes()
log_conflicting_inodes() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 7 May 2026 10:17:55 +0000 (11:17 +0100)]
btrfs: tracepoints: add trace event for add_conflicting_inode()
add_conflicting_inode() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Merge tag 'mtk-dts64-for-v7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/dt
MediaTek ARM64 DeviceTree updates
This adds improvements for already supported SoCs and devices.
In particular:
- Adds support for the MT7981 SoC's Crypto Accelerator
- Enables gpio-keys and leds found on the MT7981b based
Xiaomi AX3000T router
- Adds new variants of MT7988 BananaPi BPi-R4 Pro
- ...and some spare cleanups for all BPi-R4 Pro boards
- Adds a MediaTek MT6365 devicetree and uses it in all of
the relevant supported boards in place of MT6359, where
needed (the MT6365 PMIC is a fully compatible variant
of the MT6359 PMIC, but still not named MT6359).
- Adds correct power supplies for CPUs and devices on a
variety of MediaTek Chromebooks and Genio AIoT boards,
including:
- MT8186 Corsola Chromebooks
- MT8192 Asurada Chromebooks
- MT8195 Cherry Chromebooks
- MT8390 Genio based boards
- MT8395 Genio based boards
- Adds HDMI TX support for Ezurio Tungsten 510/700 boards.
* tag 'mtk-dts64-for-v7.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: (37 commits)
arm64: dts: mediatek: add LED and key support on Xiaomi AX3000T
arm64: dts: mediatek: mt8195-cherry: Sort top level nodes correctly
arm64: dts: mediatek: mt8195-cherry: Fix names for EC controlled regulators
arm64: dts: mediatek: mt8192-asurada: Add (BT|WIFI)_KILL_1V8_L GPIO line names
arm64: dts: mediatek: mt8192-asurada: Fix SPI-NOR flash compatible
arm64: dts: mediatek: mt8390-tungsten-smarc: add HDMI support
arm64: dts: mediatek: mt8188-geralt: Add little core CPU power supplies
arm64: dts: mediatek: mt8188-geralt: Add MT6359 PMIC supplies
arm64: dts: mediatek: mt8195-cherry: Add vusb33 supplies for XHCI controllers
arm64: dts: mediatek: mt8195-cherry: Add supply for SPI NOR flash
arm64: dts: mediatek: mt8195-cherry: Fix VBUS regulator description
arm64: dts: mediatek: mt8195-cherry: Add supplies for ChromeOS EC regulators
arm64: dts: mediatek: mt8195-cherry: Add MT6315 PMIC supplies
arm64: dts: mediatek: mt8195-cherry: Add MT6359 PMIC supplies
arm64: dts: mediatek: mt8192-asurada: Fix WiFi regulator description
arm64: dts: mediatek: mt8192-asurada: Add SPI NOR flash power supply
arm64: dts: mediatek: mt8192-asurada: Add CPU power supplies
arm64: dts: mediatek: mt8192-asurada: Add supplies for ChromeOS EC regulators
arm64: dts: mediatek: mt8192-asurada: Add MT6315 PMIC supplies
arm64: dts: mediatek: mt8192-asurada: Add MT6359 PMIC supplies
...
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
These are some of the issues that LLM reported to netconsole, and they
are being addressed here before big refactors.
I was doing some big refactors, and got some "pre-existent-issues"
during LLM review of the refactor, that make them hard to guarantee that
refactor is not introducing any bug, so, let's clean these pre-existent
bugs first, and then submit the refactor.
The issues fixed in this patchset were reported during the review of
https://lore.kernel.org/all/20260524-netconsole_move_more-v1-0-909d1ab398b4@debian.org/
Not all of them got fixed, but, those that were easy to reason about.
Why net-next and not 'net' tree.
Most of the functions that are being fixed here moved from netpoll to
netconsole, thus, fixing this on net will cause merge conflicts from
'net' to 'net-next', thus I decided to fix it on 'net-next', given we
are on 7.1-rc6 already. Sorry if that is not the right approach.
====================
Breno Leitao [Thu, 4 Jun 2026 16:10:14 +0000 (09:10 -0700)]
netconsole: close netdevice unregister window during target resume
process_resume_target() removes the target from target_list before
calling resume_target() so that netpoll_setup() can run with interrupts
enabled, then re-adds it once setup completes. netpoll_setup() acquires a
net_device reference (netdev_hold()) and releases the RTNL before
returning.
While the target is off target_list and the RTNL is not held,
netconsole_netdev_event() cannot find it. If the egress device is
unregistered in that window, the NETDEV_UNREGISTER notifier walks
target_list, misses the resuming target, and never tears it down. The
target is then re-added in STATE_ENABLED still holding a reference to the
now-unregistered device, leaking it and hanging unregister_netdevice() in
netdev_wait_allrefs().
Re-check under RTNL before re-publishing the target: if the device left
NETREG_REGISTERED while we were off the list, run do_netpoll_cleanup() and
mark the target disabled. Taking the RTNL across the check and the
list_add() serialises against the NETDEV_UNREGISTER notifier, which also
runs under RTNL, so the device is either still registered (and the
notifier will find the re-added target later) or already unregistering
(and we drop the reference here). netdev_wait_allrefs() runs from
netdev_run_todo() outside the RTNL, so dropping the reference here cannot
deadlock against the pending unregister.
Breno Leitao [Thu, 4 Jun 2026 16:10:13 +0000 (09:10 -0700)]
netconsole: clean up deactivated targets dropped before the cleanup worker
drop_netconsole_target() downgrades a STATE_DEACTIVATED target to
STATE_DISABLED and then only calls netpoll_cleanup() when the target is
STATE_ENABLED. A target becomes STATE_DEACTIVATED when its underlying
interface is unregistered: netconsole_netdev_event() moves it to
target_cleanup_list, and netconsole_process_cleanups_core() is expected
to run do_netpoll_cleanup() on it.
Now that drop_netconsole_target() takes target_cleanup_list_lock around
the unlink, a configfs removal racing with NETDEV_UNREGISTER can pull the
target off target_cleanup_list before the cleanup worker processes it.
The notifier drops the lock before calling
netconsole_process_cleanups_core(), so the worker then iterates a list
that no longer contains the target and never runs do_netpoll_cleanup() on
it. Because drop_netconsole_target() has already rewritten the state to
STATE_DISABLED, its own STATE_ENABLED check is false and netpoll_cleanup()
is skipped too. The net_device reference taken by netpoll_setup() is then
leaked and unregister_netdevice() hangs forever in netdev_wait_allrefs().
Capture whether the target still owns a netpoll before the state is
downgraded and clean it up for both STATE_ENABLED and STATE_DEACTIVATED
targets. netpoll_cleanup() is idempotent -- it skips when np->dev is
already NULL -- so it is safe even when the cleanup worker won the race
and already tore the netpoll down.
Breno Leitao [Thu, 4 Jun 2026 16:10:12 +0000 (09:10 -0700)]
netconsole: take target_cleanup_list_lock in drop_netconsole_target()
drop_netconsole_target() unlinks the target while only holding
target_list_lock. However, when the underlying interface has been
unregistered, netconsole_netdev_event() moves the target from
target_list to target_cleanup_list, and netconsole_process_cleanups_core()
walks that list under target_cleanup_list_lock only.
If a user removes the configfs target at the same time the cleanup
worker is iterating target_cleanup_list, list_del() can corrupt the list
because the two paths take disjoint locks while operating on the same
list node.
Acquire target_cleanup_list_lock around the list_del() so the unlink is
serialised against netconsole_process_cleanups_core() regardless of
which list the target currently belongs to. The state transition that
downgrades STATE_DEACTIVATED to STATE_DISABLED is left intact and is
performed under the same combined locking, preserving the existing
ordering with resume_target().
Breno Leitao [Thu, 4 Jun 2026 16:10:11 +0000 (09:10 -0700)]
netconsole: do not dequeue pooled skbs that cannot satisfy len
find_skb() falls back to np->skb_pool when the GFP_ATOMIC alloc_skb()
fails. The pool is refilled by refill_skbs(), which always allocates
buffers of MAX_SKB_SIZE (ethhdr + iphdr + udphdr + MAX_UDP_CHUNK ==
1502 bytes).
netconsole, however, computes the requested length dynamically as
total_len + np->dev->needed_tailroom
If the egress device declares a non-zero needed_tailroom (e.g. some
tunnel or hardware accelerator devices), the required length can exceed
MAX_SKB_SIZE. The pooled skb is then handed back to the caller, which
immediately performs skb_put(skb, len), trips the tail > end check, and
triggers skb_over_panic().
Leave the normal alloc_skb(len, GFP_ATOMIC) path untouched -- the slab
allocator can still satisfy oversized requests when memory is available,
so senders to devices with non-zero needed_tailroom keep working in the
common case. Only the pool fallback is gated: when alloc_skb() failed
and len exceeds the pool buffer size, skip the skb_dequeue() instead of
burning a pre-allocated skb on a request that would later trip
skb_over_panic(). Reserving pool entries for requests they can actually
satisfy also keeps the panic path, which depends on the pool being
primed, intact.
When that drop happens, emit a rate-limited net_warn() so the user
notices that netconsole is unable to push messages on the egress device.
The warn is skipped under in_nmi() for the same reason schedule_work()
is: printk machinery taken by net_warn_ratelimited() is not NMI-safe and
would risk recursing into the same nbcon console we are servicing.
MAX_SKB_SIZE / MAX_UDP_CHUNK were private to net/core/netpoll.c. Move
them to include/linux/netpoll.h so netconsole can reference the same
definition that refill_skbs() uses, keeping the two in sync by
construction. The header now pulls in <linux/ip.h> and <linux/udp.h>
explicitly so MAX_SKB_SIZE remains self-contained for any future user.
Breno Leitao [Thu, 4 Jun 2026 16:10:10 +0000 (09:10 -0700)]
netconsole: do not schedule skb pool refill from NMI
When alloc_skb() fails in find_skb(), the fallback path dequeues an skb
from np->skb_pool and unconditionally calls schedule_work() to top the
pool back up. schedule_work() ends up taking the workqueue pool locks,
which are not NMI-safe.
netconsole_write() is registered as the nbcon write_atomic callback and
is explicitly marked CON_NBCON_ATOMIC_UNSAFE, meaning it is invoked from
emergency/panic contexts including NMIs. If the NMI interrupts a thread
already holding the workqueue pool lock, calling schedule_work()
self-deadlocks and the panic message that was being printed is lost.
Introduce netcons_skb_pop() to fold the pool dequeue and the refill
request into a single helper. The helper skips schedule_work() when
called from NMI context; the pool is best-effort, so the refill is simply
deferred to the next non-NMI find_skb() call that exhausts alloc_skb()
and hits the fallback again. This keeps the fast path untouched and the
locking rules around the fallback pool documented in one place.
Note this only removes the schedule_work() hazard from the NMI path. The
allocation itself is still not fully NMI-safe: the alloc_skb(GFP_ATOMIC)
attempted first may take slab locks, and the skb_dequeue() fallback takes
np->skb_pool.lock, so either can deadlock if the NMI interrupts a holder
of those locks. Closing those windows requires an NMI-safe (lockless) skb
pool and is left to a follow-up; this patch addresses the schedule_work()
deadlock, which is both the most likely and the easiest to trigger.