Eduard Bostina [Mon, 23 Mar 2026 17:59:44 +0000 (19:59 +0200)]
dt-bindings: watchdog: Convert TS-4800 to DT schema
Convert the Technologic Systems TS-4800 watchdog timer bindings
to DT schema.
Signed-off-by: Eduard Bostina <egbostina@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Frank Li <Frank.Li@nxp.com>
Mark Brown [Mon, 1 Jun 2026 14:52:59 +0000 (15:52 +0100)]
ASoC: codecs: pcm3168a: Fix and updates for power management
Cezary Rojewski <cezary.rojewski@intel.com> says:
Set of changes composed of one fix and two improvements.
The fix leads the series and addresses "unbalanced disables" coming from
the regulator core during S4 (hibernation) scenario.
The SLEEP_PM_OPS are unset for the driver. Hibernation (S4) causes no
resume (skipped thanks to smart_suspend=true) yet still performs the
suspend sequence unconditionally, see device_complete() in
drivers/base/power/main.c. In essence, we end up with double suspend
(double disable) and thus the warning. Exemplary stack:
In regard to the improvements, both aim to drop redundant operations.
One targets pm_runtime_idle() - no need to fire it manually,
device-driver core will do that for us - while the second replaces
preprocessor directive with pm_runtime_status_suspended() check.
No !CONFIG_PM dependency equals better code coverege with default
kconfigs.
Cezary Rojewski [Mon, 25 May 2026 20:17:59 +0000 (22:17 +0200)]
ASoC: codecs: pcm3168a: Prevent regulator double-disable in S4
The SLEEP_PM_OPS are unset for the driver. Hibernation (S4) causes no
resume (skipped thanks to smart_suspend=true) yet still performs the
suspend sequence unconditionally, see device_complete() in
drivers/base/power/main.c.
If S4 runs for already suspended pcm3168a device, we end up with
"unbalanced disables" warning from the regulators. Assigning the
operations fixes the problem.
Yash Suthar [Mon, 1 Jun 2026 14:35:21 +0000 (23:35 +0900)]
tracing: Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions
Replace BUG_ON(!mutex_is_locked(&event_mutex)) with
lockdep_assert_held(&event_mutex) in uprobe_buffer_enable() and
uprobe_buffer_disable().
BUG_ON() will crash the kernel. mutex_is_locked() only checks
if any task holds lock,but not the caller task. lockdep_assert_held()
also check current task for lock and no crash on true condition.
Mark Brown [Mon, 1 Jun 2026 14:30:29 +0000 (15:30 +0100)]
ASoC: rsnd: Add RZ/G3E audio driver support
John Madieu <john.madieu.xa@bp.renesas.com> says:
Add audio support for the Renesas RZ/G3E SoC to the R-Car Sound
driver. The RZ/G3E audio subsystem is based on R-Car Sound IP but
has several differences requiring dedicated handling:
- SSI operates exclusively in BUSIF mode (no PIO)
- 2-4 BUSIF channels per SSI (layout differs from R-Car)
- Separate register regions for SCU, ADG, SSIU, SSI accessed by name
- Per-SSI ADG and SSIF supply clocks
- Dedicated audmapp clock/reset for Audio DMAC peri-peri
- Per-SSI and per-module reset controllers via CPG
- Unprefixed DT sub-node names (ssi, ssiu, src, ...) instead of
rcar_sound,xxx
- Hyphenated indexed clock/reset names (ssi-0, src-0, adg-ssi-0,
audio-clka, ...) instead of the legacy dotted form
John Madieu [Mon, 25 May 2026 11:02:30 +0000 (11:02 +0000)]
ASoC: rsnd: Add system suspend/resume support
Add system suspend/resume support for the ASoC rsnd driver, required
for RZ/G3E platforms. Distribute the per-module suspend/resume work
across the relevant files (adg.c, ssi.c, ssiu.c, src.c, ctu.c, mix.c,
dvc.c, dma.c) rather than centralising it in core.c.
John Madieu [Mon, 25 May 2026 11:02:29 +0000 (11:02 +0000)]
ASoC: rsnd: Support unprefixed DT node names for RZ/G3E
The RZ/G3E device tree binding uses standard unprefixed node names
("ssi", "ssiu", "src", "dvc", "mix", "ctu") instead of the legacy
"rcar_sound," prefixed names used by R-Car bindings.
Convert rsnd_parse_of_node() from a macro into a function that tries
the legacy prefixed name first, then falls back to the unprefixed name
by stripping the "rcar_sound," prefix. This makes the driver work
transparently with both old and new bindings.
While at it, update the related comments in dma.c, ssi.c and ssiu.c
that reference the hardcoded "rcar_sound,ssiu" / "rcar_sound,ssi"
names to note that the driver now accepts both the prefixed and the
unprefixed forms.
John Madieu [Mon, 25 May 2026 11:02:28 +0000 (11:02 +0000)]
ASoC: rsnd: src: Add SRC reset support for RZ/G3E
The RZ/G3E SoC exposes a shared SCU reset controller used by all SRC
modules. Acquire it once and pass it through per-instance's
rsnd_mod_init() so it is wired into the rsnd_mod->rstc plumbing.
devm_reset_control_get_optional_shared() returns NULL when no reset
is described in DT, leaving existing R-Car generations unaffected.
Without every one of them enabled, no SCU register is reachable.
Hold these in a new struct rsnd_src_ctrl and acquire them with
devm_clk_get_optional_enabled(). scu_supply is intentionally left
untouched by the system suspend/resume path added later in the
series, so SCU registers stay reachable across PM transitions.
John Madieu [Mon, 25 May 2026 11:02:26 +0000 (11:02 +0000)]
ASoC: rsnd: adg: Look up RZ/G3E clkin under audio-clk{a,b,c,i}
The R-Car Sound ADG block has up to four external master-clock inputs
named CLKA, CLKB, CLKC and CLKI by the silicon. On Gen2 R-Car these
come from DT under the legacy names "clk_a", "clk_b", "clk_c", "clk_i"
defined by renesas,rsnd.yaml. Gen4 collapses them to a single "clkin".
The new standalone RZ/G3E sound binding (renesas,r9a09g047-sound.yaml)
uses the standard DT naming convention with a vendor-meaningful prefix
that matches the SoC datasheet pin labels: "audio-clka", "audio-clkb",
"audio-clkc", "audio-clki".
Add a third clkin name table for RZ/G3E and dispatch to it from
rsnd_adg_get_clkin() in the same style as the existing Gen4 branch.
The CLKA/B/C/I enum values, the clkin[] array, and the BRGA/BRGB
derivation are unchanged - only the DT lookup names differ.
John Madieu [Mon, 25 May 2026 11:02:24 +0000 (11:02 +0000)]
ASoC: rsnd: Add ADG reset support for RZ/G3E
RZ/G3E requires the ADG reset line to be deasserted for the audio
subsystem to operate. The ADG module clock is already managed via
rsnd_adg_clk_enable/disable() through adg->adg, so no additional
clock handling is needed.
Add support for the optional "adg" reset control on Renesas RZ/G3E SoC.
John Madieu [Mon, 25 May 2026 11:02:23 +0000 (11:02 +0000)]
ASoC: rsnd: Add SSI reset support for RZ/G3E platform
Acquire the per-SSI reset controller and pass it through
rsnd_mod_init() so it is wired into the rsnd_mod->rstc plumbing.
The RZ/G3E SoC exposes one reset line per SSI instance. Use the
indexed-name rsnd_devm_reset_control_get_optional_indexed() helper
so the same code accepts both the hyphenated RZ/G3E names
("ssi-0", "ssi-1", ...) and the legacy dotted names used by R-Car
("ssi.0", ...).
The helper returns NULL when no reset is described in DT, leaving
existing R-Car generations unaffected.
The RZ/G3E also has only two pairs of BUSIF error-status registers
instead of four, and the SSI always operates in BUSIF mode: the
SSI_MODE0 BUSIF/PIO select bit is not implemented and must not be written.
While at it, add RSND_SSIU_BUSIF_STATUS_COUNT_2 as a capability flag in
the match data, consumed via struct rsnd_ssiu_ctrl, to parametrise the two
BUSIF error-status loops.
John Madieu [Mon, 25 May 2026 11:02:21 +0000 (11:02 +0000)]
ASoC: rsnd: ssiu: Add shared SSI reset controller support
The RZ/G3E SoC exposes a single shared "ssi-all" reset that gates all
SSI/SSIU modules. Acquire it at SSIU probe and pass it through
rsnd_mod_init() so it is wired into the rsnd_mod->rstc plumbing.
devm_reset_control_get_optional_shared() returns NULL when no reset is
described in DT, leaving existing R-Car generations unaffected.
John Madieu [Mon, 25 May 2026 11:02:20 +0000 (11:02 +0000)]
ASoC: rsnd: Add RZ/G3E DMA address calculation support
RZ/G3E has different DMA register base addresses and offset
calculations compared to R-Car platforms.
Add dedicated rsnd_rzg3e_dma_addr() function with dispatch from
rsnd_dma_addr(), following the existing per-generation pattern.
The function reuses rsnd_dma_addr_lookup() and rsnd_dma_addr_map.
John Madieu [Mon, 25 May 2026 11:02:19 +0000 (11:02 +0000)]
ASoC: rsnd: Refactor DMA address tables with named structs
Replace the raw multi-dimensional array used for DMA address lookup in
rsnd_gen2_dma_addr() with properly named structs: rsnd_dma_addr (in/out
pair), rsnd_dma_addr_dir (capture/playback arrays), and
rsnd_dma_addr_map (src/ssi/ssiu module sets).
While at it, extract the common lookup logic (is_ssi / use_src / use_cmd
evaluation and table indexing) into a shared rsnd_dma_addr_lookup()
function.
No functional change. This is a preparatory refactor for upcoming RZ/G3E
support which will add its own DMA address map using the same struct and
lookup function.
John Madieu [Mon, 25 May 2026 11:02:17 +0000 (11:02 +0000)]
ASoC: rsnd: Add RZ/G3E SoC probing and register map
RZ/G3E audio subsystem has a different register layout compared to
R-Car Gen2/Gen3/Gen4, as described below:
- Different base address organization (SCU, ADG, SSIU, SSI as
separate regions accessed by name)
- Additional registers: AUDIO_CLK_SEL3, SSI_MODE3, SSI_CONTROL2
- Different register offsets within each region
Add RZ/G3E SoC's audio subsystem register layouts and probe support.
John Madieu [Mon, 25 May 2026 11:02:16 +0000 (11:02 +0000)]
ASoC: rsnd: Support hyphen or dot in indexed clock and reset names
The rsnd driver historically looks up per-instance clocks and resets
using dot-separated names matching the ones declared in R-Car device
tree bindings ("ssi.0", "src.0", "adg.ssi.0", ...). The dot separator
is unusual for device tree clock-names / reset-names and newer
Renesas SoC bindings (RZ/G3E and later) use the more standard hyphen
form ("ssi-0", "src-0", ...).
Rather than force every existing R-Car user to rename their DT entries,
add a small set of helpers that try the hyphen form first and fall
back to the dot form. While at it, convert the existing indexed
devm_clk_get() call sites in the SSI, SRC, CTU, DVC and MIX probes to use
the new helpers and drop the now unused per-module name buffers and
NAME_SIZE defines.
John Madieu [Mon, 25 May 2026 11:02:15 +0000 (11:02 +0000)]
ASoC: rsnd: Add reset controller support to rsnd_mod
The RZ/G3E SoC requires per-module reset control for the audio subsystem.
Add reset controller support to struct rsnd_mod and update rsnd_mod_init()
to accept and handle a reset_control parameter and mirror it in
rsnd_mod_quit().
John Madieu [Mon, 25 May 2026 11:02:14 +0000 (11:02 +0000)]
ASoC: rsnd: Fix RSND_SOC_MASK width to single nibble
RSND_SOC_MASK was defined as (0xFF << 4), spanning bits 4-11. This is
wider than needed since only nibble B (bits 7:4) is used for SoC
identifiers. Narrow it to (0xF << 4) to match the intended single-nibble
allocation and prevent overlap with bits 8-11 which will be used by
upcoming RZ series flags.
No functional change, since the only current user (RSND_SOC_E) fits
within a single nibble.
Fixes: ba164a49f8f7 ("ASoC: rsnd: src: Avoid a potential deadlock") Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/20260525110230.4014435-3-john.madieu.xa@bp.renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
John Madieu [Mon, 25 May 2026 11:02:13 +0000 (11:02 +0000)]
ASoC: dt-bindings: sound: Add DT binding for RZ/G3E sound
Add a standalone device tree binding for the Renesas RZ/G3E (R9A09G047)
sound controller.
The RZ/G3E sound IP is based on R-Car Sound but differs in several ways:
- Uses unprefixed sub-node names (ssi, ssiu, src, dvc, mix, ctu) instead
of R-Car's rcar_sound,xxx prefixed names.
- Supports up to 5 DMA controllers per direction, allowing multiple DMA
entries with repeated channel names in SSIU, SRC and DVC sub-nodes.
- Has 47 clocks including per-SSI ADG clocks (adg-ssi-[0-9]), SCU clocks
(scu, scu_x2, scu_supply), SSIF supply clock, AUDMAC peri-peri clock,
and ADG clock.
- Has 14 reset lines including SCU, ADG and AUDMAC peri-peri resets.
- SSI operates exclusively in BUSIF mode.
These differences make the RZ/G3E binding incompatible with the existing
renesas,rsnd.yaml, so it is added as a separate standalone binding with
its own $ref to dai-common.yaml.
Mark Brown [Mon, 1 Jun 2026 14:13:58 +0000 (15:13 +0100)]
ASoC: nau8822: add support for supply regulators
Alexey Charkov <alchark@flipper.net> says:
The Nuvoton NAU8822 codec has four power supply pins: VDDA, VDDB, VDDC
and VDDSPK, which must be online and stable before the device can be
accessed over I2C. On boards where these rails are software-controlled,
probing the codec before the regulators are up results in -ENXIO errors
during register access.
This short series adds optional regulator support to both the device
tree binding and the driver, so platforms that need explicit power
sequencing can describe and enforce it:
Alexey Charkov [Mon, 25 May 2026 09:20:46 +0000 (13:20 +0400)]
ASoC: codecs: nau8822: add support for supply regulators
NAU8822 has four power supply pins: VDDA, VDDB, VDDC, and VDDSPK, which
need to be online and stable before communication with the device is
attempted.
Request and enable these regulators at init time, if provided. Also wait
for 100 us after powering up the supply regulators before attempting to
access the device registers, as recommended by the datasheet.
This helps avoid -ENXIO errors when the codec is probed before the
regulators are ready.
Aaron Kling [Mon, 25 May 2026 06:47:44 +0000 (01:47 -0500)]
spi: tegra210-quad: Allocate DMA memory for DMA engine
When the SPI controllers are running in DMA mode, it is the DMA engine
that performs the memory accesses rather than the SPI controller. Pass
the DMA engine's struct device pointer to the DMA API to make sure the
correct DMA operations are used.
Carlos Song [Mon, 25 May 2026 06:29:28 +0000 (14:29 +0800)]
spi: imx: replace dmaengine_terminate_all() with dmaengine_terminate_sync()
dmaengine_terminate_all() has been deprecated, so replace it with
dmaengine_terminate_sync().
Fixes: ba9b28652c75 ("spi: imx: enable DMA mode for target operation") Fixes: a450c8b77f92 ("spi: imx: handle DMA submission errors with dma_submit_error()") Signed-off-by: Carlos Song <carlos.song@nxp.com> Link: https://patch.msgid.link/20260525062928.3191821-1-carlos.song@oss.nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
Mark Brown [Mon, 1 Jun 2026 14:08:08 +0000 (15:08 +0100)]
spi: fsl-lpspi: fix DMA termination issues
Carlos Song (OSS) <carlos.song@oss.nxp.com> says:
This series fixes two issues in the fsl-lpspi DMA transfer error paths.
Patch 1 replaces the deprecated dmaengine_terminate_all() with
dmaengine_terminate_sync() across all error paths in
fsl_lpspi_dma_transfer().
Patch 2 fixes a missing RX DMA channel termination when TX descriptor
preparation fails. Since the RX channel is already submitted and issued
before the TX descriptor is prepared, returning -EINVAL without
terminating the RX channel leaves it running against buffers that the
SPI core will unmap, potentially causing memory corruption.
Carlos Song [Mon, 25 May 2026 06:23:57 +0000 (14:23 +0800)]
spi: fsl-lpspi: terminate the RX channel on TX prepare failure path
When dmaengine_prep_slave_sg() fails for the TX channel, the error path
terminates the TX DMA channel but leaves the RX channel running. Since
the RX channel was already submitted and issued prior to preparing
the TX descriptor, returning -EINVAL causes the SPI core to unmap the
DMA buffers while the RX DMA engine continues writing to them, leading
to potential memory corruption or use-after-free.
Terminate the RX channel before returning on the TX prepare failure path.
Felix Gu [Fri, 22 May 2026 12:40:48 +0000 (20:40 +0800)]
spi: atmel: fix DMA channel and bounce buffer leaks
The original code set use_dma to false when dma_alloc_coherent() for
bounce buffers failed, but DMA channels acquired earlier via
atmel_spi_configure_dma() were never freed.
When devm_request_irq() or clk_prepare_enable() failed later in probe,
the driver also did not release DMA channels or bounce buffers already
allocated.
The out_free_dma error path released DMA channels but did not free the
bounce buffers.
Fix by moving bounce buffer allocation into atmel_spi_configure_dma()
and registering the devres cleanup for DMA channels and bounce buffers.
Rosen Penev [Fri, 22 May 2026 01:45:15 +0000 (18:45 -0700)]
ASoC: mediatek: mt2701: fix snprintf bounds
For whatever reason, GCC is unable to figure out that i2s_num is a
single digit number, with MT2701_BASE_CLK_NUM being the maximum value it
represents. Add a min() call to help it out and fix W=1 errors regarding
snprintf bounds.
Rosen Penev [Tue, 26 May 2026 20:22:47 +0000 (13:22 -0700)]
net: ibm: emac: Reserve VLAN header in MJS limit
The IBM EMAC programs its Maximum Jumbo Size (MJS) drop
threshold from ndev->mtu directly. The hardware sizes the threshold
against the L2 frame minus the ethernet header, but does not
discount the 802.1Q tag, so a frame carrying a VLAN tag and a full
1500-byte payload exceeds MJS by exactly 4 bytes and is dropped.
This is normally hidden because JPSM (and therefore the MJS check)
only engages when the MTU is raised above ETH_DATA_LEN. With the
qca8k DSA tagger the conduit MTU is bumped by QCA_HDR_LEN to 1502
during dsa_conduit_setup(), which is enough to enable JPSM and
expose the off-by-VLAN-tag in the limit.
Pad MJS by VLAN_HLEN so a VLAN-tagged full-MTU frame passes.
Reported on Meraki MX60 (qca8k switch): tagged VLAN
traffic drops at 1500-byte payload, while 1496 bytes works
and untagged 1500 bytes works.
Replace %ld with %pe and PTR_ERR(path) with path pointer.
The %pe specifier automatically converts error pointers to
human-readable error names instead of raw error codes.
Zhang Yi [Fri, 24 Apr 2026 10:42:01 +0000 (18:42 +0800)]
ext4: fix LOGFLUSH shutdown ordering to allow ordered-mode data writeback
In EXT4_GOING_FLAGS_LOGFLUSH mode, the EXT4_FLAGS_SHUTDOWN flag was set
before calling ext4_force_commit(). This caused ordered-mode data
writeback (triggered by journal commit) to fail with -EIO, since
ext4_do_writepages() checks for the shutdown flag. The journal would
then be aborted prematurely before the commit could succeed.
Fix this by calling ext4_force_commit() first, then setting the
shutdown flag, so that pending data can be written back correctly.
Note that moving ext4_force_commit() before setting the shutdown flag
creates a small window in which new writes may occur and generate new
journal transactions. When the journal is subsequently aborted, the
new transactions will not be able to write to disk. This is intentional
because LOGFLUSH's semantics are to flush pre-existing journal entries
before shutdown, not to guarantee atomicity for writes that race with
the ioctl.
Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260424104201.1930823-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ryota Sakamoto [Tue, 27 Jan 2026 14:23:23 +0000 (23:23 +0900)]
ext4: replace KUnit tests for memcmp() with KUNIT_ASSERT_MEMEQ()
Replace KUnit tests for memcmp() with KUNIT_ASSERT_MEMEQ() to improve
debugging that prints the hex dump of the buffers when the assertion fails,
whereas memcmp() only returns an integer difference.
Sven Eckelmann [Mon, 4 May 2026 19:32:24 +0000 (21:32 +0200)]
batman-adv: use neigh_node's orig_node only as id
The orig_node member of struct batadv_neigh_node is no longer used in
B.A.T.M.A.N. IV. But batadv_neigh_node_create() is still storing it.
Only batadv_v_ogm_route_update() uses it to check if we route toward
it - not needing the data stored in the batadv_orig_node object itself,
but merely a pointer to identify the originator.
The field cannot hold a proper reference because that would create a
reference cycle, so it must never be dereferenced. Rename it to
orig_node_id and mark it __private to make any future attempt to
dereference it immediately noticeable.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Both OGMv1 (on the primary interface) and OGM2 unconditionally reallocated
their packet buffer on every transmission cycle, regardless of whether the
required size had changed. This meant a kfree/kmalloc pair even when the
TVLV payload size was identical to the previous send.
Introduce struct batadv_ogm_buf to encapsulate the OGM packet buffer
together with its current length, allocated capacity, and fixed header
length. This consolidates the separate buf/len arguments that were
previously threaded through each call site.
In batadv_tvlv_realloc_packet_buff(), the capacity is rounded up to the
next power of two so that small growth or shrinkage in TVLV data does not
trigger a reallocation. When kmalloc fails but the existing buffer is large
enough to hold the new data, the oversized buffer is reused rather than
returning an error.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Sun, 3 May 2026 20:46:15 +0000 (22:46 +0200)]
batman-adv: tt: replace open-coded overflow check with helper
The commit 6043a632dd06 ("batman-adv: reject oversized global TT response
buffers") introduced an open-coded check to ensure that the allocated
buffer size can be stored in a u16. The check_add_overflow() helper can
perform the addition and overflow check in one step, so use that instead.
Acked-by: Antonio Quartulli <antonio@mandelbit.com> Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Tue, 12 May 2026 17:37:05 +0000 (19:37 +0200)]
batman-adv: replace non-atomic last_ttvn with (READ|WRITE)_ONCE
The last TT version number of an meshif is only accessed as plain
loads/stores and does not require full atomic_t semantics. Convert to an
native integer and replace its users with READ_ONCE()/WRITE_ONCE() to avoid
load/store tearing.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Tue, 12 May 2026 17:37:05 +0000 (19:37 +0200)]
batman-adv: replace non-atomic packet_size_max with (READ|WRITE)_ONCE
The maximum packet size of an meshif is only accessed as plain loads/stores
and does not require full atomic_t semantics. Convert to a native integer
and replace its users with READ_ONCE()/WRITE_ONCE() to avoid load/store
tearing.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Tue, 12 May 2026 17:37:05 +0000 (19:37 +0200)]
batman-adv: replace non-atomic mesh state with (READ|WRITE)_ONCE
The mesh state is only accessed as plain loads/stores and does not require
full atomic_t semantics. Convert to an enum and replace its users with
READ_ONCE()/WRITE_ONCE() to avoid load/store tearing.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Tue, 12 May 2026 17:37:05 +0000 (19:37 +0200)]
batman-adv: replace non-atomic vlan config fields with (READ|WRITE)_ONCE
The vlan configuration values are only accessed as plain loads/stores and
do not require full atomic_t semantics. Convert these fields to native
integer types and replace their users with READ_ONCE()/WRITE_ONCE() to
avoid load/store tearing.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Tue, 12 May 2026 17:37:05 +0000 (19:37 +0200)]
batman-adv: replace non-atomic hardif config fields with (READ|WRITE)_ONCE
The hardif configuration values are only accessed as plain loads/stores and
do not require full atomic_t semantics. Convert these fields to native
integer types and replace their users with READ_ONCE()/WRITE_ONCE() to
avoid load/store tearing.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Tue, 12 May 2026 17:37:05 +0000 (19:37 +0200)]
batman-adv: replace non-atomic meshif config fields with (READ|WRITE)_ONCE
The meshif configuration values are only accessed as plain loads/stores and
do not require full atomic_t semantics. Convert these fields to native
integer types and replace their users with READ_ONCE()/WRITE_ONCE() to
avoid load/store tearing.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Wed, 13 May 2026 19:37:46 +0000 (21:37 +0200)]
batman-adv: extract netdev wifi detection information object
Previously, wifi_flags were stored directly in batadv_hard_iface, which is
created for every network interface on the system (including those never
attached to a mesh interface). This wastes memory and complicates the
long-term goal of lazily allocating batadv_hard_iface only for interfaces
that actually join a mesh.
The problem is that several batman-adv features need wifi detection for
net_devices (and their underlying devices) regardless of whether a
batadv_hard_iface exists for them:
* B.A.T.M.A.N. IV TQ hop penalty calculation
* B.A.T.M.A.N. V ELP probing / throughput estimation
* AP isolation
To decouple wifi detection from batadv_hard_iface lifetime, introduce a
global rhashtable (batadv_wifi_net_devices) mapping net_device pointers to
batadv_wifi_net_device_state objects. Only net_devices that are actually
detected as (indirect) wifi interfaces occupy an entry, keeping the common
(non-wifi) case allocation-free.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Nikolay Kuratov [Tue, 26 May 2026 16:29:32 +0000 (19:29 +0300)]
net/mlx5: Reorder completion before putting command entry in cmd_work_handler
Assuming callback != NULL && !page_queue, cmd_work_handler takes
command entry with refcnt == 1 from mlx5_cmd_invoke.
If either semaphore timeout or index allocation error happens,
it does final cmd_ent_put(ent). To avoid access to freed memory,
notify slotted completion before cmd_ent_put.
This is theoretical issue found by Svace static analyser.
Cc: stable@vger.kernel.org Fixes: 485d65e135712 ("net/mlx5: Add a timeout to acquire the command queue semaphore") Fixes: 0e2909c6bec90 ("net/mlx5: Fix variable not being completed when function returns") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Reviewed-by: Md Haris Iqbal <haris.iqbal@linux.dev> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260526162932.501584-1-kniv@yandex-team.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Florian Westphal [Tue, 12 May 2026 13:36:14 +0000 (15:36 +0200)]
netfilter: nft_byteorder: remove multi-register support
64bit byteorder conversion is broken when several registers need to be
converted because the source register array advances in steps for 4 bytes
instead of 8:
for (i = ...
src64 = nft_reg_load64(&src[i]);
~~~~~ u32 *src
nft_reg_store64(&dst64[i],
Remove the multi-register support, it has other issues as well:
Pablo points out that commit caf3ef7468f7 ("netfilter: nf_tables: prevent OOB access in nft_byteorder_eval")
alters semantics: before the loop operated on registers, i.e.
for ( ... )
dst32[i] = htons((u16)src32[i])
.. but after the patch it will operate on bytes, which makes this
useless to convert e.g. concatenations, which store each compound
in its own register.
Multi-convert of u32 has one theoretical application:
ct mark . meta mark . tcp dport @intervalset
Because ct mark and meta mark are host byte order, use with
intervals has to convert the byteorder for ct/meta mark value
to network byte order (bigendian).
I.e. two separate calls. Theoretically it could be changed to do:
[ meta load mark => reg 1 ]
[ ct load mark => reg 9 ]
[ byteorder reg 1 = htonl(reg 1, 4, 8) ]
...
But then all it would take to change the set to
meta mark . tcp dport . ct mark
... and we'd be back to two "byteorder" calls. IOW, support to
convert a range of registers is both dysfunctional and dubious.
Simplify this: remove the feature.
Pablo Neira Ayuso points out that nftables before 1.1.0 can generate
incorrect byteorder conversions, see 9fe58952c45a,
"evaluate: skip byteorder conversion for selector smaller than 2 bytes"
in nftables.git). Affected rulesets fail to load with this change and
old userspace due to 'len != size' check.
Fixes: c301f0981fdd ("netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()") Cc: <stable+noautosel@kernel.org> # may break rule load with old nftables versions Reported-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/netfilter-devel/20240206104336.ctigqpkunom2ufmn@lion.mk-sys.cz/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yiming Qian [Sat, 23 May 2026 12:29:10 +0000 (12:29 +0000)]
netfilter: bridge: make ebt_snat ARP rewrite writable
The ebtables SNAT target keeps the Ethernet source address rewrite
behind skb_ensure_writable(skb, 0). This is intentional: at the bridge
ebtables hooks the Ethernet header is addressed through
skb_mac_header()/eth_hdr(), while skb->data points at the Ethernet
payload. Asking skb_ensure_writable() for ETH_HLEN bytes would check
the payload, not the Ethernet header, and would reintroduce the small
packet regression fixed by commit 63137bc5882a.
However, the optional ARP sender hardware address rewrite is different.
It writes through skb_store_bits() at an offset relative to skb->data:
skb_header_pointer() only safely reads the ARP header; it does not make
the later sender hardware address range writable. If that range is
still held in a nonlinear skb fragment backed by a splice-imported file
page, skb_store_bits() maps the frag page and copies the new MAC address
directly into it.
Ensure the ARP SHA range is writable before reading the ARP header and
before calling skb_store_bits().
Fixes: 63137bc5882a ("netfilter: ebtables: Fixes dropping of small packets in bridge nat") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jiayuan Chen [Thu, 28 May 2026 11:09:19 +0000 (19:09 +0800)]
netfilter: nft_ct: bail out on template ct in get eval
I noticed this issue while looking at a historic syzbot report [1].
A rule like the one below is enough to trigger the bug:
table ip t {
chain pre {
type filter hook prerouting priority raw;
ct zone set 1
ct original saddr 1.2.3.4 accept
}
}
The first expression attaches a per-cpu template ct via
nft_ct_set_zone_eval() (nf_ct_tmpl_alloc -> kzalloc, tuple is all
zero, nf_ct_l3num(ct) == 0). The next expression then calls
nft_ct_get_eval() on the same skb, treats the template as a real ct
and hits the 16-byte memcpy path. With dreg at NFT_REG32_15 this
overflows past struct nft_regs on the kernel stack; with smaller
dreg values it silently clobbers adjacent registers.
Reject template ct at the eval entry and in nft_ct_get_fast_eval(),
mirroring the check nft_ct_set_eval() already has. Additionally,
bound the address copy in NFT_CT_SRC / NFT_CT_DST by priv->len
instead of by nf_ct_l3num(ct): nf_ct_get_tuple() zeroes the tuple
before pkt_to_tuple() fills in only the protocol-relevant leading
bytes, so the trailing bytes of tuple->{src,dst}.u3.all are
well-defined zero. priv->len is validated at rule load, so the
copy size is now bounded by the destination register rather than
by an untrusted field on the conntrack.
Tristan Madani [Wed, 27 May 2026 13:57:50 +0000 (13:57 +0000)]
netfilter: nft_tunnel: fix use-after-free on object destroy
nft_tunnel_obj_destroy() calls metadata_dst_free() which directly
kfree()s the metadata_dst, ignoring the dst_entry refcount. Packets
that took a reference via dst_hold() in nft_tunnel_obj_eval() and
are still queued (e.g. in a netem qdisc) are left with a dangling
pointer. When these packets are eventually dequeued, dst_release()
operates on freed memory.
Replace metadata_dst_free() with dst_release() so the metadata_dst
is freed only after all references are dropped. The dst subsystem
already handles metadata_dst cleanup in dst_destroy() when
DST_METADATA is set.
netfilter: synproxy: add mutex to guard hook reference counting
As the synproxy infrastructure register netfilter hooks on-demand when a
user adds the first iptables target or nftables expression, if done
concurrently they can race each other.
Introduce a mutex to serialize the refcount control blocks access from
both frontends. While a per namespace mutex might be more efficient, it
is not needed for target/expression like SYNPROXY.
Jiayuan Chen [Tue, 26 May 2026 02:02:27 +0000 (10:02 +0800)]
netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked
This was reported by Sashiko [1].
The RCU walk over rt->fib6_siblings can spin forever if rt is unlinked
mid-iteration: rt->fib6_siblings.next still points into the old ring,
so the loop never meets &rt->fib6_siblings as its terminator.
fib6_purge_rt() always does WRITE_ONCE(rt->fib6_nsiblings, 0) before
list_del_rcu(), so readers can use rt->fib6_nsiblings == 0 as the
detach signal. The same pattern is used in fib6_info_uses_dev() and
rt6_nlmsg_size().
Julian Anastasov [Mon, 25 May 2026 04:07:44 +0000 (07:07 +0300)]
ipvs: clear the svc scheduler ptr early on edit
ip_vs_edit_service() while unbinding the old scheduler clears
the svc->scheduler ptr after the scheduler module initiates
RCU callbacks. This can cause packets to use the old
scheduler at the time when svc->sched_data is already freed
after RCU grace period.
Fix it by clearing the ptr early in ip_vs_unbind_scheduler(),
before the done_service method schedules any RCU callbacks.
Also, if the new scheduler fails to initialize when replacing
the old scheduler, try to restore the old scheduler while still
returning the error code.
With PREEMPT_RCU this triggers a splat because smp_processor_id() can be
preempted while inside a RCU critical section. If xt_NFQUEUE target is
invoked via nft_compat_eval() path, we are inside a RCU critical
section.
Just use the raw version instead.
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
drivers/net/ethernet/microsoft/mana/mana_en.c: 17bfe0a8c014e ("net: mana: Add NULL guards in teardown path to prevent panic on attach failure") d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size")
Gray Huang [Thu, 7 May 2026 03:35:41 +0000 (11:35 +0800)]
arm64: dts: rockchip: Add Bluetooth support for Khadas Edge 2L
Enable Bluetooth support for the Ampak AP6275P module on the
Khadas Edge 2L. This involves enabling the UART5 interface for
HCI communication and defining the required regulators and
power-sequence pins.
Gray Huang [Thu, 7 May 2026 03:35:40 +0000 (11:35 +0800)]
arm64: dts: rockchip: Enable USB for Khadas Edge 2L
The Khadas Edge 2L board provides one USB 3.0 Host port and
one USB 2.0 port (connected via an internal hub). Enable the
corresponding DWC3 controllers and PHYs.
Chen-Yu Tsai [Tue, 5 May 2026 17:29:02 +0000 (01:29 +0800)]
arm64: dts: rockchip: Disable removed devices from rk3399-nanopi-r4s
While the design of the NanoPi R4S is based on the common NanoPi 4
family, it is trimmed down a lot.
Disable all the peripherals on the SoC that are not used, and delete
all the external components that are not present.
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
[feels like the cleaner option, than to move those peripherals into a new
rk3399-nanopi-allothers.dtsi, as the r4s variants are not as many ] Link: https://patch.msgid.link/20260505172903.33271-1-wens@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Chen-Yu Tsai [Tue, 5 May 2026 16:52:43 +0000 (00:52 +0800)]
arm64: dts: rockchip: Fix EEPROM compatible on rk3399-nanopi-r4s-enterprise
The EEPROM used on the R4S (enterprise) is the 24AA025E48T-I/OT from
MicroChip. This is a 2-Kbit EEPROM with 16-byte page size. The latter
half of the EEPROM is read-only, and the last 48 bits contain a globally
unique MAC address. That is to say this is not an ordinary EEPROM.
The compatible for this type of EEPROM was introduced later that the
board. Switch over to the correct compatible now that it is available.
CƔssio Gabriel [Mon, 1 Jun 2026 01:23:35 +0000 (22:23 -0300)]
ALSA: core: Use flexible array for card private data
snd_card_new() and snd_devm_card_new() allocate struct snd_card
together with optional driver-private storage. The storage is currently
described only by open-coded sizeof(*card) + extra_size arithmetic, and
snd_card_init() reaches it by manually adding sizeof(struct snd_card) to
the card pointer.
Make the trailing storage explicit with a flexible array member. Use
kzalloc_flex() for the regular allocation path and struct_size() for the
devres allocation size. This documents the layout and avoids open-coded
variable-size object arithmetic.
Align the flexible array to unsigned long long so the driver-private area
does not become less aligned than the old sizeof(struct snd_card) tail
address on 32-bit ABIs.
CƔssio Gabriel [Sun, 31 May 2026 23:41:41 +0000 (20:41 -0300)]
ALSA: seq: Use flexible array for device arguments
snd_seq_device_new() allocates struct snd_seq_device together with a
caller-specific argument area. SNDRV_SEQ_DEVICE_ARGPTR() reaches that
area by adding sizeof(struct snd_seq_device) to the object pointer.
Make the trailing storage explicit with a flexible array and allocate it
with kzalloc_flex(). This makes the object layout self-describing and
avoids open-coded size arithmetic in the allocation and accessor.
Reject negative argsize values before calculating the allocation size.
Current in-tree callers pass either zero or sizeof() values, but the
function takes an int size argument and should not let a negative value
flow into unsigned allocation arithmetic.
Ashwin Gundarapu [Sun, 24 May 2026 15:35:27 +0000 (21:05 +0530)]
ext2: Remove deprecated DAX support
DAX support in ext2 was deprecated in commit d5a2693f93e4 ("ext2:
Deprecate DAX") with a removal deadline of end of 2025. Remove all DAX
code from ext2 as scheduled.
This removes the DAX mount option, IOMAP DAX support, DAX file
operations, DAX address_space_operations, and the DAX fault handler.
Hao Li [Fri, 29 May 2026 03:50:52 +0000 (11:50 +0800)]
mm/slub: detach and reattach partial slabs in batch
get_partial_node_bulk() moves each selected slab from the node's
partial list to the local pc->slabs list using a remove_partial() and
list_add() pair. In practice, the loop often detaches several adjacent
slabs. Doing this individually repeatedly manipulates list pointers
while holding n->list_lock, which causes unnecessary churn.
To demonstrate this, the counts below show how often single vs. multiple
consecutive slabs are retrieved during a will-it-scale mmap stress test:
The data confirms that retrieving multiple contiguous slabs is highly
frequent.
To optimize this, track contiguous runs of matching slabs and move each
run in a single operation using list_bulk_move_tail(). This reduces list
pointer churn inside the lock critical section.
Apply the same optimization to __refill_objects_node() when reattaching
leftover partial slabs back to the node's partial list.
The will-it-scale mmap benchmark shows a 2% ~ 5% performance improvement
after applying this patch.
Hao Li [Fri, 29 May 2026 03:50:51 +0000 (11:50 +0800)]
mm/slub: introduce helpers for node partial slab state
Wrap partial slab count inc/dec and flag set/clear into
helper functions to reduce code duplication.
Note that __add_partial() is called locklessly in
early_kmem_cache_node_alloc(), but since there is no such use case for
removal, __remove_partial() does not exist.
Suggested-by: Harry Yoo <harry@kernel.org> Signed-off-by: Hao Li <hao.li@linux.dev> Link: https://patch.msgid.link/20260529035120.81304-2-hao.li@linux.dev Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Shengming Hu [Thu, 28 May 2026 11:35:37 +0000 (19:35 +0800)]
mm/slub: use empty sheaf helpers for oversized sheaves
Oversized prefilled sheaves are allocated separately because their
capacity can be larger than the cache's regular sheaf capacity. After
they are flushed, however, they are empty sheaves as well, and should be
released through the same empty-sheaf helper.
Allocate oversized prefilled sheaves with __alloc_empty_sheaf() and free
them with free_empty_sheaf() after a failed prefill or after they are
returned and flushed. This keeps the oversized and pfmemalloc return paths
consistent, including the SLAB_KMALLOC-specific __GFP_NO_OBJ_EXT and
mark_obj_codetag_empty() handling.
Keep the caller-GFP filtering in alloc_empty_sheaf() instead of
__alloc_empty_sheaf(). In particular, do not clear OBJCGS_CLEAR_MASK in
the raw helper, so the oversized prefill path does not unexpectedly drop
caller-provided flags such as __GFP_NOFAIL. The SLAB_KMALLOC-specific
addition of __GFP_NO_OBJ_EXT remains in __alloc_empty_sheaf(), matching
the free_empty_sheaf() assumption.
Since oversized sheaves are now allocated and freed through the empty
sheaf helpers, SHEAF_ALLOC and SHEAF_FREE also account for oversized
sheaves. Update the stat comments accordingly.
Keep the capacity initialization in the oversized prefill path, since
capacity is currently only used for prefilled sheaves
ARM: orion5x: update board check in mss2_pci_init() to use the DT
The mss2_pci_init() function contains a check for the ARM machine ID
via the machine_is_mss2() macro. The board concerned now supports only
FDT booting, which does not use machine IDs, and therefore the code
should be updated to check the DT compatible property instead. The
machine was converted to FDT booting in commit fbf04d814d0a ("ARM:
orion5x: convert Maxtor Shared Storage II to the Device Tree"). The
presence of this machine ID check prevents the removal of machine IDs
no longer used by the kernel from arch/arm/tools/mach-types, because
the machine_is_*() macros are generated from mach-types. To resolve
this issue, use of_machine_is_compatible() instead.
Rosen Penev [Mon, 11 May 2026 03:21:44 +0000 (20:21 -0700)]
RISC-V: KVM: Use flexible array for APLIC IRQ state
Store the per-source APLIC IRQ state in the APLIC allocation instead
of allocating it separately.
This ties the IRQ state lifetime directly to the APLIC state, removes a
separate allocation failure path, and lets __counted_by() describe the
array bounds.
Takao Sato [Tue, 26 May 2026 16:09:57 +0000 (13:09 -0300)]
xfrm: iptfs: preserve shared-frag marker in iptfs_consume_frags()
iptfs_consume_frags() transfers paged fragments from one socket buffer
to another but fails to propagate the SKBFL_SHARED_FRAG flag. This is
the same class of bug that was fixed in skb_try_coalesce() for
CVE-2026-46300: when fragments backed by read-only page-cache pages are
merged, the marker indicating their shared nature must be preserved so
that ESP can decide correctly whether in-place encryption is safe.
Apply the same two-line fix used in skb_try_coalesce() to
iptfs_consume_frags().
arm: mvebu_v5_defconfig: remove stale MACH_LINKSTATION_LSCHL reference
The legacy board file for MACH_LINKSTATION_LSCHL was removed in
commit ecfe69639157 ("ARM: orion5x: remove legacy support of ls-chl")
after it was converted to DT booting, but a reference to it remained in
mvebu_v5_defconfig. Drop this unused code.
Miguel Ojeda [Sat, 30 May 2026 09:58:09 +0000 (11:58 +0200)]
rust: cpufreq: clean new `clippy::map_or_identity` lint for Rust 1.98.0
Starting with Rust 1.98.0 (expected 2026-08-20), Clippy is likely
introducing a new lint `clippy::map_or_identity` [1][2], which currently
triggers in a single case:
warning: expression can be simplified using `Result::unwrap_or()`
--> rust/kernel/cpufreq.rs:1326:60
|
1326 | PolicyCpu::from_cpu(cpu_id).map_or(0, |mut policy| T::get(&mut policy).map_or(0, |f| f))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#map_or_identity
= note: `-W clippy::map-or-identity` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::map_or_identity)]`
help: consider using `unwrap_or`
|
1326 - PolicyCpu::from_cpu(cpu_id).map_or(0, |mut policy| T::get(&mut policy).map_or(0, |f| f))
1326 + PolicyCpu::from_cpu(cpu_id).map_or(0, |mut policy| T::get(&mut policy).unwrap_or(0))
|
This is doing a redundant check _and_ making life confusing, as if
!vma->vm_ops is a condition that can be reached there, it can't, as
vma_is_anonymous() is literally a !vma->vm_ops check :)
Remove the redundant check.
Link: https://lore.kernel.org/20260527184751.4147364-4-rppt@kernel.org Fixes: 0f48947c4232 ("userfaultfd: introduce vm_uffd_ops") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Suggested-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: David Carlier <devnexen@gmail.com> Cc: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs
__mfill_atomic_pte() unconditionally dereferences ops because there is an
assumption that VMAs that can undergo mfill_* operations are vetted on
registration and must have valid vm_uffd_ops.
Add a guard against potential bugs and make sure __mfill_atomic_pte()
bails out if ops is NULL.
Link: https://lore.kernel.org/20260527184751.4147364-3-rppt@kernel.org Fixes: ad9ac3081332 ("userfaultfd: introduce vm_uffd_ops->alloc_folio()") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Suggested-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: David CARLIER <devnexen@gmail.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michael Bommarito <michael.bommarito@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
userfaultfd: verify VMA state across UFFDIO_COPY retry
Patch series "userfaultfd: verify VMA state across UFFDIO_COPY retry", v2.
... and two more small fixes.
This patch (of 3):
mfill_copy_folio_retry() drops the VMA lock for copy_from_user() and
reacquires it afterwards. The destination VMA can be replaced during that
window.
The existing check compares vma_uffd_ops() before and after the retry, but
if a shmem VMA with MAP_SHARED is replaced with a shmem VMA with
MAP_PRIVATE (or vice versa) the replacement goes undetected.
The change from MAP_PRIVATE to MAP_SHARED will treat the folio allocated
with shmem_alloc_folio() as anonymous and this will cause BUG() when
mfill_atomic_install_pte() will try to folio_add_new_anon_rmap().
The change from MAP_SHARED to MAP_PRIVATE allows injection of folios into
the page cache of the original VMA.
There is no need to change for hugetlb because it never uses
mfill_copy_folio_retry().
Introduce helpers for more comprehensive comparison of VMA state:
- mfill_retry_state_save() to save the relevant VMA state into a struct
mfill_retry_state (original uffd_ops, relevant VMA flags, vm_file and
pgoff) before dropping the lock
- mfill_retry_state_changed() to compare the saved state with the state
of the VMA acquired after retaking the locks
- mfill_retry_state_put() to release vm_file pinning.
Use DEFINE_FREE() cleanup to wrap mfill_retry_state_put() to avoid
complicating error handling paths in mfill_copy_folio_retry().
Link: https://lore.kernel.org/20260527184751.4147364-1-rppt@kernel.org Link: https://lore.kernel.org/20260527184751.4147364-2-rppt@kernel.org Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()") Fixes: 6ab703034f14 ("userfaultfd: mfill_atomic(): remove retry logic") Co-developed-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Suggested-by: Peter Xu <peterx@redhat.com> Co-developed-by: David Carlier <devnexen@gmail.com> Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yin Tirui [Tue, 26 May 2026 10:13:37 +0000 (18:13 +0800)]
mm/huge_memory: update file PMD counter before folio_put()
__split_huge_pmd_locked() updates the file/shmem RSS counter after
dropping the PMD mapping's folio reference. If folio_put() drops the last
reference, mm_counter_file() can later read freed folio state via
folio_test_swapbacked().
Move the counter update before folio_put().
Link: https://lore.kernel.org/20260526101337.1984081-1-yintirui@huawei.com Fixes: fadae2953072 ("thp: use mm_file_counter to determine update which rss counter") Signed-off-by: Yin Tirui <yintirui@huawei.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: David Hildenbrand (arm) <david@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chen Jun <chenjun102@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yin Tirui [Tue, 26 May 2026 10:13:55 +0000 (18:13 +0800)]
mm/huge_memory: update file PUD counter before folio_put()
__split_huge_pud_locked() updates the file/shmem RSS counter after
dropping the PUD mapping's folio reference. If folio_put() drops the last
reference, mm_counter_file() can later read freed folio state via
folio_test_swapbacked().
Move the counter update before folio_put().
Link: https://lore.kernel.org/20260526101355.1984244-1-yintirui@huawei.com Fixes: dbe54153296d ("mm/huge_memory: add vmf_insert_folio_pud()") Signed-off-by: Yin Tirui <yintirui@huawei.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: David Hildenbrand (arm) <david@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chen Jun <chenjun102@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Mon, 25 May 2026 02:52:13 +0000 (10:52 +0800)]
mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback
vmemmap_restore_pte() rebuilds restored vmemmap pages from a tail-page
template derived from compound_head(). This is wrong when the current PTE
already maps a page whose contents are not tail-page metadata.
In the rollback path of vmemmap_remap_free(), the first restored PTE is
backed by vmemmap_head and contains head-page metadata. Reconstructing
that page from a tail-page template overwrites the head-page state and
corrupts the restored vmemmap page.
Fix this by copying the full page from the page currently mapped by the
PTE. Also pass vmemmap_tail to the rollback walk so only PTEs backed by
the shared tail page are restored, while the head PTE remains mapped to
vmemmap_head. Add VM_WARN_ON_ONCE() checks for unexpected cases.
Link: https://lore.kernel.org/20260525025213.2229628-1-songmuchun@bytedance.com Fixes: c0b495b91a47 ("mm/hugetlb: refactor code around vmemmap_walk") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 25 May 2026 16:22:55 +0000 (09:22 -0700)]
mm/damon/ops-common: call folio_test_lru() after folio_get()
damon_get_folio() speculatively calls folio_test_lru() before
folio_try_get(). The folio can get freed and reallocated to a tail page.
In the case, VM_BUG_ON_PGFLAGS() in const_folio_flags() can be triggered.
Remove the speculative call.
Also mark folio_test_lru() check right after folio_try_get() success as no
more unlikely.
The race should be rare. Also the problem can happen only if the kernel
has enabled CONFIG_DEBUG_VM_PGFLAGS. No real world report of this issue
has been made so far. This fix is based on only theoretical analysis.
That said, a bug is a bug. A similar issue was also fixed via commit 3203b3ab0fcf ("mm/filemap: don't call folio_test_locked() without a
reference in next_uptodate_folio()"). I don't expect this change will
make a meaningful impact to DAMON performance in the real world, though I
will be happy to be corrected from the real world reports.
The issue was discovered [1] by Sashiko.
Link: https://lore.kernel.org/20260525162256.8317-1-sj@kernel.org Link: https://lore.kernel.org/20260517234112.89245-1-sj@kernel.org Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Fernand Sieber <sieberf@amazon.com> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: <stable@vger.kernel.org> # 5.15.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Frank Li [Fri, 22 May 2026 20:38:08 +0000 (16:38 -0400)]
dt-bindings: trivial-devices: add fsl,mc1323
Add freescale 2.4 GHz IEEEĀ® 802.15.4/ZigBee mc1323 to fix the below
CHECK_DTBS warnings.
arch/arm/boot/dts/nxp/imx/imx53-smd.dtb: /soc/bus@50000000/spba-bus@50000000/spi@50010000/mc1323@0: failed to match any schema with compatible: ['fsl,mc1323']
Since the i.MX53 platform is more than 20 years old, it is difficult to
find detailed information about how the MC1323 was used on the i.MX53 SMD
board, as the functionality depended on firmware.
Frank Li [Thu, 21 May 2026 19:37:32 +0000 (15:37 -0400)]
dt-bindings: display: imx: Add television encoder (TVE) for imx53
Add television encoder (TVE) for legacy i.MX53 (over 15 years) to fix below
DTB_CHECK warnings:
arch/arm/boot/dts/nxp/imx/imx53-ard.dtb: /soc/bus@60000000/tve@63ff0000: failed to match any schema with compatible: ['fsl,imx53-tve']
Rosen Penev [Sat, 30 May 2026 01:12:55 +0000 (18:12 -0700)]
rbd: check snap_count against RBD_MAX_SNAP_COUNT
snap_count is u32 but the comparison is against a SIZE_MAX-derived value
(~2^61 on 64-bit), which clang flags as always false with
-Wtautological-constant-out-of-range-compare.
The proper check here should be that snap_count does not go over
RBD_MAX_SNAP_COUNT.
Haoze Xie [Sat, 30 May 2026 06:11:54 +0000 (14:11 +0800)]
rust: block: fix GenDisk cleanup paths
GenDiskBuilder::build() still has fallible work after
__blk_mq_alloc_disk(), but its error path only recovers the
foreign queue data. That leaks the temporary gendisk and
request_queue until later teardown. If the caller moved the last
Arc<TagSet<T>> into build(), the leaked queue can retain blk-mq
state after the tag set is dropped.
Fix the pre-registration failure path by dropping the temporary
gendisk reference with put_disk() before recovering queue_data,
so disk_release() can tear down the owned queue.
Also pair GenDisk::drop() with put_disk() after del_gendisk().
Once a Rust GenDisk has been added with device_add_disk(),
del_gendisk() only unregisters it; the final gendisk reference
still has to be dropped to complete the release path.
Paul Moore [Sat, 23 May 2026 16:00:26 +0000 (12:00 -0400)]
bpf: Fix security_bpf_prog_load() error handling
If security_bpf_prog_load() fails there is no need to call into
security_bpf_prog_free() as the LSM will handle the cleanup of any partial
LSM state before returning to the caller with an error. Thankfully this
isn't an issue with any of the existing code as the LSMs which currently
provide BPF hook callback implementations don't allocate any internal
state, but this is something we want to fix for potential future users.
Taegu Ha [Thu, 28 May 2026 06:21:55 +0000 (15:21 +0900)]
bpf: reject overlarge global subprog argument sizes
Global subprogram argument checking derives generic pointer sizes from BTF
and passes the resolved size to check_mem_reg() as a u32. The access-size
validation path then uses a signed int, and stack pointers negate the value
before calling check_helper_mem_access().
This creates a wrap when BTF describes a pointee size larger than S32_MAX.
For example, a global subprogram argument of type:
int (*p)[0x3fffffff]
has a BTF-resolved pointee size of 0xfffffffc bytes. At a call site the
caller can pass a pointer to a 4-byte stack slot at fp-4. The current
PTR_TO_STACK path computes:
size = -(int)mem_size
so 0xfffffffc becomes -4 as a signed int and the negation validates only
a 4-byte stack range. That range is covered by the caller's stack slot,
so the call is accepted.
The callee is then verified independently with R1 as PTR_TO_MEM and
mem_size 0xfffffffc. A small instruction such as:
r0 = *(u32 *)(r1 + 4)
is accepted as being inside that BTF-described memory region. At run time,
however, the actual argument value is still fp-4, so r1 + 4 addresses fp+0,
outside the 4-byte object that the caller provided.
Reject sizes that cannot be represented by the verifier's signed
access-size API before the stack-specific negation. Add a verifier
regression test for the oversized BTF argument.
Patch 1 fixes a redundant MOV in the arm64 JIT's
emit_stack_arg_store_imm() and clarifies the stack layout comments. This
is not a bug fix but an improvement.
Patch 2 bumps the stack argument tests from 6-8 args to at least 10 so
they actually exercise the native stack on arm64, where x0-x7 cover the
first 8 arguments.
====================
Puranjay Mohan [Thu, 28 May 2026 16:17:48 +0000 (09:17 -0700)]
selftests/bpf: Use at least 10 args in stack argument tests
On arm64, the first 8 arguments are passed in registers (x0-x7), so
tests with 8 or fewer arguments never exercise the native stack argument
path in the JIT. Increase argument counts to at least 10 across all
BPF-to-BPF subprog and kfunc stack argument tests so that at least 2
arguments land on the arm64 stack.
For the two-callees test, bump foo1 from 8 to 10 and foo2 from 10 to 12
args to preserve the different-stack-depth flavor of the test.
The bpf_kfunc_call_stack_arg_mem kfunc is left unchanged at 7 args to
avoid breaking the precision backtracking test which relies on hardcoded
verifier log instruction indices.