git.ipfire.org Git - thirdparty/linux.git/log

ipv6: mld: rename mldv2_mrc() and add mldv2_qqi()

Rename mldv2_mrc() to mldv2_mrd() as it is used to calculate
the Maximum Response Delay from the Maximum Response Code.

Introduce a new API mldv2_qqi() to define the existing
calculation logic of QQI from QQIC. This also organizes
the existing mld_update_qi() API.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-3-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation

Get rid of the IGMPV3_MRC macro and use the igmpv3_mrt() API to
calculate the Max Resp Time from the Maximum Response Code.

Similarly, for IGMPV3_QQIC, use the igmpv3_qqi() API to calculate
the Querier's Query Interval from the QQIC field.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-2-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks

Commit e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") replaced
the asix_adjust_link() PHY callback with phylink's mac_link_up() and
mac_link_down() handlers, but did not carry over the usbnet_link_change()
notification that commit 805206e66fab ("net: asix: fix "can't send until
first packet is send" issue") had added.

As a result, the original symptom returns: when the link comes up,
usbnet is never notified, so the RX URB submission stays dormant until
some other event (e.g. a transmitted packet triggering the status
endpoint interrupt) wakes it up.

This is reproducible with the Apple A1277 USB Ethernet Adapter
(05ac:1402, AX88772A based) on a Banana Pro using a static IPv4
configuration. After bringing the interface up, no incoming packets are
received until the first outgoing frame triggers usbnet's RX path.

Restore the link change notification, gated on a carrier transition so
the call remains idempotent if the status endpoint also reports the
change later.

Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink")
Signed-off-by: Markus Baier <Markus.Baier@soslab.tu-darmstadt.de>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260501163941.107668-1-Markus.Baier@soslab.tu-darmstadt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-convert-af_netlink-and-af_vsock-to-getsockopt_iter-api'

Breno Leitao says:

====================
net: Convert AF_NETLINK and AF_VSOCK to getsockopt_iter API

Continue the work to convert protocols to the new getsockopt_iter API.

Convert AF_NETLINK and AF_VSOCK getsockopt implementations to the new
sockopt_t/getsockopt_iter API, and add kselftests that verify the size
and errno semantics are preserved across the conversion.

I chose these two socket families because they are probably one of the
most used protocols,, ensuring that any potential bugs will be
discovered and reported quickly.
====================

Link: https://patch.msgid.link/20260501-getsock_one-v1-0-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: selftests: add getsockopt_iter regression tests

Add a single kselftest covering the proto_ops getsockopt_iter
conversions for AF_NETLINK and AF_VSOCK, using one fixture per protocol:

netlink:

NETLINK_PKTINFO covers the flag-style int path (exact size, oversize
clamp, undersize -EINVAL); NETLINK_LIST_MEMBERSHIPS covers the
size-discovery path that always reports the required buffer length back
via optlen, even when the user buffer is too small to receive any group
bits.

vsock:
SO_VM_SOCKETS_BUFFER_SIZE covers the u64 path (exact size, oversize
clamp, undersize -EINVAL).

Each fixture also exercises an unknown optname and a bogus level so
the returned-length / errno semantics preserved by the sockopt_t
conversion are pinned down.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260501-getsock_one-v1-3-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock: convert to getsockopt_iter

Convert AF_VSOCK's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t. The single
vsock_connectible_getsockopt() callback is shared by both
vsock_stream_ops and vsock_seqpacket_ops, so both proto_ops are
updated to use .getsockopt_iter.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260501-getsock_one-v1-2-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: convert to getsockopt_iter

Convert AF_NETLINK's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()
- For NETLINK_LIST_MEMBERSHIPS: walk the groups bitmap and emit each
  u32 sequentially via copy_to_iter(), then set opt->optlen to the
  total size required (ALIGN(BITS_TO_BYTES(ngroups), sizeof(u32))).
  The wrapper writes opt->optlen back to userspace even on partial
  failure, preserving the existing API that lets userspace discover
  the needed allocation size.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260501-getsock_one-v1-1-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

spi: spacemit: add u64 cast to NSEC_PER_SEC to avoid 32-bit overflow

NSEC_PER_SEC expands to the long constant 1000000000L, so NSEC_PER_SEC *
BITS_PER_BYTE (8 * 10^9) overflows on 32-bit-long architectures
before the result reaches the u64 nsec_per_word.

Promote the multiplication to u64 by casting the first operand, which is
NSEC_PER_SEC.

Fixes: efcd8b9d1111 ("spi: spacemit: introduce SpacemiT K1 SPI controller driver")
Suggested-by: Alex Elder <elder@riscstar.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605050437.RS6mmV2b-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202605050317.Tf9j487w-lkp@intel.com/
Signed-off-by: Guodong Xu <guodong@riscstar.com>
Link: https://patch.msgid.link/20260505-spi-spacemit-k1-fix-overflow-v1-1-77564c2e4e86@riscstar.com
Signed-off-by: Mark Brown <broonie@kernel.org>

net/sched: add qstats_cpu_drop_inc() helper

1) Using this_cpu_inc() is better than going through this_cpu_ptr():

- Single instruction on x86.
- Store tearing prevention.

2) Change tcf_action_update_stats() to use this_cpu_add().

3) Add WRITE_ONCE() to __qdisc_qstats_drop() and qstats_drop_inc()
   in preparation for lockless "tc qdisc show".

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 3/17 up/down: 72/-216 (-144)
Function                                     old     new   delta
dualpi2_enqueue_skb                          462     511     +49
tcf_ife_act                                 1061    1077     +16
taprio_enqueue                               613     620      +7
codel_qdisc_enqueue                          149     143      -6
tcf_vlan_act                                 684     676      -8
tcf_skbedit_act                              626     618      -8
tcf_police_act                               725     717      -8
tcf_mpls_act                                1297    1289      -8
tcf_gate_act                                 310     302      -8
tcf_gact_act                                 222     214      -8
tcf_csum_act                                2438    2430      -8
tcf_bpf_act                                  709     701      -8
tcf_action_update_stats                      124     115      -9
pie_qdisc_enqueue                            865     856      -9
pfifo_enqueue                                116     107      -9
choke_enqueue                               2069    2059     -10
plug_enqueue                                 139     128     -11
bfifo_enqueue                                121     110     -11
tcf_nat_act                                 1501    1489     -12
gred_enqueue                                1743    1668     -75
Total: Before=24388609, After=24388465, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260501135916.2566766-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ASoC: mediatek: Add support for MT8196 SoC

Cyril Chao <Cyril.Chao@mediatek.com> says:

This series of patches adds support for Mediatek AFE of MT8196 SoC.

ASoC: mediatek: mt8196: add machine driver with nau8825

Add support for mt8196 board with nau8825.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-11-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mediatek,mt8196-nau8825: Add audio sound card

Add soundcard bindings for the MT8196 SoC with the NAU8825 audio codec.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-10-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: add platform driver

Add mt8196 platform driver.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-9-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mediatek,mt8196-afe: add audio AFE

Add mt8196 audio AFE.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-8-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support TDM in platform driver

Add mt8196 TDM DAI driver support.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-7-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support I2S in platform driver

Add mt8196 I2S DAI driver support.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-6-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support ADDA in platform driver

Add mt8196 ADDA DAI driver support.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-5-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support audio clock control

Add audio clock wrapper and audio tuner control.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-4-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: add common header

Add header files for register definitions and structures.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-3-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: common: modify mtk afe platform driver for mt8196

Mofify the pcm pointer interface to support 64-bit address access.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Tested-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-2-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add machine driver for on-chip HDMI codec

Add a simple ASoC machine driver that wires the MT2701/MT7623N
AFE HDMI playback path to the on-chip HDMI transmitter exposed
as a generic hdmi-audio-codec "i2s-hifi" DAI.

The driver binds to "mediatek,mt2701-hdmi-audio". MT7623N device
trees carry "mediatek,mt7623n-hdmi-audio" as a board-specific
fallback, matching the dt-binding.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/1dafc8147ee09d44de53b69bca792bdbfe13e8b0.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add HDMI audio memif, FE and BE DAIs

Extend the MT2701/MT7623N AFE driver with the HDMI playback path:

  - a new HDMI DMA memif (MT2701_MEMIF_HDMI) mapped to the
    AFE_HDMI_OUT_{CON0,BASE,CUR,END} registers;
  - a PCM_HDMI front-end DAI (S16_LE only, 44.1k/48k) which feeds
    the memif via DPCM;
  - an HDMI BE DAI wrapping the AFE_8CH_I2S_OUT_CON engine that
    serialises L/R samples towards the on-chip HDMI transmitter.

Sample-rate programming uses the empirically determined
HDMI_BCK_DIV = 45 * 48000 / rate - 1 formula in AUDIO_TOP_CON3,
which covers 44.1 kHz and 48 kHz within the 6-bit divider range.
The AFE_HDMI_CONN0 interconnect is programmed to route memif
output pairs to the serializer inputs with L/R in the right order
for hdmi-audio-codec.

The existing I2S engine helpers (mt2701_mclk_configuration,
mt2701_i2s_path_enable, mt2701_afe_i2s_path_disable) are reused
for the HDMI BE so that MCLK at 128*fs and the ASYS I2S3 FS field
are programmed and cleanly released across open/close cycles.

Only S16_LE and 44.1k/48k are exposed to userspace. Other rates
fall outside the 6-bit BCK divider range, and wider sample
formats require DMA BIT_WIDTH programming that the current memif
setup does not handle. These limits match what the MT8173 AFE
driver exposes for its HDMI path.

The HDMI-related AFE registers (AUDIO_TOP_CON3, AFE_HDMI_OUT_CON0,
AFE_HDMI_CONN0, AFE_8CH_I2S_OUT_CON) are added to the suspend
backup list so that the existing mtk_afe_suspend/resume framework
saves and restores them across system sleep cycles.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/a580d6d676bcdba3d8bad94bb3bcb91a336bf1ba.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add optional HDMI audio path clocks

The HDMI audio output path on MT2701/MT7623N is rooted in HADDS2PLL
and gated by the audio_hdmi, audio_spdf and audio_apll power gates.
Acquire these four clocks from device tree using devm_clk_get_optional
so that existing platforms which do not wire up HDMI audio keep
probing unchanged. Actual clock enable/prepare is deferred to the
upcoming HDMI DAI startup path.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/5e24890acf597b04485145b5056ad8b161b4cbda.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add AFE HDMI register definitions

Add register offsets and bit defines for the MT2701/MT7623N AFE
HDMI audio output path: the HDMI BCK divider in AUDIO_TOP_CON3,
the HDMI output memif control and descriptor registers, the 8-bit
AFE_HDMI_CONN0 interconnect, and the AFE_8CH_I2S_OUT_CON engine
that drives the HDMI TX serial link. These are a prerequisite for
adding an HDMI playback path to the mt2701 AFE driver and have no
behavioural effect on their own.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/2c2a2e3e5d01da4a130160f5d5ffbd2a3808fe12.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mediatek,mt2701-hdmi-audio: add MT2701 HDMI audio

Describe the sound card node that routes the MT2701/MT7623N AFE
HDMI playback path to the on-chip HDMI transmitter. This is
separate from the AFE platform binding (mediatek,mt2701-audio)
because it represents board-level audio routing between the AFE
and the HDMI codec, not an additional IP block. MT7623N boards
carry the same IP and use the mt7623n- compatible as a fallback
to mt2701-.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/b18cd550bcfab8c5a8a765d68a9ab5bae3f1c4e7.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mt2701-afe-pcm: add HDMI audio path clocks

Document four additional optional clocks feeding the HDMI audio
output path on MT2701: the HADDS2 PLL (root of the HDMI audio
clock tree), the HDMI audio and S/PDIF interface power gates,
and the audio APLL root gate. Older device trees that do not
wire these up remain valid via minItems. MT7622 does not have
HDMI audio hardware, so its compatible is restricted to the
base set of 34 clocks.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/1a871e4494d514a270c7c27c080e338b2dbf6559.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: tas2770: Deal with bogus initial temperature value

TAS2770 initialises the temperature readout registers to 0.
This value persists until the chip is fully powered up and
the ADC starts sampling. The ADC then persists the last sampled
temperature during software shutdown.

The ADC should therefore never return 0 in normal operating
conditions, so return -ENODATA and mark it as a fault condition
using HWMON_T_FAULT.

Fixes: ff73e2780169 ("ASoC: tas2770: expose die temp to hwmon")
Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: tas2764: Deal with bogus initial temperature register value

The TAS2764 datasheet specifies that the chip initialises the
temperature register such that the temperature reading is 2.6 *C,
ostensibly to prevent tripping the chip's protection circuitry.
The chip is not capable of representing 2.6 *C however, and the
register is actually initialised to 0. The ADC does not start
sampling until the chip is powered up, and the last sampled
temperature persists in the register during software shutdown.
Therefore, any reading returning 0 is almost certain to be
from before the ADC has actually started sampling, meaning that
it is invalid.

Return -ENODATA early if the temperature has not yet been sampled
by the chip, and indicate a fault condition using HWMON_T_FAULT.

Fixes: 186dfc85f9a8 ("ASoC: tas2764: expose die temp to hwmon")
Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>

net: phy: realtek: Add support for PHY LEDs on RTL8221B

Realtek RTL8221B Ethernet PHY supports three LED pins which are used to
indicate link status and activity. Add netdev trigger support for them.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260501100002.755672-1-amadeus@jmu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: pass buffer size to egress_dev() to avoid MAC truncation

egress_dev() formats np->dev_mac via snprintf() but receives buf as
a bare char *, so it cannot derive the buffer size from the pointer. The
size argument was hardcoded to MAC_ADDR_STR_LEN (3 * ETH_ALEN - 1 = 17),
which is silly wrong in two ways:

1) misleading kernel log output on the MAC-selected target path
    (np->dev_name[0] == '\0'); for example "aa:bb:cc:dd:ee:ff doesn't
    exist, aborting" was logged as "aa:bb:cc:dd:ee:f doesn't exist,
    aborting".

2) the second argument of snprintf is the size of the buffer, not the
    size of what you want to write.

Add a bufsz parameter to egress_dev() and pass sizeof(buf) from each
caller, matching the standard snprintf() idiom and removing the
hardcoded size from the helper.

Every caller already declares "char buf[MAC_ADDR_STR_LEN + 1]" so the
formatted MAC continues to fit.

Tested by booting with
  netconsole=6665@/aa:bb:cc:dd:ee:ff,6666@10.0.0.1/00:11:22:33:44:55
on a kernel without a matching device. Pre-fix dmesg shows
"aa:bb:cc:dd:ee:f doesn't exist, aborting"; post-fix shows the full
"aa:bb:cc:dd:ee:ff doesn't exist, aborting".

Fixes: f8a10bed32f5 ("netconsole: allow selection of egress interface via MAC address")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260501-netpoll_snprintf_fix-v1-1-84b0566e6597@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Set gc_in_progress to true in unix_gc().

Igor Ushakov reported that unix_gc() could run with gc_in_progress
being false if the work is scheduled while running:

  Thread 1         Thread 2                     Thread 3
  --------         --------                     --------
                   unix_schedule_gc()           unix_schedule_gc()
                   `- if (!gc_in_progress)      `- if (!gc_in_progress)
                      |- gc_in_progress = true     |
                      `- queue_work()              |
  unix_gc() <----------------/                     |
  |                                                |- gc_in_progress = true
  ...                                              `- queue_work()
  |                                                       |
  `- gc_in_progress = false                               |
                                                          |
  unix_gc() <---------------------------------------------'
  |
  ... /* gc_in_progress == false */
  |
  `- gc_in_progress = false

unix_peek_fpl() relies on gc_in_progress not to confuse GC
by MSG_PEEK.

Let's set gc_in_progress to true in unix_gc().

Fixes: 8b90a9f819dc ("af_unix: Run GC on only one CPU.")
Reported-by: Igor Ushakov <sysroot314@gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260501073945.1884564-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: taprio: prepare taprio_dump() for RTNL removal

We soon will no longer hold RTNL in qdisc dumps.

Add READ_ONCE()/WRITE_ONCE() annotations.

Note: taprio already uses RCU to protect most of its fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260501064247.2027688-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

spi: Consistently define pci_device_ids using named initializers

The .driver_data member of the various struct pci_device_id arrays were
initialized by list expressions. This isn't easily readable if you're
not into PCI. Using named initializers is more explicit and thus easier
to parse. Also skip explicit assignments of 0 (which the compiler then
takes care of).

This change doesn't introduce changes to the compiled pci_device_id
arrays. Tested on x86 and arm64.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20260504142117.2116978-2-u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>

selftests/resctrl: Reduce L2 impact on CAT test

The L3 CAT test loads a buffer into cache that is proportional to the L3
size allocated for the workload and measures cache misses when accessing
the buffer as a test of L3 occupancy. When loading the buffer it can be
assumed that a portion of the buffer will be loaded into the L2 cache and
depending on cache design may not be present in L3. It is thus possible
for data to not be in L3 but also not trigger an L3 cache miss when
accessed.

Reduce impact of L2 on the L3 CAT test by, if L2 allocation is supported,
minimizing the portion of L2 that the workload can allocate into. This
encourages most of buffer to be loaded into L3 and support better
comparison between buffer size, cache portion, and cache misses when
accessing the buffer.

Link: https://lore.kernel.org/r/1f5aad318889cd6d4f9a8d8b0fbe83e3848d41a9.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Simplify perf usage in CAT test

The CAT test relies on the PERF_COUNT_HW_CACHE_MISSES event to determine if
modifying a cache portion size is successful. This event is configured to
report the data as part of an event group, but no other events are added to
the group.

Remove the unnecessary PERF_FORMAT_GROUP format setting. This eliminates
the need for struct perf_event_read and results in read() of the associated
file descriptor to return just one value associated with the
PERF_COUNT_HW_CACHE_MISSES event of interest.

Link: https://lore.kernel.org/r/fb69325eba5031b735fa79effaaacd797c9c6040.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Remove requirement on cache miss rate

As the CAT test reads the same buffer into different sized cache portions
it compares the number of cache misses against an expected percentage
based on the size of the cache portion.

Systems and test conditions vary. The CAT test is a test of resctrl
subsystem health and not a test of the hardware architecture so it is not
required to place requirements on the size of the difference in cache
misses, just that the number of cache misses when reading a buffer
increase as the cache portion used for the buffer decreases.

Remove additional constraint on how big the difference between cache
misses should be as the cache portion size changes. Only test that the
cache misses increase as the cache portion size decreases. This remains
a good sanity check of resctrl subsystem health while reducing impact
of hardware architectural differences and the various conditions under
which the test may run.

Increase the size difference between cache portions to additionally avoid
any consequences resulting from smaller increments.

Link: https://lore.kernel.org/r/6de4da5486354c0f25fef0d194956470cb744041.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Raise threshold at which MBM and PMU values are compared

Commit 501cfdba0a40 ("selftests/resctrl: Do not compare performance
counters and resctrl at low bandwidth") introduced a threshold under which
memory bandwidth values from MBM and performance counters are not compared.
This is needed because MBM and the PMUs do not have an identical view of
memory bandwidth since PMUs can count all memory traffic while MBM does not
count "overhead" (for example RAS) traffic that cannot be attributed to an
RMID. As a ratio this difference in view of memory bandwidth is pronounced
at low memory bandwidths.

The 750MiB threshold was chosen arbitrarily after comparisons on different
platforms. Exposed to more platforms after introduction this threshold has
proven to be inadequate.

Having accurate comparison between performance counters and MBM requires
careful management of system load as well as control of features that
introduce extra memory traffic, for example, patrol scrub. This is not
appropriate for the resctrl selftests that are intended to run on a
variety of systems with various configurations.

Increase the memory bandwidth threshold under which no comparison is made
between performance counters and MBM. Add additional leniency by increasing
the percentage of difference that will be tolerated between these counts.

There is no impact to the validity of the resctrl selftests results as a
measure of resctrl subsystem health.

Link: https://lore.kernel.org/r/b374c33ddd324130d6255cbb91c3dd500e8277e7.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Increase size of buffer used in MBM and MBA tests

Errata for Sierra Forest [1] (SRF42) and Granite Rapids [2] (GNR12)
describe the problem that MBM on Intel RDT may overcount memory bandwidth
measurements. The resctrl tests compare memory bandwidth reported by iMC
PMU to that reported by MBM causing the tests to fail on these systems
depending on the settings of the platform related to the errata.

Since the resctrl tests need to run under various conditions it is not
possible to ensure system settings are such that MBM will not overcount.
It has been observed that the overcounting can be controlled via the
buffer size used in the MBM and MBA tests that rely on comparisons
between iMC PMU and MBM measurements.

Running the MBM test on affected platforms with different buffer sizes it
can be observed that the difference between iMC PMU and MBM counts reduce
as the buffer size increases. After increasing the buffer size to more
than 4X the differences between iMC PMU and MBM become insignificant.

Increase the buffer size used in MBM and MBA tests to 4X L3 size to reduce
possibility of tests failing due to difference in counts reported by iMC
PMU and MBM.

Link: https://lore.kernel.org/r/1bd4d8c5fc791234b0a9da94f29a3e278ba2f7ee.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details/
Link: https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/birch-stream/xeon-6900-6700-6500-series-processors-with-p-cores-specification-update/011US/errata-details/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Support multiple events associated with iMC

The resctrl selftests discover needed parameters to perf_event_open() via
sysfs. The PMU associated with every memory controller (iMC) is discovered
via the /sys/bus/event_source/devices/uncore_imc_N/type file while
the read memory bandwidth event type and umask is discovered via
/sys/bus/event_source/devices/uncore_imc_N/events/cas_count_read.

Newer systems may have multiple events that expose read memory bandwidth.
Running a recent kernel that includes
commit 6a8a48644c4b ("perf/x86/intel/uncore: Add per-scheduler IMC CAS count events")
on these systems expose the multiple events. For example,
/sys/bus/event_source/devices/uncore_imc_N/events/cas_count_read_sch0
/sys/bus/event_source/devices/uncore_imc_N/events/cas_count_read_sch1

Support parsing of iMC PMU properties when the PMU may have multiple events
to measure read memory bandwidth. The PMU only needs to be discovered once.
Split the parsing of event details from actual PMU discovery in order to
loop over all events associated with the PMU. Match all events with the
cas_count_read prefix instead of requiring there to be one file with that
name.

Make the parsing code more robust. With strings passed around to create
needed paths, use snprintf() instead of sprintf() to ensure there is
always enough space to create the path while using the standard PATH_MAX
for path lengths. Ensure there is enough room in imc_counters_config[]
before attempting to add an entry.

Link: https://lore.kernel.org/r/b03ca0fa21a09500c56ee589e32516c2c5effeaf.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Prepare for parsing multiple events per iMC

The events needed to read memory bandwidth are discovered by iterating
over every memory controller (iMC) within /sys/bus/event_source/devices.
Each iMC's PMU is assumed to have one event to measure read memory
bandwidth that is represented by the sysfs cas_count_read file. The event's
configuration is read from "cas_count_read" and stored as an element of
imc_counters_config[] by read_from_imc_dir() that receives the
index of the array where to store the configuration as argument.

It is possible that an iMC's PMU may have more than one event that should
be used to measure memory bandwidth.

Change semantics to not provide the index of the array to
read_from_imc_dir() but instead a pointer to the index. This enables
read_from_imc_dir() to store configurations for more than one event by
incrementing the index to imc_counters_config[] itself.

Ensure that the same type is consistently used for the index as it is
passed around during counter configuration.

Link: https://lore.kernel.org/r/549e026d20af0381349e645c912e6470fce8bd7e.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Do not store iMC counter value in counter config structure

The MBM and MBA tests compare MBM memory bandwidth measurements against
the memory bandwidth event values obtained from each memory controller's
PMU. The memory bandwidth event settings are discovered from the memory
controller details found in /sys/bus/event_source/devices/uncore_imc_N and
stored in struct imc_counter_config.

In addition to event settings struct imc_counter_config contains
imc_counter_config::return_value in which the associated event value is
stored on every read.

The event value is consumed and immediately recorded at regular intervals.
The stored value is never consumed afterwards, making its storage as part
of event configuration unnecessary.

Remove the return_value member from struct imc_counter_config. Instead
just use a more aptly named "measurement" local variable for use during
event reading.

Link: https://lore.kernel.org/r/e0b6ad2755e2fd802f54b0bc07eeb90247baca19.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Reduce interference from L2 occupancy during cache occupancy test

The CMT test creates a new control group that is also capable of monitoring
and assigns the workload to it. The workload allocates a buffer that by
default fills a portion of the L3 and keeps reading from the buffer,
measuring the L3 occupancy at intervals. The test passes if the workload's
L3 occupancy is within 15% of the buffer size.

The CMT test does not take into account that some of the workload's data
may land in L2/L1. Matching L3 occupancy to the size of the buffer while
a portion of the buffer can be allocated into L2 is not accurate.

Take the L2 cache into account to improve test accuracy:
- Reduce the workload's L2 cache allocation to the minimum on systems that
   support L2 cache allocation. Do so with a new utility in preparation for
   all L3 cache allocation tests needing the same capability.
- Increase the buffer size to accommodate data that may be allocated into
   the L2 cache. Use a buffer size double the L3 portion to keep using the
   L3 portion size as goal for L3 occupancy while taking into account that
   some of the data may be in L2.

Running the CMT test on a sample system while introducing significant
cache misses using "stress-ng --matrix-3d 0 --matrix-3d-zyx" shows
significant improvement in L3 cache occupancy:

Before:

    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Write schema "L3:0=fffe0" to resctrl FS
    # Write schema "L3:0=1f" to resctrl FS
    # Benchmark PID: 7089
    # Checking for pass/fail
    # Pass: Check cache miss rate within 15%
    # Percent diff=12
    # Number of bits: 5
    # Average LLC val: 73269248
    # Cache span (bytes): 83886080
    ok 1 CMT: test

After:
    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Write schema "L3:0=fffe0" to resctrl FS
    # Write schema "L3:0=1f" to resctrl FS
    # Write schema "L2:1=0x1" to resctrl FS
    # Benchmark PID: 7171
    # Checking for pass/fail
    # Pass: Check cache miss rate within 15%
    # Percent diff=0
    # Number of bits: 5
    # Average LLC val: 83755008
    # Cache span (bytes): 83886080
    ok 1 CMT: test

Link: https://lore.kernel.org/r/00445fa64c251b86b86023f87220ee1ad8561460.1775266384.git.reinette.chatre@intel.com
Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/lkml/aO+7MeSMV29VdbQs@e133380.arm.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Improve accuracy of cache occupancy test

Dave Martin reported inconsistent CMT test failures. In one experiment
the first run of the CMT test failed because of too large (24%) difference
between measured and achievable cache occupancy while the second run passed
with an acceptable 4% difference.

The CMT test is susceptible to interference from the rest of the system.
This can be demonstrated with a utility like stress-ng by running the CMT
test while introducing cache misses using:

   stress-ng --matrix-3d 0 --matrix-3d-zyx

Below shows an example of the CMT test failing because of a significant
difference between measured and achievable cache occupancy when run with
interference:
    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Benchmark PID: 7011
    # Checking for pass/fail
    # Fail: Check cache miss rate within 15%
    # Percent diff=99
    # Number of bits: 5
    # Average LLC val: 235929
    # Cache span (bytes): 83886080
    not ok 1 CMT: test

The CMT test creates a new control group that is also capable of monitoring
and assigns the workload to it. The workload allocates a buffer that by
default fills a portion of the L3 and keeps reading from the buffer,
measuring the L3 occupancy at intervals. The test passes if the workload's
L3 occupancy is within 15% of the buffer size.

By not adjusting any capacity bitmasks the workload shares the cache with
the rest of the system. Any other task that may be running could evict
the workload's data from the cache causing it to have low cache occupancy.

Reduce interference from the rest of the system by ensuring that the
workload's control group uses the capacity bitmask found in the user
parameters for L3 and that the rest of the system can only allocate into
the inverse of the workload's L3 cache portion. Other tasks can thus no
longer evict the workload's data from L3.

With the above adjustments the CMT test is more consistent. Repeating the
CMT test while generating interference with stress-ng on a sample
system after applying the fixes show significant improvement in test
accuracy:

    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Write schema "L3:0=fffe0" to resctrl FS
    # Write schema "L3:0=1f" to resctrl FS
    # Benchmark PID: 7089
    # Checking for pass/fail
    # Pass: Check cache miss rate within 15%
    # Percent diff=12
    # Number of bits: 5
    # Average LLC val: 73269248
    # Cache span (bytes): 83886080
    ok 1 CMT: test

Link: https://lore.kernel.org/r/b160592179f88069cdc679563e152007998a0d76.1775266384.git.reinette.chatre@intel.com
Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/lkml/aO+7MeSMV29VdbQs@e133380.arm.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

regulator: Add PM8150 PMIC support

Rakesh Kota <rakesh.kota@oss.qualcomm.com> says:

PM8150 is a power management IC. It is used in shikra boards.

Link: https://patch.msgid.link/20260429-add_pm8150_regulators-v1-0-9879c0967cf0@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: qcom_smd: Add PM8150 regulators

The PM8150 is found on boards with shikra SoCs and It
provides 10 SMPS and 18 LDO regulators.

Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260429-add_pm8150_regulators-v1-2-9879c0967cf0@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: dt-bindings: qcom,smd-rpm-regulator: Document PM8150 IC

Document the pm8150 compatible string and available regulators in
the QCOM SMD RPM regulator documentation.

Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Link: https://patch.msgid.link/20260429-add_pm8150_regulators-v1-1-9879c0967cf0@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN

Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
affinity management"), kthreads default to use the HK_TYPE_DOMAIN
cpumask. IOW, it is no longer affected by the setting of the nohz_full
boot kernel parameter.

That means HK_TYPE_KTHREAD should now be an alias of HK_TYPE_DOMAIN
instead of HK_TYPE_KERNEL_NOISE to correctly reflect the current kthread
behavior. Make the change as HK_TYPE_KTHREAD is still being used in
some networking code.

Fixes: 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU

The ip_vs_ctl.c file and the associated ip_vs.h file are the only places
in the kernel where HK_TYPE_KTHREAD cpumask is being retrieved and used.
Now that HK_TYPE_KTHREAD/HK_TYPE_DOMAIN cpumask can be changed at run
time. We need to use RCU to guard access to this cpumask to avoid a
potential UAF problem as the returned cpumask may be freed before it
is being used.

We can replace HK_TYPE_KTHREAD by HK_TYPE_DOMAIN as they are aliases
of each other, but keeping the HK_TYPE_KTHREAD name can highlight the
fact that it is the kthread initiated by ipvs that is being controlled.

Fixes: 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size

Calling roundup_pow_of_two() with 0 has undefined result:

UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'unsigned long'
CPU: 1 UID: 0 PID: 77 Comm: kworker/u8:4 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: events_unbound conn_resize_work_handler
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
__ubsan_handle_shift_out_of_bounds+0x385/0x410 lib/ubsan.c:494
__roundup_pow_of_two include/linux/log2.h:57 [inline]
ip_vs_rht_desired_size+0x2cf/0x410 net/netfilter/ipvs/ip_vs_core.c:240
ip_vs_conn_desired_size net/netfilter/ipvs/ip_vs_conn.c:765 [inline]
conn_resize_work_handler+0x1b6/0x14c0 net/netfilter/ipvs/ip_vs_conn.c:822
process_one_work kernel/workqueue.c:3302 [inline]
process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Reported-by: syzbot+217f1db9c791e27fe54a@syzkaller.appspotmail.com
Fixes: b655388111cf ("ipvs: add resizable hash tables")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fix races around est_mutex and est_cpulist

Sashiko reports for races and possible crash around
the usage of est_cpulist_valid and sysctl_est_cpulist.
The problem is that we do not lock est_mutex in some
places which can lead to wrong write ordering and
as result problems when calling cpumask_weight()
and cpumask_empty().

Fix them by moving the est_max_threads read/write under
locked est_mutex. Do the same for one ip_vs_est_reload_start()
call to protect the cpumask_empty() usage of sysctl_est_cpulist.

To remove the chance of deadlock while stopping the
estimation kthreads, keep the data structure for kthread 0
even after last estimator is removed and do not hold mutexes
while stopping this task. Now we will use a new flag 'needed'
to know when kthread 0 should run. The kthreads above 0
do not use mutexes, so stop them under est_mutex because
their kthread data still can be destroyed if they do not
serve estimators. Now all kthreads will be started by
the est_reload_work to properly serialize the stop/start
for kthread 0.

Reduce the use of service_mutex in ip_vs_est_calc_phase()
because under est_mutex we can safely walk est_kt_arr to
stop the kthreads above slot 0.

As ip_vs_stop_estimator() for tot_stats should be called
under service_mutex, do it early in the netns exit path
in ip_vs_flush() to avoid locking the mutex again later.
It still should be called in ip_vs_control_net_cleanup_sysctl()
when we are called during netns init error. Use -2 for ktid
as indicator if estimator was already stopped.

Finally, fix use-after-free for kd->est_row in
ip_vs_est_calc_phase(). est->ktrow should simply switch to
a delay value while estimator is linked to est_temp_list.

Link: https://sashiko.dev/#/patchset/20260331165015.2777765-1-longman%40redhat.com
Link: https://sashiko.dev/#/patchset/20260420171308.87192-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260422125123.40658-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260424175858.54752-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260425103918.7447-1-ja%40ssi.bg
Fixes: f0be83d54217 ("ipvs: add est_cpulist and est_nice sysctl vars")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: do not leak dest after get from dest trash

Sashiko warns about leaked dest if ip_vs_start_estimator()
fails in ip_vs_add_dest(). Add ip_vs_trash_put_dest() to
put back the dest into dest trash.

Link: https://sashiko.dev/#/patchset/20260428175725.72050-1-ja%40ssi.bg
Fixes: 705dd3444081 ("ipvs: use kthreads for stats estimation")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fix the spin_lock usage for RT build

syzbot reports for sleeping function called from invalid context [1].
The recently added code for resizable hash tables uses
hlist_bl bit locks in combination with spin_lock for
the connection fields (cp->lock).

Fix the following problems:

* avoid using spin_lock(&cp->lock) under locked bit lock
because it sleeps on PREEMPT_RT

* as the recent changes call ip_vs_conn_hash() only for newly
allocated connection, the spin_lock can be removed there because
the connection is still not linked to table and does not need
cp->lock protection.

* the lock can be removed also from ip_vs_conn_unlink() where we
are the last connection user.

* the last place that is fixed is ip_vs_conn_fill_cport()
where now the cp->lock is locked before the other locks to
ensure other packets do not modify the cp->flags in non-atomic
way. Here we make sure cport and flags are changed only once
if two or more packets race to fill the cport. Also, we fill
cport early, so that if we race with resizing there will be
valid cport key for the hashing. Add a warning if too many
hash table changes occur for our RCU read-side critical
section which is error condition but minor because the
connection still can expire gracefully. Still, restore the
cport to 0 to allow retransmitted packet to properly fill
the cport. Problems reported by Sashiko.

[1]:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 16, name: ktimers/0
preempt_count: 2, expected: 0
RCU nest depth: 3, expected: 3
8 locks held by ktimers/0/16:
#0: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163
#1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163
#2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline]
#2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: timer_base_lock_expiry kernel/time/timer.c:1502 [inline]
#2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: __run_timer_base+0x120/0x9f0 kernel/time/timer.c:2384
#3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
#3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __rt_spin_lock kernel/locking/spinlock_rt.c:50 [inline]
#3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1e0/0x400 kernel/locking/spinlock_rt.c:57
#4: ffffc90000157a80 ((&cp->timer)){+...}-{0:0}, at: call_timer_fn+0xd4/0x5e0 kernel/time/timer.c:1745
#5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
#5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:315 [inline]
#5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_expire+0x257/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260
#6: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163
#7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline]
#7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline]
#7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260
Preemption disabled at:
[<ffffffff898a6358>] bit_spin_lock include/linux/bit_spinlock.h:38 [inline]
[<ffffffff898a6358>] hlist_bl_lock+0x18/0x110 include/linux/list_bl.h:149
CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Tainted: G W L syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [W]=WARN, [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
__might_resched+0x329/0x480 kernel/sched/core.c:9162
__rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
rt_spin_lock+0xc2/0x400 kernel/locking/spinlock_rt.c:57
spin_lock include/linux/spinlock_rt.h:45 [inline]
ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline]
ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260
call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
expire_timers kernel/time/timer.c:1799 [inline]
__run_timers kernel/time/timer.c:2374 [inline]
__run_timer_base+0x6a3/0x9f0 kernel/time/timer.c:2386
run_timer_base kernel/time/timer.c:2395 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
handle_softirqs+0x1de/0x6d0 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
run_ktimerd+0x69/0x100 kernel/softirq.c:1151
smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Reported-by: syzbot+504e778ddaecd36fdd17@syzkaller.appspotmail.com
Link: https://sashiko.dev/#/patchset/20260415200216.79699-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260420165539.85174-4-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260422135823.50489-4-ja%40ssi.bg
Fixes: 2fa7cc9c7025 ("ipvs: switch to per-net connection table")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars

Sashiko warns that the new sysctls vars can be changed
after the hash tables are destroyed and their respective
resizing works canceled, leading to mod_delayed_work()
being called for canceled works.

Solve this in different ways. conn_tab can be present even
without services and is destroyed only on netns exit, so use
disable_delayed_work_sync() to disable the work instead of
adding more synchronization mechanisms.

As for the svc_table, it is destroyed when the services
are deleted, so we must be sure that netns exit is not
called yet (the check for 'enable') and the work is
not canceled by checking all under same mutex lock.

Also, use WRITE_ONCE when updating the sysctl vars as we
already read them with READ_ONCE.

Link: https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de
Fixes: 8d7de5477e47 ("ipvs: add conn_lfactor and svc_lfactor sysctl vars")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fixes for the new ip_vs_status info

Sashiko reports some problems for the recently added
/proc/net/ip_vs_status:

* ip_vs_status_show() as a table reader may run long after the
conn_tab and svc_table table are released. While ip_vs_conn_flush()
properly changes the conn_tab_changes counter when conn_tab is removed,
ip_vs_del_service() and ip_vs_flush() were missing such change for
the svc_table_changes counter. As result, readers like
ip_vs_dst_event() and ip_vs_status_show() may continue to use
a freed table after a cond_resched_rcu() call.

* While counting the buckets in ip_vs_status_show() make sure we
traverse only the needed number of entries in the chain. This also
prevents possible overflow of the 'count' variable.

* Add check for 'loops' to prevent infinite loops while restarting
the traversal on table change.

* While IP_VS_CONN_TAB_MAX_BITS is 20 on 32-bit platforms and
there is no risk to overflow when multiplying the number of
conn_tab buckets to 100, prefer the div_u64() helper to make
the following dividing safer.

* Use 0440 permissions for ip_vs_status to restrict the
info only to root due to the exported information for hash
distribution.

Link: https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de
Fixes: 9a9ccef907a7 ("ipvs: add ip_vs_status info")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge tag 'linux_kselftest-fixes-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

- Fix extra test number increment in ksft_exit_skip() that results in
   incorrect KTAP result

- Fix regression introduced by addition of explicit constructor orders
   for fixture tests. This addition broke the ordering of those relative
   to non-fixture tests and the reverse-constructor-order detection

* tag 'linux_kselftest-fixes-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: harness: Restore order of test functions
  selftests: kselftest: fix wrong test number in ksft_exit_skip

selftests: ovpn: reduce ping count in test.sh

The second stage of test.sh ("run baseline data traffic") performs a
basic connectivity check with ping -qfc 500 -w 3. On slower CI
instances this is too strict for TCP: the RTT is high enough that 500
echo requests do not reliably complete within 3 seconds, so the stage
flakes and the test fails even though the ovpn setup is healthy.

Reduce the packet count to 100 for both the plain and 3000-byte pings in
that stage. This still verifies peer setup, key exchange, routing, and
data-path traffic, without making the basic connectivity check depend on
timing out under load.

Fixes: 959bc330a439 ("testing/selftests: add test tool and scripts for ovpn module")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>

ovpn: ensure packet delivery happens with BH disabled

ovpn injects decrypted packets into the netdev RX path through
ovpn_netdev_write() which invokes gro_cells_receive() and
dev_dstats_rx_add().

ovpn_netdev_write() is normally called in softirq context,
however, in case of TCP connections it may also be invoked
process context.

When this happens gro_cells_receive() will throw a warning:

  [  230.183747][   T12] WARNING: net/core/gro_cells.c:30 at gro_cells_receive+0x708/0xaa0, CPU#1: kworker/u16:0/12

and lockdep will also report a potential inconsistent lock state:

  WARNING: inconsistent lock state
  7.0.0-rc4+ #246 Tainted: G        W
  --------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.

because attempts to acquire gro_cells->bh_lock by both
contexts may lead to a deadlock.

At the same time, dev_dstats_rx_add() does not expect to race
with a softirq (which may happen when invoked in process context),
because the latter may access its per-cpu state and corrupt
it.

Fix all this by invoking local_bh_disable/enable() around
gro_cells_receive() and dev_dstats_rx_add() to ensure that
bottom halves are always disabled before calling both of
them.

Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>

ovpn: reset MAC header before passing skb up

After decapsulating a packet, the skb->mac_header still points to the
outer transport header.

Fix this by calling skb_reset_mac_header() in ovpn_netdev_write() to
ensure the MAC header points to the beginning of
the inner IP/network packet, as expected by the rest of the stack.

Reported-by: Minqiang Chen <ptpt52@gmail.com>
Fixes: 8534731dbf2d ("ovpn: implement packet processing")
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>

ARM: dts: imx6ul-var-som: add support for LVDS display panel

Add support for the LD configuration option (LVDS encoder assembled on SOM)
so that the LVDS display panel on the concerto EVK board works properly.
Not all VAR-SOM-6UL SOMs have the LD configuration option so factor out
this functionality to a separate dtsi.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: factor out audio support

Not all boards use the audio codec, so factor out this functionality to a
separate dtsi.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: add support for EC configuration option (ENET1)

ENET1 is currently disabled and not supported/working on the concerto EVK.
Add support for this optional configuration in a separate dtsi, so that it
can be selectively enabled/disabled.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: factor out ENET2 ethernet support

Not all boards use the ethernet ENET2 port, so factor out this
functionality to a separate dtsi. On the concerto board, this
uses the ethernet PHY assembled on it.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: add proper Wifi and Bluetooth support

Add proper support for the optional Wifi and Bluetooth configuration on
VAR-SOM-6UL so that it works out of the box, without any custom scripts.
The Wifi/BT module support is mutually exclusive with SD card interface.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: factor out SD card support

Move SD support to a separate dtsi, since it cannot be used at the
same time as the Wifi/BT module. Also not all boards support the SD card.
Move pinctrl_usdhc1* to the common imx6ul-var-som-common dtsi so that it
can be used by the future Wifi/BT dtsi.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som-concerto: order DT properties

reorder pinctrl_gpio_leds to respect alphabetical order.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som-concerto: Factor out common parts for all CPU variants

Export common parts to the Variscite VAR-SOM-6UL dtsi so that they can be
reused on other boards and to simplify adding future dedicated device tree
files for each CPU variant.

Move i2c1 pinctrl to var-som dtsi pinmux, so that it can be reused by other
boards.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: Factor out common parts for all CPU variants

Factor out the parts on the Variscite VAR-SOM-6UL [1] that are common to
all CPU variants (6UL, 6ULL, etc). This will simplify adding future
dedicated device tree files for each CPU variant.

Link https://dev.variscite.com/var-som-6ul [1]

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

dt-bindings: arm: fsl: add variscite,var-som-imx6ull

Add support for the imx6ull CPU variant of the Variscite concerto
board evaluation kit with a VAR-SOM-6UL.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

dt-bindings: arm: fsl: change incorrect VAR-SOM-MX6UL references

There is no Variscite module named VAR-SOM-MX6UL, but there is VAR-SOM-MX6
and also VAR-SOM-6UL, so it is confusing at first to know to which one it
refers to. The imx6ul-var-som* dts/dtsi supports only the VAR-SOM-6UL [1],
not VAR-SOM-MX6 [2], so modify comments and model descriptions accordingly.

Link https://dev.variscite.com/var-som-6ul [1]
Link: https://dev.variscite.com/var-som-mx6
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260305180651.1827087-4-hugo@hugovil.com/
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: change incorrect VAR-SOM-MX6UL references

There is no Variscite module named VAR-SOM-MX6UL, but there is VAR-SOM-MX6
and also VAR-SOM-6UL, so it is confusing at first to know to which one it
refers to. The imx6ul-var-som* dts/dtsi supports only the VAR-SOM-6UL [1],
not VAR-SOM-MX6 [2], so modify comments and model descriptions accordingly.

Link https://dev.variscite.com/var-som-6ul [1]
Link: https://dev.variscite.com/var-som-mx6
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: fix warning for boolean property with a value

dmesg warning:

  OF: /soc/bus@2000000/ethernet@20b4000/mdio/ethernet-phy@3: Read of
      boolean property 'micrel,rmii-reference-clock-select-25-mhz' with a
      value.

Using of_property_read_bool() for non-boolean properties is deprecated and
results in a warning during runtime since commit c141ecc3cecd
("of: Warn when of_property_read_bool() is used on non-boolean properties")

micrel,rmii-reference-clock-select-25-mhz is a boolean property and should
not have a value, so remove it.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

ARM: dts: imx6ul-var-som: fix warning for non-existent dc-supply property

The dc-supply property is non-existent in Linux now, nor when this DTS file
was created when importing it from Variscite own kernel.

Therefore remove it to fix this warning:

    imx6ul-var-som-concerto.dtb: cpu@0 (arm,cortex-a7): Unevaluated
    properties are not allowed ('dc-supply' was unexpected)
        from schema $id: http://devicetree.org/schemas/arm/cpus.yaml

Fixes: 9d6a67d9c7a9 ("ARM: dts: imx6ul: Add Variscite VAR-SOM-MX6UL SoM support")
Cc: stable@kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

arm64: dts: add support for NXP i.MX95 15x15 audio board (version 2)

i.MX Audio Board is a configurable and functional audio processing
platform. Integrating a variety of audio input and output interfaces into
the system, the i.MX Audio Board supports HDMI input, HDMI eARC,
S/PDIF I/O, 2-ch ADC line-in, 24-ch DAC line-out and more. Based on these
features, rich audio application cases can be realized.

This is a basic device tree supporting with i.MX95 15x15 SoC and Audio
board (version 2).

- Six Cortex-A55
- NXP PCAL6416 GPIO expanders
- RGB LEDs via GPIO expander
- LPI2C1, LPI2C2, LPI2C3 controllers
- LPUART1 (console)
- USDHC1 (eMMC), USDHC2 (SD)
- Three DAC (AK4458)
- One ADC (AK5552)

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

dt-bindings: arm: fsl: Add compatible for i.MX95 15x15 audio board (version 2)

Introduce a new DT compatible string for the NXP i.MX95 15x15 audio board
(version 2).

i.MX Audio Board is a configurable and functional audio processing
platform. Integrating a variety of audio input and output interfaces into
the system, the i.MX Audio Board supports HDMI input, HDMI eARC,
S/PDIF I/O, 2-ch ADC line-in, 24-ch DAC line-out and more. Based on these
features, rich audio application cases can be realized.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

sched_ext: Add __printf format attributes to scx_vexit() and bstr formatters

scx_vexit() forwards (fmt, args) to vscnprintf(); bstr_format() and
__bstr_format() forward fmt to bstr_printf(); the BPF kfunc wrappers
scx_bpf_exit_bstr(), scx_bpf_error_bstr() and scx_bpf_dump_bstr() in
turn forward fmt to those formatters. None of them have __printf(),
so clang -Wmissing-format-attribute fires on the forwarded calls and
C-side callers don't get format-string checking.

Annotate the six functions with __printf(N, 0) matching the fmt
parameter position in each.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/all/202605041112.Y6OG7v9r-lkp@intel.com/
Signed-off-by: Tejun Heo <tj@kernel.org>

nilfs2: reject CLEAN_SEGMENTS ioctl with out-of-range segment numbers

Syzbot reported a hung task in nilfs_transaction_begin() where multiple
tasks performing chmod() on a nilfs2 mount blocked for over 143 seconds
waiting to acquire ns_segctor_sem for read:

  INFO: task syz.0.17:5918 blocked for more than 143 seconds.
  Call Trace:
   schedule+0x164/0x360
   rwsem_down_read_slowpath+0x6d9/0x940
   down_read+0x99/0x2e0
   nilfs_transaction_begin+0x364/0x710 fs/nilfs2/segment.c:221
   nilfs_setattr+0x124/0x2c0 fs/nilfs2/inode.c:921
   notify_change+0xc1a/0xf40
   chmod_common+0x273/0x4a0
   do_fchmodat+0x12d/0x230

The writer holding ns_segctor_sem was a concurrent
NILFS_IOCTL_CLEAN_SEGMENTS caller, stuck inside printk while emitting
per-element warnings from nilfs_sufile_updatev():

   __nilfs_msg+0x373/0x450 fs/nilfs2/super.c:78
   nilfs_sufile_updatev+0x21c/0x6d0 fs/nilfs2/sufile.c:186
   nilfs_sufile_freev fs/nilfs2/sufile.h:93 [inline]
   nilfs_free_segments fs/nilfs2/segment.c:1140 [inline]
   nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1261 [inline]
   nilfs_segctor_do_construct+0x1f55/0x76c0
   nilfs_clean_segments+0x3bd/0xa50
   nilfs_ioctl_clean_segments fs/nilfs2/ioctl.c:922 [inline]
   nilfs_ioctl+0x261f/0x2780

The root cause is that user-supplied segment numbers are not validated
before nilfs_clean_segments() begins doing work; the range check on
each segnum is performed deep inside the call chain by
nilfs_sufile_updatev(), which emits a nilfs_warn() per invalid entry
while still holding the segctor lock and the sufile mi_sem.  Under load
(repeated invocations across multiple mounts saturating the global
printk path), the cumulative printk latency keeps ns_segctor_sem held
long enough to trip the hung_task watchdog, blocking concurrent
operations such as chmod() that need ns_segctor_sem for read.

Fix by validating the contents of kbufs[4] in nilfs_clean_segments()
immediately after acquiring ns_segctor_sem via nilfs_transaction_lock().
Holding ns_segctor_sem serializes the check against
nilfs_ioctl_resize(), which can modify ns_nsegments, so the validation
uses a consistent value.  Out-of-range segment numbers are rejected
with -EINVAL before any segment-cleaning work begins, so the bad
entries never reach the per-element diagnostic path inside
nilfs_sufile_updatev().

Reported-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=62f0f99d2f2bb8e3bbd7
Tested-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Fixes: 071cb4b81987 ("nilfs2: eliminate removal list of segments")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>

docs: cgroup-v1: Update charge-commit section

Commit 1d8f136a421f ("memcg/hugetlb: remove memcg hugetlb
try-commit-cancel protocol") removed mem_cgroup_commit_charge() and
mem_cgroup_cancel_charge(), but the docs still refer to those functions.
There is no longer any charge cancellation.

Update the docs to match the code.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched_ext: idle: Recheck prev_cpu after narrowing allowed mask

scx_select_cpu_dfl() narrows @allowed to @cpus_allowed & @p->cpus_ptr
when the BPF caller supplies a @cpus_allowed that differs from
@p->cpus_ptr and @p doesn't have full affinity. However,
@is_prev_allowed was computed against the original (wider)
@cpus_allowed, so the prev_cpu fast paths could pick a @prev_cpu that
is in @cpus_allowed but not in @p->cpus_ptr, violating the intended
invariant that the returned CPU is always usable by @p. The kernel
masks this via the SCX_EV_SELECT_CPU_FALLBACK fallback, but the
behavior contradicts the documented contract.

Move the @is_prev_allowed evaluation past the narrowing block so it
tests against the final @allowed mask.

Fixes: ee9a4e92799d ("sched_ext: idle: Properly handle invalid prev_cpu during idle selection")
Cc: stable@vger.kernel.org # v6.16+
Assisted-by: Claude <noreply@anthropic.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched_ext: Normalize exit dump header to "on CPU N"

Unify to uppercase to match the UEI output.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched_ext: Remove redundant rcu_read_lock/unlock() in sysrq_handle_sched_ext_reset()

sysrq_handle_sched_ext_reset() is called from __handle_sysrq(), which
already holds rcu_read_lock() while invoking the sysrq handler. Remove
the redundant rcu_read_lock/unlock() pair.

Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

drm/sti: remove bridge when sti_hda component_add fails

Use devm_drm_bridge_add() so the bridge is released if probe fails after
registration, and drop the manual drm_bridge_remove() in remove().

Check the return value of devm_drm_bridge_add().

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Fixes: d28726efc637 ("drm/sti: hda: add bridge before attaching")
Cc: stable@vger.kernel.org
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Acked-by: Raphaël Gallais-Pou <rgallaispou@gmail.com>
Link: https://patch.msgid.link/20260423200622.325076-1-osama.abdelkader@gmail.com
Signed-off-by: Raphael Gallais-Pou <raphael.gallais-pou@foss.st.com>

docs: kselftest: Document the FORCE_TARGETS build variable

FORCE_TARGETS has been part of the kselftest build system for
some time but is absent from the developer documentation. Without
an entry here, users relying on kselftest in CI pipelines would
have to read the selftests Makefile directly to discover the
option.

A build that exits zero despite some targets failing can mask
real breakage and mislead automated systems into reporting
success. Add a dedicated section so that CI authors can easily
find and adopt FORCE_TARGETS=1 to turn such silent partial
failures into hard errors.

Link: https://lore.kernel.org/r/20260417-selftests-docs-v1-1-32e4a78214eb@suse.com
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

isa: switch to dynamic root device

Driver core expects devices to be dynamically allocated and will, for
example, complain loudly if a device that lacks a release function is
ever freed.

Use root_device_register() to allocate and register the root device
instead of open coding using a static device.

Note that this also fixes a reference leak in case device_register()
fails which may be flagged by static checkers.

Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: William Breathitt Gray <wbg@kernel.org>
Link: https://patch.msgid.link/20260424102400.2615677-1-johan@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Merge tag 'for-linus-7.1-2' of https://github.com/cminyard/linux-ipmi

Pull IPMI fixes from Corey Minyard:
"Fix a number of issues that came up recently

  The first two fixes are workarounds for buggy IPMI hardware. The
  hardware says it has data for the IPMI driver to read constantly, so
  the driver reads the data constantly, causing any new requests to be
  blocked.

  The first fix was to check for invalid data right when the data was
  read from the device and stop the operation there (there was a later
  check for invalid data, but it could not stop the operation at that
  point). It turned out the device was providing good data, so that
  didn't fix the issue, but it's still a good check.

  The second fix stops fetching this data after a few fetches and allows
  other operations to occur. The driver won't work very well, but at
  least it won't wedge. This seems to fix the issue.

  The third issue is a problem I spotted while working on the previous
  issue where if a certain memory allocation failed the driver would
  stop working.

  The fourth issue is a problem was a missing set to NULL on a PTR_ERR()
  return, introduced in the previous series for 7.1"

* tag 'for-linus-7.1-2' of https://github.com/cminyard/linux-ipmi:
  ipmi:ssif: NULL thread on error
  ipmi:si: Return state to normal if message allocation fails
  ipmi: Add limits to event and receive message requests
  ipmi: Check event message buffer response for bad data

driver core: class: fix typo in struct class documentation

Fix a spelling error in the comment for the ns_type member of struct class.
Change "detemine" to "determine".

Signed-off-by: Prabhudasu Vatala <prabhudasuvatala@gmail.com>
Link: https://patch.msgid.link/20260503141826.27462-1-prabhudasuvatala@gmail.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

kunit: Fix spelling mistakes in comments and messages

Fix two spelling mistakes in kunit tooling:
Bascially -> Basically
higer -> higher

Link: https://lore.kernel.org/r/20260501162739.3861-1-always.starving0@gmail.com
Signed-off-by: Jinseok Kim <always.starving0@gmail.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

kunit: qemu_configs: Add or1k / openrisc configuration

Add a basic configuration to run kunit tests on or1k / openrisc.

Link: https://lore.kernel.org/r/20260427-kunit-or1k-v1-2-9d3109e991e8@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

gen_compile_commands: Ignore libgcc.a

Some architectures link libgcc.a from the toolchain into the kernel.
gen_compile_commands trie to read the kbuild .cmd files of its
constituent object files, which are not available.

Flat out ignore libgcc.a, as it is not built as part of the kernel
anyways.

Link: https://lore.kernel.org/r/20260427-kunit-or1k-v1-1-9d3109e991e8@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()

scx_task_iter's cgroup-scoped mode can return tasks whose
sched_ext_dead() has already completed: cgroup_task_dead() removes
from cset->tasks after sched_ext_dead() in finish_task_switch() and is
irq-work deferred on PREEMPT_RT. The global mode is fine -
sched_ext_dead() removes from scx_tasks via list_del_init() first.

Callers (sub-sched enable prep/abort/apply, scx_sub_disable(),
scx_fail_parent()) assume returned tasks are still on @sch and trip
WARN_ON_ONCE() or operate on torn-down state otherwise.

Set %SCX_TASK_OFF_TASKS in sched_ext_dead() under @p's rq lock and
have scx_task_iter_next_locked() skip flagged tasks under the same
lock. Setter and reader serialize on the per-task rq lock - no race.

Signed-off-by: Tejun Heo <tj@kernel.org>

cgroup, sched_ext: Include exiting tasks in cgroup iter

a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks so cgroup.procs stays consistent
with waitpid() visibility. Unfortunately, this broke scx_task_iter.

scx_task_iter walks either scx_tasks (global) or a cgroup subtree via
css_task_iter() and the two modes are expected to cover the same set of
tasks. After the above change the cgroup-scoped mode silently skips tasks
past exit_signals() that are still on scx_tasks.

scx_sub_enable_workfn()'s abort path is one of the symptoms: an exiting
SCX_TASK_SUB_INIT task can race past the cgroup iter leaking
__scx_init_task() state. Other iterations share the same gap.

Add CSS_TASK_ITER_WITH_DEAD to opt out of the skip and use it from
scx_task_iter().

Fixes: b0e4c2f8a0f0 ("sched_ext: Implement cgroup subtree iteration for scx_task_iter")
Reported-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

MAINTAINERS: Update mail for Peter Rosin

I'm resigning from my position at Axentia.

Signed-off-by: Peter Rosin <peda@lysator.liu.se>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated

A chain of commits going back to v7.0 reworked rmdir to satisfy the
controller invariant that a subsystem's ->css_offline() must not run while
tasks are still doing kernel-side work in the cgroup.

[1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out")
[2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
[3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
[4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition")
[5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context")

[1] moved task cset unlink from do_exit() to finish_task_switch() so a
task's cset link drops only after the task has fully stopped scheduling.
That made tasks past exit_signals() linger on cset->tasks until their final
context switch, which led to a series of problems as what userspace expected
to see after rmdir diverged from what the kernel needs to wait for. [2]-[5]
tried to bridge that divergence: [2] filtered the exiting tasks from
cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4]
fixed the wait's condition; [5] made nr_dying_subsys_* visible
synchronously.

The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the
rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g.
host PID 1 systemd reaping orphan pids that were re-parented to it during
the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those
pids to free, the pids can't free because PID 1 is the reaper and it's stuck
in rmdir, and the system A-A deadlocks. No internal lock ordering breaks
this; the wait itself is the bug.

The css killing side that drove the original reorder, however, can be made
cleanly asynchronous: ->css_offline() is already async, run from
css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to
make that chain start only after all tasks have left the cgroup. rmdir's
user-visible side then returns as soon as cgroup.procs and friends are
empty, while ->css_offline() still runs only after the cgroup is fully
drained.

Verified by the original reproducer (pidns teardown + zombie reaper, runs
under vng) which hangs vanilla and succeeds here, and by per-commit
deterministic repros for [2], [3], [4], [5] with a boot parameter that
widens the post-exit_signals() window so each state is reliably reachable.
Some stress tests on top of that.

cgroup_apply_control_disable() has the same shape of pre-existing race:
when a controller is disabled via subtree_control, kill_css() ran
synchronously while tasks past exit_signals() could still be linked to
the cgroup's csets, and ->css_offline() could fire before they drained.
This patch preserves the existing synchronous behavior at that call site
(kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch
will defer kill_css_finish() there using a per-css trigger.

This seems like the right approach and I don't see problems with it. The
changes are somewhat invasive but not excessively so, so backporting to
-stable should be okay. If something does turn out to be wrong, the fallback
is to revert the entire chain ([1]-[5]) and rework in the development branch
instead.

v2: Pin cgrp across the deferred destroy work with explicit
    cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1
    wasn't actually broken (ordered cgroup_offline_wq + queue_work order
    in cgroup_task_dead() saved it) but the explicit ref removes the
    dependency on those non-obvious invariants. Also note the
    pre-existing cgroup_apply_control_disable() race in the description;
    a follow-up will defer kill_css_finish() there.

Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
Cc: stable@vger.kernel.org # v7.0+
Reported-and-tested-by: Martin Pitt <martin@piware.de>
Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/
Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

memory: tegra: Add Tegra114 EMC driver

Introduce driver for the External Memory Controller (EMC) found in
Tegra114 SoC. It controls the external DRAM on the board. The purpose of
this driver is to program memory timing for external memory on the EMC
clock rate change.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Link: https://patch.msgid.link/20260427070312.81679-5-clamor95@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>

memory: tegra: Implement EMEM regs and ICC ops for Tegra114

Prepare Internal Memory Controller for introduction of External Memory
Controller.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Link: https://patch.msgid.link/20260427070312.81679-3-clamor95@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>

Merge branch 'for-v7.2/tegra114-mc-bindings' into mem-ctrl-next

dt-bindings: memory: Document Tegra114 External Memory Controller

Include Tegra114 support into existing Tegra124 EMC schema with the most
notable difference being the amount of EMC timings and a few SoC unique
entries.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260427070312.81679-4-clamor95@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>

dt-bindings: memory: Document Tegra114 Memory Controller

Add Tegra114 support into existing Tegra124 MC schema with the most
notable difference in the amount of EMEM timings.

Each memory client has unique hardware ID, add these IDs.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260427070312.81679-2-clamor95@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>

kunit: config: KUNIT_DEBUGFS should depend on DEBUG_FS

CONFIG_KUNIT_DEBUGFS is totally useless without debugfs, so it should
depend on CONFIG_DEBUG_FS.

Link: https://lore.kernel.org/r/20260425034155.53913-2-david@davidgow.net
Fixes: e2219db280e3 ("kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display")
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

kunit: config: Enable KUNIT_DEBUGFS by default

The KUNIT_DEBUGFS option is currently enabled based on the value of
KUNIT_ALL_TESTS, but it really doesn't have anything to do with the set of
enabled tests, so just enable it by default anyway. In particular, this
shouldn't be only visible if KUNIT_ALL_TESTS is set, which is quite
confusing.

Link: https://lore.kernel.org/r/20260425034155.53913-1-david@davidgow.net
Fixes: beaed42c427d ("kunit: default KUNIT_* fragments to KUNIT_ALL_TESTS")
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

memory: tegra: Add Tegra238 MC support

Add Memory Controller driver support for Tegra238 SOC, including:
- MC client definitions with Tegra238-specific stream IDs
- Reuse of Tegra234 ICC operations for bandwidth management via BPMP-FW
- Device tree compatible string "nvidia,tegra238-mc"

Export tegra234_mc_icc_ops so it can be shared with the Tegra238 MC
driver, as both SoCs use the same ICC aggregation and bandwidth
management logic.

Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260427073419.567360-3-amhetre@nvidia.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>