David Thompson [Thu, 28 May 2026 16:50:17 +0000 (16:50 +0000)]
net: lan743x: avoid netdev-based logging before netdev registration
This patch updates the lan743x driver to prevent the use of netdev-based
logging APIs (such as netdev_dbg) before the network device has been
successfully registered. Using netdev-based logging prior to registration
results in log messages referencing "(unnamed net_device) (uninitialized)",
which can be confusing and less informative.
The driver must use netif_msg_ APIs and device-based logging (e.g. dev_dbg)
until netdev registration is complete. This ensures log entries are
associated with the correct device context and improves log clarity. After
registration, netdev-based logging APIs can be used safely.
net: wwan: t7xx: Add delay between MD and SAP suspend
SAP (Service Access Point) suspend occasionally times out with error
-110 (ETIMEDOUT), followed by modem port errors and complete modem
failure requiring a system reboot to recover.
Error symptoms:
mtk_t7xx 0000:72:00.0: [PM] SAP suspend error: -110
mtk_t7xx 0000:72:00.0: can't suspend (...returned -110)
mtk_t7xx 0000:07:00.0: Failed to send skb: -22
mtk_t7xx 0000:07:00.0: Write error on MBIM port, -22
The modem firmware needs time after receiving the MD (modem) suspend
request to complete internal operations before it is ready to accept
the SAP suspend request. Without this delay, if runtime PM attempts
to suspend while the firmware is busy, the SAP suspend command times
out, leaving the modem in an unrecoverable state.
Root cause and userspace interaction:
ModemManager 1.24+ includes changes that reduce the likelihood of this
issue by ensuring the modem is in a low-power state before the kernel
attempts runtime suspend. However, the kernel driver should not depend
on specific userspace behavior or ModemManager versions. Older versions
(1.20-1.22) are still widely deployed, and the kernel should be robust
regardless of userspace implementation details.
There appears to be no hardware status register or other mechanism
available to query whether the firmware is ready for SAP suspend.
A delay between the two suspend requests is the most reliable solution
found through testing.
Add a 50ms delay between MD suspend and SAP suspend. This gives the
firmware adequate time to complete internal operations without adding
significant latency to the suspend path. This makes the driver robust
across all ModemManager versions and system conditions.
Testing: 96+ hours of continuous operation with ModemManager 1.20.2
and Fibocom FM350-GL modem. Zero SAP suspend timeouts observed across
2000+ successful suspend/resume cycles. Previously failed within
24 hours with 100% reproducibility.
Petr Wozniak [Wed, 27 May 2026 05:39:09 +0000 (07:39 +0200)]
net: phy: sfp: probe for RollBall I2C-to-MDIO bridge in mdio-i2c
The "OEM"/"SFP-10G-T" quirk entry in sfp_fixup_rollball_cc()
unconditionally forces MDIO_I2C_ROLLBALL for all modules matching that
vendor/part-number combination. This works for modules that genuinely
implement a RollBall I2C-to-MDIO bridge, but silently breaks modules
that share the same EEPROM strings without having such a bridge.
The Realtek RTL8261BE-CG is one such module: a pure copper 10G SFP+
media converter with no I2C-to-MDIO bridge. Its EEPROM reports
vendor="OEM", part="SFP-10G-T-I", and -- critically -- Vendor OUI
00:00:00, making OUI-based differentiation impossible. With
MDIO_I2C_ROLLBALL forced, the module silently ACKs the unlock password
write, the MDIO bus is created, but no PHY responds; the SFP state
machine cycles through the RollBall PHY-probe retry window before
reporting no PHY.
Move the probe into i2c_mii_init_rollball() in mdio-i2c.c, where the
RollBall protocol constants are already defined. After sending the
unlock password, issue a CMD_READ and poll for CMD_DONE up to 200 ms
(10 x 20 ms, matching the existing rollball poll tolerance). A genuine
RollBall bridge asserts CMD_DONE within that window; modules without a
bridge never do, so i2c_mii_init_rollball() returns -ENODEV.
mdio_i2c_alloc() propagates -ENODEV to the caller to signal that no
bridge is present and PHY probing should be skipped.
sfp_sm_add_mdio_bus() catches -ENODEV and transitions
sfp->mdio_protocol to MDIO_I2C_NONE so the rest of the state machine
skips PHY probing for this module.
Any I2C-level error (NACK, timeout) during the probe is also treated as
-ENODEV: if the module does not respond at I2C address 0x51 at all,
there is certainly no RollBall bridge there, and SFP initialization
should not abort.
The probe writes are safe with respect to SFP EEPROM integrity: only
modules explicitly listed in the quirk table enter this path, and the
RollBall password unlock write to 0x51 was already issued by
i2c_mii_init_rollball() before the probe for all such modules. Any
module without a device at 0x51 NACKs the transfer and is treated as
-ENODEV.
Add "OEM"/"SFP-10G-T-I" to the quirk table so RTL8261BE modules enter
the probe path; genuine RollBall modules continue to work as before.
Jakub Kicinski [Tue, 2 Jun 2026 02:11:17 +0000 (19:11 -0700)]
Merge branch 'mv88e6xxx-serdes-on-mv88e6321'
Fidan Aliyeva says:
====================
mv88e6xxx: SERDES on mv88e6321
This patch series add code support to be able to use SERDES feature of
mv88e6321 version of Marvel mv88e6xxx series. mv88e6321 has 2 ports to
support high speed SERDES but the support is lacking in the driver.
mv88e6321 version has a similar architecture to mv88e6352 version making it
possible to reuse its pcs functions. That's why the patch series consist of
2 parts:
1. Refactor the serdes functions and pcs_init of mv88e6352 to be more
generic (patches 1-2).
2. Add the SERDES support for mv88e6321 reusing 6352's pcs functions
The final code has been tested on mv88e6321 ethernet device directly by ip
ping tests, performance tests and also verifying the switch's expected
register values.
Referred document: 88E6321/88E6320 Functional Specification
====================
Fidan Aliyeva [Thu, 28 May 2026 21:03:10 +0000 (23:03 +0200)]
mv88e6xxx: Add SERDES Support for mv88e6321
Add serdes and pcs_ops functions for mv88e6321. In mv88e6321
2 ports support serdes functionality; port 0 and port 1. These ports are
serdes-only ports.
Changes:
1. Add a function support to return the lane address for the port based on
cmode.
2. Reuse mv88e6352's serdes_get_regs* and pcs_init functions for mv88e6321.
Tested on mv88e6321 switch port 0.
Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-4-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fidan Aliyeva [Thu, 28 May 2026 21:03:09 +0000 (23:03 +0200)]
mv88e6xxx: Refactor 6352's serdes functions
Changes:
1. Replace serdes check by mv88e6352_g2_scratch_port_has_serdes in
mv88e6352_pcs_init function by mv88e6xxx_serdes_get_lane function making it
more generic.
2. Replace serdes checks in mv88e6352_serdes_get_* functions with
mv88e6xxx_serdes_get_lane making them more generic.
3. Add lane argument to mv88e6352_serdes_read so it can be reused later for
6321.
Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-3-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fidan Aliyeva [Thu, 28 May 2026 21:03:08 +0000 (23:03 +0200)]
mv88e6xxx: Add mv88e6352_serdes_get_lane
Changes:
1. Add mv88e6352_serdes_get_lane function which checks if the port
supports SERDES by calling mv88e6352_g2_scratch_port_has_serdes. Then
returns the address of the SERDES lane.
2. Add this function as .serdes_get_lane member to all the chip
versions which use mv88e6352_pcs_init.
Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-2-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Realtek Otto switch platform consist of four different series
- RTL838x aka maple : 28 port 1G Switches
- RTL839x aka cypress : 52 port 1G Switches
- RTL930x aka longan : 28 port 1G/2.5G/10G Switches
- RTL931x aka mango : 56 port 1G/2.5G/10G Switches
After establishing basic groundwork for multi device support, this series
harmonizes the command handling of the MDIO driver. It is the second step
to allow easier integration of the non RTL930x SoCs into this driver.
====================
net: mdio: realtek-rtl9300: use command runner for read_c22()
Convert the final missing read_c22() path to the new read enabled command
runner. Do it the same way as other implementations.
- bus calls otto_emdio_read_c22()
- this hands over to SoC specific otto_emdio_9300_read_c22()
- finally the registers are filled and the runner issued
With this cleanup remove the obsolete helper otto_emdio_wait_ready()
net: mdio: realtek-rtl9300: use command runner for read_c45()
Convert the read_c45() path to the new command runner. This needs the
additional helper otto_emdio_read_cmd() that can issue the command runner
and process a read operation. It is basically nothing more than
- run the command
- read the command result thorugh the I/O register
With this in place convert the read_c45() like the alread existing write
C22/C45 implementation.
- bus calls otto_emdio_read_c45()
- this handed over to SoC specific otto_emdio_9300_read_c45()
- the registers are filled
- the otto_emdio_read_cmd() is issued
- that calls the command runner
net: mdio: realtek-rtl9300: provide generic command runner
The current bus read/write commands for C22/C45 are RTL930x specific.
Avoid to duplicate those 200 lines of code for the RTL838x, RTL839x and
RTL931x targets. Instead provide a generic command runner that is SoC
independent. The implementation works as follows:
The runner will take a prepared list of the four MDIO registers. It will
feed the data into the registers. This generic write to all registers
(or to say "a little bit too much") is no issue. The hardware looks at
the to be executed command and will only take the pieces of data that
are really required. No side effects have been observed on any of the
four SoCs during the time this mechanism exists in downstream OpenWrt.
The last fed register is the C22/command register. This will be enriched
with the proper command flags from the caller. The hardware issues the
command and the runner will wait for its finalization.
Besides from feeding all registers the runner emulates the behaviour of
the old code as best as possible
- check defensively for a running command in advance
- Before this commit the driver had different MMIO timeout values.
1000s for command preparation, 100us after writes and 1000us after
reads. The new version uses a consistent 1000us timeout for all
of these.
- return -ENXIO in case of hardware failure (fail bit)
As a first consumer of this runner convert the write_c45() function.
This is realized in a multi stage approach
- a generic otto_emdio_write_c45() will be called by the bus
- this will forward the request to the device specific writer. In this
case otto_emdio_9300_write_c45().
- There the command data is filled in and the additional helper
otto_emdio_write_cmd() will be called
- That adds the write flag and issues the generic command runner.
With all the above mentioned in place, there is not much left to do in
otto_emdio_9300_write_c45(). It just fills the register fields and
calls the write helper with the right command bits.
Costa Shulyupin [Sun, 31 May 2026 13:48:36 +0000 (16:48 +0300)]
net: Remove orphaned ax25_ptr references
The AX.25 subsystem was removed in commit dd8d4bc28ad7
("net: remove ax25 and amateur radio (hamradio) subsystem"),
which removed the ax25_ptr field from struct net_device but
left behind the kdoc comment and documentation.
Neal Cardwell [Sun, 31 May 2026 18:35:57 +0000 (11:35 -0700)]
tcp_bbr: fix SPDX-License-Identifier to be GPL-2.0 OR BSD-3-Clause
Since TCP BBR congestion control was introduced in
commit 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
it has always been offered as "Dual BSD/GPL":
MODULE_LICENSE("Dual BSD/GPL");
A GPL-2.0-only SPDX header was erroneously added in the recent
commit 2ed4b46b4fc7 ("net: Add SPDX ids to some source files").
This commit revises the tcp_bbr.c SPDX-License-Identifier to note that
this file is licensed as "GPL-2.0 OR BSD-3-Clause".
Fixes: 2ed4b46b4fc7 ("net: Add SPDX ids to some source files") Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Tim Bird <tim.bird@sony.com> Link: https://patch.msgid.link/20260531183558.2337381-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Tue, 26 May 2026 20:22:47 +0000 (13:22 -0700)]
net: ibm: emac: Reserve VLAN header in MJS limit
The IBM EMAC programs its Maximum Jumbo Size (MJS) drop
threshold from ndev->mtu directly. The hardware sizes the threshold
against the L2 frame minus the ethernet header, but does not
discount the 802.1Q tag, so a frame carrying a VLAN tag and a full
1500-byte payload exceeds MJS by exactly 4 bytes and is dropped.
This is normally hidden because JPSM (and therefore the MJS check)
only engages when the MTU is raised above ETH_DATA_LEN. With the
qca8k DSA tagger the conduit MTU is bumped by QCA_HDR_LEN to 1502
during dsa_conduit_setup(), which is enough to enable JPSM and
expose the off-by-VLAN-tag in the limit.
Pad MJS by VLAN_HLEN so a VLAN-tagged full-MTU frame passes.
Reported on Meraki MX60 (qca8k switch): tagged VLAN
traffic drops at 1500-byte payload, while 1496 bytes works
and untagged 1500 bytes works.
drivers/net/ethernet/microsoft/mana/mana_en.c: 17bfe0a8c014e ("net: mana: Add NULL guards in teardown path to prevent panic on attach failure") d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size")
Linus Torvalds [Fri, 29 May 2026 22:46:40 +0000 (15:46 -0700)]
Merge tag 'net-7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking fixes from Jakub Kicinski:
"Quick follow up, nothing super urgent here. Main reason I'm sending
this out is because the IPsec and Bluetooth PRs did not make it
yesterday. I don't want to have to send you all of this + whatever
comes next week, for rc7. The fixes under "Previous releases -
regressions" are for real user-reported regressions from v7.0.
Previous releases - regressions:
- Revert "ipv6: preserve insertion order for same-scope addresses"
- xfrm: move policy_bydst RCU sync, a fix which added a sync RCU on
netns exit got backported to stable and was causing serious
accumulation of dying netns's for real workloads
- pcs-mtk-lynxi: fix bpi-r3 serdes configuration
Previous releases - always broken:
- usual grab bag of race, locking and leak fixes for Bluetooth
- handful of page handling fixes for IPsec"
* tag 'net-7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
wireguard: send: append trailer after expanding head
Revert "ipv6: preserve insertion order for same-scope addresses"
net: skbuff: fix pskb_carve leaking zcopy pages
ipv6: fix possible infinite loop in fib6_select_path()
ipv6: fix possible infinite loop in rt6_fill_node()
bpf: sockmap: fix tail fragment offset in bpf_msg_push_data
vsock/virtio: bind uarg before filling zerocopy skb
Revert "esp: fix page frag reference leak on skb_to_sgvec failure"
net: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configuration
sctp: fix race between sctp_wait_for_connect and peeloff
net: mana: Skip redundant detach on already-detached port
net: mana: Add NULL guards in teardown path to prevent panic on attach failure
Bluetooth: hci_sync: Reset device counters in hci_dev_close_sync()
Bluetooth: hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close
Bluetooth: hci_core: Rework hci_dev_do_reset() to use hci_sync functions
Bluetooth: ISO: serialize iso_sock_clear_timer with socket lock
Bluetooth: ISO: fix UAF in iso_recv_frame
Bluetooth: L2CAP: Fix possible crash on l2cap_ecred_conn_rsp
Bluetooth: l2cap: clear chan->ident on ECRED reconfiguration success
Bluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading
...
Linus Torvalds [Fri, 29 May 2026 22:17:53 +0000 (15:17 -0700)]
Merge tag 'clang-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux
Pull clang build fix from Nathan Chancellor:
"A small fix to disable -Wattribute-alias for clang in the few places
it is already disabled for GCC, now that tip of tree clang has
implemented -Wattribute-alias as GCC has"
* tag 'clang-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux:
Disable -Wattribute-alias for clang-23 and newer
Linus Torvalds [Fri, 29 May 2026 20:47:55 +0000 (13:47 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Restore CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC to its former glory by
making sure the config symbol is correctly spelled out in the code
- Don't reset the AArch32 view of the PMU counters to zero when the
guest is writing to them
- Fix an assorted collection of memory leaks in the newly added
tracing code
- Fix the capping of ZCR_EL2 which could be used in an unsanitised
way by an L2 guest
x86:
- Include the kernel's linux/mman.h in KVM selftests to ensure
MADV_COLLAPSE is defined, as older libc versions may not provide
it.
- Include execinfo.h if and only if KVM selftests are building
against glibc, and provide a test_dump_stack() for non-glibc
builds.
- Silence an annoying RCU splat on (even non-KVM-related) panics.
The splat is technically legit, but in practice not an issue. To
have a race, you would need to unload the KVM modules at exactly
the time a panic happens; and speaking of incredibly rare races,
taking the locks risks introducing a deadlock if the module unload
code took the lock on a CPU that has been halted. Which seems
possibly more likely than the RCU grace period issue, so just shut
it up. This code used to be in KVM but is now outside it; but the
x86 maintainers haven't picked it up, so here we are.
- Rate-limit global clock updates once again (but without delayed
work), as KVM was subtly relying on the old rate-limiting for NPT
correction to guard against "update storms" when running without a
master clock on systems with overcommitted CPUs.
- Fix a brown paper bag goof where KVM checked if ERAPS is "dirty"
instead of marking it dirty when emulating INVPCID.
- Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the
CPU TLB doesn't contain AVIC-tagged entries for the APIC base GPA.
- The top 10 commits fix buffer overflow (and potential TOC/TOU)
flaws in the page state change protocol for encrypted VMs. AI
models find it quite easily given it was reported three times, but
aren't as good at writing a comprehensive fix. There's more to
clean up in the area, which will come in 7.2"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: SEV: Use READ_ONCE() when reading entries/indices from PSC buffer
KVM: SEV: Check PSC request indices against the actual size of the buffer
KVM: SEV: Don't explicitly pass PSC buffer to snp_begin_psc()
KVM: SEV: WARN if KVM attempts to setup scratch area with min_len==0
KVM: SEV: Compute the correct max length of the in-GHCB scratch area
KVM: SEV: Use the size of the PSC header as the minimum size for PSC requests
KVM: SEV: Ignore Port I/O requests of length '0'
KVM: SEV: Reject MMIO requests larger than 8 bytes with GHCB v2+
KVM: SEV: Ignore MMIO requests of length '0'
KVM: SEV: Require in-GHCB scratch area if GHCB v2+ is in use
KVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor
KVM: arm64: Fix memory leak in hyp_trace_unload()
KVM: arm64: Fix rollback in hyp_trace_buffer_share_hyp()
KVM: arm64: Fix meta-page unsharing in pKVM hyp tracing
KVM: arm64: PMU: Preserve AArch32 counter low bits
KVM: SVM: Flush the current TLB when transitioning from xAVIC => x2AVIC
KVM: x86: Fix ERAPS RAP clear on INVPCID single-context invalidation
KVM: arm64: Fix CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC
KVM: selftests: Guard execinfo.h inclusion for non-glibc builds
KVM: x86: Rate-limit global clock updates on vCPU load
...
wireguard: send: append trailer after expanding head
With how this is currently written, we add the trailer, zero it out, and
then add the header space on. If that header space requires a
reallocation + copy, the zeros in the trailer aren't copied, because the
skb len hasn't actually been yet expanded to cover that. Instead add the
padding at the end of the process rather than at the beginning.
Revert "ipv6: preserve insertion order for same-scope addresses"
Chris Adams reported that preserving insertion order for same-scope
addresses is causing SSH connections to be dropped after stopping a VM
while running NetworkManager.
NetworkManager caches the IPv6 address configuration, when a RA arrives,
it determines the list of addresses to configure and checks if the
addresses are already in the right order in the kernel. If they aren't,
NetworkManager removes and re-adds them to achieve the desired order.
As the order changes, NetworkManager is confused and reconfigures the
addresses on every update. In addition, this would also affect to cloud
tooling that relies on IPv6 addresses order to identify primary and
secondaries addresses.
1) xfrm: route MIGRATE notifications to caller's netns
Thread the caller's netns through km_migrate() so that
MIGRATE notifications go to the issuing netns, fixing both the
init_net listener leak and MOBIKE notifications inside
non-init netns. From Maoyi Xie.
2) xfrm: ipcomp: Free destination pages on acomp errors
Move the out_free_req label up so that allocated destination
pages are released on decompression errors, not only on success.
From Herbert Xu.
3) xfrm: Check for underflow in xfrm_state_mtu
Reject configurations that cause xfrm_state_mtu() to underflow,
preventing a negative TFCPAD value from becoming a memset size
that triggers an out-of-bounds write of several terabytes.
From David Ahern.
4) xfrm: ah: use skb_to_full_sk in async output callbacks
Convert the possibly-incomplete skb->sk to a full socket pointer
in async AH callbacks so that a request_sock or timewait_sock
never reaches xfrm_output_resume() downstream consumers.
From Michael Bommarito.
5) Add and revert: esp: fix page frag reference leak on skb_to_sgvec failure
The patch does not fix te issue completely.
6) xfrm: esp: restore combined single-frag length gate
Check the aligned post-trailer combined length against a page limit
in the fast path, preventing skb_page_frag_refill() from falling
back to a page too small for the destination scatterlist.
From Jingguo Tan.
7) xfrm: iptfs: reset runtime state when cloning SAs
Reinitialise the clone's mode_data runtime objects before
publishing it, preventing queued skbs from being freed with
list state copied from the original SA when migration fails.
From Shaomin Chen.
8) xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit
Flush policy tables and drain the workqueue in a .pre_exit handler
so that cleanup_net() pays one RCU grace period per batch instead
of one per namespace, fixing stalls at high CLONE_NEWNET rates.
From Usama Arif.
9) xfrm: input: hold netns during deferred transport reinjection
Take a netns reference when queueing deferred transport reinjection
work and drop it after the callback completes, keeping the skb->cb
net pointer valid until the deferred work runs.
From Zhengchuan Liang.
* tag 'ipsec-2026-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
Revert "esp: fix page frag reference leak on skb_to_sgvec failure"
xfrm: input: hold netns during deferred transport reinjection
xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit
xfrm: iptfs: reset runtime state when cloning SAs
xfrm: esp: restore combined single-frag length gate
esp: fix page frag reference leak on skb_to_sgvec failure
xfrm: ah: use skb_to_full_sk in async output callbacks
xfrm: Check for underflow in xfrm_state_mtu
xfrm: ipcomp: Free destination pages on acomp errors
xfrm: route MIGRATE notifications to caller's netns
====================
Pavel Begunkov [Thu, 28 May 2026 18:43:53 +0000 (19:43 +0100)]
net: skbuff: fix pskb_carve leaking zcopy pages
When SKBFL_MANAGED_FRAG_REFS is set, frag pages are not refcounted but
their lifetime is controlled by the attached ubuf_info. To make a copy
of the skb_shared_info, we either should clear the flag and reference
the frags, or keep the flag and have frags unreferenced.
pskb_carve_inside_header() and pskb_carve_inside_nonlinear() don't
follow the rule and thus can leak page references. Let's clear
SKBFL_MANAGED_FRAG_REFS from the original skb to fix it. It's the
simplest way to address it, but there are more performant ways to do
that if it ever becomes a problem.
Jiayuan Chen [Wed, 27 May 2026 05:31:31 +0000 (13:31 +0800)]
ipv6: fix possible infinite loop in fib6_select_path()
Found while auditing the same pattern Sashiko reported in
rt6_fill_node() [1]. Apply the same fix as
commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()").
Writers holding tb6_lock can list_del_rcu(&first->fib6_siblings)
without waiting for RCU readers; first->fib6_siblings.next then
still points into the old ring and this softirq-side walker never
reaches &first->fib6_siblings as its terminator. fib6_purge_rt()
always WRITE_ONCE()s first->fib6_nsiblings to 0 before
list_del_rcu(), so an inside-loop check is a reliable detach signal.
Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiayuan Chen [Wed, 27 May 2026 05:31:30 +0000 (13:31 +0800)]
ipv6: fix possible infinite loop in rt6_fill_node()
Sashiko reported this issue [1]. Apply the same fix as
commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()").
Writers holding tb6_lock can list_del_rcu(&rt->fib6_siblings)
without waiting for RCU readers; rt->fib6_siblings.next then still
points into the old ring and this softirq-side walker never reaches
&rt->fib6_siblings, causing a CPU stall. fib6_del_route() always
WRITE_ONCE()s rt->fib6_nsiblings to 0 before list_del_rcu(), so an
inside-loop check is a reliable detach signal.
Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yuqi Xu [Wed, 27 May 2026 03:48:15 +0000 (11:48 +0800)]
bpf: sockmap: fix tail fragment offset in bpf_msg_push_data
When bpf_msg_push_data() inserts data in the middle of a scatterlist
entry, it splits the original entry into a left fragment and a right
fragment.
The right fragment offset is page-local, but the code advances it with
`start`, which is the message-global insertion point. For inserts into a
non-first SG entry, this over-advances the offset and leaves the split
layout inconsistent.
Advance the right fragment offset by the fragment-local delta,
`start - offset`, which matches the length removed from the front of the
original entry.
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yuqi Xu <xuyq21@lenovo.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/8b129d10566aa3eb43f61a8f9757bcf51707d324.1779636774.git.xuyq21@lenovo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jingguo Tan [Wed, 27 May 2026 02:33:01 +0000 (10:33 +0800)]
vsock/virtio: bind uarg before filling zerocopy skb
virtio_transport_send_pkt_info() allocates or reuses the zerocopy uarg
before entering the send loop, but virtio_transport_alloc_skb() still
fills the skb before it inherits that uarg. When fixed-buffer vectored
zerocopy hits MAX_SKB_FRAGS, io_sg_from_iter() may partially attach
managed frags and return -EMSGSIZE. The rollback path call kfree_skb()
to free an skb that carries SKBFL_MANAGED_FRAG_REFS but no uarg, so
skb_release_data() falls through to ordinary frag unref.
Pass the uarg into virtio_transport_alloc_skb() and bind it immediately
before virtio_transport_fill_skb(). This keeps control or no-payload skbs
untouched while ensuring success and rollback share one lifetime rule.
Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Lin Ma <malin89@huawei.com> Signed-off-by: Rongzhen Cui <cuirongzhen@huawei.com> Signed-off-by: Jingguo Tan <tanjingguo@huawei.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260527023301.1075581-1-malin89@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KVM: SEV: Use READ_ONCE() when reading entries/indices from PSC buffer
Use READ_ONCE() when reading entries/indices from the guest-accessible
Page State Change buffer to defend against TOCTOU bugs.
Don't bother with READ_ONCE()/WRITE_ONCE() for cases where KVM is writing
(and not consuming the result!), as the guest isn't supposed to touch the
buffer while it's being processed. I.e. using READ_ONCE() is all about
protecting against misbehaving guests.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Check PSC request indices against the actual size of the buffer
When processing Page State Change (PSC) requests, validate the PSC buffer
against the effective size of the scratch area, which could be less than
the maximum size if the guest provided a pointer that isn't exactly at the
start of the GHCB shared buffer.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Don't explicitly pass PSC buffer to snp_begin_psc()
Stop explicitly passing the PSC buffer to snp_begin_psc(): it *must*
be the scratch area. This will allow fixing a variety of bugs without
further complicating the code.
No functional change intended.
Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: WARN if KVM attempts to setup scratch area with min_len==0
Now that all paths in KVM properly validate the length needed for the
scratch area, and are guaranteed to pass in a non-zero length, WARN if KVM
attempts to configured the scratch area with min_len==0 to guard against
future bugs.
Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Compute the correct max length of the in-GHCB scratch area
When setting the length of the GHCB scratch area, and the area is in the
GHCB shared buffer, set the effective length of the scratch area to the max
possible size given the start of the guest-provided pointer, and the end of
the shared buffer.
The code was "fine" when first introduced, as KVM doesn't consult the
length of the buffer when emulating MMIO, because the passed in @len always
specifies the *max* size required. But for PSC requests, the incoming @len
is just the minimum length (to process the header), and KVM needs to know
the full size of the scratch area to avoid buffer overflows (spoiler alert).
Opportunistically rename @len => @min_len to better reflect its role.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Use the size of the PSC header as the minimum size for PSC requests
When handling a Page State Change (PSC) #VMGEXIT use the size of the PSC
header as the minimum size for the scratch area. Per the GHCB spec, PSC
requests do NOT provide the length, i.e. using control->exit_info_2 for the
length is completely made up behavior. The existing code "works", e.g.
even though Linux-as-a-guest always passes '0', because KVM doesn't do
anything with the length when the request is in the GHCB's shared buffer.
Use the header as the min length. Once the header is retrieved, KVM can
use the specified indices to compute the full size of the request.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly ignore Port I/O requests of length '0' (or count '0'), so that
setting up the software scratch area (and other code) doesn't have to
worry about underflowing the length, and to allow for WARNing on trying
to configure the scratch area with len==0.
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly ignore MMIO requests of length '0', so that setting up the
software scratch area (and other code) doesn't have to worry about
underflowing the length, and to allow for special casing '0' in the
future.
Fixes: 8f423a80d299 ("KVM: SVM: Support MMIO for an SEV-ES guest") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Michael Roth [Fri, 1 May 2026 20:22:26 +0000 (13:22 -0700)]
KVM: SEV: Require in-GHCB scratch area if GHCB v2+ is in use
As per the GHCB spec, when using GHCB v2+ require the software scratch area
to reside in the GHCB's shared buffer. Note, things like Page State Change
(PSC) requests _rely_ on this behavior, as the guest can't provide a length
when making the request, i.e. the size of the guest payload is bounded by
the size of the shared buffer.
Failure to force usage of the GHCB, and a slew of other flaws, lets a
malicious SNP guest corrupt host kernel heap memory, and leak host heap
layout information.
setup_vmgexit_scratch() allocates a buffer via kvzalloc(exit_info_2),
where exit_info_2 is guest-controlled. With exit_info_2=24, this yields
a 24-byte allocation in kmalloc-cg-32 (32-byte slab objects). The buffer
holds an 8-byte psc_hdr followed by 8-byte psc_entry structs, so only
entries[0] and entries[1] are in-bounds.
snp_begin_psc() validates end_entry against VMGEXIT_PSC_MAX_COUNT (253)
but NOT against the actual buffer size:
idx_end = hdr->end_entry;
if (idx_end >= VMGEXIT_PSC_MAX_COUNT) { // checks 253, not buffer
snp_complete_psc(svm, ...);
return 1;
}
for (idx = idx_start; idx <= idx_end; idx++) {
entry_start = entries[idx]; // OOB when idx >= 2
The guest sets end_entry=10+, causing the host to iterate entries[2+]
which are OOB into adjacent slab objects. For each OOB entry:
- The host reads 8 bytes (OOB READ / info leak oracle)
- If the data passes PSC validation, __snp_complete_one_psc() writes
cur_page = 1 or 512 into the entry (OOB WRITE, sev.c:3806)
- If validation fails, the error response reveals whether adjacent
memory is zero vs non-zero (information disclosure to guest)
The guest controls allocation size (exit_info_2), entry range
(cur_entry/end_entry), and can fire unlimited VMGEXITs to repeatedly
hit different slab positions.
By exploiting the variety of bugs, a malicious SEV-SNP guest can:
- OOB read adjacent kmalloc-cg-32 objects (heap layout disclosure)
- OOB write cur_page bits into adjacent objects (heap corruption)
- Trigger use-after-free conditions across VMGEXITs
E.g. with KASAN enabled, a single insmod of the PoC guest module
produces 73 KASAN reports:
BUG: KASAN: slab-out-of-bounds in snp_begin_psc+0x126/0x890
Read of size 8 at addr ffff888219ffb5e0 by task qemu-system-x86/2199
BUG: KASAN: slab-out-of-bounds in snp_begin_psc+0x468/0x890
Write of size 8 at addr ffff888351566648 by task qemu-system-x86/2199
The buggy address belongs to the object at ffff888XXXXXXXXX
which belongs to the cache kmalloc-cg-32 of size 32
The buggy address is located N bytes to the right of
allocated 32-byte region [ffff888XXXXXXXXX, ffff888XXXXXXXXX)
Linus Torvalds [Fri, 29 May 2026 17:36:57 +0000 (10:36 -0700)]
Merge tag 'io_uring-7.1-20260529' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix for a regression introduced in this cycle, where
we should ensure the node is visible before the entry is added to
the tctx list"
* tag 'io_uring-7.1-20260529' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/tctx: set ->io_uring before publishing the tctx node
Paolo Bonzini [Fri, 29 May 2026 17:28:16 +0000 (19:28 +0200)]
Merge tag 'kvm-x86-fixes-7.1-rc6' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 7.1-rcN
- Include the kernel's linux/mman.h in KVM selftests to ensure MADV_COLLAPSE
is defined, as older libc versions may not provide it.
- Include execinfo.h if and only if KVM selftests are building against glibc,
and provide a test_dump_stack() for non-glibc builds.
- Fudge around an RCU splat in the emegerncy reboot code that is technically
a legitimate flaw, but in practice is a non-issue and fixing the flaw, e.g.
by adding locking, would incur meaningful risk, i.e. do more harm than good.
- Rate-limit global clock updates once again (but without delayed work), as
KVM was subtly relying on the old rate-limiting for NPT correction to guard
against "update storms" when running without a master clock on systems with
overcommitted CPUs.
- Fix a brown paper bag goof where KVM checked if ERAPS is "dirty" instead of
marking it dirty when emulating INVPCID.
- Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the CPU TLB
doesn't contain AVIC-tagged entries for the APIC base GPA.
Linus Torvalds [Fri, 29 May 2026 17:04:09 +0000 (10:04 -0700)]
Merge tag 'cxl-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang:
- cxl/test: update mock dev array before calling platform_device_add()
* tag 'cxl-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Update mock dev array before calling platform_device_add()
* tag 'iommu-fixes-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
MAINTAINERS: Add my employer to my entries
MAINTAINERS: Add Vasant Hegde to reviewers of AMD IOMMU
iommu, debugobjects: avoid gcc-16.1 section mismatch warnings
iommu/vt-d: Simplify calculate_psi_aligned_address()
Linus Torvalds [Fri, 29 May 2026 15:55:41 +0000 (08:55 -0700)]
Merge tag 'sound-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of recent small fixes and quirks.
We still see a bit more changes than wished, but most of them are
device-specific ones that are pretty safe to apply, while a core fix
is a typical UAF fix for PCM core that was recently caught by fuzzer;
so overall nothing looks really worrisome.
Core:
- Fix a UAF in PCM OSS proc interface
HD-audio:
- Fix memory leaks in CS35L56 driver
- Various device-specific quirks for Realtek and CS420x codecs
USB-audio:
- Quirk for TAE1160 USB Audio
- Fix for Scarlett2 Gen4 direct monitor gain
ASoC:
- Fixes for QCom q6asm-dai, Intel bytcht_es8316, and simple-mux codec
FireWire:
- Fix for Motu DSP event queue protection"
* tag 'sound-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: codecs: simple-mux: Fix enum control bounds check
ALSA: usb-audio: Add iface reset and delay quirk for TAE1160 USB Audio
ALSA: hda/cs420x: Add CS4208 fixup for iMac16,1
ALSA: hda/realtek: add quirk for HP Dragonfly Folio G3 2-in-1
ALSA: hda/realtek: Fix speaker output on ASUS ROG Strix G615LP
ASoC: qcom: q6asm-dai: use pointer type with kzalloc_obj()
ASoC: qcom: q6asm-dai: remove unnecessary braces
ASoC: qcom: q6asm-dai: fix error handling in prepare and set_params
ASoC: qcom: q6asm-dai: close stream only when running
ASoC: qcom: q6asm-dai: do not set stream state in event and trigger callbacks
ASoC: Intel: bytcht_es8316: Fix MCLK leak on init errors
ALSA: hda/realtek: Limit mic boost on Positivo DN140
ALSA: scarlett2: Fix 2i2 Gen 4 direct monitor gain on firmware 2417
ALSA: pcm: oss: Fix setup list UAF on proc write error
ALSA: hda: cs35l56: Fix system name string leaks
ALSA: hda/realtek: Add HDA_CODEC_QUIRK for Lenovo Yoga Slim 7 14AGP11
ALSA: hda/realtek: Fix incorrect comment for ALC299_FIXUP_PREDATOR_SPK
ALSA: firewire-motu: Protect register DSP event queue positions
Mark Brown [Thu, 28 May 2026 23:01:44 +0000 (00:01 +0100)]
KVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor
ZCR_EL2 can be updated by a VHE guest hypervisor either using ZCR_EL2
(which traps) or ZCR_EL1 (which does not trap). KVM handles both in
different way:
- on ZCR_EL2 trap, ZCR_EL2.LEN is immediately capped at the VM's own
VL limit. This has the potential to break existing SW that relies
on the full LEN field to be stateful.
- on ZCR_EL1 access, we do absolutely nothing.
On restoring the SVE context for an L2 guest, we directly restore the
guest hypervisor's view of ZCR_EL2 into the physical ZCR_EL2. If the
guest's view of the register was updated using the ZCR_EL2 accessor,
the value has already been sanitised (with the caveat mentioned above).
But if the guest used ZCR_EL1, the raw value is written into the HW,
and the L2 guest can now access VLs that it shouldn't.
Fix all the above by moving the VL capping to the restore points,
ensuring that:
- the HW is always programmed with a capped value, irrespective of
the accessor being used,
- the ZCR_EL2.LEN field is always completely stateful, irrespective
of the accessor being used.
Additionally, move ZCR_EL2 to be a sanitised register, ensuring that
only the LEN field is actually stateful. This requires some creative
construction of the RES0 mask, as the sysreg generation script does
not yet generate RAZ/WI fields.
Jakub Kicinski [Fri, 29 May 2026 01:10:05 +0000 (18:10 -0700)]
Merge branch 'docs-page_pool-tweaks-and-updates'
Jakub Kicinski says:
====================
docs: page_pool: tweaks and updates
I'm hoping to start feeding our docs into the AI review tools, instead
of maintaining a separate repo with review prompts. To experiment with
that we have to refresh the docs a little bit.
This set exclusively focuses on the page pool API. First patch is
a straightforward fix for information which is now out of date.
Second one attempts to clarify the NAPI linking requirements.
Third drops the dedicated section about the stats; the document
is primarily developer-facing and the stats should require no
development effort in most cases. Last but not least minor
API cleanup.
====================
Jakub Kicinski [Tue, 26 May 2026 15:57:22 +0000 (08:57 -0700)]
net: make page_pool_get_stats() void
The kdoc for page_pool_get_stats() is missing a Returns: statement.
Looking at this function, I have no idea what is the purpose of
the bool it returns. My guess was that maybe the static inline
stub returns false if CONFIG_PAGE_POOL_STATS=n but such static
inline helper doesn't exist at all. All callers pass a pointer
to a struct on the stack. Make this function void.
Jakub Kicinski [Tue, 26 May 2026 15:57:21 +0000 (08:57 -0700)]
docs: page_pool: drop the mention of the legacy stats API
The Netlink support for querying page pool stats has been
proven out in production, let's remove the mention of the
helper meant for dumping page pool stats into ethtool -S
from the docs.
Call out in the kdoc that this API is deprecated.
Some drivers may not be able to use the Netlink API
(if page pool is shared across netdevs). So the old API
is not _completely_ dead. But we shouldn't advertise it.
Jakub Kicinski [Tue, 26 May 2026 15:57:19 +0000 (08:57 -0700)]
docs: net: page_pool: drop reference to removed PP_FLAG_PAGE_FRAG
The flag was removed in commit 09d96ee5674a ("page_pool: remove
PP_FLAG_PAGE_FRAG"), but the documentation still mentions it when
describing fragment usage. Drop the stale reference; the fragment
API does not require any opt-in flag.
Commit 8871389da151 introduces common pcs dts properties which writes
rx=normal,tx=normal polarity to register SGMSYS_QPHY_WRAP_CTRL of switch.
This is initialized with tx-bit set and so change inverts polarity
compared to before.
It looks like mt7531 has tx polarity inverted in hardware and set tx-bit
by default to restore the normal polarity.
The MT7531 datasheet quite clearly states:
Register 000050EC QPHY_WRAP_CTRL -- QPHY wrapper control
Reset value: 0x00000501
BIT 1 RX_BIT_POLARITY -- RX bit polarity control
1'b0: normal
1'b1: inverted
BIT 0 TX_BIT_POLARITY -- TX bit polarity control (TX default inversed
in MT7531)
1'b0: normal
1'b1: inverted
Till this patch the register write was only called when mediatek,pnswap
property was set which cannot be done for switch because the fw-node param
was always NULL from switch driver in the mtk_pcs_lynxi_create call.
Do not configure switch side like it's done before.
Fixes: 8871389da151 ("net: pcs: pcs-mtk-lynxi: deprecate "mediatek,pnswap"") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260526153239.30194-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Remove unused support for crypto tfm cloning
This series is targeting net-next because it depends on
"net/tcp: Remove tcp_sigpool". So far no commits in cryptodev conflict
with this, so I suggest that this be taken through net-next for 7.2.
This series removes support for transformation cloning from the crypto
API. Now that the TCP-AO and TCP-MD5 code no longer uses it, it no
longer has a user. And it's unlikely that a new one will appear, as the
library API solves the problem in a much simpler and more efficient way.
This feature also regressed performance for all crypto API users, since
it changed crypto transformation objects into reference-counted objects.
That added expensive atomic operations. The refcount is reverted by
this series, thus fixing the performance regression.
A subset of this was previously sent in
https://lore.kernel.org/r/20260307224341.5644-1-ebiggers@kernel.org
Compared to that version, this version is a bit more comprehensive.
====================
Eric Biggers [Fri, 22 May 2026 05:30:28 +0000 (00:30 -0500)]
crypto: api - Fold crypto_alloc_tfmmem() into crypto_create_tfm_node()
Fold crypto_alloc_tfmmem() into its only remaining caller,
crypto_create_tfm_node(). Previously crypto_alloc_tfmmem() was called
by crypto_clone_tfm(), but crypto_clone_tfm() was removed.
This rolls back the refactoring that was done in commit 3c3a24cb0ae4
("crypto: api - Add crypto_clone_tfm").
Eric Biggers [Fri, 22 May 2026 05:30:27 +0000 (00:30 -0500)]
crypto: api - Fold __crypto_alloc_tfmgfp() into __crypto_alloc_tfm()
This reverts commit fa3b3565f3ac ("crypto: api - Add
__crypto_alloc_tfmgfp").
Fold __crypto_alloc_tfmgfp() into its only remaining caller,
__crypto_alloc_tfm(). Previously __crypto_alloc_tfmgfp() was called by
crypto_clone_cipher(), but crypto_clone_cipher() was removed.
Eric Biggers [Fri, 22 May 2026 05:30:26 +0000 (00:30 -0500)]
crypto: api - Remove per-tfm refcount
This reverts commit ae131f4970f0 ("crypto: api - Add crypto_tfm_get").
The refcount in struct crypto_tfm was added solely to support
crypto_clone_tfm(). Before then it was a simple non-refcounted object.
Since crypto_clone_tfm() has been removed, remove the refcount as well.
Note that this eliminates an expensive atomic operation from every tfm
freeing operation. So this revert doesn't just remove unused code, but
it also fixes a performance regression.
Eric Biggers [Fri, 22 May 2026 05:30:23 +0000 (00:30 -0500)]
crypto: hash - Remove support for cloning hash tfms
Hash transformation cloning no longer has a user, and there's a good
chance no new one will appear because the library API solves the problem
in a much simpler and more efficient way. Remove support for it.
Note that no tests need to be removed, as this feature had no tests.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://patch.msgid.link/20260522053028.91165-2-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 29 May 2026 00:05:23 +0000 (17:05 -0700)]
Merge tag 'wireless-next-2026-05-28' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Mostly driver updates:
- iwlwifi
- more UHR support
- NAN (multicast, schedule improvements, multi-station)
- cleanups, etc.
- ath12k
- thermal throttling/cooling device support
- 6 GHz incumbent interference detection
- channel 177 in 5 GHz
- hwsim: S1G fixes
- mac80211: NAN channel handling improvements
* tag 'wireless-next-2026-05-28' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (143 commits)
wifi: cfg80211: use strscpy in cfg80211_wext_giwname
wifi: mac80211: fix channel evacuation logic
wifi: mac80211: refactor ieee80211_nan_try_evacuate
wifi: mac80211: add an option to filter out a channel in combinations check
wifi: mac80211_hwsim: add debug messages for link changes
wifi: nl80211: re-check wiphy netns in testmode and vendor dump continuations
wifi: mac80211_hwsim: modernise S1G channel list
wifi: mac80211_hwsim: don't run RC update on new STA on S1G vif
wifi: mwifiex: remove an unnecessary check
wifi: mac80211: add KUnit coverage for negotiated TTLM parser
wifi: ath12k: fix error unwind on arch_init() failure in PCI probe
wifi: iwlwifi: mld: fix indentation in iwl_mld_fill_supp_rates()
wifi: iwlwifi: transport: add memory read under NIC access
wifi: iwlwifi: dbg: remove unused 'range_len' arg from dump
wifi: iwlwifi: fw: separate out old-style dump code
wifi: iwlwifi: fw: dbg: always use non-tracing PRPH access
wifi: iwlwifi: fw: separate ini dump allocation
wifi: iwlwifi: fw: move struct iwl_fw_ini_dump_entry to dbg.c
wifi: iwlwifi: clean up location format/BW encoding
wifi: iwlwifi: Add names for Killer BE1735x and BE1730x
...
====================
Jakub Kicinski [Fri, 29 May 2026 00:02:54 +0000 (17:02 -0700)]
Merge tag 'for-net-2026-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_core: Rework hci_dev_do_reset() to use hci_sync functions
- hci_conn: Fix memory leak in hci_le_big_terminate()
- hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close
- hci_sync: Reset device counters in hci_dev_close_sync()
- hci_sync: fix UAF in hci_le_create_cis_sync
- L2CAP: Fix possible crash on l2cap_ecred_conn_rsp
- L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn
- L2CAP: use chan timer to close channels in cleanup_listen()
- L2CAP: clear chan->ident on ECRED reconfiguration success
- ISO: fix UAF in iso_recv_frame
- ISO: serialize iso_sock_clear_timer with socket lock
- HIDP: fix missing length checks in hidp_input_report()
- 6lowpan: check skb_clone() return value in send_mcast_pkt()
- btusb: Allow firmware re-download when version matches
- hci_qca: Use 100 ms SSR delay for rampatch and NVM loading
* tag 'for-net-2026-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Reset device counters in hci_dev_close_sync()
Bluetooth: hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close
Bluetooth: hci_core: Rework hci_dev_do_reset() to use hci_sync functions
Bluetooth: ISO: serialize iso_sock_clear_timer with socket lock
Bluetooth: ISO: fix UAF in iso_recv_frame
Bluetooth: L2CAP: Fix possible crash on l2cap_ecred_conn_rsp
Bluetooth: l2cap: clear chan->ident on ECRED reconfiguration success
Bluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading
Bluetooth: hci_sync: fix UAF in hci_le_create_cis_sync
Bluetooth: 6lowpan: check skb_clone() return value in send_mcast_pkt()
Bluetooth: btusb: Allow firmware re-download when version matches
Bluetooth: HIDP: fix missing length checks in hidp_input_report()
Bluetooth: L2CAP: use chan timer to close channels in cleanup_listen()
Bluetooth: L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn
Bluetooth: hci_conn: Fix memory leak in hci_le_big_terminate()
====================
====================
selftests: mptcp: reduce bufferbloat and cleanup
Bufferbloat is baaaad, even in our selftests: let's kill it (or at least
reduce it). By doing that, the tests (seem to) have a more stable
transfer, and are then less unstable. That's what patches 1-2 are doing,
and they can be backported up to 5.10.
Patch 3 is not related: a small fix in the selftests to remove temp
files that were not deleted in some conditions, since v5.13.
====================
Geliang Tang [Wed, 27 May 2026 12:11:36 +0000 (22:11 +1000)]
selftests: mptcp: sockopt: set EXIT trap earlier
Set the EXIT trap for cleanup immediately after creating temporary file
variables, before init and make_file, to ensure cleanup runs on any
failure or interruption during the early setup phase.
Avoid using a fixed limit, no matter the setup. This was causing too
high bufferbloat in some situations, e.g. with a low bandwidth and very
low delay because the default limit was too high for this case.
Instead, use more appropriated limits. Note that unbalanced bandwidth
modes seem to require slightly higher limits to cope with the different
bursts.
Netem is used to apply a rate limit, and its 'limit' option is per
packet.
Disable GSO on both sides to work with packets of a specific size. That
increases the number of packets, but stabilise the throughput. As a
consequence, limits are more adapted, and the bufferbloat is reduced.
Zhenghang Xiao [Wed, 27 May 2026 03:24:11 +0000 (11:24 +0800)]
sctp: fix race between sctp_wait_for_connect and peeloff
sctp_wait_for_connect() drops and re-acquires the socket lock while
waiting for the association to reach ESTABLISHED state. During this
window, another thread can peeloff the association to a new socket via
getsockopt(SCTP_SOCKOPT_PEELOFF), changing asoc->base.sk. After
re-acquiring the old socket lock, sctp_wait_for_connect() returns
success without noticing the migration — the caller then accesses
the association under the wrong lock in sctp_datamsg_from_user().
Add the same sk != asoc->base.sk check that sctp_wait_for_sndbuf()
already has, returning an error if the association was migrated while
we slept.
Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave") Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260527032411.60959-1-kipreyyy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: mana: Fix NULL dereferences during teardown after attach failure
When mana_attach() fails (e.g. during queue allocation), the error
cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple
subsequent teardown paths can then dereference these NULL pointers,
causing kernel panics.
Patch 1 adds NULL guards in the low-level teardown functions
(mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are
safe to call regardless of queue initialization state. This covers all
callers: mana_remove(), mana_change_mtu() recovery, and internal error
paths in mana_alloc_queues().
Patch 2 adds an early exit in mana_detach() for already-detached ports,
making it safe for non-close callers. This allows the queue reset
handler to safely retry mana_attach() without redundant teardown.
====================
Dipayaan Roy [Mon, 25 May 2026 08:08:25 +0000 (01:08 -0700)]
net: mana: Skip redundant detach on already-detached port
When mana_per_port_queue_reset_work_handler() runs after a previous
detach succeeded but attach failed, the port is left in a detached
state with apc->tx_qp and apc->rxqs already freed. Calling
mana_detach() again unconditionally leads to NULL pointer dereferences
during queue teardown.
Add an early exit in mana_detach() when the port is already in
detached state (!netif_device_present) for non-close callers, making
it safe to call idempotently. This allows the queue reset handler and
other recovery paths to simply retry mana_attach() without redundant
teardown.
Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-3-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dipayaan Roy [Mon, 25 May 2026 08:08:24 +0000 (01:08 -0700)]
net: mana: Add NULL guards in teardown path to prevent panic on attach failure
When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as
mana_remove(), mana_change_mtu() recovery, and internal error handling
in mana_alloc_queues() can subsequently call into functions that
dereference these pointers without NULL checks:
- mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer
dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26).
- mana_destroy_vport() iterates apc->rxqs without a NULL check.
- mana_fence_rqs() iterates apc->rxqs without a NULL check.
- mana_dealloc_queues() iterates apc->tx_qp without a NULL check.
Add NULL guards for apc->rxqs in mana_fence_rqs(),
mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a
NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue
draining when TX queues were never allocated or already freed.
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-2-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xuan Zhuo [Wed, 27 May 2026 12:09:36 +0000 (20:09 +0800)]
net: remove SIOCSHWTSTAMP and SIOCGHWTSTAMP from ndo_eth_ioctl comment
Since commit 4ee58e1e5680 ("net: promote SIOCSHWTSTAMP and SIOCGHWTSTAMP
ioctls to dedicated handlers"), SIOCSHWTSTAMP and SIOCGHWTSTAMP are no
longer dispatched through dev_eth_ioctl() / ndo_eth_ioctl(). They are
now handled by their own dedicated functions dev_set_hwtstamp() and
dev_get_hwtstamp() in the ioctl path.
However, the comment describing ndo_eth_ioctl in netdevice.h still
lists these two ioctls, which is misleading for driver developers who
may incorrectly assume they need to handle hardware timestamping
commands in their ndo_eth_ioctl implementation.
Remove the stale references from the comment to accurately reflect that
ndo_eth_ioctl only handles SIOCGMIIPHY, SIOCGMIIREG and SIOCSMIIREG.
Jakub Kicinski [Wed, 27 May 2026 16:25:22 +0000 (09:25 -0700)]
net: ethtool: don't take rtnl_lock for global string dump
ETHTOOL_MSG_STRSET_GET is the only op which sets allow_nodev_do.
When no device is provided it dumps static tables, there's no
need to hold rtnl_lock for this.
Not taking rtnl_lock is a minor win in itself so I think this
patch stands on its own merits. Later on it will be useful
to do locking only in paths which have access to a netdev,
so that we can decide which locks to take per-netdev.
Revert "vsock/virtio: fix skb overhead overflow on 32-bit builds"
This reverts commit 4157501b9a8f ("vsock/virtio: fix skb overhead
overflow on 32-bit builds"). The fix was semantically correct (although
it would have been better to use mul_u32_u32(), as David pointed out),
but in practice we are estimating the memory used to allocate the SKBs,
and this will never cause a 32-bit variable to overflow on a 32-bit
system, since the memory would have run out long before that. On 64-bit,
SKB_TRUESIZE() already evaluates to size_t, so the multiplication is
already in 64-bit arithmetic without the cast.
Let's revert this to avoid unnecessary 64-bit multiplies on the
per-packet receive path on 32-bit systems.
Reported-by: David Laight <david.laight.linux@gmail.com> Closes: https://lore.kernel.org/netdev/20260523173557.5cc4f4f6@pumpkin Suggested-by: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Laight <david.laight.linux@gmail.com> Link: https://patch.msgid.link/20260527171046.130211-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
docs: net: updates for old and cobwebbed docs
I'm hoping to start feeding our docs into the AI review tools, instead
of maintaining a separate repo with review prompts. To experiment with
that we have to refresh the docs a little bit.
A read thru our current docs makes one slightly question the value
of including them in reviews. But directionally, I feel, it's probably
still right. I'm hoping the Rx Checksum section about not dropping packets
for example to be impactful. I don't think the current AI agents or
review docs include this guidance.
====================
Jakub Kicinski [Tue, 26 May 2026 16:01:49 +0000 (09:01 -0700)]
docs: net: add Rx notes to the checksum guide
The Rx checksum processing gives people pause. The two main questions
in my experience are:
- what to do with bad IPv4 checksum; and
- what to do with packets with bad checksum.
Folks often feel the urge to drop the latter, to "avoid overloading
the host".
Jakub Kicinski [Tue, 26 May 2026 16:01:48 +0000 (09:01 -0700)]
docs: net: fix minor issues with checksum offloads
Update the checksum offload documentation to match current code:
- SCTP CRC32c offload requires NETIF_F_SCTP_CRC, not ordinary IP
checksum offload
- NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM are restricted legacy
features; new devices should use NETIF_F_HW_CSUM
- GRE LCO is handled by the shared gre_build_header() helper used by
both IPv4 and IPv6 GRE
- VXLAN_F_REMCSUM_TX is a VXLAN configuration flag, not a field of
struct vxlan_rdst
Jakub Kicinski [Tue, 26 May 2026 16:01:47 +0000 (09:01 -0700)]
docs: net: refresh netdev feature guidance
Update netdev feature documentation for current locking rules and
feature semantics. Clarify hw_features updates and netdev_update_features()
locking, keep the NETIF_F_NEVER_CHANGE rule with the VLAN challenged
exception, fix the HSR duplication wording, and document netdev->netmem_tx
as a device flag rather than a feature bit.
Split the list of basic feature sets from the "extra" ones like
vlan_features. A bunch of the newer fields weren't documented and
having them all together would be confusing.
Jakub Kicinski [Tue, 26 May 2026 16:01:46 +0000 (09:01 -0700)]
docs: net: fix minor issues with the NAPI guide
Update the NAPI documentation to match current API behavior:
- repeated napi_disable() calls hang waiting for ownership, rather
than deadlock
- NAPI IDs are exposed through SO_INCOMING_NAPI_ID and netdev Netlink
- epoll uses the maxevents parameter spelling
- add that drivers holding the netdev instance lock may need _locked()
variants
Jakub Kicinski [Tue, 26 May 2026 16:01:44 +0000 (09:01 -0700)]
docs: net: statistics: fix kernel-internal stats list
Update the kernel-internal ethtool stats list to match current code:
- spell the entries as "struct ethtool_*_stats", not as functions
- list the full set of structures, not only pause and fec
- mention that fields are pre-initialized to ETHTOOL_STAT_NOT_SET by
ethtool_stats_init() and drivers should leave unsupported fields at
that value rather than zeroing them
Jakub Kicinski [Tue, 26 May 2026 16:01:43 +0000 (09:01 -0700)]
docs: net: fix minor issues with driver guide
Update the driver documentation TX queue example to match current APIs:
- use the ring-local tx_ring_mask field in drv_tx_avail()
- stop the selected netdev_queue with netif_tx_stop_queue() instead of
stopping queue 0 with netif_stop_queue()
Jakub Kicinski [Tue, 26 May 2026 16:01:42 +0000 (09:01 -0700)]
docs: net: netdevices: small fixes and clarifications
A handful of unrelated nits:
- free_netdevice() does not exist; replace two stray references
with free_netdev().
- The simple-driver probe example fell through into err_undo after
register_netdev() success; add return 0 for clarity.
- Clarify the netdev_priv() paragraph: "(netdev_priv())" was easy
to misread as the thing that needs explicit freeing; spell out
that it refers to extra pointers stored in the device private
struct.
- ndo_setup_tc synchronization note: TC_SETUP_BLOCK / TC_SETUP_FT
actually run under block->cb_lock, not "NFT locks", and rtnl_lock
may or may not be held depending on path.
- ->lltx guidance reads as very outdated, it's not really deprecated.
I suspect people may have been trying to use it for HW drivers
in the past but I can't think of such a case in the last decade.
net/sched/sch_netem.c a2f6ed7b4873 ("net/sched: netem: add per-impairment extended statistics") 9552b11e3eda ("net/sched: fix packet loop on netem when duplicate is on")
Adjacent changes:
drivers/dpll/zl3073x/core.c c1224569cef0 ("dpll: zl3073x: make frequency monitor a per-device attribute") 54e65df8cf18 ("dpll: zl3073x: report FFO as DPLL vs input reference offset")
net/iucv/af_iucv.c 347fdd4df85f ("af_iucv: convert to getsockopt_iter") 3589d20a666c ("net/iucv: fix locking in .getsockopt")
Linus Torvalds [Thu, 28 May 2026 20:45:10 +0000 (13:45 -0700)]
Merge tag 'acpi-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support fixes from Rafael Wysocki:
"Fix three issues in the ACPI button driver: a possible crash due to a
button press after unloading the driver (introduced during the 6.15
development cycle), function keys breakage on Toshiba Tecra X40 due to
missing ACPI events (introduced during the 7.0 development cycle), and
a missing probe rollback path item that has not been added by mistake
during a recent update"
* tag 'acpi-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: button: Add missing device class clearing on probe failures
ACPI: button: Enable wakeup GPEs for ACPI buttons at probe time
ACPI: button: Fix ACPI GPE handler leak during removal
Linus Torvalds [Thu, 28 May 2026 20:13:48 +0000 (13:13 -0700)]
Merge tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"This is again significantly bigger than the same point into the
previous cycle, but at least smaller than last week.
I'm not aware of any pending regression for the current cycle.
Including fixes from netfilter.
Current release - regressions:
- netfilter: walk fib6_siblings under RCU
Previous releases - regressions:
- netlink: fix sending unassigned nsid after assigned one
- bridge: fix sleep in atomic context in netlink path
- eth: tun: free page on short-frame rejection in tun_xdp_one()
Previous releases - always broken:
- skbuff: fix missing zerocopy reference in pskb_carve helpers
- handshake: drain pending requests at net namespace exit
- ethtool:
- rss: avoid modifying the RSS context response
- module: avoid leaking a netdev ref on module flash errors
- coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES
- netfilter: fix dst corruption in same register operation
- nfc: hci: fix out-of-bounds read in HCP header parsing
- ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo()
- eth:
- vti: use ip6_tnl.net in vti6_changelink().
- vxlan: do not reuse cached ip_hdr() value after
skb_tunnel_check_pmtu()"
* tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
dpll: zl3073x: make frequency monitor a per-device attribute
dpll: zl3073x: use __dpll_device_change_ntf() and remove change_work
dpll: export __dpll_device_change_ntf() for use under dpll_lock
net/handshake: Drain pending requests at net namespace exit
net/handshake: Verify file-reference balance in submit paths
net/handshake: Close the submit-side sock_hold race
net/handshake: hand off the pinned file reference to accept_doit
net/handshake: Take a long-lived file reference at submit
net/handshake: Pass negative errno through handshake_complete()
nvme-tcp: store negative errno in queue->tls_err
net/handshake: Use spin_lock_bh for hn_lock
net: skbuff: fix missing zerocopy reference in pskb_carve helpers
net: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path
net: hibmcge: disable Relaxed Ordering to fix RX packet corruption
selftests/tc-testing: Add netem test case exercising loops
selftests/tc-testing: Add mirred test cases exercising loops
net/sched: act_mirred: Fix return code in early mirred redirect error paths
net/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow
net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop
net/sched: fix packet loop on netem when duplicate is on
...
Linus Torvalds [Thu, 28 May 2026 19:36:39 +0000 (12:36 -0700)]
Merge tag 'gpio-fixes-for-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix interrupt handling in gpio-mxc
- fix scoped_guard() usage in gpio-adnp
- don't accept partial writes in gpio-virtuser debugfs interface as
they can't really work correctly
- fix resource leaks in gpio-rockchip
- fix locking issues in remove path in shared GPIO management
- undo the vote of a GPIO shared proxy virtual device on GPIO release
* tag 'gpio-fixes-for-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: rockchip: teardown bugs and resource leaks
gpio: rockchip: convert bank->clk to devm_clk_get_enabled()
gpio: virtuser: Fix uninitialized data bug in gpio_virtuser_direction_do_write()
gpio: shared: fix lockdep false positive by removing unneeded lock
gpio: shared: fix deadlock on shared proxy's parent removal
gpio: adnp: fix flow control regression caused by scoped_guard()
gpio: shared: undo the vote of the proxy on GPIO free
gpio: mxc: fix irq_high handling
Linus Torvalds [Thu, 28 May 2026 18:45:41 +0000 (11:45 -0700)]
security/keys: fix missed RCU read section on lookup
Nicholas Carlini reports that the keyring code calls assoc_array_find()
in find_key_to_update() without holding the RCU read lock, while the
assoc_array_gc() code really is designed around removing the node from
the tree and then freeing it after an RCU grace-period.
The regular key handling doesn't see this because holding the keyring
semaphore hides any lifetime issues, but the persistent key handling
uses a different model.
Instead of extending the keyring locking, just do the simple RCU locking
that the assoc_array was designed for.
Reported-by: Nicholas Carlini <npc@anthropic.com> Cc: David Howells <dhowells@redhat.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Jones [Wed, 27 May 2026 16:05:26 +0000 (17:05 +0100)]
HID: wacom: Fix OOB write in wacom_hid_set_device_mode()
wacom_hid_set_device_mode() currently assumes that the HID_DG_INPUTMODE
usage is always located in the first field (field[0]) of the feature report.
However, a device can specify HID_DG_INPUTMODE in a different field.
If HID_DG_INPUTMODE is in a field other than the first one and the first
field has a report_count smaller than the usage_index of HID_DG_INPUTMODE,
this leads to an out-of-bounds write to r->field[0]->value.
Fix this by storing the field index of HID_DG_INPUTMODE in 'struct
hid_data' during feature mapping. In wacom_hid_set_device_mode(), use
this stored field index to access the correct field and add bounds
checks to ensure both the field index and the value index are within
valid ranges before writing.
Cc: stable@vger.kernel.org Fixes: 5ae6e89f7409 ("HID: wacom: implement the finger part of the HID generic handling") Tested-by: Ping Cheng <ping.cheng@wacom.com> Reviewed-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Clang recently added support for -Wattribute-alias [1], which results in
the same warnings that necessitated commit bee20031772a ("disable
-Wattribute-alias warning for SYSCALL_DEFINEx()") for GCC.
kernel/time/itimer.c:325:1: error: alias and aliasee have different types 'long (unsigned int)' and 'long (typeof (__builtin_choose_expr((__builtin_types_compatible_p(typeof ((unsigned int)0), typeof (0LL)) || __builtin_types_compatible_p(typeof ((unsigned int)0), typeof (0ULL))), 0LL, 0L)))' (aka 'long (long)') [-Werror,-Wattribute-alias]
325 | SYSCALL_DEFINE1(alarm, unsigned int, seconds)
| ^
include/linux/syscalls.h:225:36: note: expanded from macro 'SYSCALL_DEFINE1'
225 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
| ^
include/linux/syscalls.h:236:2: note: expanded from macro 'SYSCALL_DEFINEx'
236 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
| ^
include/linux/syscalls.h:251:18: note: expanded from macro '__SYSCALL_DEFINEx'
251 | __attribute__((alias(__stringify(__se_sys##name)))); \
| ^
kernel/time/itimer.c:325:1: note: aliasee is declared here
include/linux/syscalls.h:225:36: note: expanded from macro 'SYSCALL_DEFINE1'
225 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
| ^
include/linux/syscalls.h:236:2: note: expanded from macro 'SYSCALL_DEFINEx'
236 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
| ^
include/linux/syscalls.h:255:18: note: expanded from macro '__SYSCALL_DEFINEx'
255 | asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
| ^
<scratch space>:16:1: note: expanded from here
16 | __se_sys_alarm
| ^
Disable the warnings in the same way for clang-23 and newer. Disable the
warning about unknown warning options to avoid breaking the build for
versions of clang-23 that do not have -Wattribute-alias, such as ones
deployed by vendors like Android or CI systems or when bisecting LLVM
between llvmorg-23-init and release/23.x.
Marco Scardovi [Tue, 26 May 2026 17:02:46 +0000 (19:02 +0200)]
gpio: rockchip: teardown bugs and resource leaks
Address several teardown issues and resource leaks in the driver's remove
path and error handling:
1. Debounce clock reference leak: The debounce clock (bank->db_clk) is
obtained using of_clk_get() which increments the clock's reference
count, but clk_put() is never called. Register a devm action to
cleanly release it on unbind. Note that of_clk_get(..., 1) remains
necessary over devm_clk_get() because the DT binding does not define
clock-names, precluding name-based lookup.
2. Unregistered chained IRQ handler: The chained IRQ handler is not
disconnected in remove(). If a stray interrupt fires after the driver
is removed, the kernel attempts to execute a stale handler, leading
to a panic. Fix this by clearing the handler in remove().
3. IRQ domain leak: The linear IRQ domain and its generic chips are
allocated manually during probe but never removed. Remove the IRQ
domain during driver teardown to free the associated generic chips
and mappings.
Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Assisted-by: Antigravity:gemini-3.5-flash Signed-off-by: Marco Scardovi <scardracs@disroot.org> Link: https://patch.msgid.link/20260526171050.12785-3-scardracs@disroot.org
[Bartosz: don't emit an error message on devres allocation failure] Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Marco Scardovi [Tue, 26 May 2026 17:02:45 +0000 (19:02 +0200)]
gpio: rockchip: convert bank->clk to devm_clk_get_enabled()
The bank->clk was previously obtained via of_clk_get() and manually
prepared/enabled. However, it was missing a corresponding clk_put() in
both the error paths and the remove function, leading to a reference leak.
Convert the allocation to devm_clk_get_enabled(), which also properly
propagates failures from clk_prepare_enable() that were previously ignored.
The GPIO bank device uses the same OF node as the previous of_clk_get()
call, so devm_clk_get_enabled(dev, NULL) correctly resolves the same
clock provider entry.
Fix the reference leak and simplify the code by removing the manual
clk_disable_unprepare() calls in the probe error paths and in the
remove function.
Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Assisted-by: Antigravity:gemini-3.5-flash Signed-off-by: Marco Scardovi <scardracs@disroot.org> Link: https://patch.msgid.link/20260526171050.12785-2-scardracs@disroot.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Dan Carpenter [Mon, 25 May 2026 07:15:16 +0000 (10:15 +0300)]
gpio: virtuser: Fix uninitialized data bug in gpio_virtuser_direction_do_write()
If *ppos is non-zero (user-space write split over multiple calls to
write()) then simple_write_to_buffer() won't initialize the start of the
buffer. Really, non-zero values for *ppos aren't going to work at all.
Check for that and return -EINVAL at the start of the function.
Fixes: 91581c4b3f29 ("gpio: virtuser: new virtual testing driver for the GPIO API") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://patch.msgid.link/ahP3BJWWy-m_qI0X@stanley.mountain Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>