Revert commit 70523f335734 ("Revert "x86/smp: Eliminate
mwait_play_dead_cpuid_hint()"") to reapply the changes from commit 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
reverted by it.
Previously, these changes caused idle power to rise on systems booting
with "nosmt" in the kernel command line because they effectively caused
"dead" SMT siblings to remain in idle state C1 after executing the HLT
instruction, which prevented the processor from reaching package idle
states deeper than PC2 going forward.
Now, the "dead" SMT siblings are rescanned after initializing a proper
cpuidle driver for the processor (either intel_idle or ACPI idle), at
which point they are able to enter a sufficiently deep idle state
in native_play_dead() via cpuidle, so the code changes in question can
be reapplied.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Link: https://patch.msgid.link/7813065.EvYhyI6sBW@rjwysocki.net
ACPI: processor: Rescan "dead" SMT siblings during initialization
Make acpi_processor_driver_init() call arch_cpu_rescan_dead_smt_siblings(),
via a new wrapper function called acpi_idle_rescan_dead_smt_siblings(),
after successfully initializing the driver, to allow the "dead" SMT
siblings to go into deep idle states, which is necessary for the
processor to be able to reach deep package C-states (like PC10) going
forward, so that power can be reduced sufficiently in suspend-to-idle,
among other things.
However, do it only if the ACPI idle driver is the current cpuidle
driver (otherwise it is assumed that another cpuidle driver will take
care of this) and avoid doing it on architectures other than x86.
intel_idle: Rescan "dead" SMT siblings during initialization
Make intel_idle_init() call arch_cpu_rescan_dead_smt_siblings() after
successfully registering intel_idle as the cpuidle driver so as to
allow the "dead" SMT siblings (if any) to go into deep idle states.
This is necessary for the processor to be able to reach deep package
C-states (like PC10) going forward which is requisite for reducing
power sufficiently in suspend-to-idle, among other things.
Move the inner part of the arch_resume_nosmt() code into a separate
function called arch_cpu_rescan_dead_smt_siblings(), so it can be
used in other places where "dead" SMT siblings may need to be taken
online and offline again in order to get into deep idle states.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Link: https://patch.msgid.link/3361688.44csPzL39Z@rjwysocki.net
[ rjw: Prevent build issues with CONFIG_SMP unset ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
intel_idle: Use subsys_initcall_sync() for initialization
It is not necessary to wait until the device_initcall() stage with
intel_idle initialization. All of its dependencies are met after
all subsys_initcall()s have run, so subsys_initcall_sync() can be
used for initializing it.
It is also better to ensure that intel_idle will always initialize
before the ACPI processor driver that uses module_init() for its
initialization.
Linus Torvalds [Thu, 5 Jun 2025 19:47:12 +0000 (12:47 -0700)]
Merge tag 'pm-6.16-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix three issues introduced into device suspend/resume error paths in
the PM core by some of the recent updates.
First off, replace list_splice() with list_splice_init() in three
places in device suspend error paths to avoid attempting to use an
uninitialized list head going forward.
Second, rearrange device_resume() to avoid leaking the
power.is_suspended device PM flag to the next system suspend/resume
cycle where it can confuse rolling back after an error or early
wakeup.
Finally, add synchronization to dpm_async_resume_children() to avoid
resetting the async state mistakenly for devices whose resume
callbacks have already been queued up for asynchronous execution in
the given device resume phase, which fortunately can happen only if
the preceding system suspend transition has been aborted"
* tag 'pm-6.16-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Add locking to dpm_async_resume_children()
PM: sleep: Fix power.is_suspended cleanup for direct-complete devices
PM: sleep: Fix list splicing in device suspend error paths
- fix net_devmem_bind_dmabuf() stub when DEVMEM not compiled
- eth: airoha: fixes for config / accel in bridge mode
Previous releases - regressions:
- Bluetooth: hci_qca: move the SoC type check to the right place, fix
GPIO integration
- prevent a NULL deref in rtnl_create_link() after locking changes
- fix udp gso skb_segment after pull from frag_list
- hv_netvsc: fix potential deadlock in netvsc_vf_setxdp()
Previous releases - always broken:
- netfilter:
- nf_nat: also check reverse tuple to obtain clashing entry
- nf_set_pipapo_avx2: fix initial map fill (zeroing)
- fix the helper for incremental update of packet checksums after
modifying the IP address, used by ILA and BPF
- eth:
- stmmac: prevent div by 0 when clock rate is misconfigured
- ice: fix Tx scheduler handling of XDP and changing queue count
- eth: fix support for the RGMII interface when delays configured"
* tag 'net-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
calipso: unlock rcu before returning -EAFNOSUPPORT
seg6: Fix validation of nexthop addresses
net: prevent a NULL deref in rtnl_create_link()
net: annotate data-races around cleanup_net_task
selftests: drv-net: tso: make bkg() wait for socat to quit
selftests: drv-net: tso: fix the GRE device name
selftests: drv-net: add configs for the TSO test
wireguard: device: enable threaded NAPI
netlink: specs: rt-link: decode ip6gre
netlink: specs: rt-link: add missing byte-order properties
net: wwan: mhi_wwan_mbim: use correct mux_id for multiplexing
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
net: dsa: b53: do not touch DLL_IQQD on bcm53115
net: dsa: b53: allow RGMII for bcm63xx RGMII ports
net: dsa: b53: do not configure bcm63xx's IMP port interface
net: dsa: b53: do not enable RGMII delay on bcm63xx
net: dsa: b53: do not enable EEE on bcm63xx
net: ti: icssg-prueth: Fix swapped TX stats for MII interfaces.
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
...
Linus Torvalds [Thu, 5 Jun 2025 18:45:33 +0000 (11:45 -0700)]
Merge tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Johannes Berg:
"The only really new thing is the long-standing seccomp work
(originally from 2021!). Wven if it still isn't enabled by default due
to security concerns it can still be used e.g. for tests.
- remove obsolete network transports
- remove PCI IO port support
- start adding seccomp-based process handling instead of ptrace"
* tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (29 commits)
um: remove "extern" from implementation of sigchld_handler
um: fix unused variable warning
um: fix SECCOMP 32bit xstate register restore
um: pass FD for memory operations when needed
um: Add SECCOMP support detection and initialization
um: Implement kernel side of SECCOMP based process handling
um: Track userspace children dying in SECCOMP mode
um: Add helper functions to get/set state for SECCOMP
um: Add stub side of SECCOMP/futex based process handling
um: Move faultinfo extraction into userspace routine
um: vector: Use mac_pton() for MAC address parsing
um: vector: Clean up and modernize log messages
um: chan_kern: use raw spinlock for irqs_to_free_lock
MAINTAINERS: remove obsolete file entry in TUN/TAP DRIVER
um: Fix tgkill compile error on old host OSes
um: stop using PCI port I/O
um: Remove legacy network transport infrastructure
um: vector: Eliminate the dependency on uml_net
um: Remove obsolete legacy network transports
um/asm: Replace "REP; NOP" with PAUSE mnemonic
...
Linus Torvalds [Thu, 5 Jun 2025 18:39:17 +0000 (11:39 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"We've got a couple of build fixes when using LLD, a missing TLB
invalidation and a workaround for broken firmware on SoCs with CPUs
that implement MPAM:
- Disable problematic linker assertions for broken versions of LLD
- Work around sporadic link failure with LLD and various randconfig
builds
- Fix missing invalidation in the TLB batching code when reclaim
races with mprotect() and friends
- Add a command-line override for MPAM to allow booting on systems
with broken firmware"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Add override for MPAM
arm64/mm: Close theoretical race where stale TLB entry remains valid
arm64: Work around convergence issue with LLD linker
arm64: Disable LLD linker ASSERT()s for the time being
Linus Torvalds [Thu, 5 Jun 2025 18:33:09 +0000 (11:33 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:
- Fix arch_memremap_can_ram_remap() which incorrectly passed a PFN to
memblock_is_map_memory rather than the actual address.
- Disallow kernel mode NEON when IRQs are disabled
Explanation:
"To avoid having to preserve/restore kernel mode NEON state when
such a softirq is taken softirqs are now disabled when using the
NEON from task context."
should explain that it's nested kernel mode.
In other words, softirqs from user mode are fine, because the context
will be preserved. softirqs from kernel mode may be from a context
that has already saved the user NEON state, and thus we would need to
preserve the NEON state for the parent kernel mode context, and this
we don't allow.
The problem occurs when the kernel context disables hard IRQs, and
then uses NEON. When it's finished, and restores the userspace NEON
state, we call local_bh_enable() with hard IRQs disabled, which
causes a warning.
This commit addresses that by disallowing the use of NEON with hard
IRQs disabled.
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9446/1: Disallow kernel mode NEON when IRQs are disabled
ARM: 9447/1: arm/memremap: fix arch_memremap_can_ram_remap()
Linus Torvalds [Thu, 5 Jun 2025 15:54:47 +0000 (08:54 -0700)]
Merge tag 'rtc-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There are two new drivers this cycle. There is also support for a
negative offset for RTCs that have been shipped with a date set using
an epoch that is before 1970. This unfortunately happens with some
products that ship with a vendor kernel and an out of tree driver.
Core:
- support negative offsets for RTCs that have shipped with an epoch
earlier than 1970
New drivers:
- NXP S32G2/S32G3
- Sophgo CV1800
Drivers:
- loongson: fix missing alarm notifications for ACPI
- m41t80: kickstart ocillator upon failure
- mt6359: mt6357 support
- pcf8563: fix wrong alarm register
- sh: cleanups"
* tag 'rtc-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (39 commits)
rtc: mt6359: Add mt6357 support
rtc: test: Test date conversion for dates starting in 1900
rtc: test: Also test time and wday outcome of rtc_time64_to_tm()
rtc: test: Emit the seconds-since-1970 value instead of days-since-1970
rtc: Fix offset calculation for .start_secs < 0
rtc: Make rtc_time64_to_tm() support dates before 1970
rtc: pcf8563: fix wrong alarm register
rtc: rzn1: support input frequencies other than 32768Hz
rtc: rzn1: Disable controller before initialization
dt-bindings: rtc: rzn1: add optional second clock
rtc: m41t80: reduce verbosity
rtc: m41t80: kickstart ocillator upon failure
rtc: s32g: add NXP S32G2/S32G3 SoC support
dt-bindings: rtc: add schema for NXP S32G2/S32G3 SoCs
dt-bindings: at91rm9260-rtt: add microchip,sama7d65-rtt
dt-bindings: rtc: at91rm9200: add microchip,sama7d65-rtc
rtc: loongson: Add missing alarm notifications for ACPI RTC events
rtc: sophgo: add rtc support for Sophgo CV1800 SoC
rtc: stm32: drop unused module alias
rtc: s3c: drop unused module alias
...
Linus Torvalds [Thu, 5 Jun 2025 15:49:30 +0000 (08:49 -0700)]
Merge tag 'dmaengine-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"A fairly small update for the dmaengine subsystem. This has a new ARM
dmaengine driver and couple of new device support and few driver
changes:
New support:
- Renesas RZ/V2H(P) dma support for r9a09g057
- Arm DMA-350 driver
- Tegra Tegra264 ADMA support
Linus Torvalds [Thu, 5 Jun 2025 15:20:21 +0000 (08:20 -0700)]
Merge tag 'phy-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"As usual featuring couple of new driver and bunch of new device
support and some driver changes to Freescale, rockchip driver along
with couple of yaml binding conversions.
New Support:
- Qualcomm IPQ5424 qusb2 support, IPQ5018 uniphy-pcie driver
- Rockchip usb2 support for RK3562, RK3036 usb2 phy support
- Samsung exynos2200 eusb2 phy support and driver refactoring for
this support, exynos7870 USBDRD support
- Mediatek MT7988 xs-phy support
- Broadcom BCM74110 usb phy support
- Renesas RZ/V2H(P) usb2 phy support
Updates:
- Freescale phy rate claculation updates, i.MX95 tuning support
- Better error handling for amlogic pcie phy
- Rockchip color depth configuration and management support
- Yaml binding conversion for RK3399 Type-C and PCIe Phy"
* tag 'phy-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (77 commits)
phy: tegra: p2u: Broaden architecture dependency
phy: rockchip: inno-usb2: Add usb2 phy support for rk3562
dt-bindings: phy: rockchip,inno-usb2phy: add rk3562
phy: rockchip: inno-usb2: add phy definition for rk3036
dt-bindings: phy: rockchip,inno-usb2phy: add rk3036 compatible
phy: freescale: fsl-samsung-hdmi: Improve LUT search for best clock
phy: freescale: fsl-samsung-hdmi: Refactor finding PHY settings
phy: freescale: fsl-samsung-hdmi: Rename phy_clk_round_rate
phy: renesas: phy-rcar-gen3-usb2: Add USB2.0 PHY support for RZ/V2H(P)
phy: renesas: phy-rcar-gen3-usb2: Sort compatible entries by SoC part number
dt-bindings: phy: renesas,usb2-phy: Document RZ/V2H(P) SoC
dt-bindings: phy: renesas,usb2-phy: Add clock constraint for RZ/G2L family
phy: exynos5-usbdrd: support Exynos USBDRD 3.2 4nm controller
phy: phy-snps-eusb2: add support for exynos2200
phy: phy-snps-eusb2: refactor reference clock init
phy: phy-snps-eusb2: make reset control optional
phy: phy-snps-eusb2: make repeater optional
phy: phy-snps-eusb2: split phy init code
phy: phy-snps-eusb2: refactor constructs names
phy: move phy-qcom-snps-eusb2 out of its vendor sub-directory
...
Linus Torvalds [Thu, 5 Jun 2025 15:07:24 +0000 (08:07 -0700)]
Merge tag 'soundwire-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
"A couple of small core changes and an Intel driver change:
- sdw_assign_device_num() logic simplification, using internal slave
id for irqs and optimizing computing of port params in specific
stream states
- Intel driver updates for ACE3+ microphone privacy status reporting
and enabling the status in HDA Intel driver"
* tag 'soundwire-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: only compute port params in specific stream states
ASoC: SOF: Intel: hda: Set the mic_privacy flag for soundwire with ACE3+
soundwire: intel: Add awareness of ACE3+ microphone privacy
soundwire: bus: Add internal slave ID and use for IRQs
soundwire: bus: Simplify sdw_assign_device_num()
Ido Schimmel [Wed, 4 Jun 2025 11:32:52 +0000 (14:32 +0300)]
seg6: Fix validation of nexthop addresses
The kernel currently validates that the length of the provided nexthop
address does not exceed the specified length. This can lead to the
kernel reading uninitialized memory if user space provided a shorter
length than the specified one.
Fix by validating that the provided length exactly matches the specified
one.
Fixes: d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250604113252.371528-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Jun 2025 10:58:15 +0000 (10:58 +0000)]
net: prevent a NULL deref in rtnl_create_link()
At the time rtnl_create_link() is running, dev->netdev_ops is NULL,
we must not use netdev_lock_ops() or risk a NULL deref if
CONFIG_NET_SHAPER is defined.
Jakub Kicinski [Wed, 4 Jun 2025 01:20:55 +0000 (18:20 -0700)]
selftests: drv-net: tso: make bkg() wait for socat to quit
Commit 846742f7e32f ("selftests: drv-net: add a warning for
bkg + shell + terminate") added a warning for bkg() used
with terminate=True. The tso test was missed as we didn't
have it running anywhere in NIPA. Add exit_wait=True, to avoid:
# Warning: combining shell and terminate is risky!
# SIGTERM may not reach the child on zsh/ksh!
Jakub Kicinski [Wed, 4 Jun 2025 01:20:31 +0000 (18:20 -0700)]
selftests: drv-net: tso: fix the GRE device name
The device type for IPv4 GRE is "gre" not "ipgre",
unlike for IPv6 which uses "ip6gre".
Not sure how I missed this when writing the test, perhaps
because all HW I have access to is on an IPv6-only network.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250604012031.891242-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Jun 2025 00:16:52 +0000 (17:16 -0700)]
selftests: drv-net: add configs for the TSO test
Add missing config options for the tso.py test, specifically
to make sure the kernel is built with vxlan and gre tunnels.
I noticed this while adding a TSO-capable device QEMU to the CI.
Previously we only run virtio tests and it doesn't report LSO
stats on the QEMU we have.
Jakub Kicinski [Thu, 5 Jun 2025 14:59:31 +0000 (07:59 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
iavf: get rid of the crit lock
Przemek Kitszel says:
Fix some deadlocks in iavf, and make it less error prone for the future.
Patch 1 is simple and independent from the rest.
Patches 2, 3, 4 are strictly a refactor, but it enables the last patch
to be much smaller.
(Technically Jake given his RB tags not knowing I will send it to -net).
Patch 5 just adds annotations, this also helps prove last patch to be correct.
Patch 6 removes the crit lock, with its unusual try_lock()s.
I have more refactoring for scheduling done for -next, to be sent soon.
There is a simple test:
add VF; decrease number of queueus; remove VF
that was way too hard to pass without this series :)
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: get rid of the crit lock
iavf: sprinkle netdev_assert_locked() annotations
iavf: extract iavf_watchdog_step() out of iavf_watchdog_task()
iavf: simplify watchdog_task in terms of adminq task scheduling
iavf: centralize watchdog requeueing itself
iavf: iavf_suspend(): take RTNL before netdev_lock()
====================
Mirco Barone [Thu, 5 Jun 2025 12:06:16 +0000 (14:06 +0200)]
wireguard: device: enable threaded NAPI
Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host. This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).
The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.
This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.
As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.
In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.
The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.
On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.
We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.
Paolo Abeni [Thu, 5 Jun 2025 11:37:02 +0000 (13:37 +0200)]
Merge tag 'nf-25-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Zero out the remainder in nft_pipapo AVX2 implementation, otherwise
next lookup could bogusly report a mismatch. This is followed by two
patches to update nft_pipapo selftests to cover for the previous bug.
From Florian Westphal.
2) Check for reverse tuple too in case of esoteric NAT collisions for
UDP traffic and extend selftest coverage. Also from Florian.
netfilter pull request 25-06-05
* tag 'nf-25-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
selftests: netfilter: nft_concat_range.sh: add datapath check for map fill bug
selftests: netfilter: nft_concat_range.sh: prefer per element counters for testing
netfilter: nf_set_pipapo_avx2: fix initial map fill
====================
Adding GRE tunnels to the .config for driver tests caused
some unhappiness in YNL, as it can't decode all the link
attrs on the system. Add ip6gre support to fix the tests.
This is similar to commit 6ffdbb93a59c ("netlink: specs:
rt_link: decode ip6tnl, vti and vti6 link attrs").
====================
Jakub Kicinski [Tue, 3 Jun 2025 13:53:57 +0000 (06:53 -0700)]
netlink: specs: rt-link: decode ip6gre
Driver tests now require GRE tunnels, while we don't configure
them with YNL, YNL will complain when it sees link types it
doesn't recognize. Teach it decoding ip6gre tunnels. The attrs
are largely the same as IPv4 GRE.
Correct the type of encap-limit, but note that this attr is
only used in ip6gre, so the mistake didn't matter until now.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250603135357.502626-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A number of fields in the ip tunnels are lacking the big-endian
designation. I suspect this is not intentional, as decoding
the ports with the right endian seems objectively beneficial.
Fixes: 6ffdbb93a59c ("netlink: specs: rt_link: decode ip6tnl, vti and vti6 link attrs") Fixes: 077b6022d24b ("doc/netlink/specs: Add sub-message type to rt_link family") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250603135357.502626-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Jun 2025 10:37:10 +0000 (12:37 +0200)]
Merge tag 'ovpn-net-20250603' of https://github.com/OpenVPN/ovpn-net-next
Antonio Quartulli says:
====================
In this batch you can find the following bug fixes:
Patch 1: when releasing a UDP socket we were wrongly invoking
setup_udp_tunnel_sock() with an empty config. This was not
properly shutting down the UDP encap state.
With this patch we simply undo what was done during setup.
Patch 2: ovpn was holding a reference to a 'struct socket'
without increasing its reference counter. This was intended
and worked as expected until we hit a race condition where
user space tries to close the socket while kernel space is
also releasing it. In this case the (struct socket *)->sk
member would disappear under our feet leading to a null-ptr-deref.
This patch fixes this issue by having struct ovpn_socket hold
a reference directly to the sk member while also increasing
its reference counter.
Patch 3: in case of errors along the TCP RX path (softirq)
we want to immediately delete the peer, but this operation may
sleep. With this patch we move the peer deletion to a scheduled
worker.
Patch 4 and 5 are instead fixing minor issues in the ovpn
kselftests.
* tag 'ovpn-net-20250603' of https://github.com/OpenVPN/ovpn-net-next:
selftest/net/ovpn: fix missing file
selftest/net/ovpn: fix TCP socket creation
ovpn: avoid sleep in atomic context in TCP RX error path
ovpn: ensure sk is still valid during cleanup
ovpn: properly deconfigure UDP-tunnel
====================
Daniele Palmas [Tue, 3 Jun 2025 09:12:04 +0000 (11:12 +0200)]
net: wwan: mhi_wwan_mbim: use correct mux_id for multiplexing
Recent Qualcomm chipsets like SDX72/75 require MBIM sessionId mapping
to muxId in the range (0x70-0x8F) for the PCIe tethered use.
This has been partially addressed by the referenced commit, mapping
the default data call to muxId = 112, but the multiplexed data calls
scenario was not properly considered, mapping sessionId = 1 to muxId
1, while it should have been 113.
Fix this by moving the session_id assignment logic to mhi_mbim_newlink,
in order to map sessionId = n to muxId = n + WDS_BIND_MUX_DATA_PORT_MUX_ID.
Fixes: 65bc58c3dcad ("net: wwan: mhi: make default data link id configurable") Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20250603091204.2802840-1-dnlplm@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lachlan Hodges [Tue, 3 Jun 2025 05:35:38 +0000 (15:35 +1000)]
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
S1G beacons are not traditional beacons but a type of extension frame.
Extension frames contain the frame control and duration fields, followed
by zero or more optional fields before the frame body. These optional
fields are distinct from the variable length elements.
The presence of optional fields is indicated in the frame control field.
To correctly locate the elements offset, the frame control must be parsed
to identify which optional fields are present. Currently, mac80211 parses
S1G beacons based on fixed assumptions about the frame layout, without
inspecting the frame control field. This can result in incorrect offsets
to the "variable" portion of the frame.
Properly parse S1G beacon frames by using the field lengths defined in
IEEE 802.11-2024, section 9.3.4.3, ensuring that the elements offset is
calculated accurately.
Fixes: 9eaffe5078ca ("cfg80211: convert S1G beacon to scan results") Fixes: cd418ba63f0c ("mac80211: convert S1G beacon to scan results") Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250603053538.468562-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Benjamin Berg [Thu, 5 Jun 2025 05:03:24 +0000 (07:03 +0200)]
um: fix unused variable warning
The code was updated to access the PID of the userspace stub process in
a different way, making the local cpu variable obsolete. Remove it.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506050008.AwXLNxQX-lkp@intel.com/ Fixes: 406d17c6c370 ("um: Implement kernel side of SECCOMP based process handling") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250605050325.1077208-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
RGMII ports on BCM63xx were not really working, especially with PHYs
that support EEE and are capable of configuring their own RGMII delays.
So let's make them work, and fix additional minor rgmii related issues
found while working on it.
With a BCM96328BU-P300:
Before:
[ 3.580000] b53-switch 10700000.switch GbE3 (uninitialized): validation of rgmii with support 0000000,00000000,00000000,000062ff and advertisement 0000000,00000000,00000000,000062ff failed: -EINVAL
[ 3.600000] b53-switch 10700000.switch GbE3 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.610000] b53-switch 10700000.switch GbE3 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 4
[ 3.620000] b53-switch 10700000.switch GbE1 (uninitialized): validation of rgmii with support 0000000,00000000,00000000,000062ff and advertisement 0000000,00000000,00000000,000062ff failed: -EINVAL
[ 3.640000] b53-switch 10700000.switch GbE1 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.650000] b53-switch 10700000.switch GbE1 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 5
[ 3.660000] b53-switch 10700000.switch GbE4 (uninitialized): validation of rgmii with support 0000000,00000000,00000000,000062ff and advertisement 0000000,00000000,00000000,000062ff failed: -EINVAL
[ 3.680000] b53-switch 10700000.switch GbE4 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.690000] b53-switch 10700000.switch GbE4 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 6
[ 3.700000] b53-switch 10700000.switch GbE5 (uninitialized): validation of rgmii with support 0000000,00000000,00000000,000062ff and advertisement 0000000,00000000,00000000,000062ff failed: -EINVAL
[ 3.720000] b53-switch 10700000.switch GbE5 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.730000] b53-switch 10700000.switch GbE5 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 7
Jonas Gorski [Mon, 2 Jun 2025 19:39:53 +0000 (21:39 +0200)]
net: dsa: b53: do not touch DLL_IQQD on bcm53115
According to OpenMDK, bit 2 of the RGMII register has a different
meaning for BCM53115 [1]:
"DLL_IQQD 1: In the IDDQ mode, power is down0: Normal function
mode"
Configuring RGMII delay works without setting this bit, so let's keep it
at the default. For other chips, we always set it, so not clearing it
is not an issue.
One would assume BCM53118 works the same, but OpenMDK is not quite sure
what this bit actually means [2]:
"BYPASS_IMP_2NS_DEL #1: In the IDDQ mode, power is down#0: Normal
function mode1: Bypass dll65_2ns_del IP0: Use
dll65_2ns_del IP"
Jonas Gorski [Mon, 2 Jun 2025 19:39:52 +0000 (21:39 +0200)]
net: dsa: b53: allow RGMII for bcm63xx RGMII ports
Add RGMII to supported interfaces for BCM63xx RGMII ports so they can be
actually used in RGMII mode.
Without this, phylink will fail to configure them:
[ 3.580000] b53-switch 10700000.switch GbE3 (uninitialized): validation of rgmii with support 0000000,00000000,00000000,000062ff and advertisement 0000000,00000000,00000000,000062ff failed: -EINVAL
[ 3.600000] b53-switch 10700000.switch GbE3 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.610000] b53-switch 10700000.switch GbE3 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 4
Fixes: ce3bf94871f7 ("net: dsa: b53: add support for BCM63xx RGMIIs") Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Link: https://patch.msgid.link/20250602193953.1010487-5-jonas.gorski@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:49 +0000 (21:39 +0200)]
net: dsa: b53: do not enable EEE on bcm63xx
BCM63xx internal switches do not support EEE, but provide multiple RGMII
ports where external PHYs may be connected. If one of these PHYs are EEE
capable, we may try to enable EEE for the MACs, which then hangs the
system on access of the (non-existent) EEE registers.
Fix this by checking if the switch actually supports EEE before
attempting to configure it.
Fixes: 22256b0afb12 ("net: dsa: b53: Move EEE functions to b53") Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Álvaro Fernández Rojas <noltari@gmail.com> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Link: https://patch.msgid.link/20250602193953.1010487-2-jonas.gorski@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Meghana Malladi [Tue, 3 Jun 2025 05:29:04 +0000 (10:59 +0530)]
net: ti: icssg-prueth: Fix swapped TX stats for MII interfaces.
In MII mode, Tx lines are swapped for port0 and port1, which means
Tx port0 receives data from PRU1 and the Tx port1 receives data from
PRU0. This is an expected hardware behavior and reading the Tx stats
needs to be handled accordingly in the driver. Update the driver to
read Tx stats from the PRU1 for port0 and PRU0 for port1.
Florian Westphal [Fri, 30 May 2025 10:34:03 +0000 (12:34 +0200)]
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
This will fail without the previous bug fix because we erronously
believe that the clashing entry went way.
However, the clash exists in the opposite direction due to an
existing nat mapping:
PASS: IP statless for ns2-LgTIuS
ERROR: failed to test udp ns1-x4iyOW to ns2-LgTIuS with dnat rule step 2, result: ""
This is partially adapted from test instructions from the below
ubuntu tracker.
Florian Westphal [Fri, 30 May 2025 10:34:02 +0000 (12:34 +0200)]
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
The logic added in the blamed commit was supposed to only omit nat source
port allocation if neither the existing nor the new entry are subject to
NAT.
However, its not enough to lookup the conntrack based on the proposed
tuple, we must also check the reverse direction.
Otherwise there are esoteric cases where the collision is in the reverse
direction because that colliding connection has a port rewrite, but the
new entry doesn't. In this case, we only check the new entry and then
erronously conclude that no clash exists anymore.
The existing (udp) tuple is:
a:p -> b:P, with nat translation to s:P, i.e. pure daddr rewrite,
reverse tuple in conntrack table is s:P -> a:p.
When another UDP packet is sent directly to s, i.e. a:p->s:P, this is
correctly detected as a colliding entry: tuple is taken by existing reply
tuple in reverse direction.
But the colliding conntrack is only searched for with unreversed
direction, and we can't find such entry matching a:p->s:P.
The incorrect conclusion is that the clashing entry has timed out and
that no port address translation is required.
Such conntrack will then be discarded at nf_confirm time because the
proposed reverse direction clashes with an existing mapping in the
conntrack table.
Search for the reverse tuple too, this will then check the NAT bits of
the colliding entry and triggers port reallocation.
Followp patch extends nft_nat.sh selftest to cover this scenario.
The IPS_SEQ_ADJUST change is also a bug fix:
Instead of checking for SEQ_ADJ this tested for SEEN_REPLY and ASSURED
by accident -- _BIT is only for use with the test_bit() API.
This bug has little consequence in practice, because the sequence number
adjustments are only useful for TCP which doesn't support clash resolution.
The existing test case (conntrack_reverse_clash.sh) exercise a race
condition path (parallel conntrack creation on different CPUs), so
the colliding entries have neither SEEN_REPLY nor ASSURED set.
Thanks to Yafang Shao and Shaun Brady for an initial investigation
of this bug.
Fixes: d8f84a9bc7c4 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash") Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1795 Reported-by: Yafang Shao <laoar.shao@gmail.com> Reported-by: Shaun Brady <brady.1345@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 23 May 2025 12:20:46 +0000 (14:20 +0200)]
selftests: netfilter: nft_concat_range.sh: add datapath check for map fill bug
commit 0935ee6032df ("selftests: netfilter: add test case for recent mismatch bug")
added a regression check for incorrect initial fill of the result map
that was fixed with 791a615b7ad2 ("netfilter: nf_set_pipapo: fix initial map fill").
The test used 'nft get element', i.e., control plane checks for
match/nomatch results.
The control plane however doesn't use avx2 version, so we need to
send+match packets.
As the additional packet match/nomatch is slow, don't do this for
every element added/removed: add and use maybe_send_(no)match
helpers and use them.
Florian Westphal [Fri, 23 May 2025 12:20:45 +0000 (14:20 +0200)]
selftests: netfilter: nft_concat_range.sh: prefer per element counters for testing
The selftest uses following rule:
... @test counter name "test"
Then sends a packet, then checks if the named counter did increment or
not.
This is fine for the 'no-match' test case: If anything matches the
counter increments and the test fails as expected.
But for the 'should match' test cases this isn't optimal.
Consider buggy matching, where the packet matches entry x, but it
should have matched entry y.
In that case the test would erronously pass.
Rework the selftest to use per-element counters to avoid this.
After sending packet that should have matched entry x, query the
relevant element via 'nft reset element' and check that its counter
had incremented.
The 'nomatch' case isn't altered, no entry should match so the named
counter must be 0, changing it to the per-element counter would then
pass if another entry matches.
The downside of this change is a slight increase in test run-time by
a few seconds.
Linus Torvalds [Thu, 5 Jun 2025 04:18:37 +0000 (21:18 -0700)]
Merge tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- KUnit '#[test]'s:
- Support KUnit-mapped 'assert!' macros.
The support that landed last cycle was very basic, and the
'assert!' macros panicked since they were the standard library
ones. Now, they are mapped to the KUnit ones in a similar way to
how is done for doctests, reusing the infrastructure there.
# my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251
Expected 42 == 43 to be true, but is false
# my_first_test.speed: normal
not ok 1 my_first_test
- Support tests with checked 'Result' return types.
The return value of test functions that return a 'Result' will
be checked, thus one can now easily catch errors when e.g. using
the '?' operator in tests.
With this, a failing test like:
#[test]
fn my_test() -> Result {
f()?;
Ok(())
}
will report:
# my_test: ASSERTION FAILED at rust/kernel/lib.rs:321
Expected is_test_result_ok(my_test()) to be true, but is false
# my_test.speed: normal
not ok 1 my_test
- Add 'kunit_tests' to the prelude.
- Clarify the remaining language unstable features in use.
- Compile 'core' with edition 2024 for Rust >= 1.87.
- Workaround 'bindgen' issue with forward references to 'enum' types.
- objtool: relax slice condition to cover more 'noreturn' functions.
- Use absolute paths in macros referencing 'core' and 'kernel'
crates.
- Skip '-mno-fdpic' flag for bindgen in GCC 32-bit arm builds.
- Clean some 'doc_markdown' lint hits -- we may enable it later on.
'kernel' crate:
- 'alloc' module:
- 'Box': support for type coercion, e.g. 'Box<T>' to 'Box<dyn U>'
if 'T' implements 'U'.
- 'Vec': implement new methods (prerequisites for nova-core and
binder): 'truncate', 'resize', 'clear', 'pop',
'push_within_capacity' (with new error type 'PushError'),
'drain_all', 'retain', 'remove' (with new error type
'RemoveError'), insert_within_capacity' (with new error type
'InsertError').
In addition, simplify 'push' using 'spare_capacity_mut', split
'set_len' into 'inc_len' and 'dec_len', add type invariant 'len
<= capacity' and simplify 'truncate' using 'dec_len'.
- 'time' module:
- Morph the Rust hrtimer subsystem into the Rust timekeeping
subsystem, covering delay, sleep, timekeeping, timers. This new
subsystem has all the relevant timekeeping C maintainers listed
in the entry.
- Replace 'Ktime' with 'Delta' and 'Instant' types to represent a
duration of time and a point in time.
- Temporarily add 'Ktime' to 'hrtimer' module to allow 'hrtimer'
to delay converting to 'Instant' and 'Delta'.
- 'xarray' module:
- Add a Rust abstraction for the 'xarray' data structure. This
abstraction allows Rust code to leverage the 'xarray' to store
types that implement 'ForeignOwnable'. This support is a
dependency for memory backing feature of the Rust null block
driver, which is waiting to be merged.
- Set up an entry in 'MAINTAINERS' for the XArray Rust support.
Patches will go to the new Rust XArray tree and then via the
Rust subsystem tree for now.
- Allow 'ForeignOwnable' to carry information about the pointed-to
type. This helps asserting alignment requirements for the
pointer passed to the foreign language.
- 'container_of!': retain pointer mut-ness and add a compile-time
check of the type of the first parameter ('$field_ptr').
- Support optional message in 'static_assert!'.
- Add C FFI types (e.g. 'c_int') to the prelude.
- 'str' module: simplify KUnit tests 'format!' macro, convert
'rusttest' tests into KUnit, take advantage of the '-> Result'
support in KUnit '#[test]'s.
- 'list' module: add examples for 'List', fix path of
'assert_pinned!' (so far unused macro rule).
- 'workqueue' module: remove 'HasWork::OFFSET'.
- 'page' module: add 'inline' attribute.
'macros' crate:
- 'module' macro: place 'cleanup_module()' in '.exit.text' section.
'pin-init' crate:
- Add 'Wrapper<T>' trait for creating pin-initializers for wrapper
structs with a structurally pinned value such as 'UnsafeCell<T>' or
'MaybeUninit<T>'.
- Add 'MaybeZeroable' derive macro to try to derive 'Zeroable', but
not error if not all fields implement it. This is needed to derive
'Zeroable' for all bindgen-generated structs.
- Add 'unsafe fn cast_[pin_]init()' functions to unsafely change the
initialized type of an initializer. These are utilized by the
'Wrapper<T>' implementations.
- Add support for visibility in 'Zeroable' derive macro.
- Add support for 'union's in 'Zeroable' derive macro.
- Upstream dev news: streamline CI, fix some bugs. Add new workflows
to check if the user-space version and the one in the kernel tree
have diverged. Use the issues tab [1] to track them, which should
help folks report and diagnose issues w.r.t. 'pin-init' better.
Linus Torvalds [Thu, 5 Jun 2025 02:23:37 +0000 (19:23 -0700)]
Merge tag '6.16-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server updates from Steve French:
"Four smb3 server fixes:
- Fix for special character handling when mounting with "posix"
- Fix for mounts from Mac for fs that don't provide unique inode
numbers
- Two cleanup patches (e.g. for crypto calls)"
* tag '6.16-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: allow a filename to contain special characters on SMB3.1.1 posix extension
ksmbd: provide zero as a unique ID to the Mac client
ksmbd: remove unnecessary softdep on crc32
ksmbd: use SHA-256 library API instead of crypto_shash API
Linus Torvalds [Thu, 5 Jun 2025 02:14:24 +0000 (19:14 -0700)]
Merge tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs
Pull more bcachefs updates from Kent Overstreet:
"More bcachefs updates:
- More stack usage improvements (~600 bytes)
- Define CLASS()es for some commonly used types, and convert most
rcu_read_lock() uses to the new lock guards
- New introspection:
- Superblock error counters are now available in sysfs:
previously, they were only visible with 'show-super', which
doesn't provide a live view
- New tracepoint, error_throw(), which is called any time we
return an error and start to unwind
- Repair
- check_fix_ptrs() can now repair btree node roots
- We can now repair when we've somehow ended up with the journal
using a superblock bucket
- Revert some leftovers from the aborted directory i_size feature,
and add repair code: some userspace programs (e.g. sshfs) were
getting confused
It seems in 6.15 there's a bug where i_nlink on the vfs inode has been
getting incorrectly set to 0, with some unfortunate results;
list_journal analysis showed bch2_inode_rm() being called (by
bch2_evict_inode()) when it clearly should not have been.
- bch2_inode_rm() now runs "should we be deleting this inode?" checks
that were previously only run when deleting unlinked inodes in
recovery
- check_subvol() was treating a dangling subvol (pointing to a
missing root inode) like a dangling dirent, and deleting it. This
was the really unfortunate one: check_subvol() will now recreate
the root inode if necessary
This took longer to debug than it should have, and we lost several
filesystems unnecessarily, because users have been ignoring the
release notes and blindly running 'fsck -y'. Debugging required
reconstructing what happened through analyzing the journal, when
ideally someone would have noticed 'hey, fsck is asking me if I want
to repair this: it usually doesn't, maybe I should run this in dry run
mode and check what's going on?'
As a reminder, fsck errors are being marked as autofix once we've
verified, in real world usage, that they're working correctly; blindly
running 'fsck -y' on an experimental filesystem is playing with fire
Up to this incident we've had an excellent track record of not losing
data, so let's try to learn from this one
This is a community effort, I wouldn't be able to get this done
without the help of all the people QAing and providing excellent bug
reports and feedback based on real world usage. But please don't
ignore advice and expect me to pick up the pieces
If an error isn't marked as autofix, and it /is/ happening in the
wild, that's also something I need to know about so we can check it
out and add it to the autofix list if repair looks good. I haven't
been getting those reports, and I should be; since we don't have any
sort of telemetry yet I am absolutely dependent on user reports
Now I'll be spending the weekend working on new repair code to see if
I can get a filesystem back for a user who didn't have backups"
* tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs: (69 commits)
bcachefs: add cond_resched() to handle_overwrites()
bcachefs: Make journal read log message a bit quieter
bcachefs: Fix subvol to missing root repair
bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm()
bcachefs: delete dead code from may_delete_deleted_inode()
bcachefs: Add flags to subvolume_to_text()
bcachefs: Fix oops in btree_node_seq_matches()
bcachefs: Fix dirent_casefold_mismatch repair
bcachefs: Fix bch2_fsck_rename_dirent() for casefold
bcachefs: Redo bch2_dirent_init_name()
bcachefs: Fix -Wc23-extensions in bch2_check_dirents()
bcachefs: Run check_dirents second time if required
bcachefs: Run snapshot deletion out of system_long_wq
bcachefs: Make check_key_has_snapshot safer
bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT
bcachefs: bch2_require_recovery_pass()
bcachefs: bch_err_throw()
bcachefs: Repair code for directory i_size
bcachefs: Kill un-reverted directory i_size code
bcachefs: Delete redundant fsck_err()
...
Kent Overstreet [Tue, 3 Jun 2025 13:31:58 +0000 (09:31 -0400)]
bcachefs: Make journal read log message a bit quieter
Users seem to be assuming that the 'dropped unflushed entries' message
at the end of journal read indicates some sort of problem, when it does
not - we expect there to be entries in the journal that weren't
commited, it's purely informational so that we can correlate journal
sequence numbers elsewhere when debugging.
Shorten the log message a bit to hopefully make this clearer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Jun 2025 23:48:27 +0000 (19:48 -0400)]
bcachefs: Fix subvol to missing root repair
We had a bug where the root inode of a subvolume was erronously deleted:
bch2_evict_inode() called bch2_inode_rm(), meaning the VFS inode's
i_nlink was somehow set to 0 when it shouldn't have - the inode in the
btree indicated it clearly was not unlinked.
This has been addressed with additional safety checks in
bch2_inode_rm() - pulling in the safety checks we already were doing
when deleting unlinked inodes in recovery - but the really disastrous
bug was in check_subvols(), which on finding a dangling subvol (subvol
with a missing root inode) would delete the subvolume.
I assume this bug dates from early check_directory_structure() code,
which originally handled subvolumes and normal paths - the idea being
that still live contents of the subvolume would get reattached
somewhere.
But that's incorrect, and disastrously so; deleting a subvolume triggers
deleting the snapshot ID it points to, deleting the entire contents.
The correct way to repair is to recreate the root inode if it's missing;
then any contents will get reattached under that subvolume's lost+found.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Jun 2025 13:26:20 +0000 (09:26 -0400)]
bcachefs: Fix oops in btree_node_seq_matches()
btree_update_nodes_written() needs to wait on in-flight writes to old
nodes before marking them as freed. But it has no reason to pin those
old nodes in memory, so some trickyness ensues.
The update we're completing deleted references to those nodes from the
btree, so we know if they've been evicted they can't be pulled back in.
We just have to check if the nodes we have pointers to are still those
old nodes, and haven't been reused.
To do that we check the node's "sequence number" (actually a random 64
bit cookie), but that lives in the node's data buffer. 'struct btree'
can't be freed until filesystem shutdown (as they're quite small), but
the data buffers can be freed or swapped around.
Commit 1f88c3567495, which was fixing a kmsan warning, assumed that we
could safely do this locklessly with just a READ_ONCE() - if we've got a
non-null ptr it would be safe to read from.
But that's not true if the data buffer is a vmalloc allocation, so we
need to restore the locking that commit deleted (or alternatively RCU
free those data buffers, but there's no other reason for that).
Fixes: 1f88c3567495 ("bcachefs: Fix a KMSAN splat in btree_update_nodes_written()") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 31 May 2025 04:11:52 +0000 (00:11 -0400)]
bcachefs: Fix dirent_casefold_mismatch repair
Instead of simply recreating a mis-casefolded dirent, use the str_hash
repair code, which will rename it if necessary - the dirent might have
been created again with the correct casefolding.
Factor out out bch2_str_hash_repair key() from
__bch2_str_hash_check_key() for the new path to use, and export
bch2_dirent_create_key() as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bcachefs: Fix -Wc23-extensions in bch2_check_dirents()
Clang warns (or errors with CONFIG_WERROR=y):
fs/bcachefs/fsck.c:2325:2: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
2325 | int ret = bch2_trans_run(c,
| ^
On clang-17 and older, this is an unconditional error:
fs/bcachefs/fsck.c:2325:2: error: expected expression
2325 | int ret = bch2_trans_run(c,
| ^
Move the declaration of ret to the top of the function to resolve both
ways this issue manifests.
Fixes: c72def523799 ("bcachefs: Run check_dirents second time if required") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Wed, 4 Jun 2025 18:26:17 +0000 (11:26 -0700)]
Merge tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
instead of assuming it was 1000ms (Wilfred Mallawa)
- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
devices', which broke resume from system sleep on AMD platforms and
has been fixed by other commits (Lukas Wunner)
Resource management:
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated
and unnecessary (Philipp Stanner)
- Remove pcim_iounmap_regions() and pcim_request_region_exclusive()
and related flags since all uses have been removed (Philipp
Stanner)
- Rework devres 'request' functions so they are no longer 'hybrid',
i.e., their behavior no longer depends on whether
pcim_enable_device or pci_enable_device() was used, and remove
related code (Philipp Stanner)
- Warn (not BUG()) about failure to assign optional resources (Ilpo
Järvinen)
Error handling:
- Log the DPC Error Source ID only when it's actually valid (when
ERR_FATAL or ERR_NONFATAL was received from a downstream device)
and decode into bus/device/function (Bjorn Helgaas)
- Determine AER log level once and save it so all related messages
use the same level (Karolina Stolarek)
- Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable
Errors (Karolina Stolarek)
- Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
controls on interval and burst count, to avoid flooding logs and
RCU stall warnings (Jon Pan-Doh)
Power management:
- Increment PM usage counter when probing reset methods so we don't
try to read config space of a powered-off device (Alex Williamson)
- Set all devices to D0 during enumeration to ensure ACPI opregion is
connected via _REG (Mario Limonciello)
Power control:
- Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match
the filename paths. Retain old deprecated symbols for
compatibility, except for the pwrctrl slot driver
(PCI_PWRCTRL_SLOT) (Johan Hovold)
- When unregistering pwrctrl, cancel outstanding rescan work before
cleaning up data structures to avoid use-after-free issues (Brian
Norris)
Bandwidth control:
- Simplify link bandwidth controller by replacing the count of Link
Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN
flag (Ilpo Järvinen)
- Update the Link Speed after retraining, since the Link Speed may
have changed (Ilpo Järvinen)
PCIe native device hotplug:
- Ignore Presence Detect Changed caused by DPC.
pciehp already ignores Link Down/Up events caused by DPC, but on
slots using in-band presence detect, DPC causes a spurious Presence
Detect Changed event (Lukas Wunner)
- Ignore Link Down/Up caused by Secondary Bus Reset.
On hotplug ports using in-band presence detect, the reset causes a
Presence Detect Changed event, which mistakenly caused teardown and
re-enumeration of the device. Drivers may need to annotate code
that resets their device (Lukas Wunner)
Virtualization:
- Add an ACS quirk for Loongson Root Ports that don't advertise ACS
but don't allow peer-to-peer transactions between Root Ports; the
quirk allows each Root Port to be in a separate IOMMU group (Huacai
Chen)
Endpoint framework:
- For fixed-size BARs, retain both the actual size and the possibly
larger size allocated to accommodate iATU alignment requirements
(Jerome Brunet)
- Simplify ctrl/SPAD space allocation and avoid allocating more space
than needed (Jerome Brunet)
- Correct MSI-X PBA offset calculations for DesignWare and Cadence
endpoint controllers (Niklas Cassel)
- Align the return value (number of interrupts) encoding for
pci_epc_get_msi()/pci_epc_ops::get_msi() and
pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)
- Align the nr_irqs parameter encoding for
pci_epc_set_msi()/pci_epc_ops::set_msi() and
pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)
Common host controller library:
- Convert pci-host-common to a library so platforms that don't need
native host controller drivers don't need to include these helper
functions (Manivannan Sadhasivam)
Apple PCIe controller driver:
- Extract ECAM bridge creation helper from pci_host_common_probe() to
separate driver-specific things like MSI from PCI things (Marc
Zyngier)
- Dynamically allocate RID-to_SID bitmap to prepare for SoCs with
varying capabilities (Marc Zyngier)
- Skip ports disabled in DT when setting up ports (Janne Grunau)
- Add t6020 compatible string (Alyssa Rosenzweig)
- Add T602x PCIe support (Hector Martin)
- Directly set/clear INTx mask bits because T602x dropped the
accessors that could do this without locking (Marc Zyngier)
- Move port PHY registers to their own reg items to accommodate
T602x, which moves them around; retain default offsets for existing
DTs that lack phy%d entries with the reg offsets (Hector Martin)
- Stop polling for core refclk, which doesn't work on T602x and the
bootloader has already done anyway (Hector Martin)
- Use gpiod_set_value_cansleep() when asserting PERST# in probe
because we're allowed to sleep there (Hector Martin)
Cadence PCIe controller driver:
- Drop a runtime PM 'put' to resolve a runtime atomic count underflow
(Hans Zhang)
- Make the cadence core buildable as a module (Kishon Vijay Abraham I)
- Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by
loadable drivers when they are removed (Siddharth Vadapalli)
Freescale i.MX6 PCIe controller driver:
- Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP
(Richard Zhu)
- Remove redundant dw_pcie_wait_for_link() from
imx_pcie_start_link(); since the DWC core does this, imx6 only
needs it when retraining for a faster link speed (Richard Zhu)
- Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)
- Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in
some cases, the controller can't exit 'L23 Ready' through Beacon or
PERST# deassertion (Richard Zhu)
- Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8
GT/s, causing timeouts in L1 (Richard Zhu)
- Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)
- Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)
Mobiveil PCIe controller driver:
- Return bool (not int) for link-up check in
mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans
Zhang)
NVIDIA Tegra194 PCIe controller driver:
- Create debugfs directory for 'aspm_state_cnt' only when
CONFIG_PCIEASPM is enabled, since there are no other entries (Hans
Zhang)
Qualcomm PCIe controller driver:
- Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
equalization presets (Krishna Chaitanya Chundru)
- Read Maximum Link Width from the Link Capabilities register if DT
lacks 'num-lanes' property (Krishna Chaitanya Chundru)
- Add Physical Layer 64 GT/s Capability ID and register offsets for
8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya
Chundru)
- Add generic dwc support for configuring lane equalization presets
(Krishna Chaitanya Chundru)
- Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)
Renesas R-Car PCIe controller driver:
- Describe endpoint BAR 4 as being fixed size (Jerome Brunet)
- Document how to obtain R-Car V4H (r8a779g0) controller firmware
(Yoshihiro Shimoda)
Rockchip PCIe controller driver:
- Reorder rockchip_pci_core_rsts because
reset_control_bulk_deassert() deasserts in reverse order, to fix a
link training regression (Jensen Huang)
- Mark RK3399 as being capable of raising INTx interrupts (Niklas
Cassel)
Rockchip DesignWare PCIe controller driver:
- Check only PCIE_LINKUP, not LTSSM status, to determine whether the
link is up (Shawn Lin)
- Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s
for Root Complex and Endpoint modes (Shawn Lin)
- Hide the broken ATS Capability in rockchip_pcie_ep_init() instead
of rockchip_pcie_ep_pre_init() so it stays hidden after PERST#
resets non-sticky registers (Shawn Lin)
- Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
(Diederik de Haas)
Synopsys DesignWare PCIe controller driver:
- Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training
more robust; this will not affect the intended link width if all
lanes are functional (Wenbin Yao)
- Return bool (not int) for link-up check in dw_pcie_ops.link_up()
and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay,
keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx,
tegra194, uniphier, visconti (Hans Zhang)
- Add debugfs support for exposing DWC device-specific PTM context
(Manivannan Sadhasivam)
TI J721E PCIe driver:
- Make j721e buildable as a loadable and removable module (Siddharth
Vadapalli)
- Fix j721e host/endpoint dependencies that result in link failures
in some configs (Arnd Bergmann)
Device tree bindings:
- Add qcom DT binding for 'global' interrupt (PCIe controller and
link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p,
sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan
Sadhasivam)
- Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074,
ipq8074-gen3, ipq6018 (Manivannan Sadhasivam)
- Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang)
- Correct indentation and style of examples in brcm,stb-pcie,
cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie,
microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm
(Krzysztof Kozlowski)
- Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and
armada8k from text to schema DT bindings (Rob Herring)
- Remove obsolete .txt DT bindings for content that has been moved to
schemas (Rob Herring)
- Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074
and IPQ9574 (Varadarajan Narayanan)
- Convert v3,v360epc-pci from text to DT schema binding (Rob Herring)
- Change microchip,pcie-host DT binding to be 'dma-noncoherent' since
PolarFire may be configured that way (Conor Dooley)
Miscellaneous:
- Drop 'pci' suffix from intel_mid_pci.c filename to match similar
files (Andy Shevchenko)
- All platforms with PCI have an MMU, so add PCI Kconfig dependency
on MMU to simplify build testing and avoid inadvertent build
regressions (Arnd Bergmann)
- Update Krzysztof Wilczyński's email address in MAINTAINERS
(Krzysztof Wilczyński)
- Update Manivannan Sadhasivam's email address in MAINTAINERS
(Manivannan Sadhasivam)"
* tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits)
MAINTAINERS: Update Manivannan Sadhasivam email address
PCI: j721e: Fix host/endpoint dependencies
PCI: j721e: Add support to build as a loadable module
PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup
PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup
PCI: cadence: Add support to build pcie-cadence library as a kernel module
MAINTAINERS: Update Krzysztof Wilczyński email address
PCI: Remove unnecessary linesplit in __pci_setup_bridge()
PCI: WARN (not BUG()) when we fail to assign optional resources
PCI: Remove unused pci_printk()
PCI: qcom: Replace PERST# sleep time with proper macro
PCI: dw-rockchip: Replace PERST# sleep time with proper macro
PCI: host-common: Convert to library for host controller drivers
PCI/ERR: Remove misleading TODO regarding kernel panic
PCI: cadence: Remove duplicate message code definitions
PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
PCI: cadence-ep: Correct PBA offset in .set_msix() callback
...
The regulatory domain information was initialized every time the
FW was loaded and the device was restarted. This was unnecessary
and useless as at this stage the wiphy channels information was
not setup yet so while the regulatory domain was set to the wiphy,
the channel information was not updated.
In case that a specific MCC was configured during FW initialization
then following updates with this MCC are ignored, and thus the
wiphy channels information is left with information not matching
the regulatory domain.
This commit moves the regulatory domain initialization to after the
operational firmware is started, i.e., after the wiphy channels were
configured and the regulatory information is needed.
Miri Korenblit [Wed, 4 Jun 2025 03:13:19 +0000 (06:13 +0300)]
wifi: iwlwifi: mld: avoid panic on init failure
In case of an error during init, in_hw_restart will be set, but it will
never get cleared.
Instead, we will retry to init again, and then we will act like we are in a
restart when we are actually not.
This causes (among others) to a NULL pointer dereference when canceling
rx_omi::finished_work, that was not even initialized, because we thought
that we are in hw_restart.
Set in_hw_restart to true only if the fw is running, then we know that
FW was loaded successfully and we are not going to the retry loop.
Miri Korenblit [Wed, 4 Jun 2025 03:13:18 +0000 (06:13 +0300)]
wifi: iwlwifi: mvm: fix assert on suspend
After using DEFINE_RAW_FLEX, cmd is a pointer to iwl_rxq_sync_cmd,
and not a variable containing both the command and notification.
Adjust hcmd->data and hcmd->len assignment as well.
Linus Torvalds [Wed, 4 Jun 2025 15:59:59 +0000 (08:59 -0700)]
Merge tag 'slab-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Make kvmalloc() more suitable for callers that need it to succeed,
but without unnecessary overhead by reclaim and compaction to get a
physically contiguous allocation.
Instead fall back to vmalloc() more easily by default, unless
instructed by __GFP_RETRY_MAYFAIL to prefer kmalloc() harder. This
should allow the removal of a xfs-specific workaround (Michal Hocko)
- Remove potentially excessive warnings due to memory pressure when
allocating structures for per-object allocation profiling metadata
(Usama Arif)
* tag 'slab-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm: slub: only warn once when allocating slab obj extensions fails
mm: kvmalloc: make kmalloc fast path real fast path
Linus Torvalds [Wed, 4 Jun 2025 15:57:22 +0000 (08:57 -0700)]
Merge tag 'spdx-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull LICENSES update from Greg KH:
"Here is a single patch to the LICENSES/ directory to add the CC0
license that is currently used in the kcpuid x86 tool for one of their
files.
This fixes the error that spdxcheck.py currently has with the kcpuid
file due to a missing LICENSE file for this specific license"
* tag 'spdx-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
LICENSES: add CC0-1.0 license text
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:45 +0000 (10:50 -0500)]
Merge branch 'pci/misc'
- Drop 'pci' suffix from intel_mid_pci.c filename to match similar files
(Andy Shevchenko)
- All platforms with PCI have an MMU, so add PCI Kconfig dependency on MMU
to simplify build testing and avoid inadvertent build regressions (Arnd
Bergmann)
- Update driver path in PCI NVMe function documentation (Rick Wertenbroek)
- Remove unused pci_printk() (Ilpo Järvinen)
- Warn (not BUG()) about failure to assign optional resources (Ilpo
Järvinen)
- Update Krzysztof Wilczyński's email address in MAINTAINERS (Krzysztof
Wilczyński)
- Update Manivannan Sadhasivam's email address in MAINTAINERS (Manivannan
Sadhasivam)
* pci/misc:
MAINTAINERS: Update Manivannan Sadhasivam email address
MAINTAINERS: Update Krzysztof Wilczyński email address
PCI: Remove unnecessary linesplit in __pci_setup_bridge()
PCI: WARN (not BUG()) when we fail to assign optional resources
PCI: Remove unused pci_printk()
Documentation: Fix path for NVMe PCI endpoint target driver
PCI: Add CONFIG_MMU dependency
x86/PCI: Drop 'pci' suffix from intel_mid_pci.c
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:42 +0000 (10:50 -0500)]
Merge branch 'pci/controller/qcom'
- Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
equalization presets (Krishna Chaitanya Chundru)
- Read Maximum Link Width from the Link Capabilities register if DT lacks
'num-lanes' property (Krishna Chaitanya Chundru)
- Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32,
and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru)
- Add generic dwc support for configuring lane equalization presets
(Krishna Chaitanya Chundru)
- Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)
* pci/controller/qcom:
PCI: qcom: Add support for IPQ5018
dt-bindings: PCI: qcom: Add IPQ5018 SoC
PCI: dwc: Add support for configuring lane equalization presets
PCI: Add lane equalization register offsets
PCI: dwc: Update pci->num_lanes to maximum supported link width
PCI: of: Add of_pci_get_equalization_presets() API
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:40 +0000 (10:50 -0500)]
Merge branch 'pci/controller/imx6'
- Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP (Richard
Zhu)
- Remove redundant dw_pcie_wait_for_link() from imx_pcie_start_link();
since the DWC core does this, imx6 only needs it when retraining for a
faster link speed (Richard Zhu)
- Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)
- Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in some
cases, the controller can't exit 'L23 Ready' through Beacon or PERST#
deassertion (Richard Zhu)
- Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8 GT/s,
causing timeouts in L1 (Richard Zhu)
- Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)
- Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)
* pci/controller/imx6:
PCI: imx6: Save and restore the LUT setting during suspend/resume for i.MX95 SoC
PCI: imx6: Add PLL lock check for i.MX95 SoC
PCI: imx6: Add workaround for errata ERR051586
PCI: imx6: Add workaround for errata ERR051624
PCI: imx6: Toggle the core reset for i.MX95 PCIe
PCI: imx6: Call dw_pcie_wait_for_link() from start_link() callback only when required
PCI: imx6: Skip link up workaround for newer platforms
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:39 +0000 (10:50 -0500)]
Merge branch 'pci/controller/dwc'
- Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training more
robust; this will not affect the intended link width if all lanes are
functional (Wenbin Yao)
* pci/controller/dwc:
PCI: dwc: Make link training more robust by setting PORT_LOGIC_LINK_WIDTH to one lane
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:38 +0000 (10:50 -0500)]
Merge branch 'pci/controller/dw-rockchip'
- Check only PCIE_LINKUP, not LTSSM status, to determine whether the link
is up (Shawn Lin)
- Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s for Root
Complex and Endpoint modes (Shawn Lin)
- Hide the broken ATS Capability in rockchip_pcie_ep_init() instead of
rockchip_pcie_ep_pre_init() so it stays hidden after PERST# resets
non-sticky registers (Shawn Lin)
- Organize register and bitfield definitions logically (Hans Zhang)
- Use rockchip_pcie_link_up() to check link up instead of open coding, and
use GENMASK() and FIELD_GET() when possible (Hans Zhang)
- Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
(Diederik de Haas)
- Return bool (not int) for link-up check in dw_pcie_ops.link_up() and
armada8k, dra7xx, dw-rockchip, exynos, histb, keembay, keystone, kirin,
meson, qcom, qcom-ep, rcar_gen4, spear13xx, tegra194, uniphier, visconti
(Hans Zhang)
- Return bool (not int) for link-up check in mobiveil_pab_ops.link_up() and
layerscape-gen4, mobiveil (Hans Zhang)
- Simplify j721e link-up check (Hans Zhang)
- Convert pci-host-common to a library so platforms that don't need native
host controller drivers don't need to include these helper functions
(Manivannan Sadhasivam)
* pci/controller/dw-rockchip:
PCI: qcom: Replace PERST# sleep time with proper macro
PCI: dw-rockchip: Replace PERST# sleep time with proper macro
PCI: host-common: Convert to library for host controller drivers
PCI: cadence: Simplify J721e link status check
PCI: mobiveil: Return bool from link up check
PCI: dwc: Return bool from link up check
PCI: dw-rockchip: Fix PHY function call sequence in rockchip_pcie_phy_deinit()
PCI: dw-rockchip: Use rockchip_pcie_link_up() to check link up instead of open coding
PCI: dw-rockchip: Reorganize register and bitfield definitions
PCI: dw-rockchip: Remove unused PCIE_CLIENT_GENERAL_DEBUG definition
PCI: dw-rockchip: Move rockchip_pcie_ep_hide_broken_ats_cap_rk3588() to dw_pcie_ep_ops::init()
PCI: dw-rockchip: Enable ASPM L0s capability for both RC and EP modes
PCI: dw-rockchip: Remove PCIE_L0S_ENTRY check from rockchip_pcie_link_up()
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:04 +0000 (10:50 -0500)]
Merge branch 'pci/controller/apple'
- Skip ports disabled in DT when setting up ports (Janne Grunau)
- Add t6020 compatible string (Alyssa Rosenzweig)
- Extract ECAM bridge creation helper from pci_host_common_probe() to
separate driver-specific things like MSI from PCI things (Marc Zyngier)
- Dynamically allocate RID-to_SID bitmap to prepare for SoCs with varying
capabilities (Marc Zyngier)
- Directly set/clear INTx mask bits because T602x dropped the accessors
that could do this without locking (Marc Zyngier)
- Move port PHY registers to their own reg items to accommodate T602x,
which moves them around; retain default offsets for existing DTs that
lack phy%d entries with the reg offsets (Hector Martin)
- Stop polling for core refclk, which doesn't work on T602x and the
bootloader has already done anyway (Hector Martin)
- Use gpiod_set_value_cansleep() when asserting PERST# in probe because
we're allowed to sleep there (Hector Martin)
- Move register offsets into SoC-specific structure (Hector Martin)
- Add T602x PCIe support (Hector Martin)
* pci/controller/apple:
PCI: apple: Add T602x PCIe support
PCI: apple: Abstract register offsets via a SoC-specific structure
PCI: apple: Use gpiod_set_value_cansleep in probe flow
PCI: apple: Drop poll for CORE_RC_PHYIF_STAT_REFCLK
PCI: apple: Move port PHY registers to their own reg items
PCI: apple: Fix missing OF node reference in apple_pcie_setup_port
PCI: apple: Move away from INTMSK{SET,CLR} for INTx and private interrupts
PCI: apple: Dynamically allocate RID-to_SID bitmap
PCI: apple: Move over to standalone probing
PCI: ecam: Allow cfg->priv to be pre-populated from the root port device
PCI: host-generic: Extract an ECAM bridge creation helper from pci_host_common_probe()
dt-bindings: pci: apple,pcie: Add t6020 compatible string
PCI: apple: Set only available ports up
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:03 +0000 (10:50 -0500)]
Merge branch 'pci/endpoint'
- For fixed-size BARs, retain both the actual size and the possibly larger
size allocated to accommodate iATU alignment requirements (Jerome Brunet)
- Simplify ctrl/SPAD space allocation and avoid allocating more space than
needed (Jerome Brunet)
- Correct MSI-X PBA offset calculations for DesignWare and Cadence endpoint
controllers (Niklas Cassel)
- Align the return value (number of interrupts) encoding for
pci_epc_get_msi()/pci_epc_ops::get_msi() and
pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)
- Align the nr_irqs parameter encoding for
pci_epc_set_msi()/pci_epc_ops::set_msi() and
pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)
* pci/endpoint:
PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
PCI: cadence-ep: Correct PBA offset in .set_msix() callback
PCI: dwc: ep: Correct PBA offset in .set_msix() callback
PCI: endpoint: pci-epf-vntb: Simplify ctrl/SPAD space allocation
PCI: endpoint: Retain fixed-size BAR size as well as aligned size
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:03 +0000 (10:50 -0500)]
Merge branch 'pci/virtualization'
- Add an ACS quirk for Loongson Root Ports that don't advertise ACS but
don't allow peer-to-peer transactions between Root Ports; the quirk
allows each Root Port to be in a separate IOMMU group (Huacai Chen)
* pci/virtualization:
PCI: Add ACS quirk for Loongson PCIe
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:02 +0000 (10:50 -0500)]
Merge branch 'pci/pwrctrl'
- Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the
filename paths. Retain old deprecated symbols for compatibility, except
for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold)
- When unregistering pwrctrl, cancel outstanding rescan work before
cleaning up data structures to avoid use-after-free issues (Brian Norris)
* pci/pwrctrl:
arm64: Kconfig: switch to HAVE_PWRCTRL
wifi: ath12k: switch to PCI_PWRCTRL_PWRSEQ
wifi: ath11k: switch to PCI_PWRCTRL_PWRSEQ
PCI/pwrctrl: Rename pwrctrl Kconfig symbols and slot module
PCI/pwrctrl: Cancel outstanding rescan work when unregistering
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:01 +0000 (10:50 -0500)]
Merge branch 'pci/pm'
- Add pm_runtime_put() cleanup helper for use with __free() to
automatically drop the device usage count when a pointer goes out of
scope (Alex Williamson)
- Increment PM usage counter when probing reset methods so we don't try to
read config space of a powered-off device (Alex Williamson)
- Set all devices to D0 during enumeration to ensure ACPI opregion is
connected via _REG (Mario Limonciello)
* pci/pm:
PCI: Explicitly put devices into D0 when initializing
PCI: Increment PM usage counter when probing reset methods
PM: runtime: Define pm_runtime_put cleanup helper
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:59 +0000 (10:49 -0500)]
Merge branch 'pci/hotplug'
- Ignore Presence Detect Changed caused by DPC. pciehp already ignores
Link Down/Up events caused by DPC, but on slots using in-band presence
detect, DPC causes a spurious Presence Detect Changed event (Lukas
Wunner)
- Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports
using in-band presence detect, the reset causes a Presence Detect Changed
event, which mistakenly caused teardown and re-enumeration of the device.
Drivers may need to annotate code that resets their device (Lukas Wunner)
* pci/hotplug:
PCI: hotplug: Drop superfluous #include directives
PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
PCI: pciehp: Ignore Presence Detect Changed caused by DPC
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:56 +0000 (10:49 -0500)]
Merge branch 'pci/enumeration'
- Remove pci_fixup_cardbus(), which has no users left (Heiner Kallweit)
- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
instead of assuming it was 1000ms (Wilfred Mallawa)
- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
devices', which broke resume from system sleep on AMD platforms and has
been fixed by other commits (Lukas Wunner)
- Restrict visibility of pci_dev.match_driver since it's no longer used
outside the PCI core (Lukas Wunner)
* pci/enumeration:
PCI: Limit visibility of match_driver flag to PCI core
Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices"
PCI: Print the actual delay time in pci_bridge_wait_for_secondary_bus()
PCI: Use PCI_STD_NUM_BARS instead of 6
PCI: Remove pci_fixup_cardbus()
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:50 +0000 (10:49 -0500)]
Merge branch 'pci/devres'
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and
unnecessary (Philipp Stanner)
- Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and
related flags since all uses have been removed (Philipp Stanner)
- Rework devres 'request' functions so they are no longer 'hybrid', i.e.,
their behavior no longer depends on whether pcim_enable_device or
pci_enable_device() was used, and remove related code (Philipp Stanner)
* pci/devres:
PCI: Remove function pcim_intx() prototype from pci.h
PCI: Remove hybrid-devres usage warnings from kernel-doc
PCI: Remove redundant set of request functions
PCI: Remove exclusive requests flags from _pcim_request_region()
PCI: Remove pcim_request_region_exclusive()
Documentation/driver-api: Update pcim_enable_device()
PCI: Remove hybrid devres nature from request functions
PCI: Remove pcim_iounmap_regions()
mtip32xx: Remove unnecessary pcim_iounmap_regions() calls
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:49 +0000 (10:49 -0500)]
Merge branch 'pci/bwctrl'
- Simplify link bandwidth controller by replacing the count of Link
Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag
(Ilpo Järvinen)
- Update the Link Speed after retraining, since the Link Speed may have
changed (Ilpo Järvinen)
* pci/bwctrl:
PCI: Update Link Speed after retraining
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:49 +0000 (10:49 -0500)]
Merge branch 'pci/aer'
- Initialize struct aer_err_info before using it to avoid depending on
stack garbage (Bjorn Helgaas)
- Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL
or ERR_NONFATAL was received from a downstream device) and decode into
bus/device/function (Bjorn Helgaas)
- Consolidate AER Error Source ID in one place for message consistency
(Bjorn Helgaas)
- Update statistics and emit trace events early in AER logging paths,
before any potential ratelimiting (Bjorn Helgaas)
- Determine AER log level once and save it so all related messages use the
same level (Karolina Stolarek)
- Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors.
- Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
controls on interval and burst count, to avoid flooding logs and RCU
stall warnings (Jon Pan-Doh)
* pci/aer:
PCI/ERR: Remove misleading TODO regarding kernel panic
PCI/AER: Add sysfs attributes for log ratelimits
PCI/AER: Add ratelimits to PCI AER Documentation
PCI/AER: Ratelimit correctable and non-fatal error logging
PCI/AER: Simplify add_error_device()
PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index
PCI/AER: Rename struct aer_stats to aer_info
PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNING
PCI/ERR: Add printk level to pcie_print_tlp_log()
PCI/AER: Check log level once and remember it
PCI/AER: Trace error event before ratelimiting
PCI/AER: Update statistics before ratelimiting
PCI/AER: Simplify pci_print_aer()
PCI/AER: Initialize aer_err_info before using it
PCI/AER: Move aer_print_source() earlier in file
PCI/AER: Rename aer_print_port_info() to aer_print_source()
PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc
PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()
PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()
PCI/DPC: Log Error Source ID only when valid
PCI/DPC: Initialize aer_err_info before using it
Steven Rostedt [Wed, 4 Jun 2025 12:51:21 +0000 (08:51 -0400)]
drm/ttm: Fix compile error when CONFIG_SHMEM is not set
When CONFIG_SHMEM is not set, the following compiler error occurs:
ld: vmlinux.o: in function `ttm_backup_backup_page':
(.text+0x10363bc): undefined reference to `shmem_writeout'
make[3]: *** [/work/build/trace/nobackup/linux.git/scripts/Makefile.vmlinux:91: vmlinux.unstripped] Error 1
This is due to the replacement of writepage and calling swap_writeout()
and shmem_writeout() directly. The issue is that when CONFIG_SHMEM is
not defined, shmem_writeout() is also not defined.
The function ttm_backup_backup_page() called mapping->a_ops->writepage()
which was then changed to call shmem_writeout() directly.
Even before commit 84798514db50 ("mm: Remove swap_writepage() and
shmem_writepage()"), it didn't make sense to call anything other than
shmem_writeout() as the ttm_backup deals only with shmem folios.
Have DRM_TTM config option select SHMEM to guarantee that
shmem_writeout() is available.
Link: https://lore.kernel.org/all/20250602170500.48713a2b@gandalf.local.home/ Suggested-by: Hugh Dickins <hughd@google.com> Fixes: 84798514db50 ("mm: Remove swap_writepage() and shmem_writepage()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alok Tiwari [Mon, 2 Jun 2025 10:34:29 +0000 (03:34 -0700)]
gve: add missing NULL check for gve_alloc_pending_packet() in TX DQO
gve_alloc_pending_packet() can return NULL, but gve_tx_add_skb_dqo()
did not check for this case before dereferencing the returned pointer.
Add a missing NULL check to prevent a potential NULL pointer
dereference when allocation fails.
This improves robustness in low-memory scenarios.
Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>