rtc: cmos: Enable ACPI alarm if advertised in ACPI FADT
If the ACPI_FADT_FIXED_RTC flag is unset, the platform is declaring that
it supports the ACPI RTC fixed event which should be used instead of a
dedicated CMOS RTC IRQ. However, the driver only enables it when
is_hpet_enabled() returns true, which is questionable because there is
no clear connection between enabled HPET and signaling wakeup via the
ACPI RTC fixed event (for instance, the latter can be expected to work
on systems that don't include a functional HPET).
Moreover, since use_hpet_alarm() returns false if use_acpi_alarm is set,
the ACPI RTC fixed event is effectively used instead of the HPET alarm
if the latter is functional, but there is no particular reason why it
could not be used otherwise.
Accordingly, on x86 systems with ACPI, set use_acpi_alarm if
ACPI_FADT_FIXED_RTC is unset without looking at whether or not HPET is
enabled.
Also, do the ACPI FADT check in use_acpi_alarm_quirks() before the DMI
BIOS year checks which are more expensive and it's better to skip them
if ACPI_FADT_FIXED_RTC is set.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/9618535.CDJkKcVGEf@rafael.j.wysocki
Chen-Yu Tsai [Tue, 24 Mar 2026 09:35:41 +0000 (17:35 +0800)]
PCI: mediatek-gen3: Prevent leaking IRQ domains when IRQ not found
In mtk_pcie_setup_irq(), the IRQ domains are allocated before the
controller's IRQ is fetched. If the latter fails, the function
directly returns an error, without cleaning up the allocated domains.
Hence, reverse the order so that the IRQ domains are allocated after the
controller's IRQ is found.
This was flagged by Sashiko during a review of "[PATCH v6 0/7] PCI:
mediatek-gen3: add power control support".
Merge tag 'microchip-soc-7.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/at91/linux into soc/arm
Microchip ARM64 SoC updates for v7.1
This update includes:
- use a top-level configuration flag for all Microchip platforms
* tag 'microchip-soc-7.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/at91/linux:
arm64: Kconfig: provide a top-level switch for Microchip platforms
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'input-for-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- new IDs for BETOP BTP-KP50B/C and Razer Wolverine V3 Pro added to
xpad controller driver
- another quirk for new TUXEDO InfinityBook added to i8042
- a small fixup for Synaptics RMI4 driver to properly unlock mutex when
encountering an error in F54
- an update to bcm5974 touch controller driver to reliably switch into
wellspring mode
* tag 'input-for-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - add support for BETOP BTP-KP50B/C controller's wireless mode
Input: xpad - add support for Razer Wolverine V3 Pro
Input: synaptics-rmi4 - fix a locking bug in an error path
Input: i8042 - add TUXEDO InfinityBook Max 16 Gen10 AMD to i8042 quirk table
Input: bcm5974 - recover from failed mode switch
Merge tag 'imx-dt64-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/frank.li/linux into soc/dt
Krzysztof notes:
1. This might impact users of i.MX8MM SPDIF as compatible is being
replaced.
Frank Li writes:
i.MX arm64 device tree changes for 7.1:
- New Board Support
S32N79-RDB, Variscite DART-MX95, DART-MX91 with Sonata carrier boards,
Verdin iMX95 with multiple carrier boards (Yavia, Mallow, Ivy, Dahlia)
TQMa93xx/MBa93xxLA-MINI, SolidRun i.MX8MP HummingBoard IIoT,
SolidRun i.MX8MM SOM and EVB, SolidRun SolidSense-N8 board
Ka-Ro Electronics tx8m-1610 COM, GOcontroll Moduline IV and Moduline Mini,
NXP FRDM-IMX91S board, i.MX93 Wireless EVK board with Wireless SiP,
NXP i.MX8MP audio board v2.
- USB & Type-C Support
Type-C and USB nodes for imx943, correct power-fole for
imx8qxp-mek/imx8qm-mek.
- Audio Enhancements
PDM microphone, bt-sco, and WM8962 sound card support for i.MX952. AONMIX
MQS for i.MX95. Use audio-graph-card2 for imx8dxl-evk. WM8904 audio codec
for imx8mm-var-som.
- Thermal & Cooling
PF09/53 thermal zone, fan node, active cooling on A55, SCMI
sensor/lmm/cpu for imx943/imx94.
- Display Support
Multiple LVDS and parallel display overlays for TQ boards (imx91/imx93).
Parallel display for i.MX93. ontat,kd50g21-40nt-a1 panel for
imx93-9x9-qsb. pixpaper display overlay for i.MX93 FRDM.
- Networking
Multiple queue configuration on eqos for TQMa8MPxL.
MaxLinear PHY support, MCP251xFD CAN controller for imx8mm-var-som.
SDIO WiFi support (imx91-evk, imx8mp-evk, imx943-evk)
- Bluetooth Support
imx943-evk, imx93-14x14-evk, imx95-19x19-evk, imx8mp-evk, imx8mn-evk,
imx8mm-evk.
- Miscellaneous
xspi and MT35XU01G SPI NOR flash for i.MX952.
V2X/ELE mailbox nodes, SCMI misc ctrl-ids for imx94.
eDMA channel reservation for V2X, Cortex M7 support for imx95.
Ethos-U65 NPU and SRAM nodes for imx93.
Wire up DMA IRQ for PCIe for imx8qm-ss-hsio.
- Bug Fixes & Improvements
Complete pinmux for rcwsr12 to fix I2C bus recovery affect other module
pinmux for layscape platform.
Multiple bug fixes for GPIO polarity, IRQ types, pinmux configurations.
GICv3 PPI interrupt CPU mask cleanup across multiple SoCs.
Fixed Ethernet PHY IRQ types on TQ boards.
Fixed UART RTS/CTS muxing issues.
Fixed SD card issues on Kontron boards.
Fixed touch reset configuration.
Removed fallback ethernet-phy-ieee802.3-c22 where appropriate.
Move funnel outside from soc.
TMU sensor ID cleanup.
Change usdhc tuning step for eMMC and SD.
Hexadecimal format, readability improvements, duplicate removal.
* tag 'imx-dt64-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/frank.li/linux: (139 commits)
arm64: dts: imx8qxp-mek: switch Type-C connector power-role to dual
arm64: dts: imx8qm-mek: switch Type-C connector power-role to dual
arm64: dts: lx2162a-clearfog: set sfp connector leds function and source
arm64: dts: lx2162a-sr-som: add crypto & rtc aliases, model
arm64: dts: lx2160a-cex7: add rtc alias
arm64: dts: lx2160a: complete pinmux for rcwsr12 configuration word
arm64: dts: lx2160a: change zeros to hexadecimal in pinmux nodes
arm64: dts: lx2160a: add sda gpio references for i2c bus recovery
arm64: dts: lx2160a: rename pinmux nodes for readability
arm64: dts: lx2160a: remove duplicate pinmux nodes
arm64: dts: lx2160a: change i2c0 (iic1) pinmux mask to one bit
arm64: dts: lx2160a-cex7/lx2162a-sr-som: fix usd-cd & gpio pinmux
arm64: dts: freescale: imx8mp-moduline-display-106: add typec-power-opmode property
arm64: dts: imx8mp-tqma8mpql: Add DT overlays to explicit list
arm64: dts: imx8mp-evk: Specify ADV7535 register addresses
arm64: dts: imx8dxl-evk: Use audio-graph-card2 for wm8960-2 and wm8960-3
arm64: dts: imx943-evk: Add pf09/53 thermal zone
arm64: dts: imx943-evk: Add fan node and enable active cooling on A55
arm64: dts: imx943-evk: Add nxp,ctrl-ids for scmi_misc
arm64: dts: imx943: Add thermal support
...
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'tegra-for-7.1-arm64-dt' of https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/dt
arm64: tegra: Device tree changes for v7.1-rc1
Various fixes and new additions across a number of devices. GPIO and PCI
are enabled on Tegra264 and the Jetson AGX Thor Developer Kit, allowing
it to boot via network and mass storage.
* tag 'tegra-for-7.1-arm64-dt' of https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Add Tegra264 GPIO controllers
arm64: tegra: smaug: Enable SPI-NOR flash
arm64: tegra: Add Jetson AGX Thor Developer Kit support
arm64: tegra: Add PCI controllers on Tegra264
arm64: tegra: Fix RTC aliases
arm64: tegra: Drop redundant clock and reset names for TSEC
arm64: tegra: Fix snps,blen properties
dt-bindings: pci: Document the NVIDIA Tegra264 PCIe controller
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Daniel Lezcano [Thu, 2 Apr 2026 08:44:25 +0000 (10:44 +0200)]
thermal/core: Remove pointless variable when registering a cooling device
The 'id' variable is set to store the ida_alloc() value which is
already stored into cdev->id. It is pointless to use it because
cdev->id can be used instead.
Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20260402084426.1360086-1-daniel.lezcano@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PCI: tegra194: Expose BAR2 (MSI-X) and BAR4 (DMA) as 64-bit BAR_RESERVED
Tegra Endpoint exposes three 64-bit BARs at indices 0, 2, and 4:
- BAR0+BAR1: EPF test/data (programmable 64-bit BAR)
- BAR2+BAR3: MSI-X table (hardware-backed)
- BAR4+BAR5: DMA registers (hardware-backed)
Update tegra_pcie_epc_features so that BAR2 is BAR_RESERVED with
PCI_EPC_BAR_RSVD_MSIX_TBL_RAM (64 KB) & PCI_EPC_BAR_RSVD_MSIX_PBA_RAM
(64 KB) and BAR4 is BAR_RESERVED with PCI_EPC_BAR_RSVD_DMA_CTRL_MMIO (4KB).
This keeps CONSECUTIVE_BAR_TEST working while allowing the host to use
64-bit BAR2 (MSI-X) and BAR4 (DMA).
PCI: tegra194: Make BAR0 programmable and remove 1MB size limit
The Tegra194/234 Endpoint does not support the Resizable BAR capability,
but BAR0 can be programmed to different sizes via the DBI2 BAR registers
in dw_pcie_ep_set_bar_programmable(). The BAR0 size is set once during
initialization.
Remove the fixed 1MB limit from pci_epc_features so Endpoint function
drivers can configure the BAR0 size they need.
PCI: endpoint: Add reserved region type for MSI-X Table and PBA
Add PCI_EPC_BAR_RSVD_MSIX_TBL_RAM and PCI_EPC_BAR_RSVD_MSIX_PBA_RAM to
enum pci_epc_bar_rsvd_region_type so that Endpoint controllers can
describe hardware-owned MSI-X Table and PBA (Pending Bit Array) regions
behind a BAR_RESERVED BAR.
Richard Zhu [Tue, 24 Mar 2026 02:30:32 +0000 (10:30 +0800)]
dt-bindings: PCI: imx6q-pcie: Fix maxItems of clocks and clock-names
Commit 1352f58d7c8d ("dt-bindings: PCI: pci-imx6: Add external reference
clock input") that added reference clock to the binding was incomplete.
The constraints for "clocks" and "clock-names" still enforce an incorrect
number of items. Update maxItems for both properties to 6 to match the
actual hardware configuration.
Felix Gu [Mon, 23 Mar 2026 17:57:59 +0000 (01:57 +0800)]
PCI: aspeed: Fix IRQ domain leak on platform_get_irq() failure
The aspeed_pcie_probe() function calls aspeed_pcie_init_irq_domain()
which allocates pcie->intx_domain and initializes MSI. However, if
platform_get_irq() fails afterwards, the cleanup action was not yet
registered via devm_add_action_or_reset(), causing the IRQ domain
resources to leak.
Fix this by registering the devm cleanup action immediately after
aspeed_pcie_init_irq_domain() succeeds, before calling
platform_get_irq(). This ensures proper cleanup on any subsequent
failure.
The current custom implementation of offsetof() fails UBSAN:
runtime error: member access within null pointer of type 'struct ...'
This means that all its users, including container_of(), free() and
realloc(), fail.
Use __builtin_offsetof() instead which does not have this issue and
has been available since GCC 4 and clang 3.
Remove redundant parentheses around the '&' operator to comply with
kernel style guidelines, as reported by checkpatch:
CHECK: Unnecessary parentheses around adapter->securitypriv
Documentation: fix two typos in latest update to the security report howto
In previous patch "Documentation: clarify the mandatory and desirable
info for security reports" I left two typos that I didn't detect in local
checks. One is "get_maintainers.pl" (no 's' in the script name), and the
other one is a missing closing quote after "Reported-by", which didn't
have effect here but I don't know if it can break rendering elsewhere
(e.g. on the public HTML page). Better fix it before it gets merged.
fstatat() contains two open-coded copies of makedev() to handle minor
numbers >= 256. Now that the regular makedev() handles both large minor
and major numbers correctly use the common function.
statx() returns both 32-bit minor and major numbers. For both of them to
fit into the 'dev_t' in 'struct stat', that needs to be 64 bits wide.
The other uses of 'dev_t' in nolibc are makedev() and friends and
mknod(). makedev() and friends are going to be adapted in an upcoming
commit and mknod() will silently truncate 'dev_t' to 'unsigned int' in
the kernel, similar to other libcs.
RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage
Gstage page-table helpers frequently chase gstage->kvm->arch to
fetch pgd_levels. This adds noise and repeats the same dereference
chain in hot paths.
Add pgd_levels to struct kvm_gstage and initialize it from kvm->arch
when setting up a gstage instance. Introduce kvm_riscv_gstage_init()
to centralize initialization and switch gstage code to use
gstage->pgd_levels.
Al Viro [Wed, 21 Jan 2026 23:17:12 +0000 (18:17 -0500)]
get rid of busy-waiting in shrink_dcache_tree()
If shrink_dcache_tree() runs into a potential victim that is already
dying, it must wait for that dentry to go away. To avoid busy-waiting
we need some object to wait on and a way for dentry_unlist() to see that
we need to be notified.
The obvious place for the object to wait on would be on our stack frame.
We will store a pointer to that object (struct completion_list) in victim
dentry; if there's more than one thread wanting to wait for the same
dentry to finish dying, we'll have their instances linked into a list,
with reference in dentry pointing to the head of that list.
* new object - struct completion_list. A pair of struct completion and
pointer to the next instance. That's what shrink_dcache_tree() will wait
on if needed.
* add a new member (->waiters, opaque pointer to struct completion_list)
to struct dentry. It is defined for negative live dentries that are
not in-lookup ones and it will remain NULL for almost all of them.
It does not conflict with ->d_rcu (defined for killed dentries), ->d_alias
(defined for positive dentries, all live) or ->d_in_lookup_hash (defined
for in-lookup dentries, all live negative). That allows to colocate
all four members.
* make sure that all places where dentry enters the state where ->waiters
is defined (live, negative, not-in-lookup) initialize ->waiters to NULL.
* if select_collect2() runs into a dentry that is already dying, have
its caller insert a local instance of struct completion_list into the
head of the list hanging off dentry->waiters and wait for completion.
* if dentry_unlist() sees non-NULL ->waiters, have it carefully walk
through the completion_list instances in that list, calling complete()
for each.
For now struct completion_list is local to fs/dcache.c; it's obviously
dentry-agnostic, and it can be trivially lifted into linux/completion.h
if somebody finds a reason to do so...
RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
Introduces one per-VM architecture-specific fields to support runtime
configuration of the G-stage page table format:
- kvm->arch.pgd_levels: the corresponding number of page table levels
for the selected mode.
These fields replace the previous global variables
kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
virtual machines to independently select their G-stage page table format
instead of being forced to share the maximum mode detected by the kernel
at boot time.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20260403153019.9916-2-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
Input: xpad - add support for BETOP BTP-KP50B/C controller's wireless mode
BETOP's BTP-KP50B and BTP-KP50C controller's wireless dongles are both
working as standard Xbox 360 controllers. Add USB device IDs for them to
xpad driver.
Input: xpad - add support for Razer Wolverine V3 Pro
Add device IDs for the Razer Wolverine V3 Pro controller in both
wired (0x0a57) and wireless 2.4 GHz dongle (0x0a59) modes.
The controller uses the Xbox 360 protocol (vendor-specific class,
subclass 93, protocol 1) on interface 0 with an identical 20-byte
input report layout, so no additional processing is needed.
mshv: Fix infinite fault loop on permission-denied GPA intercepts
Prevent infinite fault loops when guests access memory regions without
proper permissions. Currently, mshv_handle_gpa_intercept() attempts to
remap pages for all faults on movable memory regions, regardless of
whether the access type is permitted. When a guest writes to a read-only
region, the remap succeeds but the region remains read-only, causing
immediate re-fault and spinning the vCPU indefinitely.
Validate intercept access type against region permissions before
attempting remaps. Reject writes to non-writable regions and executes to
non-executable regions early, returning false to let the VMM handle the
intercept appropriately.
This also closes a potential DoS vector where malicious guests could
intentionally trigger these fault loops to consume host resources.
Fixes: b9a66cd5ccbb ("mshv: Add support for movable memory regions") Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
PCI: hv: Fix double ida_free in hv_pci_probe error path
If hv_pci_probe() fails after storing the domain number in
hbus->bridge->domain_nr, there is a call to free this domain_nr via
pci_bus_release_emul_domain_nr(), however, during cleanup, the bridge
release callback pci_release_host_bridge_dev() also frees the domain_nr
causing ida_free to be called on same ID twice and triggering following
warning:
ida_free called for id=28971 which is not allocated.
WARNING: lib/idr.c:594 at ida_free+0xdf/0x160, CPU#0: kworker/0:2/198
Call Trace:
pci_bus_release_emul_domain_nr+0x17/0x20
pci_release_host_bridge_dev+0x4b/0x60
device_release+0x3b/0xa0
kobject_put+0x8e/0x220
devm_pci_alloc_host_bridge_release+0xe/0x20
devres_release_all+0x9a/0xd0
device_unbind_cleanup+0x12/0xa0
really_probe+0x1c5/0x3f0
vmbus_add_channel_work+0x135/0x1a0
Fix this by letting pci core handle the free domain_nr and remove
the explicit free called in pci-hyperv driver.
Merge tag 's390-7.0-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix a memory leak in the zcrypt driver where the AP message buffer
for clear key RSA requests was allocated twice, once by the caller
and again locally, causing the first allocation to never be freed
- Fix the cpum_sf perf sampling rate overflow adjustment to clamp the
recalculated rate to the hardware maximum, preventing exceptions on
heavily loaded systems running with HZ=1000
* tag 's390-7.0-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Fix memory leak with CCA cards used as accelerator
s390/cpum_sf: Cap sampling rate to prevent lsctl exception
Merge tag 'hwmon-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix temperature sensor for PRIME X670E-PRO WIFI
- occ: Add missing newline, and fix potential division by zero
- pmbus:
- Fix device ID comparison and printing in tps53676_identify()
- Add missing MODULE_IMPORT_NS("PMBUS") for ltc4286
- Check return value of page-select write in pxe1610 probe
- Fix array access with zero-length block tps53679 read
* tag 'hwmon-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (asus-ec-sensors) Fix T_Sensor for PRIME X670E-PRO WIFI
hwmon: (occ) Fix missing newline in occ_show_extended()
hwmon: (occ) Fix division by zero in occ_show_power_1()
hwmon: (tps53679) Fix device ID comparison and printing in tps53676_identify()
hwmon: (ltc4286) Add missing MODULE_IMPORT_NS("PMBUS")
hwmon: (pxe1610) Check return value of page-select write in probe
hwmon: (tps53679) Fix array access with zero-length block read
Lucas De Marchi [Mon, 30 Mar 2026 13:13:52 +0000 (08:13 -0500)]
module: Simplify warning on positive returns from module_init()
It should now be rare to trigger this warning - it doesn't need to be so
verbose. Make it follow the usual style in the module loading code.
For the same reason, drop the dump_stack().
Suggested-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Lucas De Marchi <demarchi@kernel.org> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Lucas De Marchi [Mon, 30 Mar 2026 13:13:51 +0000 (08:13 -0500)]
module: Override -EEXIST module return
The -EEXIST errno is reserved by the module loading functionality. When
userspace calls [f]init_module(), it expects a -EEXIST to mean that the
module is already loaded in the kernel. If module_init() returns it,
that is not true anymore.
Override the error when returning to userspace: it doesn't make sense to
change potentially long error propagation call chains just because it's
will end up as the return of module_init().
Closes: https://lore.kernel.org/all/aKLzsAX14ybEjHfJ@orbyte.nwl.cc/ Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Phil Sutter <phil@nwl.cc> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Lucas De Marchi <demarchi@kernel.org>
[Sami: Fixed a typo.] Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
====================
dpll: add frequency monitoring feature
This series adds support for monitoring the measured input frequency
of DPLL input pins via the DPLL netlink interface.
Some DPLL devices can measure the actual frequency being received on
input pins. The approach mirrors the existing phase-offset-monitor
feature: a device-level attribute (DPLL_A_FREQUENCY_MONITOR) enables
or disables monitoring, and a per-pin attribute
(DPLL_A_PIN_MEASURED_FREQUENCY) exposes the measured frequency in
millihertz (mHz) when monitoring is enabled.
Patch 1 adds the new attributes to the DPLL netlink spec (dpll.yaml),
the DPLL_PIN_MEASURED_FREQUENCY_DIVIDER constant, regenerates the
auto-generated UAPI header and netlink policy, and updates
Documentation/driver-api/dpll.rst.
Patch 2 adds the callback operations (freq_monitor_get/set for
devices, measured_freq_get for pins) and the corresponding netlink
GET/SET handlers in the DPLL core. The core only invokes
measured_freq_get when the frequency monitor is enabled on the parent
device. The freq_monitor_get callback is required when measured_freq_get
is provided.
Patch 3 implements the feature in the ZL3073x driver by extracting
a common measurement latch helper from the existing FFO update path,
adding a frequency measurement function, and wiring up the new
callbacks.
====================
Ivan Vecera [Thu, 2 Apr 2026 18:40:57 +0000 (20:40 +0200)]
dpll: zl3073x: implement frequency monitoring
Extract common measurement latch logic from zl3073x_ref_ffo_update()
into a new zl3073x_ref_freq_meas_latch() helper and add
zl3073x_ref_freq_meas_update() that uses it to latch and read absolute
input reference frequencies in Hz.
Add meas_freq field to struct zl3073x_ref and the corresponding
zl3073x_ref_meas_freq_get() accessor. The measured frequencies are
updated periodically alongside the existing FFO measurements.
Add freq_monitor boolean to struct zl3073x_dpll and implement the
freq_monitor_set/get device callbacks to enable/disable frequency
monitoring via the DPLL netlink interface.
Implement measured_freq_get pin callback for input pins that returns the
measured input frequency in mHz.
Ivan Vecera [Thu, 2 Apr 2026 18:40:56 +0000 (20:40 +0200)]
dpll: add frequency monitoring callback ops
Add new callback operations for a dpll device:
- freq_monitor_get(..) - to obtain current state of frequency monitor
feature from dpll device,
- freq_monitor_set(..) - to allow feature configuration.
Add new callback operation for a dpll pin:
- measured_freq_get(..) - to obtain the measured frequency in mHz.
Obtain the feature state value using the get callback and provide it to
the user if the device driver implements callbacks. The measured_freq_get
pin callback is only invoked when the frequency monitor is enabled.
The freq_monitor_get device callback is required when measured_freq_get
is provided by the driver.
Ivan Vecera [Thu, 2 Apr 2026 18:40:55 +0000 (20:40 +0200)]
dpll: add frequency monitoring to netlink spec
Add DPLL_A_FREQUENCY_MONITOR device attribute to allow control over
the frequency monitor feature. The attribute uses the existing
dpll_feature_state enum (enable/disable) and is present in both
device-get reply and device-set request.
Add DPLL_A_PIN_MEASURED_FREQUENCY pin attribute to expose the measured
input frequency in millihertz (mHz). The attribute is present in the
pin-get reply. Add DPLL_PIN_MEASURED_FREQUENCY_DIVIDER constant to
allow userspace to extract integer and fractional parts.
The test currently allegedly makes sure that VMRUN causes a #GP in
vmcb12 GPA is valid but unmappable. However, it calls run_guest() with
an the test vmcb12 GPA, and the #GP is produced from VMLOAD, not VMRUN.
Additionally, the underlying logic just changed to match architectural
behavior, and all of VMRUN/VMLOAD/VMSAVE fail emulation if vmcb12 cannot
be mapped. The CPU still injects a #GP if the vmcb12 GPA exceeds
maxphyaddr.
Rework the test such to use the KVM_ONE_VCPU_TEST[_SUITE] harness, and
test all of VMRUN/VMLOAD/VMSAVE with both an invalid GPA (-1ULL) causing
a #GP, and a valid but unmappable GPA causing emulation failure. Execute
the instructions directly from L1 instead of run_guest() to make sure
the #GP or emulation failure is produced by the right instruction.
Leave the #VMEXIT with unmappable GPA test case as-is, but wrap it with
a test harness as well.
Opportunisitically drop gp_triggered, as the test already checks that
a #GP was injected through a SYNC. Also, use the first unmapped GPA
instead of the maximum legal GPA, as some CPUs inject a #GP for the
maximum legal GPA (likely in a reserved area).
Yosry Ahmed [Mon, 16 Mar 2026 20:27:30 +0000 (20:27 +0000)]
KVM: nSVM: Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails
KVM currently injects a #GP if mapping vmcb12 fails when emulating
VMRUN/VMLOAD/VMSAVE. This is not architectural behavior, as #GP should
only be injected if the physical address is not supported or not
aligned. Instead, handle it as an emulation failure, similar to how nVMX
handles failures to read/write guest memory in several emulation paths.
When virtual VMLOAD/VMSAVE is enabled, if vmcb12's GPA is not mapped in
the NPTs a VMEXIT(#NPF) will be generated, and KVM will install an MMIO
SPTE and emulate the instruction if there is no corresponding memslot.
x86_emulate_insn() will return EMULATION_FAILED as VMLOAD/VMSAVE are not
handled as part of the twobyte_insn cases.
Even though this will also result in an emulation failure, it will only
result in a straight return to userspace if
KVM_CAP_EXIT_ON_EMULATION_FAILURE is set. Otherwise, it would inject #UD
and only exit to userspace if not in guest mode. So the behavior is
slightly different if virtual VMLOAD/VMSAVE is enabled.
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260316202732.3164936-8-yosry@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Yosry Ahmed [Mon, 16 Mar 2026 20:27:29 +0000 (20:27 +0000)]
KVM: SVM: Treat mapping failures equally in VMLOAD/VMSAVE emulation
Currently, a #GP is only injected if kvm_vcpu_map() fails with -EINVAL.
But it could also fail with -EFAULT if creating a host mapping failed.
Inject a #GP in all cases, no reason to treat failure modes differently.
Similar to commit 01ddcdc55e09 ("KVM: nSVM: Always inject a #GP if
mapping VMCB12 fails on nested VMRUN"), treat all failures equally.
Yosry Ahmed [Mon, 16 Mar 2026 20:27:28 +0000 (20:27 +0000)]
KVM: SVM: Check EFER.SVME and CPL on #GP intercept of SVM instructions
When KVM intercepts #GP on an SVM instruction from L2, it checks the
legality of RAX, and injects a #GP if RAX is illegal, or otherwise
synthesizes a #VMEXIT to L1. However, checking EFER.SVME and CPL takes
precedence over both the RAX check and the intercept. Call
nested_svm_check_permissions() first to cover both.
Note that if #GP is intercepted on SVM instruction in L1, the intercept
handlers of VMRUN/VMLOAD/VMSAVE already perform these checks.
Note #2, if KVM does not intercept #GP, the check for EFER.SVME is not
done in the correct order, because KVM handles it by intercepting the
instructions when EFER.SVME=0 and injecting #UD. However, a #GP
injected by hardware would happen before the instruction intercept,
leading to #GP taking precedence over #UD from the guest's perspective.
Opportunistically add a FIXME for this.
Fixes: 82a11e9c6fa2 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260316202732.3164936-6-yosry@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
When #GP is intercepted by KVM, the #GP interception handler checks
whether the GPA in RAX is legal and reinjects the #GP accordingly.
Otherwise, it calls into the appropriate interception handler for
VMRUN/VMLOAD/VMSAVE. The intercept handlers do not check RAX.
However, the intercept handlers need to do the RAX check, because if the
guest has a smaller MAXPHYADDR, RAX could be legal from the hardware
perspective (i.e. CPU does not inject #GP), but not from the vCPU's
perspective. Note that with allow_smaller_maxphyaddr, both NPT and VLS
cannot be used, so VMLOAD/VMSAVE have to be intercepted, and RAX can
always be checked against the vCPU's MAXPHYADDR.
Move the check into the interception handlers for VMRUN/VMLOAD/VMSAVE as
the CPU does not check RAX before the interception. Read RAX using
kvm_register_read() to avoid a false negative on page_address_valid() on
32-bit due to garbage in the higher bits.
Keep the check in the #GP intercept handler in the nested case where
a #VMEXIT is synthesized into L1, as the RAX check is still needed there
and takes precedence over the intercept.
Opportunistically add a FIXME about the #VMEXIT being synthesized into
L1, as it needs to be conditional.
Yosry Ahmed [Mon, 16 Mar 2026 20:27:26 +0000 (20:27 +0000)]
KVM: SVM: Properly check RAX on #GP intercept of SVM instructions
When KVM intercepts #GP on an SVM instruction, it re-injects the #GP if
the instruction was executed with a mis-algined RAX. However, a #GP
should also be reinjected if RAX contains an illegal GPA, according to
the APM, one of #GP conditions is:
rAX referenced a physical address above the maximum
supported physical address.
Replace the PAGE_MASK check with page_address_valid(), which checks both
page-alignment as well as the legality of the GPA based on the vCPU's
MAXPHYADDR. Use kvm_register_read() to read RAX to so that bits 63:32 are
dropped when the vCPU is in 32-bit mode, i.e. to avoid a false positive
when checking the validity of the address.
Note that this is currently only a problem if KVM is running an L2 guest
and ends up synthesizing a #VMEXIT to L1, as the RAX check takes
precedence over the intercept. Otherwise, if KVM emulates the
instruction, kvm_vcpu_map() should fail on illegal GPAs and inject a #GP
anyway. However, following patches will change the failure behavior of
kvm_vcpu_map(), so make sure the #GP interception handler does this
appropriately.
Opportunistically drop a teaser FIXME about the SVM instructions
handling on #GP belonging in the emulator.
Fixes: 82a11e9c6fa2 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Fixes: d1cba6c92237 ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260316202732.3164936-4-yosry@kernel.org
[sean: massage wording with respect to kvm_register_read()] Signed-off-by: Sean Christopherson <seanjc@google.com>
Yosry Ahmed [Mon, 16 Mar 2026 20:27:25 +0000 (20:27 +0000)]
KVM: SVM: Refactor SVM instruction handling on #GP intercept
Instead of returning an opcode from svm_instr_opcode() and then passing
it to emulate_svm_instr(), which uses it to find the corresponding exit
code and intercept handler, return the exit code directly from
svm_instr_opcode(), and rename it to svm_get_decoded_instr_exit_code().
emulate_svm_instr() boils down to synthesizing a #VMEXIT or calling the
intercept handler, so open-code it in gp_interception(), and use
svm_invoke_exit_handler() to call the intercept handler based on
the exit code. This allows for dropping the SVM_INSTR_* enum, and the
const array mapping its values to exit codes and intercept handlers.
In gp_intercept(), handle SVM instructions and first with an early return,
and invert is_guest_mode() checks, un-indenting the rest of the code.
Yosry Ahmed [Mon, 16 Mar 2026 20:27:24 +0000 (20:27 +0000)]
KVM: SVM: Properly check RAX in the emulator for SVM instructions
Architecturally, VMRUN/VMLOAD/VMSAVE should generate a #GP if the
physical address in RAX is not supported. check_svme_pa() hardcodes this
to checking that bits 63-48 are not set. This is incorrect on HW
supporting 52 bits of physical address space. Additionally, the emulator
does not check if the address is not aligned, which should also result
in #GP.
Use page_address_valid() which properly checks alignment and the address
legality based on the guest's MAXPHYADDR. Plumb it through
x86_emulate_ops, similar to is_canonical_addr(), to avoid directly
accessing the vCPU object in emulator code.
Fixes: 01de8b09e606 ("KVM: SVM: Add intercept checks for SVM instructions") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260316202732.3164936-2-yosry@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
net: ethernet: ravb: Suspend and resume the transmission flow
The current driver does not follow the latest datasheet and does not
suspend the flow when stopping DMA and resume it when starting. Update
the driver to do so.
Jakub Kicinski [Fri, 3 Apr 2026 23:02:31 +0000 (16:02 -0700)]
Merge branch 'net-stmmac-fix-tegra234-mgbe-clock'
Jon Hunter says:
====================
net: stmmac: Fix Tegra234 MGBE clock
The name of the PTP ref clock for the Tegra234 MGBE ethernet controller
does not match the generic name in the stmmac platform driver. Despite
this basic ethernet is functional on the Tegra234 platforms that use
this driver and as far as I know, we have not tested PTP support with
this driver. Hence, the risk of breaking any functionality is low.
The previous attempt to fix this in the stmmac platform driver, by
supporting the Tegra234 PTP clock name, was rejected [0]. The preference
from the netdev maintainers is to fix this in the DT binding for
Tegra234.
This series fixes this by correcting the device-tree binding to align
with the generic name for the PTP clock. I understand that this is
breaking the ABI for this device, which we should never do, but this
is a last resort for getting this fixed. I am open to any better ideas
to fix this. Please note that we still maintain backward compatibility
in the driver to allow older device-trees to work, but we don't
advertise this via the binding, because I did not see any value in doing
so.
====================
Jon Hunter [Wed, 1 Apr 2026 10:29:40 +0000 (11:29 +0100)]
dt-bindings: net: Fix Tegra234 MGBE PTP clock
The PTP clock for the Tegra234 MGBE device is incorrectly named
'ptp-ref' and should be 'ptp_ref'. This is causing the following
warning to be observed on Tegra234 platforms that use this device:
Although this constitutes an ABI breakage in the binding for this
device, PTP support has clearly never worked and so fix this now
so we can correct the device-tree for this device. Note that the
MGBE driver still supports the legacy 'ptp-ref' clock name and so
older/existing device-trees will still work, but given that this
is not the correct name, there is no point to advertise this in the
binding.
Fixes: 189c2e5c7669 ("dt-bindings: net: Add Tegra234 MGBE") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260401102941.17466-3-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jon Hunter [Wed, 1 Apr 2026 10:29:39 +0000 (11:29 +0100)]
net: stmmac: Fix PTP ref clock for Tegra234
Since commit 030ce919e114 ("net: stmmac: make sure that ptp_rate is not
0 before configuring timestamping") was added the following error is
observed on Tegra234:
It turns out that the Tegra234 device-tree binding defines the PTP ref
clock name as 'ptp-ref' and not 'ptp_ref' and the above commit now
exposes this and that the PTP clock is not configured correctly.
In order to update device-tree to use the correct 'ptp_ref' name, update
the Tegra MGBE driver to use 'ptp_ref' by default and fallback to using
'ptp-ref' if this clock name is present.
Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260401102941.17466-2-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nfc: s3fwrn5: allocate rx skb before consuming bytes
s3fwrn82_uart_read() reports the number of accepted bytes to the serdev
core. The current code consumes bytes into recv_skb and may already
deliver a complete frame before allocating a fresh receive buffer.
If that alloc_skb() fails, the callback returns 0 even though it has
already consumed bytes, and it leaves recv_skb as NULL for the next
receive callback. That breaks the receive_buf() accounting contract and
can also lead to a NULL dereference on the next skb_put_u8().
Allocate the receive skb lazily before consuming the next byte instead.
If allocation fails, return the number of bytes already accepted.
AMD defines Extended Interrupt Local Vector Table (EILVT) registers to allow
for additional interrupt sources. While the APIC registers for those are
unique to AMD, the format of those registers follows the standard LVT
registers. Drop EILVT-specific macros in favor of the standard APIC
LVT macros.
Drop unused APIC_EILVT_NR_AMD_K8 and APIC_EILVT_LVTOFF while at it.
In configurations with multiple tunnel layers and MPLS lwtunnel routing, a
single tunnel hop can increment the counter beyond this limit. This causes
packets to be dropped with the "Dead loop on virtual device" message even
when a routing loop doesn't exist.
Increase IP_TUNNEL_RECURSION_LIMIT from 4 to 5 to handle this use-case.
====================
net: macb: Remove dedicated IRQ handler for WoL
During debugging of a suspend/resume issue, I observed that the macb driver
employs a dedicated IRQ handler for Wake-on-LAN (WoL) support. To my knowledge,
no other Ethernet driver adopts this approach. This implementation unnecessarily
complicates the suspend/resume process without providing any clear benefit.
Instead, we can easily modify the existing IRQ handler to manage WoL events,
avoiding any overhead in the TX/RX hot path.
The net throughput shows no significant difference following these changes.
The following data(net throughput and execution time of macb_interrupt) were
collected from my AMD Zynqmp board using the commands:
taskset -c 1,2,3 iperf3 -c 192.168.3.4 -t 60 -Z -P 3 -R
cat /sys/kernel/debug/tracing/trace_stat/function0
Kevin Hao [Thu, 2 Apr 2026 13:41:25 +0000 (21:41 +0800)]
net: macb: Remove dedicated IRQ handler for WoL
In the current implementation, the suspend/resume path frees the
existing IRQ handler and sets up a dedicated WoL IRQ handler, then
restores the original handler upon resume. This approach is not used
by any other Ethernet driver and unnecessarily complicates the
suspend/resume process. After adjusting the IRQ handler in the previous
patches, we can now handle WoL interrupts without introducing any
overhead in the TX/RX hot path. Therefore, the dedicated WoL IRQ
handler is removed.
I have verified WoL functionality on my AMD ZynqMP board using the
following steps:
root@amd-zynqmp:~# ifconfig end0 192.168.3.3
root@amd-zynqmp:~# ethtool -s end0 wol a
root@amd-zynqmp:~# echo mem >/sys/power/state
Kevin Hao [Thu, 2 Apr 2026 13:41:24 +0000 (21:41 +0800)]
net: macb: Factor out the handling of non-hot IRQ events into a separate function
In the current code, the IRQ handler checks each IRQ event sequentially.
Since most IRQ events are related to TX/RX operations, while other
events occur infrequently, this approach introduces unnecessary overhead
in the hot path for TX/RX processing. This patch reduces such overhead
by extracting the handling of all non-TX/RX events into a new function
and consolidating these events under a new flag. As a result, only a
single check is required to determine whether any non-TX/RX events have
occurred. If such events exist, the handler jumps to the new function.
This optimization reduces four conditional checks to one and prevents
the instruction cache from being polluted with rarely used code in the
hot path.
Kevin Hao [Thu, 2 Apr 2026 13:41:23 +0000 (21:41 +0800)]
net: macb: Introduce macb_queue_isr_clear() helper function
The current implementation includes several occurrences of the
following pattern:
if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
queue_writel(queue, ISR, value);
Introduces a helper function to consolidate these repeated code
segments. No functional changes are made.
Kevin Hao [Thu, 2 Apr 2026 13:41:22 +0000 (21:41 +0800)]
net: macb: Replace open-coded implementation with napi_schedule()
The driver currently duplicates the logic of napi_schedule() primarily
to include additional debug information. However, these debug details
are not essential for a specific driver and can be effectively obtained
through existing tracepoints in the networking core, such as
/sys/kernel/tracing/events/napi/napi_poll. Therefore, this patch
replaces the open-coded implementation with napi_schedule() to
simplify the driver's code.
Danilo Krummrich [Tue, 24 Mar 2026 00:59:14 +0000 (01:59 +0100)]
s390/ap: use generic driver_override infrastructure
When the AP masks are updated via apmask_store() or aqmask_store(),
ap_bus_revise_bindings() is called after ap_attr_mutex has been
released.
This calls __ap_revise_reserved(), which accesses the driver_override
field without holding any lock, racing against a concurrent
driver_override_store() that may free the old string, resulting in a
potential UAF.
Fix this by using the driver-core driver_override infrastructure, which
protects all accesses with an internal spinlock.
Note that unlike most other buses, the AP bus does not check
driver_override in its match() callback; the override is checked in
ap_device_probe() and __ap_revise_reserved() instead.
Also note that we do not enable the driver_override feature of struct
bus_type, as AP - in contrast to most other buses - passes "" to
sysfs_emit() when the driver_override pointer is NULL. Thus, printing
"\n" instead of "(null)\n".
Additionally, AP has a custom counter that is modified in the
corresponding custom driver_override_store().
Fixes: d38a87d7c064 ("s390/ap: Support driver_override for AP queue devices") Tested-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://patch.msgid.link/20260324005919.2408620-11-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Danilo Krummrich [Tue, 24 Mar 2026 00:59:13 +0000 (01:59 +0100)]
s390/cio: use generic driver_override infrastructure
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Danilo Krummrich [Tue, 24 Mar 2026 00:59:12 +0000 (01:59 +0100)]
vdpa: use generic driver_override infrastructure
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Yiqi Sun [Thu, 2 Apr 2026 07:04:19 +0000 (15:04 +0800)]
ipv4: icmp: fix null-ptr-deref in icmp_build_probe()
ipv6_stub->ipv6_dev_find() may return ERR_PTR(-EAFNOSUPPORT) when the
IPv6 stack is not active (CONFIG_IPV6=m and not loaded), and passing
this error pointer to dev_hold() will cause a kernel crash with
null-ptr-deref.
Instead, silently discard the request. RFC 8335 does not appear to
define a specific response for the case where an IPv6 interface
identifier is syntactically valid but the implementation cannot perform
the lookup at runtime, and silently dropping the request may safer than
misreporting "No Such Interface".
Danilo Krummrich [Tue, 24 Mar 2026 00:59:10 +0000 (01:59 +0100)]
platform/wmi: use generic driver_override infrastructure
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Danilo Krummrich [Tue, 24 Mar 2026 00:59:09 +0000 (01:59 +0100)]
PCI: use generic driver_override infrastructure
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override") Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex@shazbot.org> Tested-by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20260324005919.2408620-6-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Qingfang Deng [Thu, 2 Apr 2026 05:00:50 +0000 (13:00 +0800)]
ppp: update Kconfig help message
Both links of the PPPoE section are no longer valid, and the CVS version
is no longer relevant.
- Replace the TLDP URL with the pppd project homepage.
- Update pppd version requirement for PPPoE.
- Update RP-PPPoE project homepage, and clarify that it's only needed
for server mode.
ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop()
When querying a nexthop object via RTM_GETNEXTHOP, the kernel currently
allocates a fixed-size skb using NLMSG_GOODSIZE. While sufficient for
single nexthops and small Equal-Cost Multi-Path groups, this fixed
allocation fails for large nexthop groups like 512 nexthops.
Fix this by allocating the size dynamically using nh_nlmsg_size() and
using nlmsg_new(), this is consistent with nexthop_notify() behavior. In
addition, adjust nh_nlmsg_size_grp() so it calculates the size needed
based on flags passed. While at it, also add the size of NHA_FDB for
nexthop group size calculation as it was missing too.
This cannot be reproduced via iproute2 as the group size is currently
limited and the command fails as follows:
addattr_l ERROR: message exceeded bound of 1048
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8Li2h4KO+AQFXW4S6Yb_u5X4oSKnkywW+LPFjuErhqELA@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump
Currently NHA_HW_STATS_ENABLE is included twice everytime a dump of
nexthop group is performed with NHA_OP_FLAG_DUMP_STATS. As all the stats
querying were moved to nla_put_nh_group_stats(), leave only that
instance of the attribute querying.
Fixes: 5072ae00aea4 ("net: nexthop: Expose nexthop group HW stats to user space") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
driver core: make software nodes available earlier
Software nodes are currently initialized in a function registered as
a postcore_initcall(). However, some devices may want to register
software nodes earlier than that (or also in a postcore_initcall() where
they're at the mercy of the link order). Move the initialization to
driver_init() making swnode available much earlier as well as making
their initialization time deterministic.
Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260402-nokia770-gpio-swnodes-v5-3-d730db3dd299@oss.qualcomm.com
[ Fix typo in the commit message: "s/merci/mercy/". - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
net: qualcomm: qca_uart: report the consumed byte on RX skb allocation failure
qca_tty_receive() consumes each input byte before checking whether a
completed frame needs a fresh receive skb. When the current byte completes
a frame, the driver delivers that frame and then allocates a new skb for
the next one.
If that allocation fails, the current code returns i even though data[i]
has already been consumed and may already have completed the delivered
frame. Since serdev interprets the return value as the number of accepted
bytes, this under-reports progress by one byte and can replay the final
byte of the completed frame into a fresh parser state on the next call.
Return i + 1 in that failure path so the accepted-byte count matches the
actual receive-state progress.
Fixes: dfc768fbe618 ("net: qualcomm: add QCA7000 UART driver") Cc: stable@vger.kernel.org Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402071207.4036-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleh Konko [Thu, 2 Apr 2026 09:48:57 +0000 (09:48 +0000)]
tipc: fix bc_ackers underflow on duplicate GRP_ACK_MSG
The GRP_ACK_MSG handler in tipc_group_proto_rcv() currently decrements
bc_ackers on every inbound group ACK, even when the same member has
already acknowledged the current broadcast round.
Because bc_ackers is a u16, a duplicate ACK received after the last
legitimate ACK wraps the counter to 65535. Once wrapped,
tipc_group_bc_cong() keeps reporting congestion and later group
broadcasts on the affected socket stay blocked until the group is
recreated.
Fix this by ignoring duplicate or stale ACKs before touching bc_acked or
bc_ackers. This makes repeated GRP_ACK_MSG handling idempotent and
prevents the underflow path.
Fixes: 2f487712b893 ("tipc: guarantee that group broadcast doesn't bypass group unicast") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/41a4833f368641218e444fdcff822039.security@1seal.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rtnetlink: add missing netlink_ns_capable() check for peer netns
rtnl_newlink() lacks a CAP_NET_ADMIN capability check on the peer
network namespace when creating paired devices (veth, vxcan,
netkit). This allows an unprivileged user with a user namespace
to create interfaces in arbitrary network namespaces, including
init_net.
Add a netlink_ns_capable() check for CAP_NET_ADMIN in the peer
namespace before allowing device creation to proceed.
Fixes: 81adee47dfb6 ("net: Support specifying the network namespace upon device creation.") Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260402181432.4126920-1-nickgarlis@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
selftests: drv-net: gro: more test cases
Add a few more test cases for GRO.
First 4 patches are unchanged from v1.
Patches 5 and 6 are new. Willem pointed out that the defines are
duplicated and all these imprecise defines have been annoying me
for a while so I decided to clean them up.
With the defines cleaned up and now more precise patch 7 (was 5)
no longer has to play any games with the MTU for ip6ip6.
The last patch now sends 3 segments as requested.
====================
Jakub Kicinski [Thu, 2 Apr 2026 20:59:59 +0000 (13:59 -0700)]
selftests: drv-net: gro: test ip6ip6
We explicitly test ipip encap. Let's add ip6ip6, too. Having
just ipip seems like favoring IPv4 which we should not do :)
Testing all combinations is left for future work, not sure
it's actually worth it.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:58 +0000 (13:59 -0700)]
selftests: drv-net: gro: make large packet math more precise
When constructing the packets for large_* test cases we use
a static value for packet count and MSS. It works okay for
ipv4 vs ipv6 but the gap between ipv4 and ip6ip6 is going to
be quite significant.
Make the defines calculate the worst case values, those
are only used for sizing stack arrays. Create helpers for
calculating precise values based on the exact test case.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:57 +0000 (13:59 -0700)]
selftests: drv-net: gro: remove TOTAL_HDR_LEN
Willem points out TOTAL_HDR_LEN is identical to MAX_HDR_LEN.
This seems to have been the case ever since the test was added.
Replace the uses of TOTAL_HDR_LEN with MAX_HDR_LEN, MAX seems
more common for what this value is.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:56 +0000 (13:59 -0700)]
selftests: drv-net: gro: prepare for ip6ip6 support
Try to use already calculated offsets and not depend on the ipip
flag as much. This patch should not change any functionality,
it's just a cleanup to make ip6ip6 support easier.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:55 +0000 (13:59 -0700)]
selftests: drv-net: gro: always wait for FIN in the capacity test
The new capacity/order test exits as soon as it sees the expected
packet sequence. This may allow the "flushing" FIN packet to spill
over to the next test. Let's always wait for the FIN before exiting.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:54 +0000 (13:59 -0700)]
selftests: drv-net: gro: add 1 byte payload test
Small IPv4 packets get padded to 60B, this may break / confuse
some buggy implementations. Add a test to coalesce a 1B payload.
Keep this separate from the lrg_sml test because I suspect some
implementations may not handle this case (treat padded frames
as ineligible for coalescing).
Jakub Kicinski [Thu, 2 Apr 2026 20:59:53 +0000 (13:59 -0700)]
selftests: drv-net: gro: add data burst test case
Add a test trying to induce a GRO context timeout followed
by another sequence of packets for the same flow. The second
burst arrives 100ms after the first one so any implementation
(SW or HW) must time out waiting at that point. We expect both
bursts to be aggregated successfully but separately.
Richard Zhu [Tue, 31 Mar 2026 08:52:52 +0000 (16:52 +0800)]
PCI: imx6: Keep Root Port MSI capability with iMSI-RX to work around hardware bug
On NXP i.MX7D, i.MX8MM, and i.MX8MQ chipsets, MSIs from the endpoints won't
be received by the iMSI-RX MSI controller if the Root Port MSI capability
is disabled.
Even though the Root Port MSIs won't be received by the iMSI-RX controller
due to design, these chipsets have some weird hardware bug that prevents
the endpoint MSIs from reaching when the Root Port MSI capability is
disabled.
Hence, introduce a new flag, 'dw_pcie_rp::keep_rp_msi_en', set it for the
above mentioned SoCs, and always keep the Root Port MSI capability when
this flag is set.
Note that by keeping Root Port MSI capability, Root Port MSIs such as AER,
PME and others won't be received by default. So users need to use
workarounds such as passing 'pcie_pme=nomsi' cmdline param.
Fixes: f5cd8a929c825 ("PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller") Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: commit log] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: fix typos] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260331085252.1243108-1-hongxing.zhu@nxp.com