Antti Laakso [Thu, 19 Mar 2026 15:50:30 +0000 (17:50 +0200)]
platform/x86: int3472: Match MSI laptop board name
Ensure MSI system is correct by checking board name too.
Signed-off-by: Antti Laakso <antti.laakso@linux.intel.com> Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Jani Nikula [Fri, 8 May 2026 11:12:08 +0000 (14:12 +0300)]
Documentation/gpu: add some tables of contents to large documents
Some of the GPU documentation pages are quite long, with various levels
of details. Add document internal tables of contents to the larger
documents to make them easier to navigate.
The index.rst in the sub-directories have toctrees, which provide
similar overviews.
Fix one missing newline at the end of drm-uapi.rst while at it,
primarily because rst should have it, and secondarily because my editor
rst mode refuses to save the file without it.
Jani Nikula [Fri, 8 May 2026 11:12:07 +0000 (14:12 +0300)]
Documentation/gpu: limit main toctree depth to 2
The main GPU documentation toctree has no limit to the toctree depth,
which means the main GPU index page recursively includes all the
headings in all of GPU documentation in the single table of
contents. This makes getting any kind of overview of the documentation
really difficult.
Limit the main toctree depth to 2 i.e. show at most two levels of
headings.
Hardik Prakash [Tue, 12 May 2026 07:31:38 +0000 (13:01 +0530)]
pinctrl-amd: enable IRQ for WACF2200 touchscreen on Lenovo Yoga 7 14AGP11
On Lenovo Yoga 7 14AGP11 (83TD), the WACF2200 touchscreen controller
is wired via I2C2 (AMDI0010:02) with its interrupt on GPIO pin 157
(confirmed via ACPI _CRS GpioInt decode). After amd_gpio_irq_init()
clears all GPIO interrupts at boot, pin 157 is never re-enabled,
preventing the touchscreen from signalling the driver.
Windows keeps GPIO 157 INTERRUPT_ENABLE (bit 11) and INTERRUPT_MASK
(bit 12) set after initialisation. Add a DMI quirk to restore these
bits after amd_gpio_irq_init() on this hardware.
Add support for the EIO GPIO controller found on
xa2ve3288 silicon.
The EIO GPIO block provides access to multiplexed I/O pins exposed
through the EIO interface. Only bank 0 and bank 1 are connected to
external MIO pins, with 26 GPIOs per bank (52 GPIOs total). This
change extends the Zynq GPIO driver to support the EIO GPIO
variant.
dt-bindings: gpio: Add EIO GPIO compatible to gpio-zynq
EIO (Extended IO) GPIO is a Xilinx IP block that exposes
multiplexed I/O pins through an EIO interface.
The EIO GPIO block has 2 banks with 26 GPIOs each (52 total).
The GPIO width cannot be determined from the hardware registers,
the driver relies on the compatible string to select the correct
bank/pin configuration. A new compatible is therefore required.
The block is currently present on xa2ve3288 silicon.
The compatible string uses version 1.0 matching the IP core version.
Lin He [Sat, 9 May 2026 03:23:02 +0000 (11:23 +0800)]
drm/hisilicon/hibmc: use clock to look up the PLL value
In the past, we use width and height to look up our PLL value.
But actually the actual clock check is also necessnary. There are
some resolutions that width and height same, but its clock different.
Add the clock check when using pll_table to determine the PLL value.
Fixes: da52605eea8f ("drm/hisilicon/hibmc: Add support for display engine") Signed-off-by: Lin He <helin52@huawei.com> Signed-off-by: Yongbang Shi <shiyongbang@huawei.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260509032302.2057227-5-shiyongbang@huawei.com
Lin He [Sat, 9 May 2026 03:23:01 +0000 (11:23 +0800)]
drm/hisilicon/hibmc: move display contrl config to hibmc_probe()
If there's no VGA output, this encoder modeset won't be called, which
will cause displaying data from GPU being cut off. It's actually a
common display config for DP and VGA, so move the vdac encoder modeset
to driver load stage.
Removed invalid bit configurations from `hibmc_display_ctrl`
Fixes: 5294967f4ae4 ("drm/hisilicon/hibmc: Add support for VDAC") Signed-off-by: Lin He <helin52@huawei.com> Signed-off-by: Yongbang Shi <shiyongbang@huawei.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260509032302.2057227-4-shiyongbang@huawei.com
Lin He [Sat, 9 May 2026 03:23:00 +0000 (11:23 +0800)]
drm/hisilicon/hibmc: fix no showing when no connectors connected
Our chip support KVM over IP feature, so hibmc driver need to support
displaying without any connectors plugged in. If no connectors are
connected, the vdac connector status should be set to 'connected' to
ensure proper KVM display functionality. Additionally, for
previous-generation products that may lack hardware link support and
thus cannot detect the monitor, the same approach should be applied
to ensure VGA display functionality.
* Add phys_state in the struct of dp and vdac to check physical outputs.
* The 'epoch_counter' of the vdac connector is incremented when the
physical status changes.
For get_modes: using BMC modes for connector if no display is attached to
phys VGA cable, otherwise use EDID modes by drm_connector_helper_get_modes,
because KVM doesn't provide EDID reads.
The polling mechanism for the KMS helper is enabled.
Fixes: 4c962bc929f1 ("drm/hisilicon/hibmc: Add vga connector detect functions") Reported-by: Thomas Zimmermann <tzimmermann@suse.de> Closes: https://lore.kernel.org/all/0eb5c509-2724-4c57-87ad-74e4270d5a5a@suse.de/ Signed-off-by: Lin He <helin52@huawei.com> Signed-off-by: Yongbang Shi <shiyongbang@huawei.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260509032302.2057227-3-shiyongbang@huawei.com
Lin He [Sat, 9 May 2026 03:22:59 +0000 (11:22 +0800)]
drm/hisilicon/hibmc: add updating link cap in DP detect()
In the past, the link cap is updated in link training at encoder enable
stage, but the hibmc_dp_mode_valid() is called before it, which will use
DP link's rate and lanes. So add the hibmc_dp_update_caps() in
hibmc_dp_update_caps() to avoid some potential risks.
Fixes: 607805abfb74 ("drm/hisilicon/hibmc: add dp mode valid check") Signed-off-by: Lin He <helin52@huawei.com> Signed-off-by: Yongbang Shi <shiyongbang@huawei.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260509032302.2057227-2-shiyongbang@huawei.com
drm/xe: Refactor emit_xy_fast_copy and emit_mem_copy functions
To perform copy, based on whether the platform supports service copy
engines, either MEM_COPY or XY_FAST_COPY_BLT instruction is used.
Length of both the instructions is same today and so they use a common
define EMIT_COPY_DW.
This is not true for the future platforms. Implement separate functions
which return the length of the instruction to help in preparing for it.
Implement a function to return the length of the MEM_SET instruction.
This is to prepare for future platforms where the length of MEM_SET
instruction is expected to change.
Implement a function which returns the length of XY_FAST_COLOR_BLT
instruction instead of hardcoding it inside the emit_clear_main_copy.
In future platforms, the length of this instruction is expected to
change and this patch helps in preparing for it.
Shekhar Chauhan [Tue, 12 May 2026 05:55:08 +0000 (11:25 +0530)]
drm/xe/devcoredump: Drop a FIXME in devcoredump
The FIXME says that xe_engine_snapshot_print.. is accessing persistent
driver data, unlike what the FIXME says that it does. Drop the FIXME
since the current code is not going to access the hardware while
dumping.
More details about this patch:
https://patchwork.freedesktop.org/patch/703884/?series=161407&rev=1
The starting two feedbacks make sense and the original patch is wrong
in adding those changes, but the last feedback is the one which
highlights the point.
stddef: Document designated initializer semantics for __TRAILING_OVERLAP()
Document the designated initializer behavior for overlapping storage
between NAME and MEMBERS, and clarify the implications for static
initialization to help avoid unintended overwrites.
Ping-Ke Shih [Wed, 6 May 2026 13:10:00 +0000 (21:10 +0800)]
wifi: rtw89: check skb headroom before adding radiotap
The radiotap headroom is allocated only if IEEE80211_CONF_MONITOR is set.
However, it is potentially racing that SKB allocation without radiotap
headroom but adding radiotap from matched PPDU status of another SKB.
Add a check to avoid the case.
Ping-Ke Shih [Wed, 6 May 2026 13:09:59 +0000 (21:09 +0800)]
wifi: rtw89: phy: support PHY status IE-09 GEN2 for RTL8922D
The format of PHY status IE-10 for RTL8922D is different from earlier
chips. Fortunately only starting bit is different, but the layout is the
same. Get the VHT/HE SIG-A value by corresponding mask accordingly.
The IE-09 format of generation 0 and 1 are totally the same.
Ping-Ke Shih [Wed, 6 May 2026 13:09:58 +0000 (21:09 +0800)]
wifi: rtw89: phy: skip trailing 8-byte zeros of PHY status IE for RTL8922D
Hardware reports a list of PHY status IE. In monitor mode, IE-09 of
PHY status is enabled, and the report contains trailing 8-byte zeros,
causing failed to parse and drop all IE information.
The 8 zeros are recognize as IE type 0, but length of type 0 must be
not 8 (reference to rtw89_phy_gen_def::physt_ie_len[0]).
Check and skip them.
Ping-Ke Shih [Wed, 6 May 2026 13:09:54 +0000 (21:09 +0800)]
wifi: rtw89: fill HE-SU/HE-TB/HE-MU/HE-EXT_SU radiotap
Fill HE radiotap by PHY status IE-09/IE-10 which contains HE SIG-A/SIG-B
respectively.
The IE-10 may contain two content channels (if bandwidth is larger than
40MHz), and starting address of second content channel can be calculated by
length of first content channel containing up to 15 user fields with 8-byte
alignment.
Ping-Ke Shih [Wed, 6 May 2026 13:09:51 +0000 (21:09 +0800)]
wifi: rtw89: phy: enable IE-09/IE-10 PHY status report for monitor mode
The IE-09/IE-10 of PHY status contain SIG-A/SIG-B respectively, so enable
them in monitor mode to have rich information. If the parser detects
length invalid, ignore to reference IE-09/IE-10 to prevent accessing out
of range.
The RTL8922D is generation 2 of PHY status, which doesn't report SIG-B by
IE-10, so not enable it.
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
wilco_ec event driver.
Fixes: 27d58498f690 ("platform/chrome: wilco_ec: event: Convert to a platform driver") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/2076666.usQuhbGJ8B@rafael.j.wysocki Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
chromeos_tbmc driver.
Fixes: a2676ead257f ("platform/chrome: chromeos_tbmc: Convert to a platform driver") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/1875121.VLH7GnMWUR@rafael.j.wysocki Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
chromeos_privacy_screen driver.
Fixes: d3c2872ae323 ("platform/chrome: Convert ChromeOS privacy-screen driver to platform") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/3357444.5fSG56mABF@rafael.j.wysocki Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
====================
dpll: rework fractional frequency offset reporting
Rework how the fractional frequency offset (FFO) is reported in
the DPLL subsystem.
Both fractional-frequency-offset (PPM) and
fractional-frequency-offset-ppt (PPT) attributes are now present at
the top level of a pin and inside each pin-parent-device nest. They
carry the same measurement at different precisions.
Introduce enum dpll_ffo_type and struct dpll_ffo_param to distinguish
FFO contexts: DPLL_FFO_PORT_RXTX_RATE for the RX vs TX symbol rate
offset at the top level, and DPLL_FFO_PIN_DEVICE for the pin vs
parent DPLL offset in the nest. Drivers declare which types they
support via the supported_ffo bitmask in dpll_pin_ops; the core only
calls ffo_get for opted-in types.
Patch 1 adds the type-safe FFO API, updates the YAML spec, netlink
handling, and documentation, and converts mlx5 and zl3073x drivers.
Patch 2 implements the nested FFO for zl3073x using the
dpll_df_offset_x register with ref_ofst=1, providing 2^-48
resolution. The old per-reference frequency measurement is removed
as it was redundant with measured-frequency.
====================
Ivan Vecera [Mon, 11 May 2026 15:58:16 +0000 (17:58 +0200)]
dpll: zl3073x: report FFO as DPLL vs input reference offset
Replace the per-reference frequency offset measurement (which was
redundant with measured-frequency) with a direct read of the DPLL's
delta frequency offset vs its tracked input reference.
The new implementation uses the dpll_df_offset_x register with
ref_ofst=1 via the dpll_df_read_x semaphore mechanism. This
provides 2^-48 resolution (~3.5 fE) and reports the actual
frequency difference between the DPLL and its active input.
Switch supported_ffo from DPLL_FFO_PORT_RXTX_RATE to
DPLL_FFO_PIN_DEVICE so FFO is reported only in the per-parent
context for the active input pin.
Use atomic64_t for freq_offset to prevent torn reads on 32-bit
architectures between the periodic worker and netlink callbacks.
Rewrite ffo_check to compare the cached df_offset converted to PPT
instead of using the old per-reference measurement. Remove the
ref_ffo_update periodic measurement and the ref ffo field since
they are no longer needed.
Changes v3 -> v4:
- Switch to DPLL_FFO_PIN_DEVICE, remove dpll=NULL guard
- Use atomic64_t for freq_offset (torn read on 32-bit)
Ivan Vecera [Mon, 11 May 2026 15:58:15 +0000 (17:58 +0200)]
dpll: add fractional frequency offset to pin-parent-device
Add both fractional-frequency-offset (PPM) and
fractional-frequency-offset-ppt (PPT) attributes to the
pin-parent-device nested attribute set, alongside the existing
top-level pin attributes. Both carry the same measurement at
different precisions.
Introduce enum dpll_ffo_type and struct dpll_ffo_param to
distinguish FFO contexts: DPLL_FFO_PORT_RXTX_RATE for the RX vs
TX symbol rate offset reported at the top level, and
DPLL_FFO_PIN_DEVICE for the pin vs parent DPLL offset reported
in the pin-parent-device nest.
Add a supported_ffo bitmask to struct dpll_pin_ops so drivers
declare which FFO types they support. The core only calls ffo_get
for types the driver has opted into, eliminating the need for
per-driver NULL pointer guards. Validate at pin registration time
that supported_ffo is not set without an ffo_get callback.
Update mlx5 (DPLL_FFO_PORT_RXTX_RATE) and zl3073x
(DPLL_FFO_PORT_RXTX_RATE) drivers to use the new API.
Add documentation for both FFO types to dpll.rst.
Changes v3 -> v4:
- Replace dpll=NULL overloading with enum dpll_ffo_type and
struct dpll_ffo_param (Jakub Kicinski)
- Add supported_ffo opt-in bitmask in dpll_pin_ops for fail-close
driver validation (Jakub Kicinski)
- Add WARN_ON in dpll_pin_register for supported_ffo without
ffo_get callback
net: ethernet: ravb: Do not check URAM suspension when WoL is active
When updating the driver to match latest datasheet to suspend access to
URAM when suspending DMA transfers a corner-case was missed, URAM access
will not be suspended if WoL is enabled. This lead to the error message
(correctly) being triggered as URAM access is not suspended even tho
it's requested as part of stopping DMA.
Avoid checking if URAM access is suspended and printing the error
message if WoL is enabled when we suspend the system, as we know it will
not be.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdWnjV%3DHGE1o08zLhUfTgOSene5fYx1J5GG10mB%2BToq8qg@mail.gmail.com/ Fixes: 353d8e7989b6 ("net: ethernet: ravb: Suspend and resume the transmission flow") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chenguang Zhao [Mon, 11 May 2026 01:43:43 +0000 (09:43 +0800)]
ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
ethnl_bitmap32_not_zero() should return true if some bit in [start, end)
is set:
- Fix inverted memchr_inv() sense: return true when the scan finds a
non-zero byte, not when the middle words are all zero.
- Return false for an empty interval (end <= start).
- When end is 32-bit aligned, indices in [start, end) do not include any
bits from map[end_word]; return false after earlier checks found no
non-zero data.
Xiang Mei [Sun, 10 May 2026 22:26:40 +0000 (15:26 -0700)]
net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
The smc_msg_event tracepoint class, shared by smc_tx_sendmsg and
smc_rx_recvmsg, unconditionally dereferences smc->conn.lnk:
__string(name, smc->conn.lnk->ibname)
conn->lnk is only set for SMC-R; for SMC-D it is NULL. Other code on
these paths already handles this (e.g. !conn->lnk in
SMC_STAT_RMB_TX_SIZE_SMALL()). With the tracepoint enabled, the first
sendmsg()/recvmsg() on an SMC-D socket crashes:
Oops: general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [...]
RIP: 0010:strlen+0x1e/0xa0
Call Trace:
trace_event_raw_event_smc_msg_event (net/smc/smc_tracepoint.h:44)
smc_rx_recvmsg (net/smc/smc_rx.c:515)
smc_recvmsg (net/smc/af_smc.c:2859)
__sys_recvfrom (net/socket.c:2315)
__x64_sys_recvfrom (net/socket.c:2326)
do_syscall_64
The faulting address 0x3e0 is offsetof(struct smc_link, ibname),
confirming the NULL ->lnk deref. Enabling the tracepoint requires
root, but the trigger itself is unprivileged: socket(AF_SMC, ...) has
no capability check, and SMC-D negotiation needs no admin step on
s390 or on x86 with the loopback ISM device loaded.
Log an empty device name for SMC-D instead of dereferencing NULL.
Fixes: aff3083f10bf ("net/smc: Introduce tracepoints for tx and rx msg") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nicolò Coccia [Sun, 10 May 2026 16:34:13 +0000 (12:34 -0400)]
net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
A logic flaw in __smc_setsockopt() allows a local unprivileged user to
cause a Denial of Service (DoS) by holding the socket lock indefinitely.
The function __smc_setsockopt() calls copy_from_sockptr() while holding
lock_sock(sk). By passing a userfaultfd-monitored memory page (or
FUSE-backed memory on systems where unprivileged userfaultfd is disabled)
as the optval, an attacker can halt execution during the copy operation,
keeping the lock held.
Combined with asynchronous tear-down operations like shutdown(), this
exhausts the kernel wq (kworkers) and triggers the hung task watchdog.
[ 240.123456] INFO: task kworker/u8:2 blocked for more than 120 seconds.
[ 240.123489] Call Trace:
[ 240.123501] smc_shutdown+...
[ 240.123512] lock_sock_nested+...
This patch moves the user-space copy outside the lock_sock() critical
section to prevent the issue.
Fixes: a6a6fe27bab4 ("net/smc: Dynamic control handshake limitation by socket options") Signed-off-by: Nicolò Coccia <n.coccia96@gmail.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: dsa: realtek: rtl8365mb: add support for RTL8367SB
Add chip info entry for the Realtek RTL8367SB switch. This device has
chip ID 0x6367 and version 0x0010. It exposes two external interfaces:
port 6 supports MII, TMII, RMII, RGMII, SGMII and HSGMII, while port 7
supports MII, TMII, RMII and RGMII. Use the existing 8365MB-VC jam table
for initialization.
net: Consistently define pci_device_ids using named initializers
... and PCI device helpers.
The various struct pci_device_id arrays were initialized mostly by one
the PCI_DEVICE macros and then list expressions. The latter isn't easily
readable if you're not into PCI. Using named initializers is more
explicit and thus easier to parse.
Also use PCI_DEVICE* helper macros to assign .vendor, .device,
.subvendor and .subdevice where appropriate and skip explicit
assignments of 0 (which the compiler takes care of).
The secret plan is to make struct pci_device_id::driver_data an
anonymous union (similar to
https://lore.kernel.org/all/cover.1776579304.git.u.kleine-koenig@baylibre.com/)
and that requires named initializers. But it's also a nice cleanup on
its own.
This change doesn't introduce changes to the compiled pci_device_id
arrays. Tested on x86 and arm64.
Reviewed-by: Jijie Shao <shaojijie@huawei.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw Acked-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Forwarded: id:76da4f44d48bdde84580963862bf9616bee5c9e9.1778149923.git.u.kleine-koenig@baylibre.com (v2) Reviewed-by: Michael Grzeschik <mgr@kernel.org> Link: https://patch.msgid.link/20260511090023.1634387-6-u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: nfp: Drop PCI class entries with .class_mask = 0
With .class_mask being zero the value of .class doesn't matter because
to check if a pci_device_id entry matches a given device the expression
(id->class ^ dev->class) & id->class_mask
is checked for being zero (see pci_match_one_device()). So drop the
useless and irritating assignment for .class to match what (I think) all
other drivers are doing that don't need to match on .class, i.e. set
both members to zero.
Wei Yang [Sat, 9 May 2026 12:23:58 +0000 (20:23 +0800)]
net: atm: fix skb leak in sigd_send() default branch
The default branch in sigd_send() calls sock_put() and returns -EINVAL
without freeing the skb, while all other exit paths do so. Add the
missing dev_kfree_skb() before sock_put() to fix the leak.
net: phy: intel-xway: add PHY-level statistics via ethtool
Report PCS receive error counts for all supported PEF 7061, 7071, 7072 and
xRX200 PHYs.
Accumulate the vendor-specific PHY_ERRCNT read-clear counter
(SEL=RXERR) in .update_stats() and expose it as both IEEE 802.3
SymbolErrorDuringCarrier and generic rx_errors via
.get_phy_stats().
phy_remove() clears phydev->drv but doesn't call phy_detach(), so the
phy_device stays in the link topology xarray and ethnl_req_get_phydev()
still hands it back. ETHTOOL_MSG_PHY_GET then oopses on:
drvname is already treated as optional by phy_reply_size(),
phy_fill_reply() and phy_cleanup_data(), so just skip the allocation
when there is no driver bound.
Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Cc: stable@vger.kernel.org # 6.13.x Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260509215046.107157-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Mon, 11 May 2026 22:19:20 +0000 (18:19 -0400)]
selftests: drv-net: cope with slow env in so_txtime.py test
This test was converted from shell script to drv-net test.
The new version is flaky in dbg builds on the netdev.bots dashboard.
The previous shell script had more protections to avoid these. Added
in commit a7ee79b9c455 ("selftests: net: cope with slow env in
so_txtime.sh test").
Add the same overall protection:
- Suppress so_txtime process failure if KSFT_MACHINE_SLOW
Also relax two timeouts to reduce the number of process failures
themselves
- Increase SO_RCVTIMEO to 2 seconds
- Increase process start-up stabilization to 2 seconds
Delays were experimentally arrived at while running with vng
built with kernel/configs/debug.config
Fixes: 5c6baef3885c ("selftests: drv-net: convert so_txtime to drv-net") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20260510174219.74aeee6d@kernel.org/ Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260511222138.2045551-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Danilo Krummrich [Tue, 12 May 2026 23:49:18 +0000 (01:49 +0200)]
Merge patch series "rust: auxiliary: replace drvdata() with registration data"
Danilo Krummrich <dakr@kernel.org> says:
When drvdata() was introduced in commit 6f61a2637abe ("rust: device: introduce
Device::drvdata()"), its commit message already noted that a direct accessor to
the driver's bus device private data is not commonly required -- bus callbacks
provide access through &self, and other entry points (IRQs, workqueues, IOCTLs,
etc.) carry their own private data.
The sole motivation for drvdata() was inter-driver interaction, e.g. a parent
driver deriving its bus device private data from the child driver via the
auxiliary bus.
However, drvdata() exposes the driver's bus device private data beyond the
driver's own scope. This creates ordering constraints -- drvdata may not be set
yet when the first caller of drvdata() can appear -- and forces the driver's bus
device private data to outlive all registrations that access it; a requirement
that causes unnecessary complications.
Private data should be private to the entity that issues it; bus device private
data belongs to bus callbacks, class device private data to class callbacks, IRQ
private data to the IRQ handler, etc.
This series replaces drvdata() with a dedicated registration_data pointer on
struct auxiliary_device. The parent stores its private data explicitly during
registration; the data is private to the registration and lives as long as the
Registration object.
On teardown, Registration::drop() first triggers auxiliary_device_delete()
(unbinding the child), then frees the registration data. Ordering constraints
are structural -- the child's lifecycle is scoped to the registration by
construction, not by convention.
With no remaining use case for drvdata(), drvdata(), match_type_id(),
set_type_id() and struct driver_type are removed.
This is a prerequisite for [1], which builds on the removal of drvdata() to
enable Higher-Ranked Lifetime Types (HRT) for Rust device drivers.
Zoran Ilievski [Mon, 11 May 2026 06:40:02 +0000 (08:40 +0200)]
net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
The shutdown handler aq_pci_shutdown() unconditionally calls
pci_wake_from_d3(pdev, false), clearing the PCI PME_En bit even when
wake-on-LAN has been configured. While aq_nic_shutdown() correctly
programs the NIC firmware via aq_nic_set_power() to listen for magic
packets, the PCI subsystem will not propagate the resulting PME wake
event from D3, so the system never wakes after poweroff.
WOL from suspend (S3) is unaffected because aq_suspend_common() does
not touch pci_wake_from_d3() and relies on the PM core's wake
configuration via device_may_wakeup().
This affects all atlantic-supported NICs (AQC107/108/111/112/113);
users have reported that WOL works if the atlantic driver is never
loaded, but breaks once it has run its shutdown path.
Pass the configured WOL state to pci_wake_from_d3() instead of a
literal false, so the PCI PME_En bit is preserved when the user has
armed WOL via ethtool.
Ashutosh Dixit [Thu, 30 Apr 2026 16:14:58 +0000 (09:14 -0700)]
drm/xe/oa: Add val arg to xe_oa_is_valid_config_reg
Add val arg to xe_oa_is_valid_config_reg so that register values can also
be verified, in addition to register address. Value verification is needed
to implement MERTOA Wa_14026779378.
Tejun Heo [Mon, 11 May 2026 23:18:23 +0000 (13:18 -1000)]
sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
scx_sub_enable_workfn() pins parent->kobj before dropping scx_sched_lock,
but that does not pin parent->sub_kset. Concurrent disable can
kset_unregister and free sub_kset before scx_alloc_and_add_sched()
dereferences it.
Split sub_kset teardown: kobject_del() at disable keeps sysfs removal; defer
kobject_put() to scx_sched_free_rcu_work so the memory survives. A racing
child sees state_in_sysfs=0 with valid memory, sysfs_create_dir() fails, and
the existing exit_kind gate in scx_link_sched() turns it away with -ENOENT.
Tejun Heo [Mon, 11 May 2026 23:18:19 +0000 (13:18 -1000)]
sched_ext: INIT_LIST_HEAD() &sch->all in scx_alloc_and_add_sched()
On scx_link_sched() error paths (parent disabled, hash insert failure),
&sch->all is never added to scx_sched_all. The cleanup path runs
scx_unlink_sched() unconditionally, which calls list_del_rcu(&sch->all) on a
list_head that was never initialized triggering a corruption warning.
Initialize &sch->all.
Fixes: 54be8de4236a ("sched_ext: Factor out scx_link_sched() and scx_unlink_sched()") Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Tue, 12 May 2026 20:30:00 +0000 (10:30 -1000)]
sched_ext: Drop NONE early return in scx_disable_and_exit_task()
d3e73a0808dd ("sched_ext: Handle SCX_TASK_NONE in disable/switched_from
paths") skipped the trailing scx_set_task_sched(p, NULL) on NONE tasks.
After scx_fail_parent() parks a task at NONE/sched=parent and the parent
is later freed via queue_rcu_work() during root_disable, the preserved
p->scx.sched dangles - print_scx_info() from sched_show_task() reads
sch->ops.name from freed memory.
Drop the early return. __scx_disable_and_exit_task() already short-
circuits on NONE and the SUB_INIT block was cleared by
scx_fail_parent()'s earlier call, so clearing p->scx.sched is the only
work left - and the one thing the path actually needs.
v2: Extend the SUB_INIT block comment to note that the flag is only
set on the sub-enable path, so it's always clear on the NONE
re-entry (Andrea).
Fixes: d3e73a0808dd ("sched_ext: Handle SCX_TASK_NONE in disable/switched_from paths") Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
KVM: x86: Swap the dst and src operand for MOVNTDQA
Swap the MOVNTDQA operands, as MOVNTDQA does NOT in fact have "the same
characteristics as 0F E7 (MOVNTDQ)"; MOVNTDQA loads from memory and stores
to registers, while MOVNTDQ loads from registers and stores to memory.
Per the SDM:
MOVNTDQ - Move packed integer values in xmm1 to m128 using non-temporal
hint.
MOVNTDQA - Move double quadword from m128 to xmm1 using non-temporal hint
if WC memory type.
Reported-by: Josh Eads <josheads@google.com> Fixes: c57d9bafbd0b ("KVM: x86: Add support for emulating MOVNTDQA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260506213514.2781948-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 3 May 2026 21:09:17 +0000 (23:09 +0200)]
KVM: x86: use again the flush argument of __link_shadow_page()
Except in the case of parentless nested-TDP pages, mmu_page_zap_pte()
clears the SPTE but leaves the invalid_list empty. In this case, using
kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill.
Avoid flushing the entirety of the remote TLBs unless the invalid_list
was populated: instead, use a more efficient gfn-targeting flush (if
available) and skip it altogether if the caller guarantees that a TLB
flush is not necessary.
Based-on: <20260503201029.106481-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260503210917.121840-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The ICE register region size was originally described incorrectly when
the ICE hardware was first introduced. The same value was later carried
over unchanged when the ICE node was split out from the UFS node into
its own DT entry.
Correct the register size to match the hardware specification.
The DSI PHY CXO clock input is the SoC CXO divided by two. DSI0 already
uses correct one, but DSI1 got copy-paste from SM8650. Wrong clock
parent will cause incorrect DSI1 PHY PLL frequencies to be used making
the DSI panel non-working, although there is no upstream user of DSI1.
Aaron Kling [Mon, 30 Mar 2026 21:50:20 +0000 (16:50 -0500)]
arm64: dts: qcom: sm8550: add cpu OPP table with DDR, LLCC & L3 bandwidths
Add the OPP tables for each CPU clusters (cpu0-1-2, cpu3-4-5-6 & cpu7)
to permit scaling the Last Level Cache Controller (LLCC), DDR and L3 cache
frequency by aggregating bandwidth requests of all CPU core with referenc
to the current OPP they are configured in by the LMH/EPSS hardware.
The effect is a proper caches & DDR frequency scaling when CPU cores
changes frequency.
The OPP tables were built using the downstream memlat ddr, llcc & l3
tables for each cluster types with the actual EPSS cpufreq LUT tables
from running a QCS8550 device.
Also add the OSC L3 Cache controller node.
Also add the interconnect entry for each cpu, with 3 different paths:
- CPU to Last Level Cache Controller (LLCC)
- Last Level Cache Controller (LLCC) to DDR
- L3 Cache from CPU to DDR interface
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-HDK Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Aaron Kling <webgeek1234@gmail.com> Link: https://lore.kernel.org/r/20260330-sm8550-ddr-bw-scaling-v4-1-5020c06983a0@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Val Packett [Thu, 12 Mar 2026 00:53:37 +0000 (21:53 -0300)]
arm64: dts: qcom: x1-dell-thena: remove i2c20 (battery SMBus) and reserve its pins
i2c20 is used by the battmgr service on the ADSP to communicate with the
SBS interface of the battery. Initializing it from Linux would break the
battmgr functionality when booted in EL2. Mark those pins as reserved.
Fixes: e7733b42111c ("arm64: dts: qcom: Add support for Dell Inspiron 7441 / Latitude 7455") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Signed-off-by: Val Packett <val@packett.cool> Link: https://lore.kernel.org/r/20260312005731.12488-2-val@packett.cool Signed-off-by: Bjorn Andersson <andersson@kernel.org>
KVM: selftests: Ensure gmem file sizes are multiple of host page size
When creating a guest_memfd file and associated memslot to validate shared
guest memory, size the file+memslot to the maximum of the host or guest
page size. Attempting to allocate a single guest page will fail if the
host page size is greater than the guest page size, as KVM requires that
the size of memslots and guest_memfd files are a multiple of the host page
size.
For simplicity, verify the entire file can be shared between guest and host,
e.g. instead of trying to validate "partial" mappings.
Fixes: 42188667be38 ("KVM: selftests: Add guest_memfd testcase to fault-in on !mmap()'d memory") Reported-by: Zenghui Yu <zenghui.yu@linux.dev> Closes: https://lore.kernel.org/all/0064952b-048c-455d-ad89-e27e5cb82591@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260512155634.772602-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 May 2026 20:19:20 +0000 (22:19 +0200)]
Merge tag 'kvmarm-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.1, take #2
- Add the pKVM side of the workaround for ARM's erratum 4193714, provided
that the EL3 firmware does its part of the job. KVM will refuse to
initialise otherwise.
- Correctly handle 52bit VAs for guest EL2 stage-1 translations when
running under NV with E2H==0.
- Correctly deal with permission faults in guest_memfd memslots.
- Fix the steal-time selftest after the infrastructure was reworked.
- Make sure the host cannot pass a non-sensical clock update to the
EL2 tracing infrastructure.
- Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
ability to run arm64 guests, which will inevitably lead to arm64
code being directly used on s390.
- Make sure that EL2 is configured with both exception entry and exit
being Context Synchronization Events.
- Handle the current vcpu being NULL on EL2 panic.
- Fix the selftest_vcpu memcache being empty at the point of donation or
sharing.
- Check that the memcache has enough capacity before engaging on the
share/donate path.
- Fix __deactivate_fgt() to use its parameter rather than a variable
in the macro context.
KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
Never use L0's (KVM's) PAUSE loop exiting controls while L2 is running,
and instead always configure vmcb02 according to L1's exact capabilities
and desires.
The purpose of intercepting PAUSE after N attempts is to detect when the
vCPU may be stuck waiting on a lock, so that KVM can schedule in a
different vCPU that may be holding said lock. Barring a very interesting
setup, L1 and L2 do not share locks, and it's extremely unlikely that an
L1 vCPU would hold a spinlock while running L2. I.e. having a vCPU
executing in L1 yield to a vCPU running in L2 will not allow the L1 vCPU
to make forward progress, and vice versa.
While teaching KVM's "on spin" logic to only yield to other vCPUs in L2 is
doable, in all likelihood it would do more harm than good for most setups.
KVM has limited visibility into which L2 "vCPUs" belong to the same VM,
and thus share a locking domain. And even if L2 vCPUs are in the same
VM, KVM has no visilibity into L2 vCPU's that are scheduled out by the
L1 hypervisor.
Furthermore, KVM doesn't actually steal PAUSE exits from L1. If L1 is
intercepting PAUSE, KVM will route PAUSE exits to L1, not L0, as
nested_svm_intercept() gives priority to the vmcb12 intercept. As such,
overriding the count/threshold fields in vmcb02 with vmcb01's values is
nonsensical, as doing so clobbers all the training/learning that has been
done in L1.
Even worse, if L1 is not intercepting PAUSE, i.e. KVM is handling PAUSE
exits, then KVM will adjust the PLE knobs based on L2 behavior, which could
very well be detrimental to L1, e.g. due to essentially poisoning L1 PLE
training with bad data.
And copying the count from vmcb02 to vmcb01 on a nested VM-Exit makes even
less sense, because again, the purpose of PLE is to detect spinning vCPUs.
Whether or not a vCPU is spinning in L2 at the time of a nested VM-Exit
has no relevance as to the behavior of the vCPU when it executes in L1.
The only scenarios where any of this actually works is if at least one
of KVM or L1 is NOT intercepting PAUSE for the guest. Per the original
changelog, those were the only scenarios considered to be supported.
Disabling KVM's use of PLE makes it so the VM is always in a "supported"
mode.
Last, but certainly not least, using KVM's count/threshold instead of the
values provided by L1 is a blatant violation of the SVM architecture.
Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Cc: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260508213321.373309-1-seanjc@google.com/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Sacks [Tue, 12 May 2026 06:07:42 +0000 (02:07 -0400)]
KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
kvm_reset_dirty_gfn() guards the gfn range with
if (!memslot || (offset + __fls(mask)) >= memslot->npages)
return;
but offset is u64 and the addition is unchecked. The check can be
silently bypassed by a u64 wrap.
The dirty ring backing those entries is MAP_SHARED at
KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
slot and offset fields of any entry between when the kernel pushes
them and when KVM_RESET_DIRTY_RINGS consumes them. On reset,
kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
them straight back into this check; only the flags handshake is
treated as the handover, the slot/offset payload is taken on trust.
makes the coalescing loop in kvm_dirty_ring_reset() compute
delta = (s64)(0 - 0xffffffffffffffc1) = 63
which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
existing mask by setting bit 63. The trailing kvm_reset_dirty_gfn()
call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
the sum is 0 in u64 and the bounds check passes.
That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
unchanged. On the legacy MMU path -- kvm_memslots_have_rmaps() ==
true, i.e. shadow paging, any VM that has allocated shadow roots, or
a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
slot->arch.rmap[0][] with a near-U64_MAX gfn. That is an
out-of-bounds load of a kvm_rmap_head, followed by a conditional
clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
The path is reachable from any process holding /dev/kvm.
Range-check offset on its own first, so the addition cannot wrap.
memslot->npages is bounded well below U64_MAX, so once offset <
npages holds, offset + __fls(mask) (with __fls(mask) < BITS_PER_LONG)
stays in range.
Sergio Correia [Tue, 12 May 2026 13:28:59 +0000 (14:28 +0100)]
audit: enforce AUDIT_LOCKED for AUDIT_TRIM and AUDIT_MAKE_EQUIV
AUDIT_ADD_RULE and AUDIT_DEL_RULE correctly check for AUDIT_LOCKED
and return -EPERM, but AUDIT_TRIM and AUDIT_MAKE_EQUIV do not. This
allows a process with CAP_AUDIT_CONTROL to modify directory tree
watches and equivalence mappings even when the audit configuration
has been locked, undermining the purpose of the lock.
Add AUDIT_LOCKED checks to both commands.
Cc: stable@vger.kernel.org Reviewed-by: Ricardo Robaina <rrobaina@redhat.com> Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sergio Correia <scorreia@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Sergio Correia [Tue, 12 May 2026 13:28:33 +0000 (14:28 +0100)]
audit: fix incorrect inheritable capability in CAPSET records
__audit_log_capset() records the effective capability set into the
inheritable field due to a copy-paste error. Every CAPSET audit
record therefore reports cap_pi (process inheritable) with the value
of cap_effective instead of cap_inheritable.
This silently corrupts audit data used for compliance and forensic
analysis: an attacker who modifies inheritable capabilities to
prepare for a privilege-escalating exec would have the change masked
in the audit trail.
The bug has been present since the original introduction of CAPSET
audit records in 2008.
Cc: stable@vger.kernel.org Fixes: e68b75a027bb ("When the capset syscall is used it is not possible for audit to record the actual capbilities being added/removed. This patch adds a new record type which emits the target pid and the eff, inh, and perm cap sets.") Reviewed-by: Ricardo Robaina <rrobaina@redhat.com> Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sergio Correia <scorreia@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Taniya Das [Mon, 11 May 2026 10:15:43 +0000 (15:45 +0530)]
arm64: dts: qcom: sm8750: Add camera clock controller
The camera clock controller is split into cambistmclk and camcc. The
cambist clock controller handles the mclks and the rest of the clocks of
camera are part of the camcc clock controller.
Add the camcc clock controller device node for SM8750 SoC.
thermal: core: Simplify unregistration of governors
Thermal governors are only unregistered in the thermal_init() error
path and they actually are only deleted from thermal_governor_list.
Put the entire code needed to do that to thermal_unregister_governors()
and rearrange thermal_init() to call that function also when
thermal_register_governors() returns an error.
This allows thermal_unregister_governor() to be dropped
and thermal_register_governor() that is only called by
thermal_register_governors() can be made static __init, so the
headers of these two functions can be dropped from thermal_core.h.
thermal: core: Remove dead code from two functions
Since thermal_register_governor() is only called by
thermal_register_governors() which runs before clearing
thermal_class_unavailable, thermal_tz_list is guaranteed to be
empty when it runs, so the code walking that list in it is
effectively dead.
The same observation applies to the code walking thermal_tz_list
in thermal_unregister_governor() which is also called only if
thermal_class_unavailable hasn't been cleared.
Remove the dead code from both functions mentioned above and rearrange
thermal_register_governor() to reduce indentation level.
====================
bpf: Extend BPF syscall with common attributes support
This patch series builds upon the discussion in
"[PATCH bpf-next v4 0/4] bpf: Improve error reporting for freplace attachment failure" [1].
This patch series introduces support for *common attributes* in the BPF
syscall, providing a unified mechanism for passing shared metadata across
all BPF commands, initially used by BPF_PROG_LOAD, BPF_BTF_LOAD, and
BPF_MAP_CREATE.
The initial set of common attributes includes:
1. 'log_buf': User-provided buffer for storing log output.
2. 'log_size': Size of the provided log buffer.
3. 'log_level': Verbosity level for logging.
4. 'log_true_size': Actual log size reported by kernel.
With this extension, the BPF syscall will be able to return meaningful
error messages (e.g., map creation failures), improving debuggability
and user experience.
Changes:
v13 -> v14:
* Replace __u64 with __aligned_u64 in struct bpf_common_attr in patch #1
(per bot+bpf-ci).
* Add a single line comment for preserving original error in patch #6
(per bot+bpf-ci and Alexei).
* Drop unused label in patch #6 (per bot+bpf-ci).
* v13: https://lore.kernel.org/bpf/20260511152817.89191-1-leon.hwang@linux.dev/
v12 -> v13:
* Rebase on bpf-next tree to resolve code conflict.
* Report log_true_size on success path in patch #6.
* Check size instead of common->log_buf in patch #6 (per bot+bpf-ci).
* v12: https://lore.kernel.org/bpf/20260420141804.27179-1-leon.hwang@linux.dev/
v11 -> v12:
* Drop "log_" prefix in struct bpf_log_attr in patch #3.
* Drop "log_" prefix in struct bpf_log_opts in patch #7.
* Copy log_true_size using copy_to_bpfptr_offset() in patch #3 (per Alexei).
* v11: https://lore.kernel.org/bpf/20260216150445.68278-1-leon.hwang@linux.dev/
v10 -> v11:
* Collect Acked-by from Andrii, thanks.
* Validate whether log_buf, log_size, and log_level are valid by reusing
bpf_verifier_log_attr_valid() in patch #4 (per Andrii).
* v10: https://lore.kernel.org/bpf/20260211151115.78013-1-leon.hwang@linux.dev/
v9 -> v10:
* Collect Acked-by from Andrii, thanks.
* Address comments from Andrii:
* Drop log NULL check in bpf_log_attr_finalize().
* Return -EFAULT early in bpf_log_attr_finalize().
* Validate whether log_buf, log_size, and log_level are set.
* Keep log_buf, log_size, log_level, and user-pointer log_true_size in struct
bpf_log_attr.
* Make prog_load and btf_load work with the new struct bpf_log_attr.
* Add comment to log_true_size of struct bpf_log_opts in libbpf.
* Address comment from Alexei:
* Avoid using BPF_LOG_FIXED as log_level in tests.
* v9: https://lore.kernel.org/bpf/20260202144046.30651-1-leon.hwang@linux.dev/
v8 -> v9:
* Rework reporting 'log_true_size' for prog_load, btf_load, and map_create to
simplify struct bpf_log_attr (per Alexei).
* v8: https://lore.kernel.org/bpf/20260126151409.52072-1-leon.hwang@linux.dev/
v7 -> v8:
* Return 0 when fd < 0 and errno != EFAULT in probe_sys_bpf_ext(), then simplify
probe_bpf_syscall_common_attrs() (per Alexei and Andrii).
* v7: https://lore.kernel.org/bpf/20260123032445.125259-1-leon.hwang@linux.dev/
v6 -> v7:
* Return -errno when fd < 0 and errno != EFAULT in probe_sys_bpf_ext().
* Convert return value of probe_sys_bpf_ext() to bool in
probe_bpf_syscall_common_attrs().
* Address comments from Andrii:
* Drop the comment, and handle fd >= 0 case explicitly in
probe_sys_bpf_ext().
* Return an error when fd >= 0 in probe_sys_bpf_ext().
* v6: https://lore.kernel.org/bpf/20260120152424.40766-1-leon.hwang@linux.dev/
v5 -> v6:
* Address comments from Andrii:
* Update some variables' name.
* Drop unnecessary 'close(fd)' in libbpf.
* Rename FEAT_EXTENDED_SYSCALL to FEAT_BPF_SYSCALL_COMMON_ATTRS with
updated description in libbpf.
* Use EINVAL instead of EUSERS, as EUSERS is not used in bpf yet.
* Rename struct bpf_syscall_common_attr_opts to bpf_log_opts in libbpf.
* Add 'OPTS_SET(log_opts, log_true_size, 0);' in libbpf's 'bpf_map_create()'.
* v5: https://lore.kernel.org/bpf/20260112145616.44195-1-leon.hwang@linux.dev/
v4 -> v5:
* Rework reporting 'log_true_size' for prog_load, btf_load, and map_create
(per Alexei).
* v4: https://lore.kernel.org/bpf/20260106172018.57757-1-leon.hwang@linux.dev/
RFC v3 -> v4:
* Drop RFC.
* Address comments from Andrii:
* Add parentheses in 'sys_bpf_ext()'.
* Avoid creating new fd in 'probe_sys_bpf_ext()'.
* Add a new struct to wrap log fields in libbpf.
* Address comments from Alexei:
* Do not skip writing to user space when log_true_size is zero.
* Do not use 'bool' arguments.
* Drop the adding WARN_ON_ONCE()'s.
* v3: https://lore.kernel.org/bpf/20251002154841.99348-1-leon.hwang@linux.dev/
RFC v2 -> RFC v3:
* Rename probe_sys_bpf_extended to probe_sys_bpf_ext.
* Refactor reporting 'log_true_size' for prog_load.
* Refactor reporting 'btf_log_true_size' for btf_load.
* Add warnings for internal bugs in map_create.
* Check log_true_size in test cases.
* Address comment from Alexei:
* Change kvzalloc/kvfree to kzalloc/kfree.
* Address comments from Andrii:
* Move BPF_COMMON_ATTRS to 'enum bpf_cmd' alongside brief comment.
* Add bpf_check_uarg_tail_zero() for extra checks.
* Rename sys_bpf_extended to sys_bpf_ext.
* Rename sys_bpf_fd_extended to sys_bpf_ext_fd.
* Probe the new feature using NULL and -EFAULT.
* Move probe_sys_bpf_ext to libbpf_internal.h and drop LIBBPF_API.
* Return -EUSERS when log attrs are conflict between bpf_attr and
bpf_common_attr.
* Avoid touching bpf_vlog_init().
* Update the reason messages in map_create.
* Finalize the log using __cleanup().
* Report log size to users.
* Change type of log_buf from '__u64' to 'const char *' and cast type
using ptr_to_u64() in bpf_map_create().
* Do not return -EOPNOTSUPP when kernel doesn't support this feature
in bpf_map_create().
* Add log_level support for map creation for consistency.
* Address comment from Eduard:
* Use common_attrs->log_level instead of BPF_LOG_FIXED.
* v2: https://lore.kernel.org/bpf/20250911163328.93490-1-leon.hwang@linux.dev/
RFC v1 -> RFC v2:
* Fix build error reported by test bot.
* Address comments from Alexei:
* Drop new uapi for freplace.
* Add common attributes support for prog_load and btf_load.
* Add common attributes support for map_create.
* v1: https://lore.kernel.org/bpf/20250728142346.95681-1-leon.hwang@linux.dev/
====================
Leon Hwang [Tue, 12 May 2026 15:31:56 +0000 (23:31 +0800)]
libbpf: Add syscall common attributes support for map_create
With the previous commit adding common attribute support for
BPF_MAP_CREATE, users can now retrieve detailed error messages when map
creation fails via the log_buf field.
Introduce struct bpf_log_opts with the following fields:
log_buf, log_size, log_level, and log_true_size.
Extend bpf_map_create_opts with a new field log_opts, allowing users to
capture and inspect log messages on map creation failures.
Leon Hwang [Tue, 12 May 2026 15:31:55 +0000 (23:31 +0800)]
bpf: Add syscall common attributes support for map_create
Many BPF_MAP_CREATE validation failures currently return -EINVAL without
any explanation to userspace.
Plumb common syscall log attributes into map_create(), create a verifier
log from bpf_common_attr::log_buf/log_size/log_level, and report
map-creation failure reasons through that buffer.
This improves debuggability by allowing userspace to inspect why map
creation failed and read back log_true_size from common attributes.
Leon Hwang [Tue, 12 May 2026 15:31:54 +0000 (23:31 +0800)]
bpf: Add syscall common attributes support for btf_load
BPF_BTF_LOAD can now take log parameters from both union bpf_attr and
struct bpf_common_attr, with the same merge rules as BPF_PROG_LOAD:
- if both sides provide a complete log tuple (buf/size/level) and they
match, use it;
- if only one side provides log parameters, use that one;
- if both sides provide complete tuples but they differ, return -EINVAL.
Leon Hwang [Tue, 12 May 2026 15:31:53 +0000 (23:31 +0800)]
bpf: Add syscall common attributes support for prog_load
BPF_PROG_LOAD can now take log parameters from both union bpf_attr and
struct bpf_common_attr. The merge rules are:
- if both sides provide a complete log tuple (buf/size/level) and they
match, use it;
- if only one side provides log parameters, use that one;
- if both sides provide complete tuples but they differ, return -EINVAL.
Leon Hwang [Tue, 12 May 2026 15:31:52 +0000 (23:31 +0800)]
bpf: Refactor reporting log_true_size for prog_load
The next commit will add support for reporting logs via extended common
attributes, including 'log_true_size'.
To prepare for that, refactor the 'log_true_size' reporting logic by
introducing a new struct bpf_log_attr to encapsulate log-related behavior:
* bpf_log_attr_init(): initialize log fields, which will support
extended common attributes in the next commit.
* bpf_log_attr_finalize(): handle log finalization and write back
'log_true_size' to userspace.
Leon Hwang [Tue, 12 May 2026 15:31:51 +0000 (23:31 +0800)]
libbpf: Add support for extended BPF syscall
To support the extended BPF syscall introduced in the previous commit,
introduce the following internal APIs:
* 'sys_bpf_ext()'
* 'sys_bpf_ext_fd()'
They wrap the raw 'syscall()' interface to support passing extended
attributes.
* 'probe_sys_bpf_ext()'
Check whether current kernel supports the BPF syscall common attributes.
Leon Hwang [Tue, 12 May 2026 15:31:50 +0000 (23:31 +0800)]
bpf: Extend BPF syscall with common attributes support
Add generic BPF syscall support for passing common attributes.
The initial set of common attributes includes:
1. 'log_buf': User-provided buffer for storing logs.
2. 'log_size': Size of the log buffer.
3. 'log_level': Log verbosity level.
4. 'log_true_size': Actual log size reported by kernel.
The common-attribute pointer and its size are passed as the 4th and 5th
syscall arguments. A new command bit, 'BPF_COMMON_ATTRS' ('1 << 16'),
indicates that common attributes are supplied.
This commit adds syscall and uapi plumbing. Command-specific handling is
added in follow-up patches.
Raphael Zimmer [Tue, 12 May 2026 16:16:40 +0000 (18:16 +0200)]
libceph: Fix potential null-ptr-deref in decode_choose_args()
A message of type CEPH_MSG_OSD_MAP contains an OSD map that itself
contains a CRUSH map. When decoding this CRUSH map in crush_decode(), an
array of max_buckets CRUSH buckets is decoded, where some indices may
not refer to actual buckets and are therefore set to NULL. The received
CRUSH map may optionally contain choose_args that get decoded in
decode_choose_args(). When decoding a crush_choose_arg_map, a series of
choose_args for different buckets is decoded, with the bucket_index
being read from the incoming message. It is only checked that the bucket
index does not exceed max_buckets, but not that it doesn't point to an
index with a NULL bucket. If a (potentially corrupted) message contains
a crush_choose_arg_map including such a bucket_index, a null pointer
dereference may occur in the subsequent processing when attempting to
access the bucket with the given index.
This patch fixes the issue by extending the affected check. Now, it is
only attempted to access the bucket if it is not NULL.
Raphael Zimmer [Tue, 12 May 2026 07:29:30 +0000 (09:29 +0200)]
libceph: handle rbtree insertion error in decode_choose_args()
A message of type CEPH_MSG_OSD_MAP contains an OSD map that itself
contains a CRUSH map. The received CRUSH map may optionally contain
choose_args that get decoded in decode_choose_args(). In this function,
num_choose_arg_maps is read from the message, and a corresponding number
of crush_choose_arg_maps gets decoded afterwards. Each
crush_choose_arg_map has a choose_args_index, which serves as the key
when inserting it into the choose_args rbtree of the decoded crush_map.
If a (potentially corrupted) message contains two crush_choose_arg_maps
with the same index, the assertion in insert_choose_arg_map() triggers a
kernel BUG when trying to insert the second crush_choose_arg_map.
This patch fixes the issue by switching to the non-asserting rbtree
insertion function and rejecting the message if the insertion fails.
hwmon: (asus_atk0110) Check ACPI_COMPANION() against NULL
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_HANDLE() check against NULL to the
asus_atk0110 hwmon driver.
Fixes: ee1752590733 ("hwmon: (asus_atk0110) Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/2261594.irdbgypaU6@rafael.j.wysocki Signed-off-by: Guenter Roeck <linux@roeck-us.net>