Merge tag 'tegra-for-7.1-arm64-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/defconfig
arm64: tegra: Default configuration changes for v7.1-rc1
Drop the various ARCH_TEGRA_*_SOC options from the default configurations
since they are now enabled by default for ARCH_TEGRA.
* tag 'tegra-for-7.1-arm64-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: defconfig: Drop redundant ARCH_TEGRA_foo_SOC
Merge tag 'tegra-for-7.1-arm-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/defconfig
ARM: tegra: Default configuration changes for v7.1-rc1
Drop the various ARCH_TEGRA_*_SOC options from the default configurations
since they are now enabled by default for ARCH_TEGRA.
* tag 'tegra-for-7.1-arm-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: tegra: defconfig: Drop redundant ARCH_TEGRA_foo_SOC
soc/tegra: Make ARCH_TEGRA_SOC_FOO defaults for NVIDIA Tegra
The Talos EVK board does not include a camera sensor
by default. This DTSO has enabled the Arducam 12.3MP
IMX577 Mini Camera Module on the CSI-1 interface.
CSI-1 interface using mclk2 as the MCLK source on this board.
Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Signed-off-by: Wenmeng Liu <wenmeng.liu@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260305-sm6150_evk-v6-5-38ce4360d5e0@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Mukesh Ojha [Tue, 27 Jan 2026 11:43:50 +0000 (17:13 +0530)]
arm64: dts: qcom: talos: Add EL2 overlay
All the existing variants Talos boards are using Gunyah hypervisor
which means that, so far, Linux-based OS could only boot in EL1 on
those devices. However, it is possible for us to boot Linux at EL2
on these devices [1].
When running under Gunyah, the remote processor firmware IOMMU streams
are controlled by Gunyah. However, without Gunyah, the IOMMU is managed
by the consumer of this DeviceTree. Therefore, describe the firmware
streams for each remote processor.
Add a EL2-specific DT overlay and apply it to Talos IOT variant
devices to create -el2.dtb for each of them alongside "normal" dtb.
Sudarshan Shetty [Tue, 31 Mar 2026 06:01:07 +0000 (11:31 +0530)]
arm64: dts: qcom: talos-evk: Add support for QCS615 talos evk board
Add the device tree for the QCS615-based Talos EVK platform. The
platform is composed of a System-on-Module following the SMARC
standard, and a Carrier Board.
The Carrier Board supports several display configurations, HDMI and
LVDS. Both configurations use the same base hardware, with the display
selection controlled by a DIP switch.
Use a DTBO file, talos-evk-lvds-auo,g133han01.dtso, which defines an
overlay that disables HDMI and adds LVDS. The DTs file talos-evk
can describe the HDMI display configurations.
According to the hardware design and vendor guidance, the WiFi PA
supplies VDD_PA_A and VDD_PA_B only need to be enabled at the same time
as asserting WLAN_EN.
On this platform, WiFi enablement is controlled via the WLAN_EN GPIO
(GPIO84), which also drives the VDD_PA_A and VDD_PA_B power enables.
Remove the VDD_PA_A and VDD_PA_B regulator nodes from the device tree
and rely on WLAN_EN to enable WiFi functionality.
Add talos-evk-usb1-peripheral.dtso overlay to enable USB0 peripheral
(EDL) mode. The base DTS will keep USB0 host-only due to hardware
routing through the EDL DIP switch, and the overlay switches the
configuration for device-mode operation.
The LVDS backlight hardware has been updated to use a simplified
design. The backlight enable signal is now permanently pulled up
to 3.3V and is no longer controlled via GPIO59.
Remove the GPIO59 based backlight configuration from the device
tree, as it is no longer routed to the LVDS interface.
The initial device tree includes support for:
- CPU and memory
- UART
- GPIOs
- Regulators
- PMIC
- Early console
- AT24MAC602 EEPROM
- MCP2515 SPI to CAN
- ADV7535 DSI-to-HDMI bridge
- DisplayPort interface
- SN65DSI84ZXHR DSI-to-LVDS bridge
- Wi-Fi/BT
Sudarshan Shetty [Tue, 31 Mar 2026 06:01:06 +0000 (11:31 +0530)]
arm64: dts: qcom: talos/qcs615-ride: Fix inconsistent USB PHY node naming
The USB PHY nodes has inconsistent labels as 'usb_1_hsphy'
and 'usb_hsphy_2' across talos.dtsi and qcs615-ride.dts.
This patch renames them to follow a consistent naming
scheme.
Merge tag 'omap-for-v7.1/defconfig-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/defconfig
arm: OMAP defconfig updates for v7.1
* tag 'omap-for-v7.1/defconfig-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
arm: multi_v7_defconfig: Enable more OMAP 3/4 related configs
ARM: multi_v7_defconfig: omap2plus_defconfig: Enable ITE IT66121 driver
Mark Brown [Fri, 6 Mar 2026 17:00:53 +0000 (17:00 +0000)]
arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06
Update the definition of SMIDR_EL1 in the sysreg definition to reflect the
information in DD0601 2025-06. This includes somewhat more generic ways of
describing the sharing of SMCUs, more information on supported priorities
and provides additional resolution for describing affinity groups.
Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
====================
libbpf: clarify raw-address single kprobe attach behavior
Today libbpf documents single-kprobe attach through func_name, with an
optional offset. For the PMU-based path, func_name = NULL with an
absolute address in offset already works as well, but that is not
described in the API.
This patchset clarifies this behavior. First commit fixes kprobe
and uprobe attach error handling to use direct error codes. Next adds
kprobe API comments for the raw-address form and rejects it explicitly
for legacy tracefs/debugfs kprobes. Last adds PERF and LINK selftests
for the raw-address form, and checks that LEGACY rejects it.
---
Changes in v7:
- Change selftest line wrapping and assertions
Changes in v6:
- Split the kprobe/uprobe direct error-code fix into a separate patch
Changes in v5:
- Add kprobe API docs, use -EOPNOTSUPP, and switch selftests to LIBBPF_OPTS
Changes in v4:
- Inline raw-address error formatting and remove the probe_target buffer
Changes in v3:
- Drop bpf_kprobe_opts.addr and reuse offset when func_name is NULL
- Make legacy tracefs/debugfs kprobes reject the raw-address form
- Update selftests to cover PERF/LINK raw-address attach and LEGACY reject
Changes in v2:
- Fix line wrapping and indentation
====================
Hoyeon Lee [Wed, 1 Apr 2026 14:29:31 +0000 (23:29 +0900)]
selftests/bpf: Add test for raw-address single kprobe attach
Currently, attach_probe covers manual single-kprobe attaches by
func_name, but not the raw-address form that the PMU-based
single-kprobe path can accept.
This commit adds PERF and LINK raw-address coverage. It resolves
SYS_NANOSLEEP_KPROBE_NAME through kallsyms, passes the absolute address
in bpf_kprobe_opts.offset with func_name = NULL, and verifies that
kprobe and kretprobe are still triggered. It also verifies that LEGACY
rejects the same form.
Hoyeon Lee [Wed, 1 Apr 2026 14:29:30 +0000 (23:29 +0900)]
libbpf: Clarify raw-address single kprobe attach behavior
bpf_program__attach_kprobe_opts() documents single-kprobe attach
through func_name, with an optional offset. For the PMU-based path,
func_name = NULL with an absolute address in offset already works as
well, but that is not described in the API.
This commit clarifies this existing non-legacy behavior. For PMU-based
attach, callers can use func_name = NULL with an absolute address in
offset as the raw-address form. For legacy tracefs/debugfs kprobes,
reject this form explicitly.
Hoyeon Lee [Wed, 1 Apr 2026 14:29:29 +0000 (23:29 +0900)]
libbpf: Use direct error codes for kprobe/uprobe attach
perf_event_open_probe() and perf_event_{k,u}probe_open_legacy() helpers
are returning negative error codes directly on failure. This commit
changes bpf_program__attach_{k,u}probe_opts() to use those return
values directly instead of re-reading possibly changed errno.
Jean Delvare [Wed, 1 Apr 2026 10:23:23 +0000 (12:23 +0200)]
accel: ethosu: Add hardware dependency hint
The Ethos-U NPU is only available on ARM systems, so add a hardware
dependency hint to prevent this driver from being needlessly
included in kernels built for other architectures.
Align bpf_program__clone() with bpf_object_load_prog() by gating
BTF func/line info on FEAT_BTF_FUNC kernel support, and resolve
caller-provided prog_btf_fd before checking obj->btf so that callers
with their own BTF can use clone() even when the object has no BTF
loaded.
While at it, treat func_info and line_info fields as atomic groups
to prevent mismatches between pointer and count from different sources.
Ryan Roberts [Mon, 30 Mar 2026 16:17:04 +0000 (17:17 +0100)]
arm64: mm: Remove pmd_sect() and pud_sect()
The semantics of pXd_leaf() are very similar to pXd_sect(). The only
difference is that pXd_sect() only considers it a section if PTE_VALID
is set, whereas pXd_leaf() permits both "valid" and "present-invalid"
types.
Using pXd_sect() has caused issues now that large leaf entries can be
present-invalid since commit a166563e7ec37 ("arm64: mm: support large
block mapping when rodata=full"), so let's just remove the API and
standardize on pXd_leaf().
There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This
was previously fine for the pXd_sect() macro because it only evaluated
its argument once. But pXd_leaf() evaluates its argument multiple times.
So let's avoid unintended side effects by reimplementing pXd_leaf() as
an inline function.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ryan Roberts [Mon, 30 Mar 2026 16:17:03 +0000 (17:17 +0100)]
arm64: mm: Handle invalid large leaf mappings correctly
It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.
It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:
This panic occurs because the swiotlb memory was previously shared to
the host (__set_memory_enc_dec()), which involves transitioning the
(large) leaf mappings to invalid, sharing to the host, then marking the
mappings valid again. But pageattr_p[mu]d_entry() would only update the
entry if it is a section mapping, since otherwise it concluded it must
be a table entry so shouldn't be modified. But p[mu]d_sect() only
returns true if the entry is valid. So the result was that the large
leaf entry was made invalid in the first pass then ignored in the second
pass. It remains invalid until the above code tries to access it and
blows up.
The simple fix would be to update pageattr_pmd_entry() to use
!pmd_table() instead of pmd_sect(). That would solve this problem.
But the ptdump code also suffers from a similar issue. It checks
pmd_leaf() and doesn't call into the arch-specific note_page() machinery
if it returns false. As a result of this, ptdump wasn't even able to
show the invalid large leaf mappings; it looked like they were valid
which made this super fun to debug. the ptdump code is core-mm and
pmd_table() is arm64-specific so we can't use the same trick to solve
that.
But we already support the concept of "present-invalid" for user space
entries. And even better, pmd_leaf() will return true for a leaf mapping
that is marked present-invalid. So let's just use that encoding for
present-invalid kernel mappings too. Then we can use pmd_leaf() where we
previously used pmd_sect() and everything is magically fixed.
Additionally, from inspection kernel_page_present() was broken in a
similar way, so I'm also updating that to use pmd_leaf().
The transitional page tables component was also similarly broken; it
creates a copy of the kernel page tables, making RO leaf mappings RW in
the process. It also makes invalid (but-not-none) pte mappings valid.
But it was not doing this for large leaf mappings. This could have
resulted in crashes at kexec- or hibernate-time. This code is fixed to
flip "present-invalid" mappings back to "present-valid" at all levels.
Finally, I have hardened split_pmd()/split_pud() so that if it is passed
a "present-invalid" leaf, it will maintain that property in the split
leaves, since I wasn't able to convince myself that it would only ever
be called for "present-valid" leaves.
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full") Cc: stable@vger.kernel.org Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ryan Roberts [Mon, 30 Mar 2026 16:17:02 +0000 (17:17 +0100)]
arm64: mm: Fix rodata=full block mapping support for realm guests
Commit a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") enabled the linear map to be mapped by block/cont while
still allowing granular permission changes on BBML2_NOABORT systems by
lazily splitting the live mappings. This mechanism was intended to be
usable by realm guests since they need to dynamically share dma buffers
with the host by "decrypting" them - which for Arm CCA, means marking
them as shared in the page tables.
However, it turns out that the mechanism was failing for realm guests
because realms need to share their dma buffers (via
__set_memory_enc_dec()) much earlier during boot than
split_kernel_leaf_mapping() was able to handle. The report linked below
showed that GIC's ITS was one such user. But during the investigation I
found other callsites that could not meet the
split_kernel_leaf_mapping() constraints.
The problem is that we block map the linear map based on the boot CPU
supporting BBML2_NOABORT, then check that all the other CPUs support it
too when finalizing the caps. If they don't, then we stop_machine() and
split to ptes. For safety, split_kernel_leaf_mapping() previously
wouldn't permit splitting until after the caps were finalized. That
ensured that if any secondary cpus were running that didn't support
BBML2_NOABORT, we wouldn't risk breaking them.
I've fix this problem by reducing the black-out window where we refuse
to split; there are now 2 windows. The first is from T0 until the page
allocator is inititialized. Splitting allocates memory for the page
allocator so it must be in use. The second covers the period between
starting to online the secondary cpus until the system caps are
finalized (this is a very small window).
All of the problematic callers are calling __set_memory_enc_dec() before
the secondary cpus come online, so this solves the problem. However, one
of these callers, swiotlb_update_mem_attributes(), was trying to split
before the page allocator was initialized. So I have moved this call
from arch_mm_preinit() to mem_init(), which solves the ordering issue.
I've added warnings and return an error if any attempt is made to split
in the black-out windows.
Note there are other issues which prevent booting all the way to user
space, which will be fixed in subsequent patches.
Reported-by: Jinjiang Tu <tujinjiang@huawei.com> Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/ Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full") Cc: stable@vger.kernel.org Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Nicholas Carlini [Tue, 31 Mar 2026 13:25:32 +0000 (15:25 +0200)]
eventpoll: defer struct eventpoll free to RCU grace period
In certain situations, ep_free() in eventpoll.c will kfree the epi->ep
eventpoll struct while it still being used by another concurrent thread.
Defer the kfree() to an RCU callback to prevent UAF.
Fixes: f2e467a48287 ("eventpoll: Fix semi-unbounded recursion") Signed-off-by: Nicholas Carlini <nicholas@carlini.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Karol Wachowski [Thu, 2 Apr 2026 12:55:26 +0000 (14:55 +0200)]
accel/ivpu: Trigger recovery on TDR with OS scheduling
With OS scheduling mode the driver cannot determine which context
caused the timeout, so context abort cannot be used. Instead of
queuing context_abort_work, directly trigger full device recovery
when a job timeout (TDR) occurs in OS scheduling mode.
Fixes: ade00a6c903f ("accel/ivpu: Perform engine reset instead of device recovery on TDR") Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20260402125526.845210-1-karol.wachowski@linux.intel.com
Changwoo Min [Thu, 2 Apr 2026 02:31:50 +0000 (11:31 +0900)]
sched_ext: Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU
Since commit 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for
trampoline.c"), the BPF prolog (__bpf_prog_enter) calls migrate_disable()
only when CONFIG_PREEMPT_RCU is enabled, via rcu_read_lock_dont_migrate().
Without CONFIG_PREEMPT_RCU, the prolog never touches migration_disabled,
so migration_disabled == 1 always means the task is truly
migration-disabled regardless of whether it is the current task.
The old unconditional p == current check was a false negative in this
case, potentially allowing a migration-disabled task to be dispatched to
a remote CPU and triggering scx_error in task_can_run_on_remote_rq().
Only apply the p == current disambiguation when CONFIG_PREEMPT_RCU is
enabled, where the ambiguity with the BPF prolog still exists.
Fixes: 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for trampoline.c") Cc: stable@vger.kernel.org # v6.18+ Link: https://lore.kernel.org/lkml/20250821090609.42508-8-dongml2@chinatelecom.cn/ Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Ionut Nechita [Mon, 23 Mar 2026 21:13:43 +0000 (23:13 +0200)]
drm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations
Description:
- Commit b82f0759346617b2 ("drm/amd/display: Migrate DIO registers access
from hwseq to dio component") moved DIO_MEM_PWR_CTRL register access
behind the new dio abstraction layer but only created the dio object for
DCN 4.01. On all other generations (DCN 10/20/21/201/30/301/302/303/
31/314/315/316/32/321/35/351/36), the dio pointer is NULL, causing the
register write to be silently skipped.
This results in AFMT HDMI memory not being powered on during init_hw,
which can cause HDMI audio failures and display issues on affected
hardware including Renoir/Cezanne (DCN 2.1) APUs that use dcn10_init_hw.
Call dcn10_dio_construct() in each older DCN generation's resource.c
to create the dio object, following the same pattern as DCN 4.01. This
ensures the dio pointer is non-NULL and the mem_pwr_ctrl callback works
through the dio abstraction for all DCN generations.
Fixes: b82f07593466 ("drm/amd/display: Migrate DIO registers access from hwseq to dio component.") Reviewed-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sched_ext: Fix missing warning in scx_set_task_state() default case
In scx_set_task_state(), the default case was setting the
warn flag, but then returning immediately. This is problematic
because the only purpose of the warn flag is to trigger
WARN_ONCE, but the early return prevented it from ever firing,
leaving invalid task states undetected and untraced.
To fix this, a WARN_ONCE call is now added directly in the
default case.
The fix addresses two aspects:
- Guarantees the invalid task states are properly logged
and traced.
- Provides a distinct warning message
("sched_ext: Invalid task state") specifically for
states outside the defined scx_task_state enum values,
making it easier to distinguish from other transition
warnings.
This ensures proper detection and reporting of invalid states.
Signed-off-by: Samuele Mariotti <smariotti@disroot.org> Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
ata: libata-transport: use static struct ata_transport_internal to simplify match functions
Both matching functions can make use of static struct
ata_transport_internal. This eliminates the dependency on static
variable ata_scsi_transport_template, and it allows to remove helper
to_ata_internal(). Small drawback is that a forward declaration of
both functions is needed.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Miklos Szeredi [Thu, 12 Mar 2026 19:30:08 +0000 (20:30 +0100)]
fuse: support FSCONFIG_SET_FD for "fd" option
This is not only cleaner to use in userspace (no need to sprintf the fd to
a string) but also allows userspace to detect that the devfd can be closed
after the fsconfig call.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Miklos Szeredi [Thu, 12 Mar 2026 11:19:10 +0000 (12:19 +0100)]
fuse: clean up device cloning
- fuse_mutex is not needed for device cloning, because fuse_dev_install()
uses cmpxcg() to set fud->fc, which prevents races between clone/mount
or clone/clone. This makes the logic simpler
- Drop fc->dev_count. This is only used to check in release if the device
is the last clone, but checking list_empty(&fc->devices) is equivalent
after removing the released device from the list. Removing the fuse_dev
before calling fuse_abort_conn() is okay, since the processing and io
lists are now empty for this device.
Both functions are helpers which are used only once. So remove them and
merge their code into libata_transport_init() and libata_transport_exit()
respectively.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Miklos Szeredi [Wed, 11 Mar 2026 20:02:41 +0000 (21:02 +0100)]
fuse: create fuse_dev on /dev/fuse open instead of mount
Allocate struct fuse_dev when opening the device. This means that unlike
before, ->private_data is always set to a valid pointer.
The use of USE_DEV_SYNC_INIT magic pointer for the private_data is now
replaced with a simple bool sync_init member.
If sync INIT is not set, I/O on the device returns error before mount.
Keep this behavior by checking for the ->fc member. If fud->fc is set, the
mount has succeeded. Testing this used READ_ONCE(file->private_data) and
smp_mb() to try and provide the necessary semantics. Switch this to
smp_store_release() and smp_load_acquire().
Setting fud->fc is protected by fuse_mutex, this is unchanged.
Will need this later so the /dev/fuse open file reference is not held
during FSCONFIG_CMD_CREATE.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
fuse: fuse_dev_ioctl_clone() should wait for device file to be initialized
Use fuse_get_dev() not __fuse_get_dev() on the old fd, since in the case of
synchronous INIT the caller will want to wait for the device file to be
available for cloning, just like I/O wants to wait instead of returning an
error.
This is a more refined version of the previous patch series fixing
and cleaning up the TSO code.
I'm not sure whether "TSO" or "GSO" should be used to describe this
feature - although it primarily handles TCP, dwmac4 appears to also
be able to handle UDP.
In essence, this series adds a .ndo_features_check() method to handle
whether TSO/GSO can be used for a particular skbuff - checking which
queue the skbuff is destined for and whether that has TBS available
which precludes TSO being enabled on that channel.
I'm also adding a check that the header is smaller than 1024 bytes,
as documented in those sources which have TSO support - this is due
to the hardware buffering the header in "TSO memory" which I guess
is limited to 1KiB. I expect this test never to trigger, but if
the headers ever exceed that size, the hardware will likely fail.
While IPv4 headers are unlikely to be anywhere near this, there is
nothing in the protocol which prevents IPv6 headers up to 64KiB.
As we now have a .ndo_features_check() method, I'm moving the VLAN
insertion for TSO packets into core code by unpublishing the VLAN
insertion features when we use TSO. Another move is for checksumming,
which is required for TSO, but stmmac's requirements for offloading
checksums are more strict - and this seems to be a bug in the TSO
path.
I've changed the hardware initialisation to always enable TSO support
on the channels even if the user requests TSO/GSO to be disabled -
this fixes another issue as pointed out by Jakub in a previous review.
I'm moving the setup of the GSO features, cleaning those up, and
adding a warning if platform glue requests this to be enabled but the
hardware has no support. Hopefully this will never trigger if everyone
got the STMMAC_FLAG_TSO_EN flag correct. Also adding a check for TxPBL
value.
Finally, moving the "TSO supported" message to the new
stmmac_set_gso_features() function so keep all this TSO stuff together.
====================
Documentation states that TxPBL must be >= 4 to allow TSO support, but
the driver doesn't check this. TxPBL comes from the platform glue code
or DT. Add a check with a warning if platform glue code attempts to
enable TSO support with TxPBL too low.
net: stmmac: simplify GSO/TSO test in stmmac_xmit()
The test in stmmac_xmit() to see whether we should pass the skbuff to
stmmac_tso_xmit() is more complex than it needs to be. This test can
be simplified by storing the mask of GSO types that we will pass, and
setting it according to the enabled features.
Note that "tso" is a mis-nomer since commit b776620651a1 ("net:
stmmac: Implement UDP Segmentation Offload"). Also note that this
commit controls both via the TSO feature. We preserve this behaviour
in this commit.
Also, this commit unconditionally accessed skb_shinfo(skb)->gso_type
for all frames, even when skb_is_gso() was false. This access is
eliminated.
net: stmmac: move check for hardware checksum supported
Add a check in .ndo_features_check() to indicate whether hardware
checksum can be performed on the skbuff. Where hardware checksum is
not supported - either because the channel does not support Tx COE
or the skb isn't suitable (stmmac uses a tighter test than
can_checksum_protocol()) we also need to disable TSO, which will be
done by harmonize_features() in net/core/dev.c
This fixes a bug where a channel which has COE disabled may still
receive TSO skbuffs.
net: stmmac: move TSO VLAN tag insertion to core code
stmmac_tso_xmit() checks whether the skbuff is trying to offload
vlan tag insertion to hardware, which from the comment in the code
appears to be buggy when the TSO feature is used.
Rather than stmmac_tso_xmit() inserting the VLAN tag, handle this
in stmmac_features_check() which will then use core net code to
handle this. See net/core/dev.c::validate_xmit_skb()
net: stmmac: fix TSO support when some channels have TBS available
According to the STM32MP25xx manual, which is dwmac v5.3, TBS (time
based scheduling) is not permitted for channels which have hardware
TSO enabled. Intel's commit 5e6038b88a57 ("net: stmmac: fix TSO and
TBS feature enabling during driver open") concurs with this, but it
is incomplete.
This commit avoids enabling TSO support on the channels which have
TBS available, which, as far as the hardware is concerned, means we
do not set the TSE bit in the DMA channel's transmit control register.
However, the net device's features apply to all queues(channels), which
means these channels may still be handed TSO skbs to transmit, and the
driver will pass them to stmmac_tso_xmit(). This will generate the
descriptors for TSO, even though the channel has the TSE bit clear.
Fix this by checking whether the queue(channel) has TBS available,
and if it does, fall back to software GSO support.
netdev features documentation requires that .ndo_fix_features() is
stateless: it shouldn't modify driver state. Yet, stmmac_fix_features()
does exactly that, changing whether GSO frames are processed by the
driver.
Move this code to stmmac_set_features() instead, which is the correct
place for it. We don't need to check whether TSO is supported; this
is already handled via the setup of netdev->hw_features, and we are
guaranteed that if netdev->hw_features indicates that a feature is
not supported, .ndo_set_features() won't be called with it set.
Rather than configuring the channels depending on whether GSO/TSO is
currently enabled by the user, always enable if the hardware has TSO
support and the platform wants TSO to be enabled.
This avoids the channel TSO enable bit being disabled after a resume
when the user has disabled TSO features. This will cause problems when
the user re-enables TSO.
This bug goes back to commit f748be531d70 ("stmmac: support new GMAC4")
Igor Pylypiv [Thu, 2 Apr 2026 16:07:05 +0000 (09:07 -0700)]
ata: libata-eh: Do not retry reset if the device is gone
If a device is hot-unplugged or otherwise disappears during error handling,
ata_eh_reset() may fail with -ENODEV. Currently, the error handler will
continue to retry the reset operation up to max_tries times.
Prevent unnecessary reset retries by exiting the loop early when
ata_do_reset() returns -ENODEV.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Cross-merge networking fixes after downstream PR (net-7.0-rc7).
Conflicts:
net/vmw_vsock/af_vsock.c b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()") 0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport")
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic") 57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change")
drivers/net/wireless/intel/iwlwifi/mld/mac80211.c 4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") 687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data")
drivers/net/wireless/intel/iwlwifi/mvm/fw.c 078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v
2") 323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported")
Hans Zhang [Wed, 1 Apr 2026 02:30:48 +0000 (10:30 +0800)]
PCI: dwc: Fix type mismatch for kstrtou32_from_user() return value
kstrtou32_from_user() returns int, but the return value was stored in
a u32 variable 'val', risking sign loss. Use a dedicated int variable
to correctly handle the return code.
Merge tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix for a potential extent tree corruption due to an
unexpected error value.
When the search for an extent item failed, it under some circumstances
was reported as a success to the caller"
* tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix incorrect return value after changing leaf in lookup_extent_data_ref()
Steven Rostedt [Tue, 31 Mar 2026 20:39:24 +0000 (16:39 -0400)]
tracing: Allow backup to save persistent ring buffer before it starts
When the persistent ring buffer was first introduced, it did not make
sense to start tracing for it on the kernel command line. That's because
if there was a crash, the start of events would invalidate the events from
the previous boot that had the crash.
But now that there's a "backup" instance that can take a snapshot of the
persistent ring buffer when boot starts, it is possible to have the
persistent ring buffer start events at boot up and not lose the old events.
Update the code where the boot events start after all boot time instances
are created. This will allow the backup instance to copy the persistent
ring buffer from the previous boot, and allow the persistent ring buffer
to start tracing new events for the current boot.
The above will create a boot_mapped persistent ring buffer and enabled the
scheduler events. If there's a crash, a "backup" instance will be created
holding the events of the persistent ring buffer from the previous boot,
while the persistent ring buffer will once again start tracing scheduler
events of the current boot.
Now the user doesn't have to remember to start the persistent ring buffer.
It will always have the events started at each boot.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: John Stultz <jstultz@google.com> Link: https://patch.msgid.link/20260331163924.6ccb3896@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
thermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp
Use max() to simplify brcmstb_get_temp() and improve its readability.
Since avs_tmon_code_to_temp() returns an int, change the data type of
the local variable 't' from long to int. No functional changes.
tracing: Remove the backup instance automatically after read
Since the backup instance is readonly, after reading all data via pipe, no
data is left on the instance. Thus it can be removed safely after closing
all files. This also removes it if user resets the ring buffer manually
via 'trace' file.
Since there is no reason to reuse the backup instance, make it readonly
(but erasable). Note that only backup instances are readonly, because
other trace instances will be empty unless it is writable. Only backup
instances have copy entries from the original.
With this change, most of the trace control files are removed from the
backup instance, including eventfs enable/filter etc.
ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers
On CPU hotplug, if it is the first time a trace_buffer sees a CPU, a
ring_buffer_per_cpu will be allocated and its corresponding bit toggled
in the cpumask. Many readers check this cpumask to know if they can
safely read the ring_buffer_per_cpu but they are doing so without memory
ordering and may observe the cpumask bit set while having NULL buffer
pointer.
Enforce the memory read ordering by sending an IPI to all online CPUs.
The hotplug path is a slow-path anyway and it saves us from adding read
barriers in numerous call sites.
Daniel Borkmann [Tue, 31 Mar 2026 22:20:20 +0000 (00:20 +0200)]
selftests/bpf: Add more precision tracking tests for atomics
Add verifier precision tracking tests for BPF atomic fetch operations.
Validate that backtrack_insn correctly propagates precision from the
fetch dst_reg to the stack slot for {fetch_add,xchg,cmpxchg} atomics.
For the first two src_reg gets the old memory value, and for the last
one r0. The fetched register is used for pointer arithmetic to trigger
backtracking. Also add coverage for fetch_{or,and,xor} flavors which
exercises the bitwise atomic fetch variants going through the same
insn->imm & BPF_FETCH check but with different imm values.
Add dual-precision regression tests for fetch_add and cmpxchg where
both the fetched value and a reread of the same stack slot are tracked
for precision. After the atomic operation, the stack slot is STACK_MISC,
so the ldx does not set INSN_F_STACK_ACCESS. These tests verify that
stack precision propagates solely through the atomic fetch's load side.
Add map-based tests for fetch_add and cmpxchg which validate that non-
stack atomic fetch completes precision tracking without falling back
to mark_all_scalars_precise. Lastly, add 32-bit variants for {fetch_add,
cmpxchg} on map values to cover the second valid atomic operand size.
Daniel Borkmann [Tue, 31 Mar 2026 22:20:19 +0000 (00:20 +0200)]
bpf: Fix incorrect pruning due to atomic fetch precision tracking
When backtrack_insn encounters a BPF_STX instruction with BPF_ATOMIC
and BPF_FETCH, the src register (or r0 for BPF_CMPXCHG) also acts as
a destination, thus receiving the old value from the memory location.
The current backtracking logic does not account for this. It treats
atomic fetch operations the same as regular stores where the src
register is only an input. This leads the backtrack_insn to fail to
propagate precision to the stack location, which is then not marked
as precise!
Later, the verifier's path pruning can incorrectly consider two states
equivalent when they differ in terms of stack state. Meaning, two
branches can be treated as equivalent and thus get pruned when they
should not be seen as such.
Fix it as follows: Extend the BPF_LDX handling in backtrack_insn to
also cover atomic fetch operations via is_atomic_fetch_insn() helper.
When the fetch dst register is being tracked for precision, clear it,
and propagate precision over to the stack slot. For non-stack memory,
the precision walk stops at the atomic instruction, same as regular
BPF_LDX. This covers all fetch variants.
Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"With fixes from wireless, bluetooth and netfilter included we're back
to each PR carrying 30%+ more fixes than in previous era.
The good news is that so far none of the "extra" fixes are themselves
causing real regressions. Not sure how much comfort that is.
Current release - fix to a fix:
- netdevsim: fix build if SKB_EXTENSIONS=n
- eth: stmmac: skip VLAN restore when VLAN hash ops are missing
Previous releases - regressions:
- wifi: iwlwifi: mvm: don't send a 6E related command when
not supported
Previous releases - always broken:
- some info leak fixes
- add missing clearing of skb->cb[] on ICMP paths from tunnels
- ipv6:
- flowlabel: defer exclusive option free until RCU teardown
- avoid overflows in ip6_datagram_send_ctl()
- mpls: add seqcount to protect platform_labels from OOB access
- bridge: improve safety of parsing ND options
- bluetooth: fix leaks, overflows and races in hci_sync
- netfilter: add more input validation, some to address bugs directly
some to prevent exploits from cooking up broken configurations
- wifi:
- ath: avoid poor performance due to stopping the wrong
aggregation session
- virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free
- eth:
- fec: fix the PTP periodic output sysfs interface
- enetc: safely reinitialize TX BD ring when it has unsent frames"
* tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
ipv6: avoid overflows in ip6_datagram_send_ctl()
net: hsr: fix VLAN add unwind on slave errors
net: hsr: serialize seq_blocks merge across nodes
vsock: initialize child_ns_mode_locked in vsock_net_init()
selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks
net/sched: cls_flow: fix NULL pointer dereference on shared blocks
net/sched: cls_fw: fix NULL pointer dereference on shared blocks
net/x25: Fix overflow when accumulating packets
net/x25: Fix potential double free of skb
bnxt_en: Restore default stat ctxs for ULP when resource is available
bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()
bnxt_en: Refactor some basic ring setup and adjustment logic
net/mlx5: Fix switchdev mode rollback in case of failure
net/mlx5: Avoid "No data available" when FW version queries fail
net/mlx5: lag: Check for LAG device before creating debugfs
net: macb: properly unregister fixed rate clocks
net: macb: fix clk handling on PCI glue driver removal
virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN
net/sched: sch_netem: fix out-of-bounds access in packet corruption
...
Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- IOMMU-PT related compile breakage in for AMD driver
- IOTLB flushing behavior when unmapped region is larger than requested
due to page-sizes
- Fix IOTLB flush behavior with empty gathers
* tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline
iommupt: Fix short gather if the unmap goes into a large mapping
iommu: Do not call drivers for empty gathers
bpf: Reject sleepable kprobe_multi programs at attach time
kprobe.multi programs run in atomic/RCU context and cannot sleep.
However, bpf_kprobe_multi_link_attach() did not validate whether the
program being attached had the sleepable flag set, allowing sleepable
helpers such as bpf_copy_from_user() to be invoked from a non-sleepable
context.
This causes a "sleeping function called from invalid context" splat:
BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:169
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1787, name: sudo
preempt_count: 1, expected: 0
RCU nest depth: 2, expected: 0
Fix this by rejecting sleepable programs early in
bpf_kprobe_multi_link_attach(), before any further processing.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Varun R Mallya <varunrmallya@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Leon Hwang <leon.hwang@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260401191126.440683-1-varunrmallya@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Merge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"People have been so busy for hunting and we're still getting more
changes than wished for, but it doesn't look too scary; almost all
changes are device-specific small fixes.
I guess it's rather a casual bump, and no more Easter eggs are left
for 7.0 (hopefully)...
- Fixes for the recent regression on ctxfi driver
- Fix missing INIT_LIST_HEAD() for ASoC card_aux_list
- Usual HD- and USB-audio, and ASoC AMD quirk updates
- ASoC fixes for AMD and Intel"
* tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log
ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP
ALSA: hda/realtek: add quirk for Acer Swift SFG14-73
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9
ASoC: Intel: boards: fix unmet dependency on PINCTRL
ASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_params
ALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initialization
ALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10
ALSA: hda/realtek: add quirk for HP Laptop 15-fc0xxx
ASoC: ep93xx: Fix unchecked clk_prepare_enable() and add rollback on failure
ASoC: soc-core: call missing INIT_LIST_HEAD() for card_aux_list
ALSA: hda/realtek: Add quirk for Samsung Book2 Pro 360 (NP950QED)
ASoC: amd: yc: Add DMI entry for HP Laptop 15-fc0xxx
ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 16X OLED M7601RM
ALSA: hda/realtek: Add quirk for ASUS ROG Strix SCAR 15
ALSA: usb-audio: Exclude Scarlett Solo 1st Gen from SKIP_IFACE_SETUP
ALSA: caiaq: fix stack out-of-bounds read in init_card
ALSA: ctxfi: Check the error for index mapping
ALSA: ctxfi: Fix missing SPDIFI1 index handling
ALSA: hda/realtek: add quirk for HP Victus 15-fb0xxx
...
Merge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay
Pull auxdisplay fixes from Andy Shevchenko:
- Fix NULL dereference in linedisp_release()
- Fix ht16k33 DT bindings to avoid warnings
- Handle errors in I²C transfers in lcd2s driver
* tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
auxdisplay: line-display: fix NULL dereference in linedisp_release
auxdisplay: lcd2s: add error handling for i2c transfers
dt-bindings: auxdisplay: ht16k33: Use unevaluatedProperties to fix common property warning
Philipp Zabel [Thu, 2 Apr 2026 12:30:10 +0000 (14:30 +0200)]
Merge tag 'reset-fixes-for-v7.0-2' into reset/next
Reset controller fixes for v7.0, part 2
* Decouple spacemit K3 reset lines that were incorrectly coupled
together as one, but are in fact separate resets in hardware.
* Fix a double free in the reset_add_gpio_aux_device() error path.
This has already been fixed on reset/next by commit a9b95ce36de4
("reset: gpio: add a devlink between reset-gpio and its consumer").
* Fix the MODULE_AUTHOR string in the rzg2l-usbphy-ctrl driver.
We merge this into reset/next to resolve a conflict between commits a9b95ce36de4 ("reset: gpio: add a devlink between reset-gpio and its
consumer") and fbffb8c7c7bb ("reset: gpio: fix double free in
reset_add_gpio_aux_device() error path").
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
====================
bpf: Migrate bpf_task_work and file dynptr to kmalloc_nolock
Now that kmalloc can be used from NMI context via kmalloc_nolock(),
migrate BPF internal allocations away from bpf_mem_alloc to use the
standard slab allocator.
Use kfree_rcu() for deferred freeing, which waits for a regular RCU
grace period before the memory is reclaimed. Sleepable BPF programs
hold rcu_read_lock_trace but not regular rcu_read_lock, so patch 1
adds explicit rcu_read_lock/unlock around the pointer-to-refcount
window to prevent kfree_rcu from freeing memory while a sleepable
program is still between reading the pointer and acquiring a
reference.
Patch 1 migrates bpf_task_work_ctx from bpf_mem_alloc/bpf_mem_free to
kmalloc_nolock/kfree_rcu.
Patch 2 migrates bpf_dynptr_file_impl from bpf_mem_alloc/bpf_mem_free
to kmalloc_nolock/kfree.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v2:
- Switch to scoped_guard in patch 1 (Kumar)
- Remove rcu gp wait in patch 2 (Kumar)
- Defer to irq_work when irqs disabled in patch 1
- use bpf_map_kmalloc_nolock() for bpf_task_work
- use kmalloc_nolock() for file dynptr
- Link to v1: https://lore.kernel.org/all/20260325-kmalloc_special-v1-0-269666afb1ea@meta.com/
====================
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:57 +0000 (15:27 -0700)]
bpf: Migrate dynptr file to kmalloc_nolock
Replace bpf_mem_alloc/bpf_mem_free with kmalloc_nolock/kfree_nolock for
bpf_dynptr_file_impl, continuing the migration away from bpf_mem_alloc
now that kmalloc can be used from NMI context.
freader_cleanup() runs before kfree_nolock() while the dynptr still
holds exclusive access, so plain kfree_nolock() is safe — no concurrent
readers can access the object.
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:56 +0000 (15:27 -0700)]
bpf: Migrate bpf_task_work to kmalloc_nolock
Replace bpf_mem_alloc/bpf_mem_free with
kmalloc_nolock/kfree_rcu for bpf_task_work_ctx.
Replace guard(rcu_tasks_trace)() with guard(rcu)() in
bpf_task_work_irq(). The function only accesses ctx struct members
(not map values), so tasks trace protection is not needed - regular
RCU is sufficient since ctx is freed via kfree_rcu. The guard in
bpf_task_work_callback() remains as tasks trace since it accesses map
values from process context.
Sleepable BPF programs hold rcu_read_lock_trace but not
regular rcu_read_lock. Since kfree_rcu
waits for a regular RCU grace period, the ctx memory can be freed
while a sleepable program is still running. Add scoped_guard(rcu)
around the pointer read and refcount tryget in
bpf_task_work_acquire_ctx to close this race window.
Since kfree_rcu uses call_rcu internally which is not safe from
NMI context, defer destruction via irq_work when irqs are disabled.
For the lost-cmpxchg path the ctx was never published, so
kfree_nolock is safe.
MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer
Mario Limonciello has led amd-pstate maintenance in recent years and
has done excellent work. The amd-pstate driver is in good hands with
him. I am stepping down as co-maintainer as I move on to other things.
Add K Prateek Nayak as a reviewer. He has been actively contributing
to the driver including preferred-core and ITMT improvements, and has
been helping review amd-pstate patches for a while now.
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://lore.kernel.org/r/20260402102611.16519-1-gautham.shenoy@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
K Prateek Nayak [Mon, 16 Mar 2026 08:18:49 +0000 (08:18 +0000)]
cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
writer(s), however amd-pstate depends on fetching the cpudata via the
policy's driver data which necessitates grabbing the reference.
Since schedutil governor can call "cpufreq_driver->update_perf()"
during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
fetching the policy object using the cpufreq_cpu_get() helper in the
scheduler fast-path leads to "BUG: scheduling while atomic" on
PREEMPT_RT [1].
Pass the cached cpufreq policy object in sg_policy to the update_perf()
instead of just the CPU. The CPU can be inferred using "policy->cpu".
The lifetime of cpufreq_policy object outlasts that of the governor and
the cpufreq driver (allocated when the CPU is onlined and only reclaimed
when the CPU is offlined / the CPU device is removed) which makes it
safe to be referenced throughout the governor's lifetime.
K Prateek Nayak [Mon, 16 Mar 2026 08:18:48 +0000 (08:18 +0000)]
cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
All callers of amd_pstate_update() already have a reference to the
cpufreq_policy object.
Pass the entire policy object and grab the cpudata using
"policy->driver_data" instead of passing the cpudata and unnecessarily
grabbing another read-side reference to the cpufreq policy object when
it is already available in the caller.
No functional changes intended.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Link: https://lore.kernel.org/r/20260316081849.19368-2-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
====================
bpf: Fix abuse of kprobe_write_ctx via freplace
The potential issue of kprobe_write_ctx+freplace was mentioned in
"bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs" [1].
It is true issue, that the test in patch #2 verifies that kprobe_write_ctx=false
kprobe progs can be abused to modify struct pt_regs via kprobe_write_ctx=true
freplace progs.
When struct pt_regs is modified, bpf_prog_test_run_opts() gets -EFAULT instead
of 0.
Changes:
v2 -> v3:
* Add comment to the rejection of kprobe_write_ctx (per Jiri).
* Use libbpf_get_error() instead of errno in test (per Jiri).
* Collect Acked-by tags from Jiri and Song, thanks.
v2: https://lore.kernel.org/bpf/20260326141718.17731-1-leon.hwang@linux.dev/
v1 -> v2:
* Drop patch #1 in v1, as it wasn't an issue (per Toke).
* Check kprobe_write_ctx value at attach time instead of at load time, to
prevent attaching kprobe_write_ctx=true freplace progs on
kprobe_write_ctx=false kprobe progs (per Gemini/sashiko).
* Move kprobe_write_ctx test code to attach_probe.c and kprobe_write_ctx.c.
v1: https://lore.kernel.org/bpf/20260324150444.68166-1-leon.hwang@linux.dev/
====================
Leon Hwang [Tue, 31 Mar 2026 14:53:53 +0000 (22:53 +0800)]
selftests/bpf: Add test to verify the fix of kprobe_write_ctx abuse
Add a test to verify the issue: kprobe_write_ctx can be abused to modify
struct pt_regs of kernel functions via kprobe_write_ctx=true freplace
progs.
Without the fix, the issue is verified:
kprobe_write_ctx=true freplace prog is allowed to attach to
kprobe_write_ctx=false kprobe prog. Then, the first arg of
bpf_fentry_test1 will be set as 0, and bpf_prog_test_run_opts() gets
-EFAULT instead of 0.
With the fix, the issue is rejected at attach time.
Leon Hwang [Tue, 31 Mar 2026 14:53:52 +0000 (22:53 +0800)]
bpf: Fix abuse of kprobe_write_ctx via freplace
uprobe programs are allowed to modify struct pt_regs.
Since the actual program type of uprobe is KPROBE, it can be abused to
modify struct pt_regs via kprobe+freplace when the kprobe attaches to
kernel functions.
For example,
SEC("?kprobe")
int kprobe(struct pt_regs *regs)
{
return 0;
}
freplace_kprobe prog will attach to kprobe prog.
kprobe prog will attach to a kernel function.
Without this patch, when the kernel function runs, its first arg will
always be set as 0 via the freplace_kprobe prog.
To fix the abuse of kprobe_write_ctx=true via kprobe+freplace, disallow
attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.
Fixes: 7384893d970e ("bpf: Allow uprobe program to change context registers") Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260331145353.87606-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
cpufreq/amd-pstate: Add support for raw EPP writes
The energy performance preference field of the CPPC request MSR
supports values from 0 to 255, but the strings only offer 4 values.
The other values are useful for tuning the performance of some
workloads.
Add support for writing the raw energy performance preference value
to the sysfs file. If the last value written was an integer then
an integer will be returned. If the last value written was a string
then a string will be returned.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
cpufreq/amd-pstate: add kernel command line to override dynamic epp
Add `amd_dynamic_epp=enable` and `amd_dynamic_epp=disable` to override
the kernel configuration option `CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP`
locally.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>