Ryan Roberts [Tue, 3 Mar 2026 15:08:39 +0000 (15:08 +0000)]
randomize_kstack: Unify random source across arches
Previously different architectures were using random sources of
differing strength and cost to decide the random kstack offset. A number
of architectures (loongarch, powerpc, s390, x86) were using their
timestamp counter, at whatever the frequency happened to be. Other
arches (arm64, riscv) were using entropy from the crng via
get_random_u16().
There have been concerns that in some cases the timestamp counters may
be too weak, because they can be easily guessed or influenced by user
space. And get_random_u16() has been shown to be too costly for the
level of protection kstack offset randomization provides.
So let's use a common, architecture-agnostic source of entropy; a
per-cpu prng, seeded at boot-time from the crng. This has a few
benefits:
- We can remove choose_random_kstack_offset(); That was only there to
try to make the timestamp counter value a bit harder to influence
from user space [*].
- The architecture code is simplified. All it has to do now is call
add_random_kstack_offset() in the syscall path.
- The strength of the randomness can be reasoned about independently
of the architecture.
- Arches previously using get_random_u16() now have much faster
syscall paths, see below results.
[*] Additionally, this gets rid of some redundant work on s390 and x86.
Before this patch, those architectures called
choose_random_kstack_offset() under arch_exit_to_user_mode_prepare(),
which is also called for exception returns to userspace which were *not*
syscalls (e.g. regular interrupts). Getting rid of
choose_random_kstack_offset() avoids a small amount of redundant work
for the non-syscall cases.
In some configurations, add_random_kstack_offset() will now call
instrumentable code, so for a couple of arches, I have moved the call a
bit later to the first point where instrumentation is allowed. This
doesn't impact the efficacy of the mechanism.
There have been some claims that a prng may be less strong than the
timestamp counter if not regularly reseeded. But the prng has a period
of about 2^113. So as long as the prng state remains secret, it should
not be possible to guess. If the prng state can be accessed, we have
bigger problems.
Additionally, we are only consuming 6 bits to randomize the stack, so
there are only 64 possible random offsets. I assert that it would be
trivial for an attacker to brute force by repeating their attack and
waiting for the random stack offset to be the desired one. The prng
approach seems entirely proportional to this level of protection.
Performance data are provided below. The baseline is v6.18 with rndstack
on for each respective arch. (I)/(R) indicate statistically significant
improvement/regression. arm64 platform is AWS Graviton3 (m7g.metal).
x86_64 platform is AWS Sapphire Rapids (m7i.24xlarge):
I tested an earlier version of this change on x86 bare metal and it
showed a smaller but still significant improvement. The bare metal
system wasn't available this time around so testing was done in a VM
instance. I'm guessing the cost of rdtsc is higher for VMs.
Ryan Roberts [Tue, 3 Mar 2026 15:08:38 +0000 (15:08 +0000)]
randomize_kstack: Maintain kstack_offset per task
kstack_offset was previously maintained per-cpu, but this caused a
couple of issues. So let's instead make it per-task.
Issue 1: add_random_kstack_offset() and choose_random_kstack_offset()
expected and required to be called with interrupts and preemption
disabled so that it could manipulate per-cpu state. But arm64, loongarch
and risc-v are calling them with interrupts and preemption enabled. I
don't _think_ this causes any functional issues, but it's certainly
unexpected and could lead to manipulating the wrong cpu's state, which
could cause a minor performance degradation due to bouncing the cache
lines. By maintaining the state per-task those functions can safely be
called in preemptible context.
Issue 2: add_random_kstack_offset() is called before executing the
syscall and expands the stack using a previously chosen random offset.
choose_random_kstack_offset() is called after executing the syscall and
chooses and stores a new random offset for the next syscall. With
per-cpu storage for this offset, an attacker could force cpu migration
during the execution of the syscall and prevent the offset from being
updated for the original cpu such that it is predictable for the next
syscall on that cpu. By maintaining the state per-task, this problem
goes away because the per-task random offset is updated after the
syscall regardless of which cpu it is executing on.
Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Closes: https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@arm.com/ Cc: stable@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://patch.msgid.link/20260303150840.3789438-2-ryan.roberts@arm.com Signed-off-by: Kees Cook <kees@kernel.org>
Guangshuo Li [Mon, 23 Mar 2026 16:57:30 +0000 (00:57 +0800)]
net: mana: fix use-after-free in add_adev() error path
If auxiliary_device_add() fails, add_adev() jumps to add_fail and calls
auxiliary_device_uninit(adev).
The auxiliary device has its release callback set to adev_release(),
which frees the containing struct mana_adev. Since adev is embedded in
struct mana_adev, the subsequent fall-through to init_fail and access
to adev->id may result in a use-after-free.
Fix this by saving the allocated auxiliary device id in a local
variable before calling auxiliary_device_add(), and use that saved id
in the cleanup path after auxiliary_device_uninit().
Fixes: a69839d4327d ("net: mana: Add support for auxiliary device") Cc: stable@vger.kernel.org Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260323165730.945365-1-lgs201920130244@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bobby Eshleman [Tue, 24 Mar 2026 00:08:10 +0000 (17:08 -0700)]
selftests: drv-net: add missing tc config options for netkit tests
The NetDrvContEnv env context uses tc clsact qdiscs and BPF tc filters
for traffic redirection, but the kernel config options are missing from
the selftests config.
Without them, the tc qdisc installation trips on:
CMD: tc qdisc add dev enp1s0 clsact
EXIT: 2
STDERR: Error: Specified qdisc kind is unknown.
net.lib.py.utils.CmdExitFailure: Command failed
Add CONFIG_NET_CLS_ACT and CONFIG_NET_SCH_INGRESS to enable these tc
options.
Jonas Köppeler [Mon, 23 Mar 2026 17:49:20 +0000 (18:49 +0100)]
net_sched: codel: fix stale state for empty flows in fq_codel
When codel_dequeue() finds an empty queue, it resets vars->dropping
but does not reset vars->first_above_time. The reference CoDel
algorithm (Nichols & Jacobson, ACM Queue 2012) resets both:
dodeque_result codel_queue_t::dodeque(time_t now) {
...
if (r.p == NULL) {
first_above_time = 0; // <-- Linux omits this
}
...
}
Note that codel_should_drop() does reset first_above_time when called
with a NULL skb, but codel_dequeue() returns early before ever calling
codel_should_drop() in the empty-queue case. The post-drop code paths
do reach codel_should_drop(NULL) and correctly reset the timer, so a
dropped packet breaks the cycle -- but the next delivered packet
re-arms first_above_time and the cycle repeats.
For sparse flows such as ICMP ping (one packet every 200ms-1s), the
first packet arms first_above_time, the flow goes empty, and the
second packet arrives after the interval has elapsed and gets dropped.
The pattern repeats, producing sustained loss on flows that are not
actually congested.
Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the
consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms
target), ping at 5 pps under UDP flood:
Before fix: 26% ping packet loss
After fix: 0% ping packet loss
Fix by resetting first_above_time to zero in the empty-queue path
of codel_dequeue(), matching the reference algorithm.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Fixes: d068ca2ae2e6 ("codel: split into multiple files") Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reported-by: Chris Arges <carges@cloudflare.com> Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/all/20260318134826.1281205-7-hawk@kernel.org/ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260323174920.253526-1-hawk@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Mon, 23 Mar 2026 15:19:43 +0000 (16:19 +0100)]
rtnetlink: fix leak of SRCU struct in rtnl_link_register
Commit 6b57ff21a310 ("rtnetlink: Protect link_ops by mutex.") swapped
the EEXIST check with the init_srcu_struct, but didn't add cleanup of
the SRCU struct we just allocated in case of error.
net: lan743x: fix duplex configuration in mac_link_up
The driver does not explicitly configure the MAC duplex mode when
bringing the link up. As a result, the MAC may retain a stale duplex
setting from a previous link state, leading to duplex mismatches with
the link partner and degraded network performance.
Update lan743x_phylink_mac_link_up() to set or clear the MAC_CR_DPX_
bit according to the negotiated duplex mode.
This ensures the MAC configuration is consistent with the phylink
resolved state.
Fixes: a5f199a8d8a03 ("net: lan743x: Migrate phylib to phylink") Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260323065345.144915-1-thangaraj.s@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: dsa: mxl862xx: MDIO bus integrity and optimization
The MxL862xx firmware offers opt-in CRC validation on the MDIO/MMD
command interface to guard against bit errors on the bus.
The driver has not used this until now.
This series enables CRC protection on both the command registers (CRC-6)
and data payloads (CRC-16), and reduces MDIO bus traffic by bulk-zeroing
registers instead of writing zero-valued words individually.
====================
Daniel Golle [Sun, 22 Mar 2026 13:27:26 +0000 (13:27 +0000)]
net: dsa: mxl862xx: use RST_DATA to skip writing zero words
Issue the firmware's RST_DATA command before writing data payloads that
contain many zero words. RST_DATA zeroes both the firmware's internal
buffer and the MMD data registers in a single command, allowing the
driver to skip individual MDIO writes for zero-valued words. This
reduces bus traffic for the common case where API structs have many
unused or default-zero fields.
The optimization is applied when at least 5 zero words are found in the
payload, roughly the break-even point against the cost of the extra
RST_DATA command round-trip.
Daniel Golle [Sun, 22 Mar 2026 13:27:20 +0000 (13:27 +0000)]
net: dsa: mxl862xx: add CRC for MDIO communication
Enable the firmware's opt-in CRC validation on the MDIO/MMD command
interface to detect bit errors on the bus. The firmware bundles CRC-6
and CRC-16 under a single enable flag, so both are implemented
together.
CRC-6 protects the ctrl and len_ret command registers using a table-
driven 3GPP algorithm. It is applied to every command exchange
including SET_DATA/GET_DATA batch transfers. With CRC enabled, the
firmware encodes its return value as a signed 11-bit integer within
the CRC- protected register fields, replacing the previous 16-bit
interpretation.
CRC-16 protects the data payload using the kernel's crc16() library.
The driver appends a CRC-16 checksum to outgoing data and verifies the
firmware-appended checksum on responses. The checksum is placed at the
exact byte offset where the struct data ends, correctly handling
packed structs with odd sizes by splitting the checksum across word
boundaries. SET_DATA/GET_DATA sub-commands carry only CRC-6.
Upon detection of a CRC error on either side all conduit interfaces
are taken down, triggering all user ports to go down as well. This is
the most feasible option: CRC errors are likely caused either by
broken hardware, or are symptom of overheating. In either case, trying
to resume normal operation isn't reasonable.
xietangxin [Thu, 12 Mar 2026 02:54:06 +0000 (10:54 +0800)]
virtio_net: Fix UAF on dst_ops when IFF_XMIT_DST_RELEASE is cleared and napi_tx is false
A UAF issue occurs when the virtio_net driver is configured with napi_tx=N
and the device's IFF_XMIT_DST_RELEASE flag is cleared
(e.g., during the configuration of tc route filter rules).
When IFF_XMIT_DST_RELEASE is removed from the net_device, the network stack
expects the driver to hold the reference to skb->dst until the packet
is fully transmitted and freed. In virtio_net with napi_tx=N,
skbs may remain in the virtio transmit ring for an extended period.
If the network namespace is destroyed while these skbs are still pending,
the corresponding dst_ops structure has freed. When a subsequent packet
is transmitted, free_old_xmit() is triggered to clean up old skbs.
It then calls dst_release() on the skb associated with the stale dst_entry.
Since the dst_ops (referenced by the dst_entry) has already been freed,
a UAF kernel paging request occurs.
fix it by adds skb_dst_drop(skb) in start_xmit to explicitly release
the dst reference before the skb is queued in virtio_net.
config_qdisc_route_filter() {
tc qdisc del dev $NETDEV root
tc qdisc add dev $NETDEV root handle 1: prio
tc filter add dev $NETDEV parent 1:0 \
protocol ip prio 100 route to 100 flowid 1:1
ip route add 192.168.1.100/32 dev $NETDEV realm 100
}
test_ns() {
ip netns add testns
ip link set $NETDEV netns testns
ip netns exec testns ifconfig $NETDEV 10.0.32.46/24
ip netns exec testns ping -c 1 10.0.32.1
ip netns del testns
}
Gao Xiang [Tue, 24 Mar 2026 15:54:07 +0000 (23:54 +0800)]
erofs: fix .fadvise() for page cache sharing
Currently, .fadvise() doesn't work well if page cache sharing is on
since shared inodes belong to a pseudo fs generated with init_pseudo(),
and sb->s_bdi is the default one &noop_backing_dev_info.
Then, generic_fadvise() will just behave as a no-op if sb->s_bdi is
&noop_backing_dev_info, but as the bdev fs (the bdev fs changes
inode_to_bdi() instead), it's actually NOT a pure memfs.
Let's generate a real bdi for erofs_ishare_mnt instead.
Fixes: d86d7817c042 ("erofs: implement .fadvise for page cache share") Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Steven Rostedt [Tue, 24 Mar 2026 00:22:12 +0000 (20:22 -0400)]
ring-buffer: Show what clock function is used on timestamp errors
The testing for tracing was triggering a timestamp count issue that was
always off by one. This has been happening for some time but has never
been reported by anyone else. It was finally discovered to be an issue
with the "uptime" (jiffies) clock that happened to be traced and the
internal recursion caused the discrepancy. This would have been much
easier to solve if the clock function being used was displayed when the
error was detected.
The commit f35dbac69421 ("ring-buffer: Fix to update per-subbuf entries of
persistent ring buffer") was a fix and merged upstream. It is needed for
some other work in the ring buffer. The current branch has the remote
buffer code that is shared with the Arm64 subsystem and can't be rebased.
Merge in the upstream commit to allow continuing of the ring buffer work.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pengyu Luo [Thu, 26 Feb 2026 12:29:58 +0000 (20:29 +0800)]
drm/msm/dsi/phy: rename DSI_PHY_7NM_QUIRK_PRE_V4_1 to DSI_PHY_7NM_QUIRK_V4_0
The quirk flag DSI_PHY_7NM_QUIRK_PRE_V4_1 is renamed to
DSI_PHY_7NM_QUIRK_V4_0 to better reflect the actual hardware revision
it applies to. (Only SM8150 uses it, its hardware revision is 4.0)
Maíra Canal [Thu, 12 Mar 2026 21:34:23 +0000 (18:34 -0300)]
clk: bcm: rpi: Manage clock rate in prepare/unprepare callbacks
On current firmware versions, RPI_FIRMWARE_SET_CLOCK_STATE doesn't
actually power off the clock. To achieve meaningful power savings, the
clock rate must be set to the minimum before disabling. This might be
fixed in future firmware releases.
Rather than pushing rate management to clock consumers, handle it
directly in the clock framework's prepare/unprepare callbacks. In
unprepare, set the rate to the minimum before disabling the clock.
In prepare, for clocks marked with `maximize` (currently v3d),
restore the rate to the maximum after enabling.
Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Dmitry Baryshkov [Mon, 12 Jan 2026 03:23:31 +0000 (05:23 +0200)]
drm/msm/dpu: use full scale alpha in _dpu_crtc_setup_blend_cfg()
Both _dpu_crtc_setup_blend_cfg() and setup_blend_config_alpha()
callbacks embed knowledge about platform's alpha range (8-bit or
10-bit). Make _dpu_crtc_setup_blend_cfg() use full 16-bit values for
alpha and reduce alpha only in DPU-specific callbacks.
Dmitry Baryshkov [Mon, 12 Jan 2026 03:23:30 +0000 (05:23 +0200)]
drm/msm/dpu: simplify bg_alpha selection
In order to be more obvious in fg_alpha / bg_alpha handling during the
blending programming drop the default setting for background alpha value
and set it explicitly in all cases.
Create read-only debugfs entries for LOGINIT, LOGRM, and LOGINTR, which
are the three primary printf logging buffers from GSP-RM. LOGPMU will
be added at a later date, as it requires support for its RPC message
first.
This patch uses the `pin_init_scope` feature to create the entries.
`pin_init_scope` solves the lifetime issue over the `DEBUGFS_ROOT`
reference by delaying its acquisition until the time the entry is
actually initialized.
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Timur Tabi <ttabi@nvidia.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260319212658.2541610-7-ttabi@nvidia.com
[ Rebase onto Coherent<T> changes. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Timur Tabi [Thu, 19 Mar 2026 21:26:57 +0000 (16:26 -0500)]
gpu: nova-core: create debugfs root in module init
Create the 'nova_core' root debugfs entry when the driver loads.
Normally, non-const global variables need to be protected by a
mutex. Instead, we use unsafe code, as we know the entry is never
modified after the driver is loaded. This solves the lifetime
issue of the mutex guard, which would otherwise have required the
use of `pin_init_scope`.
Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260319212658.2541610-6-ttabi@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Timur Tabi [Thu, 19 Mar 2026 21:26:56 +0000 (16:26 -0500)]
gpu: nova-core: Replace module_pci_driver! with explicit module init
Replace the module_pci_driver! macro with an explicit module
initialization using the standard module! macro and InPlaceModule
trait implementation. No functional change intended, with the
exception that the driver now prints a message when loaded.
This change is necessary so that we can create a top-level "nova_core"
debugfs entry when the driver is loaded.
Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260319212658.2541610-5-ttabi@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Timur Tabi [Thu, 19 Mar 2026 21:26:55 +0000 (16:26 -0500)]
rust: dma: implement BinaryWriter for Coherent<[u8]>
Implement the BinaryWriter trait for Coherent<[u8]>, enabling DMA
coherent allocations to be exposed as readable binary files. The
implementation handles offset tracking and bounds checking, copying data
from the coherent allocation to userspace via write_dma().
Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260319212658.2541610-4-ttabi@nvidia.com
[ Rebase onto Coherent<T> changes. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Timur Tabi [Thu, 19 Mar 2026 21:26:54 +0000 (16:26 -0500)]
rust: uaccess: add write_dma() for copying from DMA buffers to userspace
Add UserSliceWriter::write_dma() to copy data from a Coherent<[u8]> to
userspace. This provides a safe interface for copying DMA buffer
contents to userspace without requiring callers to work with raw
pointers.
Because write_dma() and write_slice() have common code, factor that code
out into a helper function, write_raw().
The method handles bounds checking and offset calculation internally,
wrapping the unsafe copy_to_user() call.
Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Tested-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Eliot Courtney <ecourtney@nvidia.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260319212658.2541610-3-ttabi@nvidia.com
[ Rebase onto Coherent<T> changes; remove unnecessary turbofish from
cast(). - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Vignesh Raman [Tue, 10 Feb 2026 07:11:32 +0000 (12:41 +0530)]
drm/ci: uprev mesa
Uprev mesa to adapt to the latest changes in Mesa CI, such as:
- LAVA overlay-based firmware handling
- Container/job rule separation
- Removal of the python-artifacts job
- Use lava-job-submitter container to submit jobs
- Use of the Alpine container for LAVA jobs
- Various other CI improvements
- Remove bare-metal jobs and disable apq8016 and apq8096 jobs,
as these have been migrated to the Collabora LAVA farm
- Fix issues with rebase with external fixes branch
- Update expectation files
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Co-developed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Vignesh Raman [Tue, 10 Feb 2026 07:11:31 +0000 (12:41 +0530)]
drm/ci: i915: cml: update runner tag
asus-C436FA-Flip-hatch has fewer devices available in the LAVA lab and
drm-ci uses only 2 DUTs, causing tests to time out. Update drm-ci to
use puff instead of hatch so the tests can run on 5 DUTs.
Also increase parallel count for amly jobs to 3.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Vignesh Raman [Tue, 10 Feb 2026 07:11:30 +0000 (12:41 +0530)]
drm/ci: reduce sm8350-hdk parallel jobs from 4 to 2
The sm8350-hdk jobs are short and each test takes around 2–3 minutes and
the full job completes in about 10 minutes. Running 4 parallel jobs uses
4 devices at once, which is not needed. Set parallel to 2 to reduce
device usage.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
bpf: Fix variable length stack write over spilled pointers
Scrub slots if variable-offset stack write goes over spilled pointers.
Otherwise is_spilled_reg() may == true && spilled_ptr.type == NOT_INIT
and valid program is rejected by check_stack_read_fixed_off()
with obscure "invalid size of register fill" message.
Timur Tabi [Thu, 19 Mar 2026 21:26:53 +0000 (16:26 -0500)]
rust: device: add device name method
Add a name() method to the `Device` type, which returns a CStr that
contains the device name.
Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Gary Guo <gary@garyguo.net> Tested-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260319212658.2541610-2-ttabi@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
avivdaum [Sun, 15 Mar 2026 22:24:04 +0000 (00:24 +0200)]
fat: add KUnit tests for timestamp conversion helpers
Extend fat_test with coverage for FAT timestamp edge cases that are not
currently exercised.
Add tests for fat_time_unix2fat() clamping at the UTC boundaries and
when time_offset pushes an otherwise valid timestamp outside the FAT
date range. Also cover the NULL time_cs path used by existing callers,
and verify fat_truncate_atime() truncation across timezone offsets.
Shawn Lin [Thu, 12 Mar 2026 01:11:53 +0000 (09:11 +0800)]
arm64: dts: rockchip: Add mphy reset to ufshc node
The mphy reset signal is used to reset the physical adapter. Resetting
other components while leaving the mphy unreset may occasionally prevent
the UFS controller from successfully linking up with the device.
This addresses an intermittent hardware bug where the UFS link fails to
establish under specific timing conditions with certain chips. While
difficult to reproduce initially, this issue was consistently observed in
downstream testing and requires explicit mphy reset control for full
stability.
Anand Moon [Mon, 16 Mar 2026 07:33:55 +0000 (13:03 +0530)]
arm64: dts: rockchip: Enable PCIe CLKREQ# for RK3588 on Rock 5b-5bp-5t series
Add supports-clkreq and the corresponding pinmux configurations for PCIe
ASPM L1 substates on the Rock 5B, 5B+ and 5T.
The supports-clkreq flag informs the PCIe controller that the hardware
routing for the CLKREQ# sideband signal is present. This enables support
for PCIe ASPM (Active State Power Management) L1 substates, allowing for
better power efficiency.
arm64: dts: rockchip: add SD/eMMC aliases for ArmSom Sige5
Provide aliases for the SD and eMMC interfaces, so that the operating
system can assign stable interface names.
On Linux this is only relevant when booting without partition UUID
based root device identification, e.g. when booting without an
initramfs. In that case booting with e.g. root=/dev/mmcblk0p2 is
unreliable without this patch as the device numbers changed based
on device probe order.
arm64: dts: rockchip: Add SPDIF nodes to RK3576 device tree
Add support for all six SPDIF transmitters found in the RK3576.
The nodes have been taken over from the BSP kernel and checked
against the TRM (power domain descriptions from chapter 6.3.2,
addresses from "Table 1-1 Address Mapping", interrupt from
"Table 1-3 RK3576 Interrupt Connection List" (TRM numbers are
off by 32 due to SGI/PPI not being numbered separately). The
TRM lacks a proper clock tree, but fortunately are quite obvious
for the SPDIF IP.
Note, that the RK3576 also has 3 SPDIF receivers, which need their
own binding and are not handled in this patch.
A typical use case for the SPDIF transmitters is audio support for
the Displayport (DP) controller. DP requires inserting PCUV control
bits, which requires software support when using I2S. The SPDIF IP
can add it automatically and thus is preferred.
Gray Huang [Tue, 17 Mar 2026 09:07:31 +0000 (17:07 +0800)]
arm64: dts: rockchip: Add Khadas Edge 2L board
The Khadas Edge 2L is a single board computer based on the Rockchip
RK3576 SoC.
Add basic device tree support for this board. Currently, only eMMC
and UART are enabled, allowing the board to boot into a basic Linux
system via the serial console.
Shrikanth Hegde [Mon, 23 Mar 2026 19:36:30 +0000 (01:06 +0530)]
timers: Get this_cpu once while clearing the idle state
Calling smp_processor_id() on:
- In CONFIG_DEBUG_PREEMPT=y, if preemption/irq is disabled, then it does
not print any warning.
- In CONFIG_DEBUG_PREEMPT=n, it doesn't do anything apart from getting
__smp_processor_id
So with both CONFIG_DEBUG_PREEMPT=y/n, in preemption disabled section it is
better to cache the value. It saves a few cycles. Though tiny, repeated
adds up.
timer_clear_idle() is called with interrupts disabled. So cache the value
once.
David Carlier [Fri, 20 Mar 2026 07:26:45 +0000 (07:26 +0000)]
bpf: Use RCU-safe iteration in dev_map_redirect_multi() SKB path
The DEVMAP_HASH branch in dev_map_redirect_multi() uses
hlist_for_each_entry_safe() to iterate hash buckets, but this function
runs under RCU protection (called from xdp_do_generic_redirect_map()
in softirq context). Concurrent writers (__dev_map_hash_update_elem,
dev_map_hash_delete_elem) modify the list using RCU primitives
(hlist_add_head_rcu, hlist_del_rcu).
hlist_for_each_entry_safe() performs plain pointer dereferences without
rcu_dereference(), missing the acquire barrier needed to pair with
writers' rcu_assign_pointer(). On weakly-ordered architectures (ARM64,
POWER), a reader can observe a partially-constructed node. It also
defeats CONFIG_PROVE_RCU lockdep validation and KCSAN data-race
detection.
Replace with hlist_for_each_entry_rcu() using rcu_read_lock_bh_held()
as the lockdep condition, consistent with the rcu_dereference_check()
used in the DEVMAP (non-hash) branch of the same functions. Also fix
the same incorrect lockdep_is_held(&dtab->index_lock) condition in
dev_map_enqueue_multi(), where the lock is not held either.
Kit Dallege [Sun, 15 Mar 2026 17:09:41 +0000 (18:09 +0100)]
entry: Add missing kernel-doc for arch_ptrace_report_syscall functions
Document @regs and @step parameters for arch_ptrace_report_syscall_entry()
and arch_ptrace_report_syscall_exit() that were missing from the kernel-doc
comments.
Nylon Chen [Mon, 2 Mar 2026 03:04:34 +0000 (19:04 -0800)]
selftests/futex: Conditionally include libnuma support
Use LIBNUMA_TEST to conditionally add -lnuma to LDLIBS.
Guard numa header includes with #ifdef LIBNUMA_VER_SUFFICIENT
to allow compilation without libnuma installed.
selftests/bpf: verify_pkcs7_sig: Use 'struct module_signature' from the UAPI headers
Now that the UAPI headers provide the required definitions, use those.
Some symbols have been renamed, adapt to those.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
sign-file: use 'struct module_signature' from the UAPI headers
Now that the UAPI headers provide the required definitions, use those.
Some symbols have been renamed, adapt to those.
Also adapt the include path for the custom sign-file rule in the
bpf selftests.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
This header is going to be used from scripts/sign-file.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
This structure definition is used outside the kernel proper.
For example in kmod and the kernel build environment.
To allow reuse, move it to a new UAPI header.
While it is not a true UAPI, it is a common practice to have
non-UAPI interface definitions in the kernel's UAPI headers.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
module: Give MODULE_SIG_STRING a more descriptive name
The purpose of the constant it is not entirely clear from its name.
As this constant is going to be exposed in a UAPI header, give it a more
specific name for clarity. As all its users call it 'marker', use that
wording in the constant itself.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
module: Give 'enum pkey_id_type' a more specific name
This enum originates in generic cryptographic code and has a very
generic name. Nowadays it is only used for module signatures.
As this enum is going to be exposed in a UAPI header, give it a more
specific name for clarity and consistency.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Remove the unused enum values. As this enum is used in on-disk data,
preserve the numeric value.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
extract-cert: drop unused definition of PKEY_ID_PKCS7
This definition duplicates a definition from an internal kernel header
which is going to be renamed.
To get rid of an instance of the old name, drop the definition.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Jiaqi Yan [Wed, 4 Feb 2026 21:47:41 +0000 (21:47 +0000)]
fs: hugetlb: simplify remove_inode_hugepages() return type
When remove_inode_hugepages() was introduced in commit c86272287bc6
("hugetlb: create remove_inode_single_folio to remove single file folio")
it used to return a boolean to indicate if it bailed out due to race with
page faults. However, since the race is already solved by [1],
remove_inode_hugepages() doesn't have any path to return false anymore.
Simplify remove_inode_hugepages() return type to void, remove the
unnecessary ret variable, and adjust the call site in
remove_inode_hugepages(). No functional change in this commit.
Altan Hacigumus [Wed, 4 Feb 2026 03:35:53 +0000 (19:35 -0800)]
mm/shrinker: fix refcount leak in shrink_slab_memcg()
When kmem is disabled for memcg, slab-backed shrinkers are skipped.
However, shrink_slab_memcg() doesn't drop the reference acquired via
shrinker_try_get() before continuing.
Add the missing shrinker_put().
Also, since memcg_kmem_online() and shrinker flags cannot change
dynamically, remove the shrinker from the bitmap to avoid unnecessary
future scans.
qinyu [Tue, 3 Feb 2026 09:54:00 +0000 (17:54 +0800)]
mm/damon/ops-common: remove redudnant mmu notifier call in pmdp mkold
Currently, mmu_notifier_clear_young() is called immediately after
pmdp_clear_young_notify(), which already calls mmu_notifier_clear_young()
internally. This results in a redundant notifier call.
Replace pmdp_clear_young_notify() with the non-notify variant to avoid the
duplicate call and make the pmdp path consistent with the corresponding
ptep_mkold() code.
Shengming Hu [Thu, 29 Jan 2026 14:38:14 +0000 (22:38 +0800)]
mm/page_alloc: avoid overcounting bulk alloc in watermark check
alloc_pages_bulk_noprof() only fills NULL slots and already tracks how
many entries are pre-populated via nr_populated.
The fast watermark check was adding nr_pages unconditionally, which can
overestimate the demand. Use (nr_pages - nr_populated) instead, as an
upper bound on the remaining pages this call can still allocate without
scanning the whole array.
Link: https://lkml.kernel.org/r/tencent_F36C5B5FB4DED98C79D9BDEE1210CD338C06@qq.com Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Mon, 16 Feb 2026 14:58:02 +0000 (22:58 +0800)]
mm, swap: speed up hibernation allocation and writeout
Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"),
hibernation has been using the swap slot slow allocation path for
simplification, which turns out might cause regression for some devices
because the allocator now rotates clusters too often, leading to slower
allocation and more random distribution of data.
Fast allocation is not complex, so implement hibernation support as well.
Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the
performance is several times better [1]:
6.19: 324 seconds
After this series: 35 seconds
Link: https://lkml.kernel.org/r/20260216-hibernate-perf-v4-1-1ba9f0bf1ec9@tencent.com Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ Signed-off-by: Kairui Song <kasong@tencent.com> Fixes: 0ff67f990bd4 ("mm, swap: remove swap slot cache") Reported-by: Carsten Grohmann <mail@carstengrohmann.de> Closes: https://lore.kernel.org/linux-mm/20260206121151.dea3633d1f0ded7bbf49c22e@linux-foundation.org/ Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guoniu Zhou [Fri, 5 Dec 2025 09:07:46 +0000 (17:07 +0800)]
media: imx8mq-mipi-csi2: Add support for i.MX8ULP
The CSI-2 receiver in i.MX8ULP is almost same as i.MX8QXP/QM except
clocks and resets, so add compatible string for i.MX8ULP to handle
the difference and reuse platform data of i.MX8QXP/QM.
Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com> Link: https://patch.msgid.link/20251205-csi2_imx8ulp-v10-4-190cdadb20a3@nxp.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Guoniu Zhou [Fri, 5 Dec 2025 09:07:45 +0000 (17:07 +0800)]
media: imx8mq-mipi-csi2: Explicitly release reset
Call reset_control_deassert() to explicitly release reset to make sure
reset bits are cleared since platform like i.MX8ULP can't clear reset
bits automatically.
Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com> Link: https://patch.msgid.link/20251205-csi2_imx8ulp-v10-3-190cdadb20a3@nxp.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
The CSI-2 receiver in the i.MX8ULP is almost identical to the version
present in the i.MX8QXP/QM, but i.MX8ULP CSI-2 controller needs pclk
clock as the input clock for its APB interface of Control and Status
register(CSR). So add compatible string fsl,imx8ulp-mipi-csi2 and
increase maxItems of Clocks (clock-names) to 4 from 3. And keep the
same restriction for existing compatible.
Reviewed-by: Frank Li <Frank.Li@nxp.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com> Link: https://patch.msgid.link/20251205-csi2_imx8ulp-v10-1-190cdadb20a3@nxp.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Guoniu Zhou [Wed, 5 Nov 2025 05:55:12 +0000 (13:55 +0800)]
media: nxp: imx8-isi: Add ISI support for i.MX95
The ISI module on i.MX95 supports up to eight channels and four link
sources to obtain the image data for processing in its pipelines. It
can process up to eight image sources at the same time.
Add ISI basic functions support for i.MX95.
Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patch.msgid.link/20251105-isi_imx95-v3-3-3987533cca1c@nxp.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Guoniu Zhou [Wed, 5 Nov 2025 05:55:11 +0000 (13:55 +0800)]
media: nxp: imx8-isi: Keep the default value for BLANK_PXL field
The field BLANK_PXL provides the value of the blank pixel to be inserted
in the image in case an overflow error occurs in the output buffers of
the channel. Its default value is 0xff, so no need to set again.
Besides, the field only exist in i.MX8QM/XP ISI version. Other versions
like i.MX 8M series, remove the field since it won't send data to AXI bus
when overflow error occurs and mark BLANK_PXL as reserved. i.MX9 series
use it for other purposes.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com> Link: https://patch.msgid.link/20251105-isi_imx95-v3-2-3987533cca1c@nxp.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
The ISI module on i.MX95 supports up to eight channels and four link
sources to obtain the image data for processing in its pipelines. It
can process up to eight image sources at the same time.
Guoniu Zhou [Thu, 12 Mar 2026 03:12:34 +0000 (11:12 +0800)]
media: nxp: imx8-isi: Reduce minimum queued buffers from 2 to 0
Fix a hang issue when capturing a single frame with applications like cam
in libcamera. It would hang waiting for the driver to complete the buffer,
but streaming never starts because min_queued_buffers was set to 2.
The ISI module uses a ping-pong buffer mechanism that requires two buffers
to be programmed at all times. However, when fewer than 2 user buffers are
available, the driver use internal discard buffers to fill the remaining
slot(s). Reduce minimum queued buffers from 2 to 0 allows streaming to
start without any queued buffers.
Stefan Klug [Wed, 4 Mar 2026 15:50:25 +0000 (16:50 +0100)]
media: dw100: Merge dw100_device_run and dw100_start
The dw100_start() function is only called from dw100_device_run(). As
both functions are not too big, move the code directly into
dw100_device_run() and drop dw100_start() to improve readability.