Kan Liang [Fri, 27 Mar 2026 05:28:44 +0000 (13:28 +0800)]
perf/x86/msr: Make SMI and PPERF on by default
The MSRs, SMI_COUNT and PPERF, are model-specific MSRs. A very long
CPU ID list is maintained to indicate the supported platforms. With more
and more platforms being introduced, new CPU IDs have to be kept adding.
Also, the old kernel has to be updated to apply the new CPU ID.
The MSRs have been introduced for a long time. There is no plan to
change them in the near future. Furthermore, the current code utilizes
rdmsr_safe() to check the availability of MSRs before using it.
Make them on by default. It should be good enough to only rely on the
rdmsr_safe() to check their availability for both existing and future
platforms.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260327052844.818218-1-dapeng1.mi@linux.intel.com
Vincent Guittot [Tue, 31 Mar 2026 16:23:52 +0000 (18:23 +0200)]
sched/fair: Prevent negative lag increase during delayed dequeue
Delayed dequeue feature aims to reduce the negative lag of a dequeued
task while sleeping but it can happens that newly enqueued tasks will
move backward the avg vruntime and increase its negative lag.
When the delayed dequeued task wakes up, it has more neg lag compared
to being dequeued immediately or to other tasks that have been
dequeued just before theses new enqueues.
Ensure that the negative lag of a delayed dequeued task doesn't
increase during its delayed dequeued phase while waiting for its neg
lag to diseappear. Similarly, we remove any positive lag that the
delayed dequeued task could have gain during thsi period.
Short slice tasks are particularly impacted in overloaded system.
Vincent Guittot [Fri, 27 Mar 2026 13:20:13 +0000 (14:20 +0100)]
sched/fair: Use sched_energy_enabled()
Use helper sched_energy_enabled() everywhere we want to test if EAS is
enabled instead of mixing sched_energy_enabled() and direct call to
static_branch_unlikely().
Add logic to handle migrating a blocked waiter to a remote
cpu where the lock owner is runnable.
Additionally, as the blocked task may not be able to run
on the remote cpu, add logic to handle return migration once
the waiting task is given the mutex.
Because tasks may get migrated to where they cannot run, also
modify the scheduling classes to avoid sched class migrations on
mutex blocked tasks, leaving find_proxy_task() and related logic
to do the migrations and return migrations.
This was split out from the larger proxy patch, and
significantly reworked.
Credits for the original patch go to:
Peter Zijlstra (Intel) <peterz@infradead.org>
Juri Lelli <juri.lelli@redhat.com>
Valentin Schneider <valentin.schneider@arm.com>
Connor O'Brien <connoro@google.com>
John Stultz [Tue, 24 Mar 2026 19:13:24 +0000 (19:13 +0000)]
sched: Move attach_one_task and attach_task helpers to sched.h
The fair scheduler locally introduced attach_one_task() and
attach_task() helpers, but these could be generically useful so
move this code to sched.h so we can use them elsewhere.
One minor tweak made to utilize guard(rq_lock)(rq) to simplifiy
the function.
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://patch.msgid.link/20260324191337.1841376-10-jstultz@google.com
John Stultz [Tue, 24 Mar 2026 19:13:23 +0000 (19:13 +0000)]
sched: Add logic to zap balance callbacks if we pick again
With proxy-exec, a task is selected to run via pick_next_task(),
and then if it is a mutex blocked task, we call find_proxy_task()
to find a runnable owner. If the runnable owner is on another
cpu, we will need to migrate the selected donor task away, after
which we will pick_again can call pick_next_task() to choose
something else.
However, in the first call to pick_next_task(), we may have
had a balance_callback setup by the class scheduler. After we
pick again, its possible pick_next_task_fair() will be called
which calls sched_balance_newidle() and sched_balance_rq().
This is because if a RT task was originally picked, it will
setup the rq->balance_callback with push_rt_tasks() via
set_next_task_rt().
Once the task is migrated away and we pick again, we haven't
processed any balance callbacks, so rq->balance_callback is not
in the same state as it was the first time pick_next_task was
called.
To handle this, add a zap_balance_callbacks() helper function
which cleans up the balance callbacks without running them. This
should be ok, as we are effectively undoing the state set in
the first call to pick_next_task(), and when we pick again,
the new callback can be configured for the donor task actually
selected.
John Stultz [Tue, 24 Mar 2026 19:13:22 +0000 (19:13 +0000)]
sched: Add assert_balance_callbacks_empty helper
With proxy-exec utilizing pick-again logic, we can end up having
balance callbacks set by the preivous pick_next_task() call left
on the list.
So pull the warning out into a helper function, and make sure we
check it when we pick again.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://patch.msgid.link/20260324191337.1841376-8-jstultz@google.com
John Stultz [Tue, 24 Mar 2026 19:13:21 +0000 (19:13 +0000)]
sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration
As we add functionality to proxy execution, we may migrate a
donor task to a runqueue where it can't run due to cpu affinity.
Thus, we must be careful to ensure we return-migrate the task
back to a cpu in its cpumask when it becomes unblocked.
Peter helpfully provided the following example with pictures:
"Suppose we have a ww_mutex cycle:
Where Task-A holds Mutex-1 and tries to acquire Mutex-2, and
where Task-B holds Mutex-2 and tries to acquire Mutex-1.
Then the blocked_on->owner chain will go in circles.
Task-A -> Mutex-2
^ |
| v
Mutex-1 <- Task-B
We need two things:
- find_proxy_task() to stop iterating the circle;
- the woken task to 'unblock' and run, such that it can
back-off and re-try the transaction.
Now, the current code [without this patch] does:
__clear_task_blocked_on();
wake_q_add();
And surely clearing ->blocked_on is sufficient to break the
cycle.
Suppose it is Task-B that is made to back-off, then we have:
Task-A -> Mutex-2 -> Task-B (no further blocked_on)
and it would attempt to run Task-B. Or worse, it could directly
pick Task-B and run it, without ever getting into
find_proxy_task().
Now, here is a problem because Task-B might not be runnable on
the CPU it is currently on; and because !task_is_blocked() we
don't get into the proxy paths, so nobody is going to fix this
up.
Ideally we would have dequeued Task-B alongside of clearing
->blocked_on, but alas, [the lock ordering prevents us from
getting the task_rq_lock() and] spoils things."
Thus we need more than just a binary concept of the task being
blocked on a mutex or not.
So allow setting blocked_on to PROXY_WAKING as a special value
which specifies the task is no longer blocked, but needs to
be evaluated for return migration *before* it can be run.
This will then be used in a later patch to handle proxy
return-migration.
John Stultz [Tue, 24 Mar 2026 19:13:20 +0000 (19:13 +0000)]
sched: Fix modifying donor->blocked on without proper locking
Introduce an action enum in find_proxy_task() which allows
us to handle work needed to be done outside the mutex.wait_lock
and task.blocked_lock guard scopes.
This ensures proper locking when we clear the donor's blocked_on
pointer in proxy_deactivate(), and the switch statement will be
useful as we add more cases to handle later in this series.
John Stultz [Tue, 24 Mar 2026 19:13:19 +0000 (19:13 +0000)]
locking: Add task::blocked_lock to serialize blocked_on state
So far, we have been able to utilize the mutex::wait_lock
for serializing the blocked_on state, but when we move to
proxying across runqueues, we will need to add more state
and a way to serialize changes to this state in contexts
where we don't hold the mutex::wait_lock.
So introduce the task::blocked_lock, which nests under the
mutex::wait_lock in the locking order, and rework the locking
to use it.
John Stultz [Tue, 24 Mar 2026 19:13:18 +0000 (19:13 +0000)]
sched: Fix potentially missing balancing with Proxy Exec
K Prateek pointed out that with Proxy Exec, we may have cases
where we context switch in __schedule(), while the donor remains
the same. This could cause balancing issues, since the
put_prev_set_next() logic short-cuts if (prev == next). With
proxy-exec prev is the previous donor, and next is the next
donor. Should the donor remain the same, but different tasks are
picked to actually run, the shortcut will have avoided enqueuing
the sched class balance callback.
So, if we are context switching, add logic to catch the
same-donor case, and trigger the put_prev/set_next calls to
ensure the balance callbacks get enqueued.
Closes: https://lore.kernel.org/lkml/20ea3670-c30a-433b-a07f-c4ff98ae2379@amd.com/ Reported-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260324191337.1841376-4-jstultz@google.com
Peter noted: Compilers are really bad (as in they utterly refuse)
optimizing (even when marked with __pure) the static branch
things, and will happily emit multiple identical in a row.
So pull out the one obvious sched_proxy_exec() branch in
__schedule() and remove some of the 'implicit' ones in that
path.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://patch.msgid.link/20260324191337.1841376-3-jstultz@google.com
John Stultz [Tue, 24 Mar 2026 19:13:16 +0000 (19:13 +0000)]
sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr()
With proxy-execution, the scheduler selects the donor, but for
blocked donors, we end up running the lock owner.
This caused some complexity, because the class schedulers make
sure to remove the task they pick from their pushable task
lists, which prevents the donor from being migrated, but there
wasn't then anything to prevent rq->curr from being migrated
if rq->curr != rq->donor.
This was sort of hacked around by calling proxy_tag_curr() on
the rq->curr task if we were running something other then the
donor. proxy_tag_curr() did a dequeue/enqueue pair on the
rq->curr task, allowing the class schedulers to remove it from
their pushable list.
The dequeue/enqueue pair was wasteful, and additonally K Prateek
highlighted that we didn't properly undo things when we stopped
proxying, leaving the lock owner off the pushable list.
After some alternative approaches were considered, Peter
suggested just having the RT/DL classes just avoid migrating
when task_on_cpu().
So rework pick_next_pushable_dl_task() and the rt
pick_next_pushable_task() functions so that they skip over the
first pushable task if it is on_cpu.
Then just drop all of the proxy_tag_curr() logic.
Fixes: be39617e38e0 ("sched: Fix proxy/current (push,pull)ability") Closes: https://lore.kernel.org/lkml/e735cae0-2cc9-4bae-b761-fcb082ed3e94@amd.com/ Reported-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260324191337.1841376-2-jstultz@google.com
The newly added serdev code fails to link when serdev is turned off:
arm-linux-gnueabi-ld: drivers/power/sequencing/pwrseq-pcie-m2.o: in function `pwrseq_pcie_m2_remove_serdev':
pwrseq-pcie-m2.c:(.text+0xc8): undefined reference to `serdev_device_remove'
arm-linux-gnueabi-ld: drivers/power/sequencing/pwrseq-pcie-m2.o: in function `pwrseq_m2_pcie_notify':
pwrseq-pcie-m2.c:(.text+0x69c): undefined reference to `of_find_serdev_controller_by_node'
arm-linux-gnueabi-ld: pwrseq-pcie-m2.c:(.text+0x6f8): undefined reference to `serdev_device_alloc'
arm-linux-gnueabi-ld: pwrseq-pcie-m2.c:(.text+0x724): undefined reference to `serdev_device_add'
power: sequencing: pcie-m2: enforce PCI and OF dependencies
The driver fails to build when PCI is disabled:
drivers/power/sequencing/pwrseq-pcie-m2.c: In function 'pwrseq_pcie_m2_register_notifier':
drivers/power/sequencing/pwrseq-pcie-m2.c:368:54: error: 'pci_bus_type' undeclared (first use in this function); did you mean 'pci_pcie_type'?
368 | ret = bus_register_notifier(&pci_bus_type, &ctx->nb);
| ^~~~~~~~~~~~
| pci_pcie_type
Similarly, when CONFIG_OF is disabled:
drivers/power/sequencing/pwrseq-pcie-m2.c: In function 'pwrseq_m2_pcie_create_bt_node':
drivers/power/sequencing/pwrseq-pcie-m2.c:191:9: error: implicit declaration of function 'of_changeset_init' [-Wimplicit-function-declaration]
191 | of_changeset_init(ctx->ocs);
| ^~~~~~~~~~~~~~~~~
Make both dependencies unconditional to prevent compile-testing
in either configuration.
Maciej Strozek [Fri, 3 Apr 2026 08:23:35 +0000 (09:23 +0100)]
ASoC: intel: sof_sdw: Prepare for configuration without a jack
In certain setups of cs42l43 UAJ function may be removed from ACPI and
physically unconnected. Prepare a driver for that configuration by
setting a system clock in the speaker path too.
Documentation: clarify the mandatory and desirable info for security reports
A significant part of the effort of the security team consists in begging
reporters for patch proposals, or asking them to provide them in regular
format, and most of the time they're willing to provide this, they just
didn't know that it would help. So let's add a section detailing the
required and desirable contents in a security report to help reporters
write more actionable reports which do not require round trips.
Documentation: explain how to find maintainers addresses for security reports
These days, 80% of the work done by the security team consists in
locating the affected subsystem in a report, running get_maintainers on
it, forwarding the report to these persons and responding to the reporter
with them in Cc. This is a huge and unneeded overhead that we must try to
lower for a better overall efficiency. This patch adds a complete section
explaining how to figure the list of recipients to send the report to.
Documentation: minor updates to the security contacts
This clarifies the fact that the bug reporters must use a valid
e-mail address to send their report, and that the security team
assists developers working on a fix but doesn't always produce
fixes on its own.
Mingzhe Zou [Fri, 3 Apr 2026 04:21:35 +0000 (12:21 +0800)]
bcache: fix uninitialized closure object
In the previous patch ("bcache: fix cached_dev.sb_bio use-after-free and
crash"), we adopted a simple modification suggestion from AI to fix the
use-after-free.
But in actual testing, we found an extreme case where the device is
stopped before calling bch_write_bdev_super().
At this point, struct closure sb_write has not been initialized yet.
For this patch, we ensure that sb_bio has been completed via
sb_write_mutex.
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: Coly Li <colyli@fnnas.com> Link: https://patch.msgid.link/20260403042135.2221247-1-colyli@fnnas.com Fixes: fec114a98b87 ("bcache: fix cached_dev.sb_bio use-after-free and crash") Signed-off-by: Jens Axboe <axboe@kernel.dk>
After analyzing the coredump file, we found that the address of
dc->sb_bio has been freed. We know that cached_dev is only freed when it
is stopped.
Since sb_bio is a part of struct cached_dev, rather than an alloc every
time. If the device is stopped while writing to the superblock, the
released address will be accessed at endio.
This patch hopes to wait for sb_write to complete in cached_dev_free.
It should be noted that we analyzed the cause of the problem, then tell
all details to the QWEN and adopted the modifications it made.
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Fixes: cafe563591446 ("bcache: A block layer cache") Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Coly Li <colyli@fnnas.com> Link: https://patch.msgid.link/20260322134102.480107-1-colyli@fnnas.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace sprintf() with sysfs_emit() in sysfs show functions.
sysfs_emit() is preferred for formatting sysfs output because it
provides safer bounds checking.
warning: variables can be used directly in the `format!` string
--> rust/macros/module.rs:112:23
|
112 | let content = format!("{param}:{content}", param = param, content = content);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
= note: `-W clippy::uninlined-format-args` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::uninlined_format_args)]`
help: change this to
|
112 - let content = format!("{param}:{content}", param = param, content = content);
112 + let content = format!("{param}:{content}");
warning: variables can be used directly in the `format!` string
--> rust/macros/module.rs:198:14
|
198 | t => panic!("Unsupported parameter type {}", t),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
= note: `-W clippy::uninlined-format-args` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::uninlined_format_args)]`
help: change this to
|
198 - t => panic!("Unsupported parameter type {}", t),
198 + t => panic!("Unsupported parameter type {t}"),
|
The reason it only triggers in that version is that the lint was moved
from `pedantic` to `style` in Rust 1.88.0 and then back to `pedantic`
in Rust 1.89.0 [2][3].
In the first case, the suggestion is fair and a pure simplification, thus
we will clean it up separately.
To keep the behavior the same across all versions, and since the lint
does not work for all macros (e.g. custom ones like `pr_info!`), disable
it globally.
Alice Ryhl [Thu, 2 Apr 2026 10:55:34 +0000 (10:55 +0000)]
rust_binder: override crate name to rust_binder
The Rust Binder object file is called rust_binder_main.o because the
name rust_binder.o is used for the result of linking together
rust_binder_main.o with rust_binderfs.o and a few others.
However, the crate name is supposed to be rust_binder without a _main
suffix. Thus, override the crate name accordingly.
Alice Ryhl [Thu, 2 Apr 2026 10:55:33 +0000 (10:55 +0000)]
rust: support overriding crate_name
Currently you cannot filter out the crate-name argument
RUSTFLAGS_REMOVE_stem.o because the Rust filter-out invocation does not
include that particular argument. Since --crate-name is an argument that
can't be passed multiple times, this means that it's currently not
possible to override the crate name. Thus, remove the --crate-name
argument for drivers. This allows them to override the crate name using
the #![crate_name] annotation.
This affects symbol names, but has no effect on the filenames of object
files and other things generated by the build, as we always use --emit
with a fixed output filename.
The --crate-name argument is kept for the crates under rust/ for
simplicity and to avoid changing many of them by adding #![crate_name].
The rust analyzer script is updated to use rustc to obtain the crate
name of the driver crates, which picks up the right name whether it is
configured via #![crate_name] or not. For readability, the logic to
invoke 'rustc' is extracted to its own function.
Note that the crate name in the python script is not actually that
important - the only place where the name actually affects anything is
in the 'deps' array which specifies an index and name for each
dependency, and determines what that dependency is called in *this*
crate. (The same crate may be called different things in each
dependency.) Since driver crates are leaf crates, this doesn't apply and
the rustc invocation only affects the 'display_name' parameter.
Acked-by: Gary Guo <gary@garyguo.net> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Jesung Yang <y.jems.n@gmail.com> Acked-by: Tamir Duberstein <tamird@kernel.org> Link: https://patch.msgid.link/20260402-binder-crate-name-v4-1-ec3919b87909@google.com
[ Applied Python type hints. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Alice Ryhl [Mon, 23 Feb 2026 10:08:26 +0000 (10:08 +0000)]
tyr: remove impl Send/Sync for TyrData
Now that clk implements Send and Sync, we no longer need to manually
implement these traits for TyrData. Thus remove the implementations.
The comment also mentions the regulator. However, the regulator had the
traits added in commit 9a200cbdb543 ("rust: regulator: implement Send
and Sync for Regulator<T>"), which is already in mainline.
Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260223-clk-send-sync-v5-2-181bf2f35652@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Alice Ryhl [Mon, 23 Feb 2026 10:08:25 +0000 (10:08 +0000)]
rust: clk: implement Send and Sync
These traits are required for drivers to embed the Clk type in their own
data structures because driver data structures are usually required to
be Send. Since the Clk type is thread-safe, implement the relevant
traits.
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Brian Masney <bmasney@redhat.com> # Active contributor to clk Link: https://patch.msgid.link/20260223-clk-send-sync-v5-1-181bf2f35652@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Given that the GPIOs are successfully found a bit later during boot and
the code is intentionally returning -EPROBE_DEFER when they are not
found, downgrade these messages to debug prints to avoid unnecessary
warnings being observed.
Note that although the 'cannot find GPIO line' warning has not been
observed in this case, it seems reasonable to make this print a debug
print for consistency too.
Dave Airlie [Fri, 3 Apr 2026 09:05:46 +0000 (19:05 +1000)]
Merge tag 'drm-misc-fixes-2026-04-02' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A refcounting fix for bridges, revert a previous framebuffer
use-after-free fix that turned out to be causing more problems, a hang
fix for qaic, an initialization fix for ast, a error handling fix for
sysfb, and a speculation fix for drm_compat_ioctl.
Dave Airlie [Fri, 3 Apr 2026 08:56:58 +0000 (18:56 +1000)]
Merge tag 'drm-msm-next-2026-04-02' of https://gitlab.freedesktop.org/drm/msm into drm-next
Changes for v7.1
CI:
- Uprev mesa
- Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices
Core:
- Switched to of_get_available_child_by_name()
DPU:
- Fixes for DSC panels
- Fixed brownout because of the frequency / OPP mismatch
- Quad pipe preparation (not enabled yet)
- Switched to virtual planes by default
- Dropped VBIF_NRT support
- Added support for Eliza platform
- Reworked alpha handling
- Switched to correct CWB definitions on Eliza
- Dropped dummy INTF_0 on MSM8953
- Corrected INTFs related to DP-MST
DP:
- Removed debug prints looking into PHY internals
DSI:
- Fixes for DSC panels
- RGB101010 support
- Support for SC8280XP
- Moved PHY bindings from display/ to phy/
GPU:
- Preemption support for x2-85 and a840
- IFPC support for a840
- SKU detection support for x2-85 and a840
- Expose AQE support (VK ray-pipeline)
- Avoid locking in VM_BIND fence signaling path
- Fix to avoid reclaim in GPU snapshot path
- Disallow foreign mapping of _NO_SHARE BOs
- Couple a6xx gpu snapshot fixes
- Various other fixes
HDMI:
- Fixed infoframes programming
MDP5:
- Dropped support for MSM8974v1
- Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998
which does an indirect jump to a location stored in Rx. The
register Rx should have type PTR_TO_INSN. This new type ensures
that the Rx register contains a value (or a range of values)
loaded from a correct jump table – map of type instruction array.
Support indirect jump to all registers in powerpc64 JIT using
the ctr register. Move Rx content to ctr register, then invoke
bctr instruction to branch to address stored in ctr register.
Skip save and restore of TOC as the jump is always within the
program context.
On loading the BPF program, the verifier might adjust/omit some
instructions. The adjusted instruction offset is accounted in the
map containing original instruction -> xlated mapping. This patch
add ppc64 JIT support to additionally build the xlated->jitted
mapping for every instruction present in instruction array. This
change is needed to enable support for indirect jumps, added in a
subsequent patch.
Invoke bpf_prog_update_insn_ptrs() with offset pair of xlated_offset
and jited_offset. The offset mapping is already available, which is
being used for bpf_prog_fill_jited_linfo() and can be directly used
for bpf_prog_update_insn_ptrs() as well.
Additional details present at:
commit b4ce5923e780 ("bpf, x86: add new map type: instructions array")
Extend JIT support of fsession in powerpc64 trampoline, since
ppc64 and ppc32 shares common trampoline implementation.
Arch specific helpers handle 64-bit data copy using 32 bit regs.
Need to validate fsession support along with trampoline support.
Implement JIT support for fsession in powerpc64 trampoline.
The trampoline stack now accommodate session cookies and
function metadata in place of function argument. fentry/fexit
programs consume corresponding function metadata. This mirrors
existing x86 behavior and enable session cookies on powerpc64.
powerpc64/bpf: Implement JIT support for private stack
Provision the private stack as a per-CPU allocation during
bpf_int_jit_compile(). Align the stack to 16 bytes and place guard
regions at both ends to detect runtime stack overflow and underflow.
Round the private stack size up to the nearest 16-byte boundary.
Make each guard region 16 bytes to preserve the required overall
16-byte alignment. When private stack is set, skip bpf stack size
accounting in kernel stack.
There is no stack pointer in powerpc. Stack referencing during JIT
is done using frame pointer. Frame pointer calculation goes like:
BPF frame pointer = Priv stack allocation start address +
Overflow guard +
Actual stack size defined by verifier
Update BPF_REG_FP to point to the calculated offset within the
allocated private stack buffer. Now, BPF stack usage reference
in the allocated private stack.
Dave Airlie [Fri, 3 Apr 2026 08:31:22 +0000 (18:31 +1000)]
Merge tag 'drm-intel-fixes-2026-04-02' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix for #12045: Huawei Matebook E (DRR-WXX): Persistent Black Screen on Boot with i915 and Gen11: Modesetting and Backlight Control Malfunction
- Fix for #15826: i915: Raptor Lake-P [UHD Graphics] display flicker/corruption on eDP panel
- Use crtc_state->enhanced_framing properly on ivb/hsw CPU eDP
Boris Brezillon [Fri, 20 Mar 2026 15:19:13 +0000 (16:19 +0100)]
drm/shmem_helper: Make sure PMD entries get the writeable upgrade
Unlike PTEs which are automatically upgraded to writeable entries if
.pfn_mkwrite() returns 0, the PMD upgrades go through .huge_fault(),
and we currently pretend to have handled the make-writeable request
even though we only ever map things read-only. Make sure we pass the
proper "write" info to vmf_insert_pfn_pmd() in that case.
This also means we have to record the mkwrite event in the .huge_fault()
path now. Move the dirty tracking logic to a
drm_gem_shmem_record_mkwrite() helper so it can also be called from
drm_gem_shmem_pfn_mkwrite().
Note that this wasn't a problem before commit 28e3918179aa
("drm/gem-shmem: Track folio accessed/dirty status in mmap"), because
the pgprot were not lowered to read-only before this commit (see the
vma_wants_writenotify() in vma_set_page_prot()).
Fixes: 28e3918179aa ("drm/gem-shmem: Track folio accessed/dirty status in mmap") Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Reviewed-by: Loïc Molinari <loic.molinari@collabora.com> Tested-by: Biju Das <biju.das.jz@bp.renesas.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Link: https://patch.msgid.link/20260320151914.586945-1-boris.brezillon@collabora.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Cássio Gabriel [Fri, 3 Apr 2026 03:21:34 +0000 (00:21 -0300)]
ALSA: hda: Notify IEC958 Default PCM switch state changes
The "IEC958 Default PCM Playback Switch" control is backed directly by
mout->share_spdif. The share-switch callbacks currently access that state
without serialization, and spdif_share_sw_put() always returns 0, so
normal userspace writes never emit the standard ALSA control value
notification.
snd_hda_multi_out_analog_open() may also clear mout->share_spdif when the
analog PCM capabilities and the SPDIF capabilities no longer intersect.
That fallback is still needed to avoid creating an impossible hw
constraint set, but it changes the mixer backing value without notifying
subscribers.
Protect the share-switch callbacks with spdif_mutex like the other SPDIF
control handlers, return the actual change value from spdif_share_sw_put(),
and notify the cached control when the open path forcibly disables
shared SPDIF mode after dropping spdif_mutex.
This keeps the existing auto-disable behavior while making switch state
changes visible to userspace.
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix register equivalence for pointers to packet (Alexei Starovoitov)
- Fix incorrect pruning due to atomic fetch precision tracking (Daniel
Borkmann)
- Fix grace period wait for bpf_link-ed tracepoints (Kumar Kartikeya
Dwivedi)
- Fix use-after-free of sockmap's sk->sk_socket (Kuniyuki Iwashima)
- Reject direct access to nullable PTR_TO_BUF pointers (Qi Tang)
- Reject sleepable kprobe_multi programs at attach time (Varun R
Mallya)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add more precision tracking tests for atomics
bpf: Fix incorrect pruning due to atomic fetch precision tracking
bpf: Reject sleepable kprobe_multi programs at attach time
bpf: reject direct access to nullable PTR_TO_BUF pointers
bpf: sockmap: Fix use-after-free of sk->sk_socket in sk_psock_verdict_data_ready().
bpf: Fix grace period wait for tracepoint bpf_link
bpf: Fix regsafe() for pointers to packet
The jumbo_frm() chain-mode implementation unconditionally computes
len = nopaged_len - bmax;
where nopaged_len = skb_headlen(skb) (linear bytes only) and bmax is
BUF_SIZE_8KiB or BUF_SIZE_2KiB. However, the caller stmmac_xmit()
decides to invoke jumbo_frm() based on skb->len (total length including
page fragments):
When a packet has a small linear portion (nopaged_len <= bmax) but a
large total length due to page fragments (skb->len > bmax), the
subtraction wraps as an unsigned integer, producing a huge len value
(~0xFFFFxxxx). This causes the while (len != 0) loop to execute
hundreds of thousands of iterations, passing skb->data + bmax * i
pointers far beyond the skb buffer to dma_map_single(). On IOMMU-less
SoCs (the typical deployment for stmmac), this maps arbitrary kernel
memory to the DMA engine, constituting a kernel memory disclosure and
potential memory corruption from hardware.
Fix this by introducing a buf_len local variable clamped to
min(nopaged_len, bmax). Computing len = nopaged_len - buf_len is then
always safe: it is zero when the linear portion fits within a single
descriptor, causing the while (len != 0) loop to be skipped naturally,
and the fragment loop in stmmac_xmit() handles page fragments afterward.
David Carlier [Wed, 1 Apr 2026 21:12:18 +0000 (22:12 +0100)]
net: altera-tse: fix skb leak on DMA mapping error in tse_start_xmit()
When dma_map_single() fails in tse_start_xmit(), the function returns
NETDEV_TX_OK without freeing the skb. Since NETDEV_TX_OK tells the
stack the packet was consumed, the skb is never freed, leaking memory
on every DMA mapping failure.
Add dev_kfree_skb_any() before returning to properly free the skb.
Fixes: bbd2190ce96d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260401211218.279185-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Fix invariant violations and improve branch detection
This patchset fixes invariant violations on register bounds. These
invariant violations cause a warning and happen when reg_bounds_sync is
trying to refine register bounds while walking an impossible branch.
This patchset takes this situation as an opportunity to improve
verification performance. That is, the verifier will use the invariant
violations as a signal that a branch cannot be taken and process it as
dead code.
This patchset implements this approach and covers it in selftests with
a new invariant violation case. Some of the logic in reg_bounds_sync
likely acts as a duplicate with logic from is_scalar_branch_taken. This
patchset does not attempt to remove superfluous logic from
is_scalar_branch_taken and leaves it to a future patchset (ex. once
syzbot has confirmed that all invariant violations are fixed).
In the future, there is also a potential opportunity to simplify
existing logic by merging reg_bounds_sync and range_bounds_violation
(have reg_bounds_sync error out on invariant violation). That is
however not needed to fix invariant violation, which we focus on in
this patchset.
Changes in v3:
- Rename and refactor the helper functions checking for tnum-related
invariant violations (Mykyta).
- Small changes to comment style in verifier changes and new selftest
(Mykyta).
- Rebased.
Changes in v2:
- Moved tmp registers to env in preparatory commit (Eduard).
- Updated reg_bounds_sync to bail out in case of ill-formed
registers, thus avoiding one set of invariant violation checks in
simulate_both_branches_taken (Eduard).
- Drop the Fixes tag to avoid misleading backporters (Shung-Hsi).
- Improve wording of commit descriptions (Shung-Hsi, Hari).
- Fix error in code comments (AI bot).
- Rebased.
====================
Paul Chaignon [Thu, 2 Apr 2026 15:12:48 +0000 (17:12 +0200)]
selftests/bpf: Remove invariant violation flags
With the changes to the verifier in previous commits, we're not
expecting any invariant violations anymore. We should therefore always
enable BPF_F_TEST_REG_INVARIANTS to fail on invariant violations. Turns
out that's already the case and we've been explicitly setting this flag
in selftests when it wasn't necessary. This commit removes those flags
from selftests, which should hopefully make clearer that it's always
enabled.
Paul Chaignon [Thu, 2 Apr 2026 15:11:41 +0000 (17:11 +0200)]
selftests/bpf: Cover invariant violation case from syzbot
This patch adds a selftest for the change in the previous patch. The
selftest is derived from a syzbot reproducer from [1] (among the 22
reproducers on that page, only 4 still reproduced on latest bpf tree,
all being small variants of the same invariant violation).
The test case failure without the previous patch is shown below.
R5 and R7 are prepared such that their tnums intersection results in a
known constant but that constant isn't within R7's u32 bounds.
is_branch_taken isn't able to detect this case today, so the verifier
walks the impossible fallthrough branch. After regs_refine_cond_op and
reg_bounds_sync refine R5 on the assumption that the branch is taken,
the impossibility becomes apparent and results in an invariant violation
for R5: umin32 is greater than umax32.
The previous patch fixes this by using regs_refine_cond_op and
reg_bounds_sync in is_branch_taken to detect the impossible branch. The
fallthrough branch is therefore correctly detected as dead code.
bpf: Simulate branches to prune based on range violations
This patch fixes the invariant violations that can happen after we
refine ranges & tnum based on an incorrectly-detected branch condition.
For example, the branch is always true, but we miss it in
is_branch_taken; we then refine based on the branch being false and end
up with incoherent ranges (e.g. umax < umin).
To avoid this, we can simulate the refinement on both branches. More
specifically, this patch simulates both branches taken using
regs_refine_cond_op and reg_bounds_sync. If the resulting register
states are ill-formed on one of the branches, is_branch_taken can mark
that branch as "never taken".
On a more formal note, we can deduce a branch is not taken when
regs_refine_cond_op or reg_bounds_sync returns an ill-formed state
because the branch operators are sound (verified with Agni [1]).
Soundness means that the verifier is guaranteed to produce sound
outputs on the taken branches. On the non-taken branch (explored
because of imprecision in the bounds), the verifier is free to produce
any output. We use ill-formedness as a signal that the branch is dead
and prune that branch.
This patch moves the refinement logic for both branches from
reg_set_min_max to their own function, simulate_both_branches_taken,
which is called from is_scalar_branch_taken. As a result,
reg_set_min_max now only runs sanity checks and has been renamed to
reg_bounds_sanity_check_branches to reflect that.
We have had five patches fixing specific cases of invariant violations
in the past, all added with selftests:
- commit fbc7aef517d8 ("bpf: Fix u32/s32 bounds when ranges cross
min/max boundary")
- commit efc11a667878 ("bpf: Improve bounds when tnum has a single
possible value")
- commit f41345f47fb2 ("bpf: Use tnums for JEQ/JNE is_branch_taken
logic")
- commit 00bf8d0c6c9b ("bpf: Improve bounds when s64 crosses sign
boundary")
- commit 6279846b9b25 ("bpf: Forget ranges when refining tnum after
JSET")
To confirm that this patch addresses all invariant violations, we have
also reverted those five commits and verified that their related
selftests don't cause any invariant violation warnings anymore. Those
selftests still fail but only because of misdetected branches or
less-precise bounds than expected. This demonstrates that the current
patch is enough to avoid the invariant violation warning AND that the
previous five patches are still useful to improve branch detection.
In addition to the selftests, this change was also tested with the
Cilium complexity test suite: all programs were successfully loaded and
it didn't change the number of processed instructions.
bpf: Exit early if reg_bounds_sync gets invalid inputs
In the subsequent commit, to prune dead branches we will rely on
detecting ill-formed ranges using range_bounds_violations()
(e.g., umin > umax) after refining register bounds using
regs_refine_cond_op().
However, reg_bounds_sync() can sometimes "repair" ill-formed bounds,
potentially masking a violation that was produced by
regs_refine_cond_op().
This commit modifies reg_bounds_sync() to exit early if an invariant
violation is already present in the input.
This ensures ill-formed reg_states remain ill-formed after
reg_bounds_sync(), allowing simulate_both_branches_taken() to correctly
identify dead branches with a single check to range_bounds_violation().
Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/73127d628841c59cb7423d6bdcd204bf90bcdc80.1775142354.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul Chaignon [Thu, 2 Apr 2026 15:09:15 +0000 (17:09 +0200)]
bpf: Use bpf_verifier_env buffers for reg_set_min_max
In a subsequent patch, the regs_refine_cond_op and reg_bounds_sync
functions will be called in is_branch_taken instead of reg_set_min_max,
to simulate each branch's outcome. Since they will run before we branch
out, these two functions will need to work on temporary registers for
the two branches.
This refactoring patch prepares for that change, by introducing the
temporary registers on bpf_verifier_env and using them in
reg_set_min_max.
This change also allows us to save one fake_reg slot as we don't need to
allocate an additional temporary buffer in case of a BPF_K condition.
Finally, you may notice that this patch removes the check for
"false_reg1 == false_reg2" in reg_set_min_max. That check was introduced
in commit d43ad9da8052 ("bpf: Skip bounds adjustment for conditional
jumps on same scalar register") to avoid an invariant violation. Given
that "env->false_reg1 == env->false_reg2" doesn't make sense and
invariant violations are addressed in a subsequent commit, this patch
just removes the check.
This is the first of four series adding SR-IOV V2 support to the enic
driver for Cisco VIC 14xx/15xx adapters.
The existing V1 SR-IOV implementation has VFs that interact directly
with the VIC firmware, leaving the PF driver with no visibility or
control over VF behavior. V2 introduces a PF-mediated model where VFs
communicate with the PF through a mailbox over a dedicated admin
channel. This brings enic in line with the standard Linux SR-IOV
model, enabling full PF management of VFs via ip link (MAC, VLAN,
link state, spoofchk, trust, and per-VF statistics).
This preparatory series adds detection and resource helper code with
no functional change to existing driver behavior:
- Extend BAR resource discovery for admin channel resources
- Register the V2 VF PCI device ID
- Detect VF type (V1/V2/usNIC) from SR-IOV PCI capability
- Make enic_dev_enable/disable ref-counted for shared use by data
path and admin channel
- Add type-aware resource allocation for admin WQ/RQ/CQ/INTR
- Detect presence of admin channel resources at probe time
Tested on VIC 14xx and 15xx series adapters with V2 VFs under KVM
(sriov_numvfs, VF passthrough, ip link VF configuration, VF traffic).
Based in part on initial work by Christian Benvenuti.
====================
Check for the presence of admin channel BAR resources
(RES_TYPE_ADMIN_WQ, ADMIN_RQ, ADMIN_CQ, SRIOV_INTR) during resource
discovery. Set has_admin_channel when all four are available.
Use ARRAY_SIZE(enic->admin_cq) for the admin CQ count check since the
driver allocates two admin CQs (one for WQ completions, one for RQ
completions) and both must be backed by hardware resources.
Add admin WQ, RQ, CQ and INTR fields to struct enic for use by the
upcoming admin channel open/close paths.
enic: add type-aware alloc for WQ, RQ, CQ and INTR resources
The existing vnic_wq_alloc(), vnic_rq_alloc(), vnic_cq_alloc() and
vnic_intr_alloc() hardcode data-path resource types (RES_TYPE_WQ,
RES_TYPE_RQ, RES_TYPE_CQ, RES_TYPE_INTR_CTRL). The upcoming admin
channel uses different BAR resource types (RES_TYPE_ADMIN_WQ/RQ/CQ,
RES_TYPE_SRIOV_INTR) for its queues.
Add _with_type() variants that accept an explicit resource type
parameter. Refactor the original functions as thin wrappers that
pass the default data-path type. No functional change.
Both the data path (ndo_open/ndo_stop) and the upcoming admin channel
need to enable and disable the vNIC device independently. Without
reference counting, closing the admin channel while the netdev is up
would inadvertently disable the entire device.
Add an enable_count to struct enic, protected by the existing
devcmd_lock. enic_dev_enable() issues CMD_ENABLE_WAIT only on the
first caller (0 -> 1 transition), and enic_dev_disable() issues
CMD_DISABLE only when the last caller releases (1 -> 0 transition).
Also check the return value of enic_dev_enable() in enic_open() and
fail the open if the firmware enable command fails. Without this check,
a failed enable leaves enable_count at zero while the interface appears
up, which can cause a later admin channel enable/disable cycle to
incorrectly disable the hardware under the active data path.
Read the VF device ID from the SR-IOV PCI capability at probe time to
determine whether the PF is configured for V1, USNIC, or V2 virtual
functions. Store the result in enic->vf_type for use by subsequent
SR-IOV operations.
The VF type is a firmware-configured property (set via UCSM, CIMC,
Intersight etc) that is immutable from the driver's perspective. Only
PFs are probed for this capability; VFs and dynamic vnics skip
detection.
Register the V2 VF PCI device ID (0x02b7) so the driver binds to V2
virtual functions created via sriov_configure. Update enic_is_sriov_vf()
to recognize V2 VFs alongside the existing V1 type.
enic: extend resource discovery for SR-IOV admin channel
VIC firmware exposes admin channel resources (WQ, RQ, CQ) for PF-VF
communication when SR-IOV is active. Add the corresponding resource
type definitions and teach the discovery and access functions to
handle them.
Qingfang Deng [Wed, 1 Apr 2026 02:28:39 +0000 (10:28 +0800)]
MAINTAINERS: orphan PPP over Ethernet driver
We haven't seen activities from Michal Ostrowski for quite a long time.
The last commit from him is fb64bb560e18 ("PPPoE: Fix flush/close
races."), which was in 2009. Email to mostrows@earthlink.net also
bounces.
====================
net: phy: microchip: add downshift support for LAN88xx
Add standard ETHTOOL_PHY_DOWNSHIFT tunable support for the Microchip
LAN88xx PHY, following the same pattern used by Marvell and other PHY
drivers.
Ethernet cables with faulty or missing pairs (specifically C and D)
can successfully auto-negotiate 1000BASE-T but fail to establish a
stable link. The LAN88xx PHY supports automatic downshift to
100BASE-TX after a configurable number of failed attempts (2-5).
Patch 1 adds the get/set tunable implementation.
Patch 2 enables downshift by default with a count of 2. The setting is
stored in the driver's private data so that user changes via ethtool are
preserved across suspend/resume cycles.
Based on an earlier downstream implementation by Phil Elwell.
Tested on Raspberry Pi 3B+ (LAN7515/LAN88xx).
====================
net: phy: microchip: enable downshift by default on LAN88xx
Enable auto-downshift from 1000BASE-T to 100BASE-TX after 2 failed
auto-negotiation attempts by default. This ensures that links with
faulty or missing cable pairs (C and D) fall back to 100Mbps without
requiring userspace configuration.
The downshift count is stored in the driver's private data and applied
in config_init, so user changes via ethtool are preserved across
suspend/resume cycles.
Users can override or disable downshift at runtime:
net: phy: microchip: add downshift tunable support for LAN88xx
Implement the standard ETHTOOL_PHY_DOWNSHIFT tunable for the LAN88xx
PHY. This allows runtime configuration of the auto-downshift feature
via ethtool:
ethtool --set-phy-tunable eth0 downshift on count 3
The LAN88xx PHY supports downshifting from 1000BASE-T to 100BASE-TX
after 2-5 failed auto-negotiation attempts. Valid count values are
2, 3, 4 and 5.
This is based on an earlier downstream implementation by Phil Elwell.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260401123848.696766-2-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Wagner [Wed, 1 Apr 2026 11:49:31 +0000 (12:49 +0100)]
net: phy: bcm84881: add LED framework support for BCM84891/BCM84892
Expose LED1 and LED2 pins via the PHY LED framework. Each pin has a
source mask (MASK_LOW + MASK_EXT registers) selecting which hardware
events light it, plus a CTL field in the shared 0xA83B register
(RMW; LED4 is firmware-controlled per the datasheet).
Hardware can offload per-speed link triggers (1000/2500/5000/10000),
RX/TX activity, and force-on. LINK_100 is accepted only alongside
LINK_1000: source bit 4 lights at both speeds and 100-alone isn't
representable, so the unrepresentable case falls to software.
The chip has five LED pins; only LED1/LED2 are exposed here as those
are the only ones characterized on tested hardware. LED4 is firmware-
controlled regardless of strap configuration.
Tested on TRENDnet TEG-S750 (LED1/LED2 wired to an antiparallel
bicolor LED): brightness_set via sysfs; netdev trigger offloaded=1
with amber lit at 100M/1G/2.5G and green lit at 10G via respective
link_* modes; LED off immediately on cable unplug with no software
involvement.
Giovanni Cabiddu [Sat, 28 Mar 2026 22:29:47 +0000 (22:29 +0000)]
crypto: qat - add support for zstd
Add support for the ZSTD algorithm for QAT GEN4, GEN5 and GEN6 via the
acomp API.
For GEN4 and GEN5, compression is performed in hardware using LZ4s, a
QAT-specific variant of LZ4. The compressed output is post-processed to
generate ZSTD sequences, and the ZSTD library is then used to produce
the final ZSTD stream via zstd_compress_sequences_and_literals(). Only
inputs between 8 KB and 512 KB are offloaded to the device. The minimum
size restriction will be relaxed once polling support is added. The
maximum size is limited by the use of pre-allocated per-CPU scratch
buffers. On these generations, only compression is offloaded to hardware;
decompression always falls back to software.
For GEN6, both compression and decompression are offloaded to the
accelerator, which natively supports the ZSTD algorithm. There is no
limit on the input buffer size supported. However, since GEN6 is limited
to a history size of 64 KB, decompression of frames compressed with a
larger history falls back to software.
Since GEN2 devices do not support ZSTD or LZ4s, add a mechanism that
prevents selecting GEN2 compression instances for ZSTD or LZ4s when a
GEN2 plug-in card is present on a system with an embedded GEN4, GEN5 or
GEN6 device.
In addition, modify the algorithm registration logic to allow
registering the correct implementation, i.e. LZ4s based for GEN4 and
GEN5 or native ZSTD for GEN6.
Co-developed-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com> Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Laurent M Coquerel <laurent.m.coquerel@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Sat, 28 Mar 2026 22:29:46 +0000 (22:29 +0000)]
crypto: qat - use swab32 macro
Replace __builtin_bswap32() with swab32 in icp_qat_hw_20_comp.h to fix
the following build errors on architectures without native byte-swap
support:
alpha-linux-ld: drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.o: in function `adf_gen4_build_decomp_block':
drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:141:(.text+0xeec): undefined reference to `__bswapsi2'
alpha-linux-ld: drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:141:(.text+0xef8): undefined reference to `__bswapsi2'
alpha-linux-ld: drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.o: in function `adf_gen4_build_comp_block':
drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:57:(.text+0xf64): undefined reference to `__bswapsi2'
alpha-linux-ld: drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:57:(.text+0xf7c): undefined reference to `__bswapsi2'
Fixes: 5b14b2b307e4 ("crypto: qat - enable deflate for QAT GEN4") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603290259.Ig9kDOmI-lkp@intel.com/ Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Sat, 28 Mar 2026 10:20:44 +0000 (11:20 +0100)]
crypto: img-hash - use list_first_entry_or_null to simplify digest
Use list_first_entry_or_null() to simplify img_hash_digest() and remove
the now-unused local 'struct img_hash_dev *' variables. Use 'ctx->hdev'
when calling img_hash_handle_queue() instead of 'tctx->hdev'.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Fri, 27 Mar 2026 23:08:18 +0000 (16:08 -0700)]
crypto: cryptomgr - Select algorithm types only when CRYPTO_SELFTESTS
Enabling any template selects CRYPTO_MANAGER, which causes
CRYPTO_MANAGER2 to enable itself, which selects every algorithm type
option. However, pulling in all algorithm types is needed only when the
self-tests are enabled. So condition the selections accordingly.
To make this possible, also add the missing selections to various
symbols that were relying on transitive selections via CRYPTO_MANAGER.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Paul Louvel [Fri, 27 Mar 2026 09:24:18 +0000 (10:24 +0100)]
crypto: aspeed - Use memcpy_from_sglist() in aspeed_ahash_dma_prepare()
Replace scatterwalk_map_and_copy() with memcpy_from_sglist() in
aspeed_ahash_dma_prepare(). The latter provides a simpler interface
without requiring a direction parameter, making the code easier to
read and less error-prone.
No functional change intended.
Signed-off-by: Paul Louvel <paul.louvel@bootlin.com> Reviewed-by: Neal Liu <neal_liu@aspeedtech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:07 +0000 (17:15 -0700)]
crypto: rng - Don't pull in DRBG when CRYPTO_FIPS=n
crypto_stdrng_get_bytes() is now always available:
- When CRYPTO_FIPS=n it is an inline function that always calls into
the always-built-in drivers/char/random.c.
- When CRYPTO_FIPS=y it is an inline function that calls into either
random.c or crypto/rng.c, depending on the value of fips_enabled.
The former is again always built-in. The latter is built-in as
well in this case, due to CRYPTO_FIPS=y.
Thus, the CRYPTO_RNG_DEFAULT symbol is no longer needed. Remove it.
This makes it so that CRYPTO_DRBG_MENU (and hence also CRYPTO_DRBG,
CRYPTO_JITTERENTROPY, and CRYPTO_LIB_SHA3) no longer gets unnecessarily
pulled into CRYPTO_FIPS=n kernels. I.e. CRYPTO_FIPS=n kernels are no
longer bloated with code that is relevant only to FIPS certifications.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:06 +0000 (17:15 -0700)]
crypto: fips - Depend on CRYPTO_DRBG=y
Currently, the callers of crypto_stdrng_get_bytes() do 'select
CRYPTO_RNG_DEFAULT', which does 'select CRYPTO_DRBG_MENU'.
However, due to the change in how crypto_stdrng_get_bytes() is
implemented, CRYPTO_DRBG_MENU is now needed only when CRYPTO_FIPS.
But, 'select CRYPTO_DRBG_MENU if CRYPTO_FIPS' would cause a recursive
dependency, since CRYPTO_FIPS 'depends on CRYPTO_DRBG'.
Solve this by just making CRYPTO_FIPS depend on CRYPTO_DRBG=y (rather
than CRYPTO_DRBG i.e. CRYPTO_DRBG=y || CRYPTO_DRBG=m). The distros that
use CRYPTO_FIPS=y already set CRYPTO_DRBG=y anyway, which makes sense.
This makes the CRYPTO_RNG_DEFAULT symbol (and its corresponding
selection of CRYPTO_DRBG_MENU) unnecessary. A later commit removes it.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:05 +0000 (17:15 -0700)]
crypto: rng - Make crypto_stdrng_get_bytes() use normal RNG in non-FIPS mode
"stdrng" is needed only in "FIPS mode". Therefore, make
crypto_stdrng_get_bytes() delegate to either the normal Linux RNG or to
"stdrng", depending on the current mode.
This will eliminate the need to built the SP800-90A DRBG and its
dependencies into CRYPTO_FIPS=n kernels.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:04 +0000 (17:15 -0700)]
crypto: rng - Unexport "default RNG" symbols
Now that crypto_default_rng, crypto_get_default_rng(), and
crypto_put_default_rng() have no users outside crypto/rng.c itself,
unexport them and make them static.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:03 +0000 (17:15 -0700)]
net: tipc: Use crypto_stdrng_get_bytes()
Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:02 +0000 (17:15 -0700)]
crypto: intel/keembay-ocs-ecc - Use crypto_stdrng_get_bytes()
Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:01 +0000 (17:15 -0700)]
crypto: hisilicon/hpre - Use crypto_stdrng_get_bytes()
Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:15:00 +0000 (17:15 -0700)]
crypto: geniv - Use crypto_stdrng_get_bytes()
Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:14:59 +0000 (17:14 -0700)]
crypto: ecc - Use crypto_stdrng_get_bytes()
Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Thu, 26 Mar 2026 00:14:58 +0000 (17:14 -0700)]
crypto: dh - Use crypto_stdrng_get_bytes()
Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
While it may have been intended that callers amortize the cost of
getting and putting the "default RNG" (i.e. "stdrng") over multiple
calls, in practice that optimization is never used. The callers just
want a function that gets random bytes from the "stdrng".
Therefore, add such a function: crypto_stdrng_get_bytes().
Importantly, this decouples the callers from the crypto_rng API. That
allows a later commit to make this function simply call
get_random_bytes_wait() unless the kernel is in "FIPS mode".
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Tue, 24 Mar 2026 18:29:05 +0000 (18:29 +0000)]
crypto: iaa - fix per-node CPU counter reset in rebalance_wq_table()
The cpu counter used to compute the IAA device index is reset to zero
at the start of each NUMA node iteration. This causes CPUs on every
node to map starting from IAA index 0 instead of continuing from the
previous node's last index. On multi-node systems, this results in all
nodes mapping their CPUs to the same initial set of IAA devices,
leaving higher-indexed devices unused.
Move the cpu counter initialization before the for_each_node_with_cpus()
loop so that the IAA index computation accumulates correctly across all
nodes.
Fixes: 714ca27e9bf4 ("crypto: iaa - Optimize rebalance_wq_table()") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>