git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge branch 'sched/cache'

Merge the cache aware balancer topic branch.

# Conflicts:
# kernel/sched/topology.c

sched/rt: Have RT_PUSH_IPI be default off for non PREEMPT_RT

RT migration is done aggressively. When a CPU schedules out a high
priority RT task for a lower priority task, it will look to see if there's
any RT tasks that are waiting to run on another CPU that is of higher
priority than the task this CPU is about to run. If it finds one, it will
pull that task over to the CPU and allow it to run there instead.

Normally, this pulling is done by looking at the RT overloaded mask (rto)
which contains all the CPUs in the scheduler domain with RT tasks that are
waiting to run due to a higher priority RT task currently running on their
CPU. The CPU that is about to schedule a lower priority task will grab the
rq lock of the overloaded CPU and move the RT task from that CPU's runqueue
to the local one and schedule the higher priority RT task.

This caused issues when a lot of CPUs would schedule a lower priority task
at the same time. They would all try to grab the same runqueue lock of
the CPU with the overloaded RT tasks. Only the first CPU that got in will
get that task. All the others would wait until they got the runqueue lock
and see there's nothing to pull and do nothing. On systems with lots of
CPUs, this caused a large latency (up to 500us) which is beyond what
PREEMPT_RT is to allow.

The solution to that was to create an RT_PUSH_IPI logic. When any CPU
wanted to pull a task, instead of grabbing the runqueue lock of the
overloaded CPU, it would start by sending an IPI to the overloaded CPU,
and that IPI handler would have the CPU with the waiting RT task do a push
instead. Then that handler would send an IPI to the next CPU with
overloaded RT tasks, and so on. Note, after the first CPU starts this
process, if another CPU wanted to do a pull, it would see that the process
has already begun and would only increment a counter to have the IPIs
continue again.

The RT_PUSH_IPI solved the latency problem with PREEMPT_RT but could cause
a new issue with non PREEMPT_RT. Namely, softirqs run in a threaded
context on PREEMPT_RT but they can run in an interrupt context in non-RT.

If an IPI lands on a CPU that has just woken up multiple RT tasks and the
current CPU is running a non RT or a low priority RT task, instead of
doing a push, it would simply do a schedule on that CPU. But if a softirq
was also executing on this CPU, the schedule would need to wait until the
softirq finished. Until then, the CPU would still be considered overloaded
as there are RT tasks still waiting to run on it.

A live lock occurred on a workload that was doing heavy networking traffic
on a large machine where the softirqs would run 500us out of 750us. And it
would also be waking up RT tasks, causing the RT pull logic to be
constantly executed.

When a softirq triggered on a CPU with RT tasks queued but not running
yet, and the other CPUs would see this CPU as being overloaded, they would
send an IPI over to it. The CPU would notice that the waiting RT tasks are
of higher priority than the currently running task and simply schedule
that CPU instead. But because the softirq was executing, before it could
schedule, it would receive another IPI to do the same. The amount of IPIs
would slow down the currently running softirq so much that before it could
return back to task context, it would execute another softirq never
allowing the CPU to schedule. This live locked that CPU.

As RT_PUSH_IPI was created to help PREEMPT_RT, make it default off if
PREEMPT_RT is not enabled.

Fixes: b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
Closes: https://lore.kernel.org/all/20260506235716.2530720-1-tj@kernel.org/
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260515103740.25ccbed8@gandalf.local.home

sched: Switch rq->next_class on proxy_resched_idle()

K Prateek noticed we weren't setting the rq->next_class in
proxy_resched_idle(), when I was debugging an issue seen with
CONFIG_SCHED_PROXY_EXEC and some of Peter's new patches, and
suggested this fix.

So set rq->next_class when we temporarily switch the donor to
idle, so we don't accidentally call wakeup_preempt_fair()
with idle as the donor.

Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260514234732.3170197-1-jstultz@google.com

sched/fair: Add SIS_UTIL support to select_idle_capacity()

Add to select_idle_capacity() the same SIS_UTIL-controlled idle-scan
mechanism, already used by select_idle_cpu(): when sched_feat(SIS_UTIL)
is enabled and the LLC domain has sched_domain_shared data, derive the
per-attempt scan limit from sd->shared->nr_idle_scan.

That bounds the walk on large LLCs: once nr_idle_scan is exhausted,
return the best CPU seen so far. The early exit is gated on
!has_idle_core so an active idle-core search (SMT with idle cores
reported by test_idle_cores()) isn't cut short before it gets a chance
to find one.

Co-developed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://patch.msgid.link/20260509180955.1840064-6-arighi@nvidia.com

sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity

When SD_ASYM_CPUCAPACITY load balancing considers pulling a misfit task,
capacity_of(dst_cpu) can overstate available compute if the SMT sibling is
busy: the core does not deliver its full nominal capacity.

If SMT is active and dst_cpu is not on a fully idle core, skip this
destination so we do not migrate a misfit expecting a capacity upgrade we
cannot actually provide.

Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://patch.msgid.link/20260509180955.1840064-5-arighi@nvidia.com

sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection

On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
different per-core frequencies), the wakeup path uses
select_idle_capacity() and prioritizes idle CPUs with higher capacity
for better task placement. However, when those CPUs belong to SMT cores,
their effective capacity can be much lower than the nominal capacity
when the sibling thread is busy: SMT siblings compete for shared
resources, so a "high capacity" CPU that is idle but whose sibling is
busy does not deliver its full capacity. This effective capacity
reduction cannot be modeled by the static capacity value alone.

Introduce SMT awareness in the asym-capacity idle selection policy: when
SMT is active, always prefer fully-idle SMT cores over partially-idle
ones.

Prioritizing fully-idle SMT cores yields better task placement because
the effective capacity of partially-idle SMT cores is reduced; always
preferring them when available leads to more accurate capacity usage on
task wakeup.

On an SMT system with asymmetric CPU capacities (NVIDIA Vera Rubin),
SMT-aware idle selection has been shown to improve throughput by around
15-18% over NO_ASYM mainline and by around 60% over ASYM mainline, for
CPU-bound workloads (NVBLAS) running an amount of tasks equal to the
amount of SMT cores.

Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260511142502.3873984-1-arighi@nvidia.com

sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity

On asymmetric CPU capacity systems, the wakeup path uses
select_idle_capacity(), which scans the span of sd_asym_cpucapacity
rather than sd_llc.

The has_idle_cores hint however lives on sd_llc->shared, so the
wakeup-time read of has_idle_cores operates on an LLC-scoped blob while
the actual scan/decision spans the asym domain; nr_busy_cpus also lives
in the same shared sched_domain data, but it's never used in the asym
CPU capacity scenario.

Therefore, move the sched_domain_shared object to sd_asym_cpucapacity
whenever the CPU has a SD_ASYM_CPUCAPACITY_FULL ancestor and that
ancestor is non-overlapping (i.e., not built from SD_NUMA). In that case
the scope of has_idle_cores matches the scope of the wakeup scan.

Fall back to attaching the shared object to sd_llc in three cases:

  1) plain symmetric systems (no SD_ASYM_CPUCAPACITY_FULL anywhere);

  2) CPUs in an exclusive cpuset that carves out a symmetric capacity
     island: has_asym is system-wide but those CPUs have no
     SD_ASYM_CPUCAPACITY_FULL ancestor in their hierarchy and follow
     the symmetric LLC path in select_idle_sibling();

  3) exotic topologies where SD_ASYM_CPUCAPACITY_FULL lands on an
     SD_NUMA-built domain. init_sched_domain_shared() keys the shared
     blob off cpumask_first(span), which on overlapping NUMA domains
     would alias unrelated spans onto the same blob. Keep the shared
     object on the LLC there; select_idle_capacity() gracefully skips
     the has_idle_cores preference when sd->shared is NULL.

While at it, also rename the per-CPU sd_llc_shared to sd_balance_shared,
as it is no longer strictly tied to the LLC.

Co-developed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://patch.msgid.link/20260516055850.1345932-1-arighi@nvidia.com

sched/fair: Drop redundant RCU read lock in NOHZ kick path

nohz_balancer_kick() is reached from sched_balance_trigger(), which is
called from sched_tick(). sched_tick() runs with IRQs disabled, so the
additional rcu_read_lock/unlock() used around sched_domain accesses in
this path is redundant. Rely on the existing IRQ-disabled context (and
the rcu_dereference_all() checking) instead.

The same applies to set_cpu_sd_state_idle(), called from the idle entry
path with IRQs disabled, and to set_cpu_sd_state_busy(), reachable via
nohz_balance_exit_idle() from two contexts: nohz_balancer_kick() (IRQs
disabled, as above) and sched_cpu_deactivate() (the CPUHP_AP_ACTIVE
teardown, which runs under cpus_write_lock(), so it cannot race with
sched-domain rebuilds). In both cases the rcu_dereference_all()
validation is sufficient.

No functional change intended.

Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://patch.msgid.link/20260509180955.1840064-2-arighi@nvidia.com

sched: Unify SMT active check via sched_smt_active()

There is a use of sched_smt_active() and explicit use of sched_smt_present.
Remove the explicit usage for better code maintenance and readability.

Note that this differs slightly for update_idle_core. It used to call
static_branch_unlikely earlier and now it will call static_branch_likely.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://patch.msgid.link/20260515172456.542799-5-sshegde@linux.ibm.com

sched/fair: Add sched_smt_active check for fastpaths

For fastpaths such as wakeup and load balance even minimal code additions
can add up. is_core_idle is accessed during load balance.

Other callsites of is_core_idle make sched_smt_active() check first.
Make the same check in should_we_balance.

Rest of access to cpu_smt_mask isn't in fastpath.

Note: Remove the stale comment above is_core_idle. Enqueue methods
of fair aren't close to it anymore.

Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://patch.msgid.link/20260515172456.542799-4-sshegde@linux.ibm.com

sched: Simplify ifdeffery around cpu_smt_mask

Now, that cpu_smt_mask is defined as cpumask_of(cpu) for
CONFIG_SCHED_SMT=n, it is possible to get rid of the ifdeffery.

Effectively,
- This makes sched_smt_present is defined always

- cpumask_weight(cpumask_of(cpu)) == 1. So sched_smt_present_inc/dec
   will never enable the sched_smt_present. Which is expected.

- Paths that were compile-time eliminated become runtime guarded
   using static keys.

- Defines set_idle_cores, test_idle_cores, etc which could likely benefit
   the CONFIG_SCHED_SMT=n systems to use the same optimizations within the
   LLC at wakeups.

- This will expose sched_smt_present symbol for CONFIG_SCHED_SMT=n.
   Likely not a concern.

- There is a bloat of code CONFIG_SCHED_SMT=n. (NR_CPUS=2048)
   add/remove: 24/18 grow/shrink: 26/28 up/down: 6396/-3188 (3208)
   Total: Before=30629880, After=30633088, chg +0.01%

- No code bloat for CONFIG_SCHED_SMT=y, which is expected.

- Add comments around stop_core_cpuslocked on why ifdefs are not
   removed.

- This leaves the remaining uses of CONFIG_SCHED_SMT mainly for
   topology building bits which has a policy based decision.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260515172456.542799-3-sshegde@linux.ibm.com

topology: Introduce cpu_smt_mask for CONFIG_SCHED_SMT=n

Define cpu_smt_mask in case of CONFIG_SCHED_SMT=n as cpumask_of that
CPU. With that config, it is expected that kernel treats each CPU
as individual core. Using cpumask_of(cpu) reflects that.

This would help to get rid of the ifdeffery that is spread across
the codebase since cpu_smt_mask is defined only in case of
CONFIG_SCHED_SMT=y.

Note: There is no arch today which defines cpu_smt_mask unconditionally.
So likely defining the cpu_smt_mask shouldn't lead redefinition errors.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260515172456.542799-2-sshegde@linux.ibm.com

sched/fair: Update util_est after updating util_avg during dequeue

util_est_update() must be called after updating util_avg during the dequeue
of a task and only when the task is not delayed dequeue.

Move util_est_update() in update_load_avg().

Fixes: b55945c500c5 ("sched: Fix pick_next_task_fair() vs try_to_wake_up() race")
Closes: https://lore.kernel.org/all/20260512124653.305275-1-qyousef@layalina.io/
Reported-by: Qais Yousef <qyousef@layalina.io>
Reviewed-and-tested-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260518102345.268452-1-vincent.guittot@linaro.org

sched/clock: Provide !HAVE_UNSTABLE_SCHED_CLOCK stub for sched_clock_stable()

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is disabled, sched_clock() is
already assumed to provide stable semantics, but the public header
doesn't provide a sched_clock_stable() stub for that case.

Add a header stub that always returns true and clean up the duplicate
local stub in ring_buffer.c, so callers can use sched_clock_stable()
unconditionally.

Signed-off-by: Yiyang Chen <cyyzero16@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://patch.msgid.link/56e45338858946cd9581b75c8bd45dd37dba52c5.1778773587.git.cyyzero16@gmail.com

sched/cputime: Drop now-stale mul_u64_u64_div_u64() over-approximation guard

Commit 77baa5bafcbe ("sched/cputime: Fix mul_u64_u64_div_u64() precision
for cputime") added a clamp in cputime_adjust():

if (unlikely(stime > rtime))
stime = rtime;

The justification was that mul_u64_u64_div_u64() could over-approximate
on some architectures (notably arm64 and the old 32-bit fallback), so
the mathematically impossible stime > rtime was nevertheless reachable
and would underflow utime = rtime - stime.

That premise no longer holds. Commit b29a62d87cc0 ("mul_u64_u64_div_u64:
make it precise always") replaced the fallback implementation with an
exact 128-bit long division, and the x86_64 inline asm already produced
exact results. The helper now returns the mathematically correct
floor(a*b/d) on every architecture, so stime <= rtime is guaranteed by
stime <= stime + utime and the clamp is dead code.

Remove it along with its stale comment.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260514202629.673539-1-nico@fluxnic.net

sched/deadline: Fix replenishment logic for non-deferred servers

Enqueue and replenish non-deferred deadline servers when their runtime is
exhausted and the replenishment timer could not be started because it is
too close to the wake-up instant.

Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260430213835.62217-2-yurand2000@gmail.com

sched/rt: Update default bandwidth for real-time tasks to ONE

Set the default total bandwidth for SCHED_DEADLINE tasks and servers
to ONE. FIFO/RR tasks are already throttled by fair-servers and
ext-servers, and the sysctl_sched_rt_runtime parameter now only
defines the total bw that is allowed to deadline entities.

Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260430213835.62217-22-yurand2000@gmail.com

iommu/arm-smmu-qcom: Fix fastrpc compatible string in ACTLR client match table

The qcom_smmu_actlr_client_of_match table contained "qcom,fastrpc" as
the compatible string for applying ACTLR prefetch settings to FastRPC
devices. However, "qcom,fastrpc" is the compatible string for the parent
rpmsg channel node, which is not an IOMMU client — it carries no
"iommus" property in the device tree and is never attached to an SMMU
context bank.

The actual IOMMU clients are the compute context bank (CB) child nodes,
which use the compatible string "qcom,fastrpc-compute-cb". These nodes
carry the "iommus" property and are probed by fastrpc_cb_driver via
fastrpc_cb_probe(), which sets up the DMA mask and IOMMU mappings for
each FastRPC session. The device tree structure is:

  fastrpc {
      compatible = "qcom,fastrpc";        /* rpmsg channel, no iommus */
      ...
      compute-cb@3 {
          compatible = "qcom,fastrpc-compute-cb";
          iommus = <&apps_smmu 0x1823 0x0>;  /* actual IOMMU client */
      };
  };

Since qcom_smmu_set_actlr_dev() calls of_match_device() against the
device being attached to the SMMU context bank, the "qcom,fastrpc"
entry was never matching any device. As a result, the ACTLR prefetch
settings (PREFETCH_DEEP | CPRE | CMTLB) were silently never applied
for FastRPC compute context banks.

Fix this by replacing "qcom,fastrpc" with "qcom,fastrpc-compute-cb"
in the match table so that the ACTLR settings are correctly applied
to the compute CB devices that are the true IOMMU clients.

Assisted-by: Claude:claude-sonnet-4-6
Fixes: 3e35c3e725de ("iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
Signed-off-by: Bibek Kumar Patro <bibek.patro@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>

platform/x86: alienware-wmi-base: Transition to new WMI API

Transition to the new wmi_buffer based WMI API.

Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Link: https://patch.msgid.link/20260429-aw-new-api-v5-1-7702668d04c6@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

phy: qcom: qmp-usbc: Fix out-of-bounds array access in dp swing config

swing_tbl and pre_emphasis_tbl are 4x4 arrays (valid indices 0-3), but
the boundary check uses "> 4" instead of ">= 4", allowing index 4 to
cause an out-of-bounds access.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 81791c45c8e0 ("phy: qcom: qmp-usbc: Add QCS615 USB/DP PHY config and DP mode support")
Signed-off-by: Xiangxu Yin <xiangxu.yin@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260227-master-v1-1-8d91b9407fdb@oss.qualcomm.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

phy: qcom: qmp-combo: Move pipe_clk on/off to common

Keep the USB pipe clock working when the phy is in DP-only mode, because
the dwc controller still needs it for USB 2.0 over the same Type-C port.

Tested with the BenQ RD280UA monitor which has a downstream-facing port
for data passthrough that's manually switchable between USB 2 and 3,
corresponding to 4-lane and 2-lane DP respectively.

Note: the suspend/resume callbacks were already gating the enable/disable
of this clock only on init_count and not usb_init_count!

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Val Packett <val@packett.cool>
Link: https://patch.msgid.link/20260304190827.176988-1-val@packett.cool
Signed-off-by: Vinod Koul <vkoul@kernel.org>

Merge branch 'fixes' of into for-next

Reasons:
- lenovo-wmi-* feature work
- an important WMI core fix

MAINTAINERS: Update HiSilicon PMU driver maintainer to Yushan Wang

Replace myself with Yushan Wang who is very familiar with the HiSilicon PMU
drivers.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Jie Zhan <zhanjie9@hisilicon.com>
Acked-by: Yushan Wang <wangyushan12@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>

Merge branch 'eea-add-basic-driver-framework-for-alibaba-elastic-ethernet-adaptor'

Xuan Zhuo says:

====================
eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor

Add a driver framework for EEA that will be available in the future.

This driver is currently quite minimal, implementing only fundamental
core functionalities. Key features include: I/O queue management via
adminq, basic PCI-layer operations, and essential RX/TX data
communication capabilities. It also supports the creation,
initialization, and management of network devices (netdev). Furthermore,
the ring structures for both I/O queues and adminq have been abstracted
into a simple, unified, and reusable library implementation,
facilitating future extension and maintenance.
====================

Link: https://patch.msgid.link/20260514095138.80680-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: introduce callback for ndo_get_stats64 and register netdev

This commit adds support for ndo_get_stats64 to provide accurate
interface statistics. With the TX and RX data paths now fully functional,
it is appropriate to register the netdevice and expose the interface to
userspace.

Registered the network device via register_netdev, and updated the
corresponding unregister_netdev and dev_close routines to ensure
synchronization.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-9-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: introduce ethtool support

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit introduces ethtool support.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-8-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: implement packet transmit logic

Implement the core logic for transmitting packets in the EEA TX path,
including packet preparation and submission to the underlying transport.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-7-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: implement packet receive logic

Implement the core logic for receiving packets in the EEA RX path,
including packet buffering and basic validation.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-6-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: create/destroy rx,tx queues for netdevice open and stop

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit introduces the implementation for the netdevice open and
stop.

This commit introduces HA to restore the device when error occurs,
but in HA scenarios the driver can't ensure to restore the status
correctly.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-5-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: probe the netdevice and create adminq

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit creates the netdevice after PCI probe,
and initializes the admin queue to send commands to the device.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-4-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: introduce ring and descriptor structures

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit introduces the ring and descriptor implementations.

These structures and ring APIs are used by the RX, TX, and admin queues.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-3-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eea: introduce PCI framework

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit implements the EEA PCI probe functionality.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260514095138.80680-2-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/i915: Fix potential UAF in TTM object purge

TLDR: The bo->ttm object might be changed by calling ttm_bo_validate(),
      move casting it to an i915_tt object later to actually get the right
      pointer.

A user reported hitting the following bug under heavy use on DG2:

[26620.095550] Oops: general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 1 SMP NOPTI
[26620.095556] CPU: 2 UID: 0 PID: 631 Comm: Xorg Not tainted 6.18.8 #1 PREEMPT(lazy)
[26620.095558] Hardware name: ASRock B850M Steel Legend WiFi/B850M Steel Legend WiFi, BIOS 3.50 09/18/2025
[26620.095559] RIP: 0010:i915_ttm_purge+0x84/0x100 [i915]
[26620.095604] Code: 00 00 00 48 8d 54 24 10 48 89 e6 48 89 fb e8 83 aa ae ff 85 c0 75 6f 48 83 bb a8 01 00 00 00 74 2c 48 8b 45 78 48 85 c0 74 23 <48> 8b 78 20 48 c7 c2 ff ff ff ff 31 f6 e8 7a 73 e3 e0 48 8b 7d 78
[26620.095605] RSP: 0018:ffffc90005fd7430 EFLAGS: 00010282
[26620.095607] RAX: a56b6b6b6b6b6b6b RBX: ffff8881f46c3dc0 RCX: 0000000000000000
[26620.095608] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 00000000ffffffff
[26620.095609] RBP: ffff888289610f00 R08: 0000000000000001 R09: ffff88823b022000
[26620.095609] R10: ffff888103029b28 R11: ffff8881fc7f3800 R12: ffff88810b6150d0
[26620.095609] R13: ffff888289610f00 R14: 0000000000000000 R15: ffff8881f46c3dc0
[26620.095610] FS: 00007f1004d86900(0000) GS:ffff88901c858000(0000) knlGS:0000000000000000
[26620.095611] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26620.095611] CR2: 00007f0fdf489000 CR3: 000000035b0c1000 CR4: 0000000000750ef0
[26620.095612] PKRU: 55555554
[26620.095612] Call Trace:
[26620.095615] <TASK>
[26620.095615] i915_ttm_move+0x2b9/0x420 [i915]
[26620.095642] ? ttm_tt_init+0x65/0x80 [ttm]
[26620.095644] ? i915_ttm_tt_create+0xc6/0x150 [i915]
[26620.095667] ttm_bo_handle_move_mem+0xb6/0x160 [ttm]
[26620.095669] ttm_bo_evict+0x100/0x150 [ttm]
[26620.095671] ? preempt_count_add+0x64/0xa0
[26620.095673] ? _raw_spin_lock+0xe/0x30
[26620.095675] ? _raw_spin_unlock+0xd/0x30
[26620.095675] ? i915_gem_object_evictable+0xb7/0xd0 [i915]
[26620.095704] ttm_bo_evict_cb+0x6e/0xd0 [ttm]
[26620.095705] ttm_lru_walk_for_evict+0xa6/0x200 [ttm]
[26620.095708] ttm_bo_alloc_resource+0x185/0x4f0 [ttm]
[26620.095709] ? init_object+0x62/0xd0
[26620.095712] ttm_bo_validate+0x7a/0x180 [ttm]
[26620.095713] ? _raw_spin_unlock_irqrestore+0x16/0x30
[26620.095714] __i915_ttm_get_pages+0xb0/0x170 [i915]
[26620.095737] i915_ttm_get_pages+0x9f/0x150 [i915]
[26620.095759] ? i915_gem_do_execbuffer+0xedc/0x2b40 [i915]
[26620.095786] ? alloc_debug_processing+0xd0/0x100
[26620.095787] ? _raw_spin_unlock_irqrestore+0x16/0x30
[26620.095788] ? i915_vma_instance+0xa0/0x4e0 [i915]
[26620.095822] __i915_gem_object_get_pages+0x2f/0x40 [i915]
[26620.095848] i915_vma_pin_ww+0x706/0x980 [i915]
[26620.095875] ? i915_gem_do_execbuffer+0xedc/0x2b40 [i915]
[26620.095904] eb_validate_vmas+0x170/0xa00 [i915]
[26620.095930] i915_gem_do_execbuffer+0x1201/0x2b40 [i915]
[26620.095953] ? alloc_debug_processing+0xd0/0x100
[26620.095954] ? _raw_spin_unlock_irqrestore+0x16/0x30
[26620.095955] ? i915_gem_execbuffer2_ioctl+0xc9/0x240 [i915]
[26620.095977] ? __wake_up_sync_key+0x32/0x50
[26620.095979] ? i915_gem_execbuffer2_ioctl+0xc9/0x240 [i915]
[26620.096001] ? __slab_alloc.isra.0+0x67/0xc0
[26620.096003] i915_gem_execbuffer2_ioctl+0x11a/0x240 [i915]

Results from decode_stacktrace.sh pointed to dereference of a file pointer
field of a i915 TTM page vector container associated with an object being
purged on eviction.  That path is taken when the object is marked as no
longer needed.

Code analysis revealed a possibility of the i915 TTM page vector container
being replaced with a new instance inside a function that purges content
of the object, should it be still busy.  That function is called,
indirectly via a more general function that changes the object's placement
and caching policy, before the problematic dereference, but still after
a pointer to the container is captured, rendering the pointer no longer
valid.

Fix the issue by capturing the pointer to the container only after its
potential replacement.

v2: Move the container_of() inside the if block (Sebastian),
  - a simplified version of the commit description that explains briefly
    why the change is necessary (Christian).

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/14882
Fixes: 7ae034590ceae ("drm/i915/ttm: add tt shmem backend")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: stable@vger.kernel.org # v5.17+
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20260508122612.469227-2-janusz.krzysztofik@linux.intel.com

drm/i915: Skip deprecated selftest

One of workaround test cases is now deprecated on modern platfroms,
skip it.

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/12061
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20260515133052.1628281-2-janusz.krzysztofik@linux.intel.com

ARM: dts: stm32: stm32mp15x-mecio1-io: Move expander gpio-line-names to board files

Move the gpio-line-names properties for the I2C GPIO expanders (gpio0
and gpio1) out of the common mecio1-io.dtsi file and into the specific
board dts files.

The layout originally defined in the common include file belonged to the
mecio1r1 (Revision 1) hardware. This layout is moved 1:1 into the
stm32mp153c-mecio1r1.dts file.

The mecio1r0 (Revision 0) hardware utilizes a completely different
pinout for these expanders. A new, accurate mapping reflecting the
Revision 0 schematics is added to stm32mp151c-mecio1r0.dts.

Fixes: 8267753c891c ("ARM: dts: stm32: Add MECIO1 and MECT1S board variants")
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260318105123.819807-8-o.rempel@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: stm32mp15x-mecio1-io: Fix expander gpio line typo

Fix a copy-paste error in the GPIO line names for the TCA6416 expander
(gpio@20).

The common mecio1-io include file was originally defined using the
mecio1r1 (Revision 1) hardware layout, but incorrectly labeled pin 13
as "HSIN9_BIAS" instead of the actual "HSIN7_BIAS" present in the
schematics.

Fixes: 8267753c891c ("ARM: dts: stm32: Add MECIO1 and MECT1S board variants")
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260318105123.819807-7-o.rempel@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: stm32mp15x-mecio1-io: Move gpio-line-names to board files

Move the gpio-line-names properties out of the common mecio1-io.dtsi file
and into the specific board dts files.

The pinout originally defined in the common include file belonged to the
mecio1r0 (Revision 0) hardware. This is moved 1:1 into the
stm32mp151c-mecio1r0.dts file without any modifications.

A large number of GPIO pins are swapped on the mecio1r1 (Revision 1)
hardware, so a new, board-specific gpio-line-names mapping is added to
stm32mp153c-mecio1r1.dts to reflect those hardware changes.

Fixes: 8267753c891c ("ARM: dts: stm32: Add MECIO1 and MECT1S board variants")
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20260318105123.819807-6-o.rempel@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: stm32mp15x-mecio1-io: Fix GPIO names typo

The reset pins for the LPOUT lines were incorrectly prefixed with "GPOUT"
instead of "LPOUT" in the gpio-line-names array. Fix these typos so the
pin names consistently match the LPOUT0-4 signals they belong to.

Fixes: 8267753c891c ("ARM: dts: stm32: Add MECIO1 and MECT1S board variants")
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20260318105123.819807-5-o.rempel@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: stm32mp15x-mecio1-io: Move divergent mecio1 ADC channels to board files

Move the divergent adc1 channel definitions out of the common
mecio1-io.dtsi file and into the specific Revision 0 and Revision 1
board files.

The original common file contained incorrect schematic labels for the
Revision 0 hardware (e.g., labeling ana0 as p24v_hpdcm instead of
ain_aux0) and failed to account for physical signal routing changes
between the board revisions.

Retain only the strictly shared channels in the common include file. Map
the correct channels and schematic labels directly within
stm32mp151c-mecio1r0.dts and stm32mp153c-mecio1r1.dts.

Crucially, ensure that the required 200us sample time follows the
phint1_ain signal to its new physical location on channel 3 for the
Revision 1 hardware.

Fixes: 8267753c891c ("ARM: dts: stm32: Add MECIO1 and MECT1S board variants")
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20260318105123.819807-4-o.rempel@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: stm32mp15x-mecio1-io: Fix ADC sampling times

Increase the minimum ADC sample times for all configured channels on
ADC1 and ADC2 to ensure measurement accuracy meets specifications.

The default 5us sample time is insufficient for the internal sampling
capacitor to fully charge. Increase the default time to 20us to relax
the input impedance requirements.

Additionally, the phint0_ain and phint1_ain channels require a much
longer sampling period due to their specific circuit design. Increase
their sample times to 200us. Remove stale comments regarding clock
cycles that no longer match the updated timings.

Fixes: 8267753c891c ("ARM: dts: stm32: Add MECIO1 and MECT1S board variants")
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20260318105123.819807-3-o.rempel@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: stm32mp15x-mecio1-io: Enable internal ADC reference

Switch the ADC reference supply from the general 3.3V rail to the
internal 2.5V VREFBUF regulator. The ADC circuits on this board are
designed for the internal 2.5V reference. Without this change, all ADC
measurement values are incorrect.

Fixes: 8267753c891c ("ARM: dts: stm32: Add MECIO1 and MECT1S board variants")
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20260318105123.819807-2-o.rempel@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: add board pin documentation stm32mp135f-dk

Relate the devices defined in the device tree to the SoC ports and pins
and labels available on the board.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20260420204647.1713944-2-u.kleine-koenig@baylibre.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

ARM: dts: stm32: Enable PHY SSC on DH STM32MP13xx DHCOR DHSBC board

Add realtek,rxc-ssc-enable and realtek,sysclk-ssc-enable to both PHY
DT nodes to enable PHY Spread Spectrum on RXC and SYSCLK, CLKOUT is
disabled and therefore does not need SSC enabled.

Signed-off-by: Marek Vasut <marex@nabladev.com>
Link: https://lore.kernel.org/r/20260411130355.19670-1-marex@nabladev.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>

iommu/io-pgtable-arm: Use address conversion consistently

Use consistent address conversions in the driver:
- virt_to_phys(): For all virtual to physical address conversion,
convert __pa users as we don’t need to rely on it type casting.
- phys_to_virt(): For all physical to virtual address conversion,
similarly, convert __va users.

That changes nothing at all. However, it will be useful when
compiling this file for the KVM hypervisor as it can cleanly
replace virt_to_phys/phys_to_virt

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/io-pgtable-arm: Rework to use the iommu-pages API

Update the io-pgtable-arm allocator to use the iommu-pages API.

Replace the DMA API usage from __arm_lpae_alloc_pages() with
iommu_pages_start_incoherent() and from __arm_lpae_free_pages() with
iommu_pages_free_incoherent().

Since the iommu-pages API relies on metadata stored in the struct page
during iommu_alloc_pages_node_sz(), it cannot be used safely with memory
allocated via the custom cfg->alloc (which may not be backed by pages).
So, isolate that logic and keep it as it.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/io-pgtable-arm: Use consistent sizes for page allocation and freeing

At the moment we use alloc_size to allocate memory but then there
is a logical error where we just size in the error and free path,
which might be smaller.
Also we size to do DMA-API operations, which is OK, but confusing.

Instead of this error-prone handling, just set size to alloc_size
and use it everywhere.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

drm: of: forbid bridge-only calls to drm_of_find_panel_or_bridge()

Up to now drm_of_find_panel_or_bridge() can be called with a bridge pointer
only, a panel pointer only, or both a bridge and a panel pointers. The
logic to handle all the three cases is somewhat complex to read however.

Now all bridge-only callers have been converted to
of_drm_get_bridge_by_endpoint(), which is simpler and handles bridge
refcounting. So forbid new bridge-only users by mandating a non-NULL panel
pointer in the docs and in the sanity checks along with a warning.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-11-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm: zynqmp_dp: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal. To achieve this, instead of
adding an explicit drm_bridge_put(), migrate to the bridge::next_bridge
pointer which is automatically put when the bridge is eventually freed.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-10-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: lt8713sx: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal. To achieve this, instead of
adding an explicit drm_bridge_put(), migrate to the bridge::next_bridge
pointer which is automatically put when the bridge is eventually freed.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-9-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: adv7511: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal. To achieve this, instead of
adding an explicit drm_bridge_put(), migrate to the bridge::next_bridge
pointer which is automatically put when the bridge is eventually freed.

Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-8-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: lt9611: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal. To achieve this, instead of
adding an explicit drm_bridge_put(), migrate to the bridge::next_bridge
pointer which is automatically put when the bridge is eventually freed.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-7-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: lontium-lt9611uxc: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal. To achieve this, instead of
adding an explicit drm_bridge_put(), migrate to the bridge::next_bridge
pointer which is automatically put when the bridge is eventually freed.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-6-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: chrontel-ch7033: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal. To achieve this, instead of
adding an explicit drm_bridge_put(), migrate to the bridge::next_bridge
pointer which is automatically put when the bridge is eventually freed.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-5-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/hisilicon/kirin: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal. Here the bridge pointer is
only stored in a temporary variable, so a cleanup action is enough.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-4-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/msm/hdmi: switch to of_drm_get_bridge_by_endpoint()

This driver calls drm_of_find_panel_or_bridge() with a NULL pointer in the
@panel parameter, thus using a reduced feature set of that function.
Replace this call with the simpler of_drm_get_bridge_by_endpoint().

Since of_drm_get_bridge_by_endpoint() increases the refcount of the
returned bridge, ensure it is put on removal.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-3-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: add of_drm_get_bridge_by_endpoint()

drm_of_find_panel_or_bridge() is widely used, but many callers pass NULL
into the @panel or the @bridge arguments, thus making a very partial usage
of this rather complex function.

Besides, the bridge returned in @bridge is not refcounted, thus making this
API unsafe when DRM bridge hotplug will be introduced.

Solve both issues for the cases of calls to drm_of_find_panel_or_bridge()
with a NULL @panel pointer by adding a new function that only looks for
bridges (and is thus much simpler) and increments the refcount of the
returned bridge.

The new function is identical to drm_of_find_panel_or_bridge() except it:

- handles bridge refcounting: uses of_drm_find_and_get_bridge() instead of
   of_drm_find_bridge() internally to return a refcounted bridge
- is simpler to use: just takes no @panel parameter, returns the pointer
   in the return value instead of a double pointer argument
- has a simpler implementation: it is equal to
   drm_of_find_panel_or_bridge() after removing the code that becomes dead
   when @panel == NULL

Also add this function to drm_bridge.c and not drm_of.c because it returns
bridges only.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-2-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: drm_bridge_put(): ignore ERR_PTR

Most functions returning a struct drm_bridge pointer currently return a
valid pointer or NULL, but this restricts their ability to return an error
code as an ERR_PTR describing the error kind.

In preparation to have new APIs that can return a struct drm_bridge pointer
holding an ERR_PTR (and for those which already do) make drm_bridge_put()
ignore ERR_PTR values, just like it ignores NULL pointers.

This will avoid annoying error checking in many places and the risk of
missing error checks.

Suggested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://lore.kernel.org/all/20260318152533.GA633439@killaraus.ideasonboard.com/
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/all/omlnswxukeqgnatzdvooaashgkfcacjevkvbkm6xt33itgua2k@jcmzll2w6kdq/
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20260511-drm-bridge-alloc-getput-panel_or_bridge-v6-1-f61c9e498b3f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/xe/memirq: Drop cached iosys_map for MEMIRQ status

Since addition of the MSI-X support, we mostly rely on the offset
calculations done by XE_MEMIRQ_STATUS_OFFSET. We don't use this
separate map pointing to the first status page anymore.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-10-michal.wajdeczko@intel.com

drm/xe/memirq: Drop cached iosys_map for MEMIRQ mask

It is used occasionally and iosys_map_wr() helper takes an offset
parameter anyway. There is no extra benefit to keep a separate map.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-9-michal.wajdeczko@intel.com

drm/xe/memirq: Dump all source pages if MSI-X

When using MSI-X, engines report their source/status on separate
MEMIRQ pages, so we need to dump additional source pages, not just
the first one.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-8-michal.wajdeczko@intel.com

drm/xe/memirq: Update diagnostic message

Instead printing static offset values, print number of allocated
pages and the actual GGTT addresses of the page zero source and
status and address of the common mask vector.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-7-michal.wajdeczko@intel.com

drm/xe/memirq: Reduce buffer size

When using MSI-X, we don't have to allocate the largest possible
buffer to accommodate all potential engine instances. Loop through
available engines, find highest engine instance and reduce buffer
size to avoid memory waste.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-6-michal.wajdeczko@intel.com

drm/xe/memirq: Use IRQ page from HW engine definition

We can now drop repeated calculations of the actual IRQ page used
by the engines from our memory based interrupt handler and other
functions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-5-michal.wajdeczko@intel.com

drm/xe/memirq: Update GuC initialization and IRQ handler

Introduce and use simple macro to calculate exact location of the
status vector to avoid inline calculation. Fix type for the GuC
source and status MEMIRQ addresses.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-4-michal.wajdeczko@intel.com

drm/xe/memirq: Make page layout macros private

There is no need to expose the macros describing memory-based
interrupts page layouts in the .h file as we only use them in
the private code. Move them to the .c file near the kernel-doc.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-3-michal.wajdeczko@intel.com

drm/xe: Add IRQ page to HW engine definition

For each HW engine definition, we already make changes to the IRQ
offset, as required when using MSI-X, but we leave actual MEMIRQ
page selection to the MEMIRQ handler, repeated on every interrupt.

As a preparation step to simplify the MEMIRQ handler, store the
MEMIRQ page number as part of the HW engine definition.

Suggested-by: Ilia Levi <ilia.levi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patch.msgid.link/20260518192547.600-2-michal.wajdeczko@intel.com

iommu/amd: Adhere to IVINFO[VASIZE] for address limits

ACPI IVRS IVHD’s IVINFO field reports the maximum virtual address
size (VASIZE) supported by the IOMMU. The AMD IOMMU driver currently
caps this with pagetable level reported by EFR[HATS] when configuring
paging domains (hw_max_vasz_lg2). On systems where firmware or VM
advertises smaller or different limits, the driver may over-advertise
capabilities and create domains outside the hardware’s actual bounds.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu_pt: add kunit config for 32-bit VA (amdv1_cfg_1)

Add test coverage for small VAs (32‑bit) starting at level 2 by enabling
the AMDv1 KUnit configuration. This limits level expansion because the
starting level can accommodate only the maximum virtual address requested.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu_pt: support small VA for AMDv1

When hardware/VM request a small VA limit, the generic page-table code
clears PT_FEAT_DYNAMIC_TOP. This later causes domain initialization to
fail with -EOPNOTSUPP.

Remove the clearing so init succeeds when the VA fits in the starting
level and no top-level growth is needed.

Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu_pt: Fix pgsize_bitmap calculation in get_info for smaller vasz's

To properly enforce the domain VA limit, clamp pgsize_bitmap using the
requested max_vasz_lg2 in get_info().
Apply the same VA limit as get_info() in the kunit possible_sizes test so
assertions stay consistent with the domain bitmap.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/riscv: Add NAPOT range invalidation support

Use the RISC-V IOMMU Address Range Invalidation extension
(capabilities.S, spec section 9.3) to invalidate an IOVA range with
a single IOTINVAL.VMA command using NAPOT-encoded addressing.

One iommu_iotlb_gather maps to one NAPOT invalidation command. The
smallest power-of-two aligned range covering the gather is used since
over-invalidation is always safe.

S and NL seem to be orthogonal in the spec, so if NL is not
supported then global invalidation is probably always going to happen
as wiping a large range without a table change is not common.

Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/riscv: Include the dword number in RISCV_IOMMU_CMD macros

The command queue entry format is 128 bits. Follow the pattern of the
other drivers and encode the 64 bit dword number in the macro
itself. RISC-V further has similarly named macros that are not field
layout macros, but field content macros which won't get a new number.

Overall this is clearer to understand the code and check for errors like
using the wrong macro in the wrong spot.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/riscv: Add RISCV_IOMMU_CAPABILITIES_NL

Non-leaf invalidation allows the single invalidate command to also
clear the walk cache. If NL is available, set the NL bit if the
gather indicates tables have been changed. The stride is already
calculated properly.

Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/riscv: Compute best stride for single invalidation

Replace the per-page IOTLB invalidation loop with stride-based
invalidation that uses the level bitmaps from iommu_iotlb_gather.

Pre-calculate the invalidation information before running over the
bonds loop as it is the same for every entry.

The lowest set bit in the PT_FEAT_DETAILED_GATHER bitmaps indicates
the stride. This design ignores the SVNAPOT contiguous pages on the
assumption that they still have to be individually invalidated like
ARM requires, though it is not clear from the spec.

Replace the 2M cutoff for global invalidation with a 512 command
limit. This is the same for a 4k stride and now scales with the
stride size.

Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/riscv: Enable PT_FEAT_DETAILED_GATHER and pass gather to iotlb_inval

RISC-V can use the information from PT_FEAT_DETAILED_GATHER to
compute the best stride to generate the single TLB invalidations.

Pass the gather down to the lower functions and create a full-range
gather for the flush-all callback.

Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

drm/bridge: megachips: remove bridge when irq request fails

If devm_request_threaded_irq() fails after drm_bridge_add(), remove the
bridge before returning.

Keep drm_bridge_add() rather than devm_drm_bridge_add(): registration is
tied to the STDP4028 device while ge_b850v3_register() may complete from
either I2C probe; devm would not unwind the bridge if the other client's
probe fails.

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Fixes: fcfa0ddc18ed ("drm/bridge: Drivers for megachips-stdpxxxx-ge-b850v3-fw (LVDS-DP++)")
Cc: stable@vger.kernel.org
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Tested-by: Ian Ray <ian.ray@gehealthcare.com>
Link: https://patch.msgid.link/20260430195700.80317-1-osama.abdelkader@gmail.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

iommupt: Add PT_FEAT_DETAILED_GATHER

Generating the ARM SMMUv3 and RISC-V invalidation commands optimally
requires some additional details from iommupt:

- leaf_levels_bitmap is used to compute the ARM Range Invalidation
  Table Top Level hint

- leaf_levels_bitmap is also used to compute the stride when
  generating single invalidations to invalidate once per leaf

- table_levels_bitmap also computes the ARM TTL for future cases when
  there are no leaves

Put these under a feature since only two drivers need to calculate
them.

This is also useful for the coming kunit iotlb invalidation test to
know more about what invalidation is happening.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommupt: Add struct iommupt_pending_gather

Add a struct to keep track of all the things that are pending to be
merged into the gather. The way gather merging works, the pending
range is checked against the current gather, and the current gather
can be flushed before the pending things are added.

Thus, if new things have to be recorded in the gather they need to be
kept in the pending struct until after the gather is optionally
flushed.

The next patch adds new items to the gather and the pending struct.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu: Split the kdoc comment for struct iommu_iotlb_gather

Use in-line member documentation and add some small clarifications to
the members. This is preparation to add more members.

- Note that pgsize is only used by arm-smmuv3

- Note that freelist is only used by iommupt

- Reword queued to emphasize the flush-all behavior

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

spi: Merge up fixes

Help the CI by merging up fixes into the development branch.

batman-adv: bla: avoid NULL-ptr deref for claim via dropped interface

Without rtnl_lock held, a hardif might be retrieved as primary interface of
a meshif, but then (while operating on this interface) getting decoupled
from the mesh interface. In this case, the meshif still exists but the
pointer from the primary hardif to the meshif is set to NULL.

The mesh_iface must be checked first to be non-NULL before continuing to
send an ARP request using meshif.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot+9fdcc9f05a98a540b816@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9fdcc9f05a98a540b816
Signed-off-by: Sven Eckelmann <sven@narfation.org>

iommu/vt-d: Simplify calculate_psi_aligned_address()

This is doing far too much math for the simple task of finding a
power of 2 that fully spans the given range. Use fls directly on
the xor which computes the common binary prefix.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v2-895748900b39+5303-iommupt_inv_vtd_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

drm/bridge: chipone-icn6211: use devm_drm_bridge_add in dsi probe

Use devm_drm_bridge_add() so the bridge is released if probe fails after
registration, and drop drm_bridge_remove() in chipone_dsi_probe.

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://patch.msgid.link/20260430194944.78119-2-osama.abdelkader@gmail.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

selftests: net: add tests for PPPoL2TP

Add ping, iperf3, and recursion tests for PPPoL2TP.

Assisted-by: Gemini:gemini-3-flash
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260514015743.37869-1-qingfang.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/bridge: chipone-icn6211: use devm_drm_bridge_add in i2c probe

Use devm_drm_bridge_add() so the bridge is released if probe
fails after registration, and drop drm_bridge_remove() in chipone_i2c_probe.

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Fixes: 8dde6f7452a1 ("drm: bridge: icn6211: Add I2C configuration support")
Cc: stable@vger.kernel.org
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://patch.msgid.link/20260430194944.78119-1-osama.abdelkader@gmail.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

selftests: net: test PPPoE packets in gro.sh

Add PPPoE test-cases to the GRO selftest. Only run a subset of
common_tests to avoid changing the hardcoded L3 offsets everywhere.
Add a new "pppoe_sid" test case to verify that packets with different
PPPoE session IDs are correctly identified as separate flows and not
coalesced.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260513013400.7467-2-qingfang.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: pppoe: implement GRO/GSO support

Only handles packets where the pppoe header length field matches the exact
packet length. Significantly improves rx throughput.

When running NAT traffic through a MediaTek MT7621 devices from a host
behind PPPoE to a host directly connected via ethernet, the TCP throughput
that the device is able to handle improves from ~130 Mbit/s to ~630 Mbit/s,
using fraglist GRO.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Tested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20260513013400.7467-1-qingfang.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

batman-adv: bla: avoid double decrement of bla.num_requests

The bla.num_requests is increased when no request_sent was in progress. And
it is decremented in various places (announcement was received, backbone is
purged, periodic work). But the check if the request_sent is actually set
to a specific state and the atomic_dec/_inc are not safe because they are
not atomic (TOCTOU) and multiple such code portions can run concurrently.

At the same time, it is necessary to modify request_sent (state) and
bla.num_requests atomically. Otherwise batadv_bla_send_request() might set
request_sent to 1 and is interrupted. batadv_handle_announce() can then
set request_sent back to 0 and decrement num_requests before
batadv_bla_send_request() incremented it.

The two operations must therefore be locked. And since state (request_sent)
and wait_periods are only accessed inside this lock, they can be converted
to simpler datatypes. And to avoid that the bla.num_requests is touched by
a parallel running context with a valid backbone_gw reference after
batadv_bla_purge_backbone_gw() ran, a third state "stopped" is required to
correctly signal that a backbone_gw is in the state of being cleaned up.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: bla: fix report_work leak on backbone_gw purge

batadv_bla_purge_backbone_gw() removes stale backbone gateway entries,
but fails to properly handle their associated report_work:

- If report_work is running, the purge must wait for it to finish before
  freeing the backbone_gw, otherwise the worker may access freed memory
  (e.g. bat_priv).
- If report_work is pending, the purge must cancel it and release the
  reference held for that pending work item.

The previous implementation called hlist_for_each_entry_safe() inside a
spin_lock_bh() section, but cancel_work_sync() may sleep and therefore
cannot be called from within a spinlock-protected region.

Restructure the loop to handle one entry per spinlock critical section:
acquire the lock, find the next entry to purge, remove it from the hash
list, then release the lock before calling cancel_work_sync() and
dropping the hash_entry reference. Repeat until no more entries require
purging.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Reviewed-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: iv: recover OGM scheduling after forward packet error

When batadv_iv_ogm_schedule_buff() fails to allocate and queue a forward
packet for OGM transmission, the work item that drives periodic OGM
scheduling is never re-armed. This silently halts transmission of the
node's own OGMs on the affected interface — only OGMs from other peers
continue to be aggregated and forwarded.

Fix this by tracking whether batadv_iv_ogm_queue_add() (and transitively
batadv_iv_ogm_aggregate_new()) successfully scheduled a forward packet.
When scheduling fails, batadv_iv_ogm_schedule_buff() falls back to queuing
a dedicated recovery work item (reschedule_work) that fires after one
originator interval and calls batadv_iv_ogm_schedule() again.

Cc: stable@kernel.org
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

Merge branch 'for-linus' into for-next

Signed-off-by: Takashi Iwai <tiwai@suse.de>

drm/i915/psr: Apply Intel DPCD workaround when SDP on prior line used

There is Intel specific workaround DPCD address containing workaround for
case where SDP is on prior line. Apply this workaround according to values
in the offset.

Fixes: 61e887329e33 ("drm/i915/xelpd: Handle PSR2 SDP indication in the prior scanline")
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260515095756.2799483-4-jouni.hogander@intel.com
(cherry picked from commit c3fe899fbeac86ea4a5ca9dd845b2cbc0da46249)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>

drm/i915/psr: Read Intel DPCD workaround register

Read Intel DPCD workaround register and store it into
intel_connector->dp.psr_caps. psr_caps was chosen as currently it contains
only PSR workaround for PSR2 SDP on prior scanline implementation.

Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260515095756.2799483-3-jouni.hogander@intel.com
(cherry picked from commit c48ff24d0f4ab7ad696b2d35ad64ce7e049c668c)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>

drm/i915/psr: Add defininitions for INTEL_WA_REGISTER_CAPS DPCD register

EDP specification says:

"If either VSC SDP is unable to be transmitted 100 ns before the SU region,
the Source device may optionally transmit the VSC SDP during the prior
video scan line’s HBlank period There is a Intel specific drm dp register
currently containing bits related how TCON can support PSR2 with SDP on
prior line."

Unfortunately many panels are having problems in implementing this. So
there is a custom Intel specific DPCD register (INTEL_WA_REGISTER_CAPS) to
figure out if this is properly implemented on a panel or if panel doesn't
require that 100 ns delay before the SU region. Here are the definitions in
this custom DPCD address:

0 = Panel doesn't support SDP on prior line
1 = Panel supports SDP on prior line
2 = Panel doesn't have 100ns requirement
3 = Reserved

Add definitions for this new register and it's values into new header
intel_dpcd.h.

v2: add INTEL_DPCD_ prefix to definitions

Bspec: 74741
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260515095756.2799483-2-jouni.hogander@intel.com
(cherry picked from commit 1da1c9294825f08f622c473480d185680c2a3b75)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>

media: mali-c55: Disable pm_runtime on probe error

When mali_c55_media_frameworks_init() fails, the goto target jumps to
err_free_context_registers, skipping pm_runtime_disable() despite
pm_runtime having already been enabled earlier in the function.

Fix this by adding an err_pm_runtime_disable label and redirecting the
frameworks init failure to it, so pm_runtime is properly unwound on
that error path. The runtime PM status is also set back to suspended
before disabling, to undo the pm_runtime_set_active() from probe.

Cc: stable@vger.kernel.org
Fixes: d5f281f3dd29 ("media: mali-c55: Add Mali-C55 ISP driver")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>

media: mali-c55: Power-off the peripheral in remove()

The Mali C55 driver doesn't depend on PM. For this reason, if pm_runtime
is not compiled in it is required to manually power-off the peripheral
during the driver's remove() handler.

Also pm_runtime_enable() is called during probe but mali_c55_remove()
never calls pm_runtime_disable(), leaving the device's runtime PM state
enabled after the driver is unbound.

Manually power-off the peripheral in remove() if the peripheral has not
been suspended using runtime_pm and disable runtime pm.

Cc: stable@vger.kernel.org
Fixes: d5f281f3dd29 ("media: mali-c55: Add Mali-C55 ISP driver")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>

media: mali-c55: Add missing of_reserved_mem_device_release()

mali_c55_probe() calls of_reserved_mem_device_init() to associate
reserved memory regions with the device. This function allocates a
struct rmem_assigned_device and adds it to a global linked list, which
must be explicitly released via of_reserved_mem_device_release() — there
is no devm variant of this API.

However, neither the probe error paths nor mali_c55_remove() called
of_reserved_mem_device_release(). Any probe failure after the
of_reserved_mem_device_init() call, as well as every normal device
removal, leaked the reserved memory association on the global list.

Fix this by adding an err_release_mem label at the end of the probe
error chain and calling of_reserved_mem_device_release() in
mali_c55_remove(). The remove teardown order is also corrected to call
mali_c55_media_frameworks_deinit() before kfree(), mirroring the probe
init order in reverse.

Cc: stable@vger.kernel.org
Fixes: d5f281f3dd29 ("media: mali-c55: Add Mali-C55 ISP driver")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>

media: mali-c55: Fix possible ERR_PTR in enable_streams

The media_pad_remote_pad_unique() function returns either a valid
pointer or an ERR_PTR() on failure (-ENOTUNIQ if multiple links are
enabled, -ENOLINK if no connected pad is found). The return value
was assigned directly to isp->remote_src and dereferenced in the
next line without checking for errors, which could lead to an
ERR_PTR dereference.

Add proper error checking with IS_ERR() before dereferencing the
pointer. Also set isp->remote_src to NULL on error to maintain
consistency with other error paths in the function.

Cc: stable@vger.kernel.org
Fixes: d5f281f3dd29 ("media: mali-c55: Add Mali-C55 ISP driver")
Signed-off-by: Alper Ak <alperyasinak1@gmail.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>

media: mali-c55: core: Remove redundant dev_err()

The platform_get_irq_byname() function already prints an error message
internally upon failure using dev_err_probe(). Therefore, the explicit
dev_err() is redundant and results in duplicate error logs.

Remove the redundant dev_err() call to clean up the error path.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>

media: mali-c55: Initialise dev for tpg/rsz/isp subdevs

The subdevices registered by the Mali-C55 driver do not have their
'struct device *dev' member initialized. This is visibile when looking
at debug message, as in example:

"(NULL device *): collect_streams: sub-device 'mali-c55 tpg' does not
support streams"

Fix this by initializing the *dev field for each subdevice registered
by the Mali-C55 driver.

Signed-off-by: jempty.liang <imntjempty@163.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>