net: mdio: realtek-rtl9300: Add pages to info structure
The Realtek ethernet MDIO controller has a proprietary paging feature
that is closely aligned with Realtek based PHYs. These PHY know "pages"
for C22 access. Those can be switched via reads/writes to register 31.
Usually the paged access must be programmed in four steps.
1. read/save page register
2. change "page" register 31
3. read/write data register (on the given page)
4. restore page register
The controller can run all this in hardware with one single request
from the driver. It is given the page, the register and the data
and takes care of all the rest. This reduces CPU load. The number
of supported pages depend on the model. This is either 4096 for low
port count SOCs (up to 28 ports) or 8192 for high port count SOCs
(up to 56 ports).
There is however one special page that allows to pass through all C22
commands directly to the PHY - without any caching. This so called raw
page is dependent of the hardware. It is the highest supported page
number minus 1.
Provide the number of supported pages as a device specific property.
This new "num_pages" aligns with the existing properties and gives
an better insight into the hardware layout than just defining the
number of the raw page. The later directly derives from that and
can be accessed with the new RAW_PAGE() macro. Make use of it where
needed.
net: mdio: realtek-rtl9300: Add ports to info structure
The ethernet MDIO controller in the Realtek Otto series has a very special
command register style. Instead of working with bus/address it works on
ethernet port numbers. For this the controller is initialized via mapping
registers that tell which port is mapped to which bus/address. Every
request to the driver is then converted as follows
1. Kernel calls driver with bus/address
2. Driver converts bus/address to port and issues command
3. Hardware maps port back to bus/address
The number of ports is different for each device. Make this configurable
by adding a property to the info structure. Switch the existing usage of
MAX_PORTS to this new property where needed.
net: mdio: realtek-rtl9300: Add device specific info structure
Device properties of the RTL930x SOCs are hardcoded into the MDIO driver.
This must be relaxed to support additional devices like the RTL838x or
RTL839x. These do not have 4 SMI buses but 1 or 2 instead.
To support multiple devices establish an info structure that contains
individual variations of each series. As a first use case add the number
of buses into this structure and use it where needed.
The Realtek ethernet MDIO driver currently only serves SOCs from the
Realtek RTL930x series. This is only one lineup of the Realtek Otto
switch series that also knows RTL838x, RTL839x, RTL931x devices.
All of these share similar hardware with comparable MMIO access logic
but have individual variations. Important to note
- Controller works on switch ports instead of buses and addresses.
- Devices incorporate additional MDIO hardware. E.g.
- an auxiliary MDIO controller for GPIO expanders [1]
- a MDIO style SerDes controller [2]
To avoid future confusion enhance the driver documentation and
function naming. Make clear what this driver is about and what
parts are generic and what parts are device specific. For this
rename the function and structure prefix as follows:
- for generic functions use otto_emdio_
- for device specific helpers use e.g. otto_emdio_9300_
This prefix naming tries to align with the watchdog timer [3].
It paves the way so that drivers for the other Realtek Otto MDIO
controllers can be added in future commits using the same naming
convention.
Remark 1: The read/write functions are kept device specific for now
because they will only fit the RTL930x SOCs. Renaming will take place
as soon as the I/O handling will be generalized.
Remark 2: The driver name "mdio-rtl9300" is kept for now.
When the system transitions from bootloader to kernel, the GPIO is
expected to keep driving high.
However, the Linux kernel first configures the pin direction and then
sets the output value. This may cause a brief low-level glitch on the
GPIO line, which can be problematic for regulator control.
By configuring the output value before switching the pin direction to
output, the glitch can be avoided.
This commit fixes the issue by swapping the configuration order.
Fixes: 6e9be3abb78c ("pinctrl: Add driver support for Amlogic SoCs") Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Linus Walleij <linusw@kernel.org>
Add a module containing kunit test cases for GPIO core. The idea is to
use it to test functionalities that can't easily be tested from
user-space with kernel selftests or GPIO character device test suites
provided by the libgpiod package.
For now add test cases that verify software node based lookup and ensure
that a GPIO provider unbinding with active consumers does not cause a
crash.
Tests may want to unregister a platform device as part of the test case
logic. Using the regular platform_device_register() with kunit
assertions may result in a platform device leak or otherwise requires
cumbersome error handling. Provide a function that unregisters a
kunit-managed platform device and drops the release action from the
test's list.
====================
ip6_vti: vti6_changelink and vti6_siocdevprivate netns fixes
1/2 carries forward Eric Dumazet's Reviewed-by. Only the Fixes
tag changes there. 2/2 changes the Fixes tag and adds the
ns_capable hunk.
====================
Maoyi Xie [Thu, 21 May 2026 13:05:55 +0000 (21:05 +0800)]
ip6: vti: Use ip6_tnl.net in vti6_siocdevprivate().
After patch 1/2 in this series, vti6_update() unlinks and relinks
the tunnel through t->net. vti6_siocdevprivate() still uses
dev_net(dev) for the collision lookup. For a tunnel moved through
IFLA_NET_NS_FD, dev_net(dev) is the new netns, not t->net.
SIOCCHGTUNNEL on a migrated tunnel then runs:
net = dev_net(dev) /* migrated netns */
t = vti6_locate(net, &p1, false) /* misses target in t->net */
...
t = netdev_priv(dev)
vti6_update(t, &p1, false) /* mutates t->net's hash */
A caller in the migrated netns picks params that match a tunnel
in the creation netns. The lookup in dev_net(dev) finds nothing.
vti6_update() prepends the migrated tunnel at the head of the
creation netns hash bucket for those params. Later lookups in
the creation netns resolve to the migrated device. xfrm receive
delivers the matched packets through a device the caller controls.
Reachable from an unprivileged user namespace (unshare --user
--map-root-user --net). Cross tenant scope on container hosts.
Switch the SIOCCHGTUNNEL path on a non fallback device to use
t->net for the lookup. The lookup now matches the netns
vti6_update() operates on.
Also add ns_capable(self->net->user_ns, CAP_NET_ADMIN) before
the lookup. The check at the top of the case is against
dev_net(dev)->user_ns, which after migration is the attacker's
netns. A caller there can pick params absent from self->net,
the lookup returns NULL, t becomes self, and vti6_update()
inserts the device into the creation netns hash. The new check
requires CAP_NET_ADMIN in the creation netns user_ns too.
SIOCADDTUNNEL and SIOCCHGTUNNEL on the fallback device keep
dev_net(dev), which equals init_net there.
ip netns add ns1
ip netns add ns2
ip -n ns1 link add vti6_test type vti6 remote ::1 local ::2 key 7
ip -n ns1 link set vti6_test netns ns2
ip -n ns2 link set vti6_test type vti6 remote ::3 local ::4 key 9
ip netns del ns2
ip netns del ns1
[ 132.495484] ------------[ cut here ]------------
[ 132.497609] kernel BUG at net/core/dev.c:12376!
Commit 61220ab34948 ("vti6: Enable namespace changing") dropped
NETIF_F_NETNS_LOCAL from vti6 devices. A vti6 tunnel can then
move through IFLA_NET_NS_FD. After the move dev_net(dev) points
at the new netns while t->net stays at the creation netns.
vti6_changelink() and vti6_update() still use dev_net(dev) and
dev_net(t->dev). They unlink from one per netns hash and relink
into another. The creation netns is left with a stale entry.
cleanup_net() of that netns later walks freed memory.
Reachable from an unprivileged user namespace (unshare --user
--map-root-user --net). Cross tenant scope on container hosts.
hash_pointers= accepts a small set of mode strings, but the parser uses
strncmp() with the length of each valid mode. That accepts values with
trailing garbage, such as hash_pointers=autobots or
hash_pointers=nevermind, as valid aliases for auto and never.
Use strcmp() so that only the documented mode strings are accepted.
Invalid values will continue to fall back to auto through the existing
unknown-mode path.
exec_mmap() installs the new mm and then tears the old one down while
still holding exec_update_lock for writing -- and with cred_guard_mutex
held all the way to setup_new_exec():
Neither lock is needed for this. exec_update_lock only exists to make the
mm swap atomic with the later commit_creds(), so that permission-checking
readers (proc, ptrace, the futex robust list, perf, kcmp, mm_access())
never observe the new mm together with the old credentials. Those readers
all operate on task->mm, i.e. the new mm after the swap; none looks at the
detached old mm, its ->owner or signal->maxrss. cred_guard_mutex guards
credential calculation and is equally irrelevant here.
The cost is real: __mmput() runs exit_mmap() over the entire old address
space and can block in exit_aio() waiting for in-flight AIO, all while
holding exec_update_lock for writing and cred_guard_mutex. For execve() of
a large process this blocks ptrace_attach() and every exec_update_lock
reader for the duration of the teardown.
Stash the old mm in bprm->old_mm and release it from setup_new_exec()
after both locks are dropped. setup_new_exec() still runs before
setup_arg_pages() and the segment mappings, so the old address space is
freed before the new one is populated and peak memory is unchanged. The
ordering constraints are kept: old_mm's mmap_lock is still dropped in
exec_mmap() before mm_update_next_owner() (required since commit 31a78f23bac0 ("mm owner: fix race between swapoff and exit")), and
mm_update_next_owner() still precedes mmput(); both run in the execing
task's context, as mm_update_next_owner() requires.
If exec swaps the mm but fails before setup_new_exec() runs the old mm
would leak, so add a backstop in free_bprm(). The lazy-tlb case
(old_mm == NULL, e.g. kernel_execve()) has no address space to
free and is left in exec_mmap().
Merge patch series "exec: introduce task_exec_state for exec-time metadata"
Christian Brauner (Amutable) <brauner@kernel.org> says:
This series relocates the dumpable mode and the user_namespace
captured at execve() from mm_struct onto a new per-task
task_exec_state structure that stays attached to the task for its
full lifetime.
__ptrace_may_access() and several /proc owner / visibility checks
need to consult two pieces of state for any observable task,
including zombies that have already gone through exit_mm(): the
dumpable mode and the user namespace captured at execve(). Both
live on mm_struct today, which exit_mm() clears from the task long
before the task is reaped.
A reader that races with do_exit() observes task->mm == NULL and
either fails the check or falls back to init_user_ns - which denies
legitimate access to non-dumpable zombies that were running in a
nested user namespace.
mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* layout exposed via
/proc/<pid>/coredump_filter stays stable. task->user_dumpable and its
exit_mm() snapshot are removed.
task_exec_state is the privilege domain established by an execve()
[1]. Within a thread group it is shared via refcount; across thread
groups each task has its own:
- CLONE_VM siblings (thread-group members, io_uring workers)
refcount-share the parent's exec_state.
- Non-CLONE_VM clones (fork(), vfork() without CLONE_VM)
allocate a fresh exec_state inheriting the parent's dumpable
mode and user_ns.
- execve() in the child allocates a fresh instance and installs
it under task_lock + exec_update_lock via
task_exec_state_replace().
- Credential changes (setresuid, capset, ...) and
prctl(PR_SET_DUMPABLE) update dumpability on the current
task's exec_state, i.e. on the thread group's shared instance.
Behavioral change:
Kernel threads that briefly use a user mm via kthread_use_mm() no
longer inherit dumpability from the borrowed mm. Kthreads are not
ptraceable (PF_KTHREAD short-circuits __ptrace_may_access), so this
is observable only via /proc surfaces that a sufficiently privileged
reader can reach.
The dumpable flag captured at execve() is consulted by
__ptrace_may_access() and several /proc owner / visibility checks.
It lives on mm_struct today, which exit_mm() clears from the task
long before the task itself is reaped.
exec_state is anchored to the execve() that established the current
privilege domain. CLONE_VM siblings refcount-share the parent's
exec_state via copy_exec_state(); non-CLONE_VM clones allocate a
fresh exec_state inheriting the parent's dumpable mode and user_ns
reference via task_exec_state_copy(). execve() allocates a fresh
instance (via alloc_task_exec_state() in begin_new_exec()) and
installs it under task_lock + exec_update_lock with
task_exec_state_replace(). init_task uses a static instance.
The dumpable mode now lives on task->exec_state->dumpable.
task->mm->flags no longer carries dumpability; MMF_DUMPABLE_MASK is
removed, but MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* bit
positions remain stable for the /proc/<pid>/coredump_filter ABI. The
task->user_dumpable cache bit and its assignment in exit_mm() are
removed; readers go through get_dumpable(task) directly.
coredump_params gains a snapshot field cprm.dumpable, populated from
get_dumpable(current) at vfs_coredump() entry, replacing the previous
__get_dumpable(cprm->mm_flags) consumers in fs/coredump.c and
fs/pidfs.c.
The user namespace recorded at execve() is consulted by
__ptrace_may_access() and by /proc/PID/* owner derivation. Move the
captured user_ns onto task_exec_state, which stays attached to the task
past exit_mm() and across exit_files().
bprm grows a user_ns field staged in bprm_mm_init() with the caller's
user_ns, narrowed by would_dump() to the closest privileged ancestor,
and consumed by exec_mmap() via alloc_task_exec_state(bprm->user_ns).
free_bprm() releases the staging reference.
mm_struct loses ->user_ns entirely. Initializers in init-mm, efi_mm,
and the implicit one in mm_init()/dup_mm()/mm_alloc() are removed;
__mmdrop() drops the matching put_user_ns(). The kthread_use_mm()
WARN_ON_ONCE(!mm->user_ns) is no longer meaningful and goes too.
Introduce struct task_exec_state, a per-task RCU-protected structure
that holds the dumpable mode and the user namespace and stays attached
to the task for its full lifetime.
task_exec_state_rcu() is the canonical reader: asserts RCU or
task_lock is held, WARNs on a NULL state, returns the
rcu_dereference()'d pointer.
Replace the SUID_DUMP_DISABLE/USER/ROOT preprocessor constants with
enum task_dumpable. Numeric values are preserved (kernel.suid_dumpable
sysctl and prctl(PR_SET_DUMPABLE) ABI), so this is a pure rename with
no behavioral change.
Subsequent commits relocate dumpability onto a per-task structure
where the enum type will allow stronger type-checking on the new API.
Weiming Shi [Thu, 21 May 2026 08:12:01 +0000 (01:12 -0700)]
net: team: fix NULL pointer dereference in team_xmit during mode change
__team_change_mode() clears team->ops with memset() before restoring
safe dummy handlers via team_adjust_ops(). A concurrent team_xmit()
running under RCU on another CPU can read team->ops.transmit during
this window and call a NULL function pointer, crashing the kernel.
The race requires a mode change (CAP_NET_ADMIN) concurrent with
transmit on the team device.
The original code assumed that no ports means no traffic, so mode
changes could freely memset()/memcpy() the ops. AF_PACKET with
forced carrier breaks that assumption.
Prevent the race instead of making it safe: replace memset()/memcpy()
with per-field updates that never touch transmit or receive. Those
two handlers are managed solely by team_adjust_ops(), which already
installs dummies when tx_en_port_count == 0 (always true during mode
change since no ports are present). WRITE_ONCE/READ_ONCE prevent
store/load tearing on the handler pointers.
synchronize_net() before exit_op() drains in-flight readers that may
still reference old mode state from before port removal switched the
handlers to dummies.
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260521081159.1491563-3-bestswngs@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Zhengchuan Liang [Fri, 22 May 2026 09:31:55 +0000 (17:31 +0800)]
xfrm: input: hold netns during deferred transport reinjection
Transport-mode reinjection stores a struct net pointer in skb->cb and
uses it later from xfrm_trans_reinject(). That pointer must stay valid
until the deferred callback runs.
Take a netns reference when queueing deferred reinjection work and drop
it after the callback completes. Use maybe_get_net() so the queueing
path does not revive a namespace that is already being torn down.
This keeps the existing workqueue design and fixes the netns lifetime
handling in one place for all users of xfrm_trans_queue_net().
Usama Arif [Thu, 21 May 2026 10:29:26 +0000 (03:29 -0700)]
xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit
The struct pernet_operations docstring in include/net/net_namespace.h
explicitly warns against blocking RCU primitives in .exit handlers:
Exit methods using blocking RCU primitives, such as
synchronize_rcu(), should be implemented via exit_batch.
[...]
Please, avoid synchronize_rcu() at all, where it's possible.
Note that a combination of pre_exit() and exit() can
be used, since a synchronize_rcu() is guaranteed between
the calls.
xfrm_policy_fini() violates this: it calls synchronize_rcu() before
freeing the policy_bydst hash tables (so no RCU reader is mid-
traversal at free time), but runs from xfrm_net_ops.exit -- once per
namespace -- so a cleanup_net() of N namespaces pays N full RCU
grace periods serially.
Use the documented pre_exit/exit split. Move the policy flush (and
the workqueue drains it depends on) into a new .pre_exit handler;
xfrm_policy_fini() then runs in .exit and frees the hash tables
after the synchronize_rcu_expedited() that cleanup_net() guarantees
between the two phases. Providing O(1) RCU grace periods per batch
instead of O(N).
Observed on Linux 6.18 with a workload doing unshare(CLONE_NEWNET)
at ~13/sec sustained: cleanup_net() and the netns_wq rescuer kthread
both stuck in xfrm_policy_fini()'s synchronize_rcu(), >300k struct
net accumulated in the cleanup queue, Percpu in /proc/meminfo climbed
to 130+ GB on 256-CPU hosts, and memcg OOMs followed. setup_net and
__put_net counts were balanced, ruling out a refcount leak.
Fixes: 069daad4f2ae ("xfrm: Wait for RCU readers during policy netns exit") Signed-off-by: Usama Arif <usama.arif@linux.dev> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Shaomin Chen [Wed, 20 May 2026 18:07:23 +0000 (02:07 +0800)]
xfrm: iptfs: reset runtime state when cloning SAs
iptfs_clone_state() clones the IPTFS mode data with kmemdup(). This
copies runtime objects which must not be shared with the original SA,
including the embedded sk_buff_head, hrtimers, spinlock, and in-flight
reassembly/reorder state.
If xfrm_state_migrate() fails after clone_state() but before the later
init_state() call has reinitialized those fields, the cloned state can be
destroyed by xfrm_state_gc_task() with list and timer state copied from the
original SA. With queued packets this lets the clone splice and free skbs
owned by the original IPTFS queue, leading to use-after-free and
double-free reports in iptfs_destroy_state() and skb release paths.
Reinitialize the clone's runtime state before publishing it through
x->mode_data. Because clone_state() now publishes a destroyable mode_data
object before init_state(), take the mode callback module reference there.
Avoid taking it again from __iptfs_init_state() for the same object.
Len Bao [Sat, 16 May 2026 10:57:34 +0000 (10:57 +0000)]
gpiolib: Mark gpio_devt, gpiolib_initialized and gpio_stub_drv as __ro_after_init
The 'gpio_devt' and 'gpiolib_initialized' variables are initialized only
during the init phase in the 'gpiolib_dev_init' function and never
changed. So, mark these as __ro_after_init.
The 'gpio_stub_drv' variable is initialized only in the declaration and
never changed. So, this variable could be 'const', but using the
'driver_register' and 'driver_unregister' functions discards the 'const'
qualifier. Therefore, as an alternative, mark it as a __ro_after_init.
Jouni Högander [Wed, 20 May 2026 10:49:44 +0000 (13:49 +0300)]
drm/i915/psr: Use DC_OFF wake reference to block DC6 on vblank enable
We are observing following warnings:
*ERROR* power well DC_off state mismatch (refcount 0/enabled 1)
gen9_dc_off_power_well_enabled is considering target state DC_STATE_DISABLE
as DC_OFF power well being enabled. Fix this by using wakeref for the
purpose.
To achieve this we need to modify notification code as well. Currently it
is possible that PSR gets notified vblank enable/disable twice on same
status. This is currently not a problem as it is just triggering call to
intel_display_power_set_target_dc_state with same target state as a
parameter. When using wakeref this becomes a problem due to reference
counting. Fix this storing vbank status on last notification and use that
to ensure there are no more than one notification with same vblank status.
v2: ensure there is no subsequent notifications with same status
Jouni Högander [Wed, 20 May 2026 10:49:43 +0000 (13:49 +0300)]
drm/i915/psr: Block DC states on vblank enable when Panel Replay supported
Currently we are blocking DC states only when Panel Replay is enabled on
vblank enable. It may happen that Panel Replay is getting enabled when
vblank is already enabled. Fix this by blocking DC states always if Panel
Replay is supported.
While at it take care of possible dual eDP case by looping all encoders
supporting PSR.
drm/i915/aux: use polling when irqs are unavailable
PTL with physically disconnected display was observed to have 40s longer
execution time when testing xe_fault_injection@xe_guc_mmio_send_recv.
The issue has not been seen when reverting commit 40a9f77a28fa ("Revert
"drm/i915/dp: change aux_ctl reg read to polling read"").
Apparently the configuration suffers from not having AUX enabled when
using interrupts. One probable cause can be xe enabling interrupts too
late: interrupts need memory allocations which currently can't be done
before the display FB takeover is done.
As for now, use polling for AUX in case interrupts are unavailable.
Fixes: 40a9f77a28fa ("Revert "drm/i915/dp: change aux_ctl reg read to polling read"") Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Michał Grzelak <michal.grzelak@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260416163744.288107-1-michal.grzelak@intel.com
(cherry picked from commit 05e0550b65cd1604bd515fbc65f522bce4c10a87) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Results from decode_stacktrace.sh pointed to dereference of a file pointer
field of a i915 TTM page vector container associated with an object being
purged on eviction. That path is taken when the object is marked as no
longer needed.
Code analysis revealed a possibility of the i915 TTM page vector container
being replaced with a new instance inside a function that purges content
of the object, should it be still busy. That function is called,
indirectly via a more general function that changes the object's placement
and caching policy, before the problematic dereference, but still after
a pointer to the container is captured, rendering the pointer no longer
valid.
Fix the issue by capturing the pointer to the container only after its
potential replacement.
v2: Move the container_of() inside the if block (Sebastian),
- a simplified version of the commit description that explains briefly
why the change is necessary (Christian).
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/14882 Fixes: 7ae034590ceae ("drm/i915/ttm: add tt shmem backend") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: stable@vger.kernel.org # v5.17+ Cc: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20260508122612.469227-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 4462966a93eb185849b7f174f0d0de53476d00a4) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Icenowy Zheng [Mon, 25 May 2026 15:36:18 +0000 (23:36 +0800)]
drm: verisilicon: fix build failure of cursor plane code
The cursor plane patch was stalled for a too long time that the
struct drm_atomic_state parameter of atomic modeset hooks has been
changed to struct drm_atomic_commit.
Fix this by replacing the parameter's type. All helpers that retrieve
information from this struct are also changed so simply replacing the
type works.
Alexander Stein [Tue, 26 May 2026 06:35:01 +0000 (08:35 +0200)]
gpio: mxc: fix irq_high handling
If port->irq_high is -1 (fsl,imx21-gpio compatible) and gpio_idx is >= 16
enable_irq_wake() is called with -1 which is wrong.
Fixes: 5f6d1998adeb ("gpio: mxc: release the parent IRQ in runtime suspend") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260526063504.25916-1-alexander.stein@ew.tq-group.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
kho: fix order calculation for kho_unpreserve_pages()
Commit 91e74fa8b1bc ("kho: make sure preservations do not span multiple
NUMA nodes") made sure preservations from kho_preserve_pages() do not
span multiple NUMA nodes. If they do, the order is reduced and tried
again.
The same logic was not implemented for kho_unpreserve_pages(). This can
result in unpreserve calculating a different order than preserve, and
thus not actually unpreserving the pages.
Fix this by moving the order calculation logic to
__kho_preserve_pages_order() and use it from both preserve and
unpreserve paths.
Move __kho_unpreserve() down to avoid having a forward declaration. Its
users are further down in the file anyway. Also, it results in grouping
for all the page-level preservation and unpreservation functions. This
unfortunately makes the diff hard to read, but the main change in
__kho_unpreserve() is to call __kho_preserve_pages_order() instead of
open-coding the order calculation.
Fixes: 91e74fa8b1bc ("kho: make sure preservations do not span multiple NUMA nodes") Cc: stable@vger.kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260519133332.2498092-1-pratyush@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
For systems with 16KB pages (e.g. arm64 with CONFIG_ARM64_16K_PAGES=y or
LoongArch), this gives a depth of 4. Since levels are 0 based, with
depth = 4 the effective top level is 3 and the top-level shift at bit 39.
The order-0 bit sits at bit 50 (KHO_ORDER_0_LOG2 = 64 - PAGE_SHIFT =
50). When inserting or reading a key, the index extracted at the top
level is:
(1 << 50) >> 39 = 2048
2048 is exactly the table size (PAGE_SIZE / sizeof(phys_addr_t) = 2048
for 16KB pages), so it wraps to 0, aliasing the order bit to index 0
and losing it silently.
On the second kernel, kho_radix_decode_key() sees a key without the
order bit, calls fls64() on the wrong bit, computes a wrong order and
thus a garbage physical address. phys_to_page() of that address faults
in kho_preserved_memory_reserve(), causing a kernel panic early in boot.
Fix by adding +1 to the DIV_ROUND_UP numerator so the formula accounts
for the order bit itself, giving depth 5 for 16KB pages. The top-level
shift becomes 50, and (1 << 50) >> 50 = 1, which is nonzero and
unambiguous. For 4KB and 64KB page sizes the depth is unchanged.
Link: https://patch.msgid.link/20260509024415.33190-1-dongtai.guo@linux.dev Fixes: 3f2ad90060f6 ("kho: adopt radix tree for preserved memory tracking") Tested-by: Kexin Liu <liukexin@kylinos.cn> Signed-off-by: George Guo <guodongtai@kylinos.cn> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[rppt: added actual math to the changelog] Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Alice Ryhl [Thu, 7 May 2026 11:14:42 +0000 (11:14 +0000)]
rust: kasan/kbuild: fix rustc-option when cross-compiling
The Makefile version of rustc-option currently checks whether the option
exists for the host target instead of the target actually being compiled
for. It was done this way in commit 46e24a545cdb ("rust: kasan/kbuild:
fix missing flags on first build") to avoid a circular dependency on
target.json. However, because of this, rustc-option currently does not
function when cross-compiling from x86_64 to aarch64 if
CONFIG_SHADOW_CALL_STACK is enabled. This is because KBUILD_RUSTFLAGS
contains -Zfixed-x18 under this configuration. Since that flag does not
exist on the host target, rustc-option runs into a compilation failure
every time, leading to all flags being rejected as unsupported.
To fix this, update rustc-option to pass a --target parameter so that
the host target is not used. For targets using target.json, use a
built-in target that is as close as possible to the target created with
target.json to avoid the circular dependency on target.json.
One scenario where this causes a boot failure:
* Cross-compiled from x86_64 to aarch64.
* With CONFIG_SHADOW_CALL_STACK=y
* With CONFIG_KASAN_SW_TAGS=y
* With CONFIG_KASAN_INLINE=n
Then the resulting kernel image will fail to boot when it first calls
into Rust code with a crash along the lines of "Unable to handle kernel
paging request at virtual address 0ffffffc08541796". This is because the
call threshold is not specified, so rustc will inline kasan operations,
but the kasan shadow offset is not specified, which leads to the inlined
kasan instructions being incorrect.
Note that the -Zsanitizer=kernel-hwaddress parameter itself does not
lead to a rustc-option failure despite being aarch64-specific because
RUSTFLAGS_KASAN has not yet been added to KBUILD_RUSTFLAGS when
rustc-option is evaluated by the kasan Makefile.
Cc: stable@vger.kernel.org Fixes: 46e24a545cdb ("rust: kasan/kbuild: fix missing flags on first build") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260507-rustc-option-cross-v2-1-2f650a49c2b5@google.com
[ Edited slightly:
- Reset variable to avoid using the environment.
- Use a simply expanded variable flavor for simplicity.
- Export variable so that behavior in sub-`make`s is consistent.
This matches other variables. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Sven Schnelle [Thu, 21 May 2026 10:41:44 +0000 (12:41 +0200)]
s390/tracing: Add s390-tod clock
In order to allow comparing trace timestamps between different
systems or virtual machines on s390, add a s390-tod trace clock.
This clock just uses the returned TOD clock value from stcke
directly.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Heiko Carstens [Tue, 19 May 2026 06:20:42 +0000 (08:20 +0200)]
s390/zcore: Removed unused variables
allmodconfig with clang W=1 points out unused global variables:
drivers/s390/char/zcore.c:49:23: error: variable
'zcore_reipl_file' set but not used [-Werror,-Wunused-but-set-global]
drivers/s390/char/zcore.c:50:23: error: variable
'zcore_hsa_file' set but not used [-Werror,-Wunused-but-set-global]
Remove both of them, since there is no point in keeping them.
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Niklas Schnelle [Thu, 21 May 2026 11:10:44 +0000 (13:10 +0200)]
s390/configs: Enable IOMMUFD and VFIO cdev in defconfigs
Enable IOMMUFD and VFIO cdev such that PCI pass-through to QEMU/KVM can
optionally utilize native IOMMUFD. Note that because the defconfigs do
not enable IOMMUFD_VFIO_CONTAINER the default PCI pass-through using
VFIO with the existing container interface is not affected.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Dan Carpenter [Mon, 25 May 2026 07:14:42 +0000 (10:14 +0300)]
accel/ivpu: prevent uninitialized data bug in debugfs
The simple_write_to_buffer() will only initialize data starting from
the *pos offset so if it's non-zero then the first part of the buffer
uninitialized. Really, if *pos is non-zero then this code won't work
so just check for that at the start of the function.
Fixes: 320323d2e545 ("accel/ivpu: Add debugfs interface for setting HWS priority bands") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/ahP24m6Mii9EDL7Q@stanley.mountain
Cássio Gabriel [Mon, 25 May 2026 14:16:09 +0000 (11:16 -0300)]
ALSA: seq: Remove arbitrary prioq insertion limit
The sequencer priority queue insertion path uses a hardcoded traversal
limit of 10000 entries. The value is intended to catch a corrupted list,
but it also becomes a real limit for valid queues.
The event pool limit is per client, while a sequencer queue can be shared
by multiple clients. A queue can therefore legitimately contain more than
10000 events. In that case, inserting an event that has to be placed past
the arbitrary limit fails with -EINVAL.
Use the queue's own cell count as the traversal bound instead. This keeps
the protection against inconsistent list accounting or cyclic lists without
rejecting valid large queues.
Manish Baing [Sat, 23 May 2026 17:32:51 +0000 (17:32 +0000)]
dt-bindings: pwm: stmpe: Drop legacy binding
The st,stmpe-pwm binding is already covered by the MFD schema
Documentation/devicetree/bindings/mfd/st,stmpe.yaml. Remove the
obsolete and redundant text binding file.
pwm: pca9685: Use named initializers for struct i2c_device_id
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
pwm: stm32: Make use of mul_u64_u64_div_u64_roundup()
When the driver was converted to the waveform API the need for this
function arised but at that time this function didn't exist yet. In the
meantime it's available, so switch to the global function and drop the
driver specific implementation.
pwm: Consistently define pci_device_ids using named initializers
The .driver_data member in the various struct pci_device_id arrays were
initialized by list expressions. This isn't easily readable if you're
not into PCI. Using named initializers is more explicit and thus easier
to parse.
The secret plan is to make struct pci_device_id::driver_data an
anonymous union (similar to
https://lore.kernel.org/all/cover.1776579304.git.u.kleine-koenig@baylibre.com/)
and that requires named initializers. But it's also a nice cleanup on
its own.
This change doesn't introduce changes to the compiled pci_device_id
arrays. Tested on x86 and arm64.
Driver for the PWM block in Qualcomm IPQ6018 line of SoCs. Based on
driver from downstream Codeaurora kernel tree. Removed support for older
(V1) variants because I have no access to that hardware.
Tested on IPQ5018 and IPQ6010 based hardware.
Co-developed-by: Baruch Siach <baruch.siach@siklu.com> Signed-off-by: Baruch Siach <baruch.siach@siklu.com> Signed-off-by: Devi Priya <quic_devipriy@quicinc.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: George Moussalem <george.moussalem@outlook.com> Link: https://patch.msgid.link/20260406-ipq-pwm-v21-2-6ed1e868e4c2@outlook.com
[ukleinek: Fixed a few nitpicks as agreed on the mailing list] Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
This option was never meant to be used in production because it solely
clears the X86_FEATURE kernel-internal representation of what CPUID bits
it has detected and doesn't do any *proper* feature disablement like
clearing CR4.CET in the user shadow stack case, for example.
So remove its documentation so that it doesn't get used in production
and people get silly ideas. It is meant strictly for debugging; and if
a chicken bit for properly disabling a feature is warranted, then that
would need proper enablement.
RISC-V: KVM: AIA: Make HGEI number management fully per-CPU
Previously, the number of Hypervisor Guest External Interrupt (HGEI)
lines was stored in a single global variable `kvm_riscv_aia_nr_hgei`
and assumed to be the same for all HARTs. This assumption does not
hold on heterogeneous RISC-V SoCs where different cores may expose
different HGEIE CSR widths.
Introduce `nr_hgei` field into the per-CPU `struct aia_hgei_control`
and probe the actual supported HGEI count for the current HART in
`kvm_riscv_aia_enable()` using the standard RISC-V CSR probe technique:
csr_write(CSR_HGEIE, -1UL);
nr = fls_long(csr_read(CSR_HGEIE));
if (nr)
nr--;
All HGEI allocation, free and disable paths (`kvm_riscv_aia_free_hgei()`,
`kvm_riscv_aia_disable()`, etc.) now use the per-CPU value instead of
the global one.
The global `kvm_riscv_aia_nr_hgei` now represents the minimum number
of HGEI lines across HARTs and can be used to check whether HGEI
support is available or not.
This makes KVM AIA robust on big.LITTLE-style asymmetric platforms.
irqchip/riscv-imsic: Add nr_guest_files in per-HART local config
Add nr_guest_files in per-HART local config to represent the number of
guest files available on a particular HART whereas the nr_guest_files
in the global config represents the number of guest files available
across all HARTs.
This allows KVM RISC-V to use nr_guest_files from per-HART local
config for asymmetric big.Little systems.
Mayuresh Chitale [Mon, 25 May 2026 09:59:28 +0000 (15:29 +0530)]
RISC-V: KVM: Fix ebreak self test failure
The ebreak self test enables/disables guest debugging as a part of the
test. However the KVM_SET_GUEST_DEBUG ioctl doesn't actually do it.
Fixing it by calling kvm_riscv_vcpu_config_guest_debug.
Dave Airlie [Tue, 26 May 2026 00:54:30 +0000 (10:54 +1000)]
Merge tag 'exynos-drm-next-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
New feature and cleanup for Exynos fbdev
- Move fbdev emulation to DRM client buffers
. Reuses standard ADDFB2/GEM paths and simplifies cleanup.
- Use DRM format helpers for geometry and size
. Applies 4CC-based format/pitch/size calculation with stronger checks and PAGE_SIZE alignment.
. Sets screen_size and fix.smem_len from actual allocated size.
Exynos DRM internal cleanup
- Adopt DRM core DMA tracking and drop redundant code
. Removes private DMA tracking, exynos_drm_gem_prime_import(), and obsolete iommu_dma_init_domain() stub.
- Reduce duplication and tighten local scope
. Replaces MAX_FB_BUFFER with DRM_FORMAT_MAX_PLANES.
. Drops redundant exynos_drm_gem.size and internalizes local-only helpers.
Bug fix for Exynos fbdev behavior
- Fix screen_buffer offset handling
. Keeps screen_buffer at framebuffer base and avoids applying scanout offset.
. Includes Fixes and stable Cc for backporting.
Luka Gejak [Sat, 23 May 2026 13:03:30 +0000 (15:03 +0200)]
net: hsr: fix potential OOB access in supervision frame handling
Ensure the entire TLV header is linearized before access by adding
sizeof(struct hsr_sup_tlv) to the pskb_may_pull() calls. Without this,
a truncated frame could cause an out-of-bounds access.
Fixes: eafaa88b3eb7 ("net: hsr: Add support for redbox supervision frames") Signed-off-by: Luka Gejak <luka.gejak@linux.dev> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260523130330.61880-1-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Len Bao [Sat, 23 May 2026 15:07:35 +0000 (15:07 +0000)]
eth: dpaa2: constify dpaa2_ethtool_stats and dpaa2_ethtool_extras
The 'dpaa2_ethtool_stats' and 'dpaa2_ethtool_extras' structures are
initialized in their declarations and never changed. So, constify them
to reduce the attack surface.
Rosen Penev [Thu, 21 May 2026 21:59:08 +0000 (14:59 -0700)]
net: ibm: emac: Use napi_gro_receive() for Rx packets
emac_poll_rx() already runs in NAPI context and TAH-equipped EMACs set
CHECKSUM_UNNECESSARY on verified frames, which lets GRO coalesce TCP
segments without a software checksum on the merge path. Replace the
per-poll rx_list batched with netif_receive_skb_list() with direct
napi_gro_receive() calls so the stack can merge segments into super-skbs
and skip a full traversal per packet -- a meaningful win on the slow
4xx-class CPUs this driver targets.
Small routing speed improvement tested on a Cisco Meraki MX60W:
Patch 1 reduces stack usage in mlx5e_pcie_cong_get_thresh_config()
by reusing a single union devlink_param_value across four
devl_param_driverinit_value_get() calls (instead of
union devlink_param_value val[4] on the stack) and assigning each
vu16 into mlx5e_pcie_cong_thresh, so the helper stays under the
frame-size warning limit as the union grows.
Patch 2 changes devlink_nl_param_value_put() and
devlink_nl_param_value_fill_one() to pass union devlink_param_value
by pointer instead of by value. Passing two copies of the union
by value in the param netlink path consumes over 500 bytes of argument
stack and risks CONFIG_FRAME_WARN as the union grows beyond its
historical size.
====================
Picking a couple of uncontroversial changes from the series
since it's making very slow progress.
Ratheesh Kannoth [Thu, 21 May 2026 09:52:57 +0000 (15:22 +0530)]
devlink: pass param values by pointer
union devlink_param_value grows substantially once U64 array
parameters are added to devlink (from 32 bytes to over 264 bytes).
devlink_nl_param_value_fill_one() and devlink_nl_param_value_put()
copy the union by value in several places. Passing two instances as
value arguments alone consumes over 528 bytes of stack; combined with
deeper call chains the parameter stack can approach 800 bytes and trip
CONFIG_FRAME_WARN more easily.
Switch internal helpers and exported driver APIs to pass pointers to
union devlink_param_value rather than passing the union by value.
Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> #for ena Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260521095303.2395584-4-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ratheesh Kannoth [Thu, 21 May 2026 09:52:56 +0000 (15:22 +0530)]
net/mlx5e: Reduce stack use reading PCIe congestion thresholds
union devlink_param_value grew when U64 array parameters were added.
Keeping union devlink_param_value val[4] in
mlx5e_pcie_cong_get_thresh_config() exceeded the compiler's
-Wframe-larger-than limit.
Reuse one union: call devl_param_driverinit_value_get() once per
MLX5 PCIe congestion threshold and assign each vu16 to the
corresponding mlx5e_pcie_cong_thresh member.
Cross-subsystem Changes:
- Add common TMDS character rate constants to video/hdmi and use those
in bridge drivers.
Core Changes:
- Fix leak in drm_syncobj_find_fence.
- Fix OOB reads related to DP-MST.
- Create drm_get_bridge_by_endpoint and convert drivers to use it in
preparation of hotplug.
Driver Changes:
- Assorted bugfixes and cleanups to accel/ethosu, imagination, virtio,
rockchip.
- Expandable device heap support to amdxdna, bridge/chipone-icn6211.
- Add Surface Pro 12 panels.
- Convert ite-it6211 to use drm hdmi audio helpers.
Jakub Kicinski [Mon, 25 May 2026 20:48:20 +0000 (13:48 -0700)]
Merge branch 'net-mlx5-add-satellite-pf-support'
Tariq Toukan says:
====================
net/mlx5: Add satellite PF support
A satellite PF is a new SmartNIC configuration that adds another
physical function on the DPU that is not an eswitch manager and not a
page manager. The satellite PF can have its own SFs and can be passed
through to a VM on the DPU, providing an isolated function for users who
should not have access to the privileged ECPF. The ECPF handles the
satellite PF and the host PF in a similar way, using the same management
framework.
This series adds support for satellite PFs (SPFs) in the mlx5 eswitch.
SPFs are discovered through the v1 response layout of the
query_esw_functions command, introduced in the previous infrastructure
preparation series.
The first four patches discover satellite PFs, allocate eswitch vports
for them and their SFs, and extend the SF hardware table to manage SPF
SF entries.
The next five patches expose PF numbers from firmware, map SF
controllers to their pfnum, register devlink ports with proper
attributes, and register SF resource on satellite PF ports.
The final four patches add devlink port state management, FDB peer miss
rules, dedicated page accounting, and SF resource registration for
satellite PF vports.
This series builds on the eswitch infrastructure preparation series
previously submitted.
====================
Moshe Shemesh [Thu, 21 May 2026 11:08:43 +0000 (14:08 +0300)]
net/mlx5: Add SPF function type for page management
Add MLX5_SPF to enum mlx5_func_type so SPFs get their own page counter,
and add the corresponding WARN check at page cleanup. Wait for SPF pages
to be reclaimed during ECPF teardown, alongside the existing host PF and
VF page waits.
SPF page requests are always identified by vhca_id, so the legacy
func_id_to_type() path is not reached for satellite PFs.
Moshe Shemesh [Thu, 21 May 2026 11:08:41 +0000 (14:08 +0300)]
net/mlx5: Support state get/set for satellite PF ports
Extend mlx5_devlink_pf_port_fn_state_get() to support satellite PF
vports by querying their vhca_state from the query_esw_functions output
using the vport's vhca_id.
Extend mlx5_devlink_pf_port_fn_state_set() to support satellite PFs by
using the generic mlx5_esw_pf_enable/disable_hca() functions.
Moshe Shemesh [Thu, 21 May 2026 11:08:39 +0000 (14:08 +0300)]
net/mlx5: Register devlink ports for satellite PFs
Include satellite PFs in mlx5_eswitch_is_pf_vf_vport() so they receive
the standard PF/VF devlink port operations. Update
mlx5_esw_devlink_port_supported() and devlink port attribute setup to
register SPF devlink ports with controller number and PF number.
Add mlx5_esw_spf_vport_to_idx() to look up the SPF array index by vport
number, and mlx5_esw_is_spf_vport() boolean wrapper to identify
satellite PF vports.
Moshe Shemesh [Thu, 21 May 2026 11:08:38 +0000 (14:08 +0300)]
net/mlx5: Map SF controller to pfnum for satellite PFs
SF devlink port creation and registration used the ECPF's PCI function
as pfnum. Extend this to support satellite PF controllers by introducing
mlx5_esw_sf_controller_to_pfnum() that maps a controller number to the
corresponding PF number, and use it in SF port attribute setup and SF
creation validation.
Reorder the checks in mlx5_devlink_sf_port_new() so that
mlx5_sf_table_supported() runs before attribute validation, since the
new helper requires the eswitch to be initialized.
Moshe Shemesh [Thu, 21 May 2026 11:08:37 +0000 (14:08 +0300)]
net/mlx5: Expose PF number from query_esw_functions
Extract pci_device_function from the query_esw_functions output for both
the host PF and satellite PFs, storing it alongside the existing
host_number field.
Add mlx5_esw_get_hpf_pf_num() helper that returns the host PF's actual
PCI device function when the new query format is supported, falling back
to PCI_FUNC(dev->pdev->devfn) for older firmware. Use it in devlink port
attribute setup so that host PF and VF devlink ports report the correct
PF number rather than the ECPF's own PCI function number.
Moshe Shemesh [Thu, 21 May 2026 11:08:36 +0000 (14:08 +0300)]
net/mlx5: Support SPF SFs in SF hardware table
Convert the SF hardware table from a fixed-size hwc array to a
dynamically allocated one, supporting satellite PF (SPF) SFs alongside
local and external host SFs. Initialize hwc entries for each SPF using
its host_number as controller. Rename MLX5_SF_HWC_EXTERNAL to
MLX5_SF_HWC_EXT_HOST and add MLX5_SF_HWC_FIRST_SPF for clarity.
Moshe Shemesh [Thu, 21 May 2026 11:08:35 +0000 (14:08 +0300)]
net/mlx5: Initialize satellite PF SF vports
Extend satellite PF (SPF) initialization to allocate SF vports for each
SPF. For each discovered SPF, query its SF capabilities, allocate SF
vports, and store the host_number for controller identification.
Add accessor APIs mlx5_esw_get_num_spfs(),
mlx5_esw_spf_get_host_number(), mlx5_esw_sf_max_spf_functions(), and
mlx5_esw_has_spf_sfs() for use by the SF hardware table in a subsequent
patch. Also extend mlx5_esw_offloads_controller_valid() to accept SPF
controllers in addition to the host PF controller.
Moshe Shemesh [Thu, 21 May 2026 11:08:34 +0000 (14:08 +0300)]
net/mlx5: Initialize host PF host number earlier
Move host_number from esw->offloads to esw->esw_funcs as hpf_host_number
and initialize it during vports_init instead of offloads_enable. This
makes the host PF host number available earlier in the initialization
sequence, which is required for upcoming SF hardware table support for
satellite PFs.
Add a mlx5_esw_get_hpf_host_number() accessor to retrieve the stored
host number.
Moshe Shemesh [Thu, 21 May 2026 11:08:33 +0000 (14:08 +0300)]
net/mlx5: Introduce generic helper for PF SFs info
Introduce mlx5_esw_sf_max_pf_functions() that queries a PF's max_num_sf
and sf_base_id using mlx5_vport_get_other_func_general_cap(), which
supports both function_id and vhca_id based addressing.
Refactor mlx5_esw_sf_max_hpf_functions() into a thin wrapper that adds
the host PF precondition checks and calls the new generic helper. Remove
mlx5_query_hca_cap_host_pf() as it is not used anymore.
This prepares for querying SFs info of Satellite PFs.
Moshe Shemesh [Thu, 21 May 2026 11:08:32 +0000 (14:08 +0300)]
net/mlx5: Add satellite PF vport support
Discover satellite PFs from query_esw_functions output and allocate
eswitch vports for them. For each satellite PF, create a vport via the
CREATE_ESW_VPORT command using its vhca_id and allocate it in the
eswitch vport table.
When enabling switchdev mode, the ECPF acting as the eswitch manager
activates each satellite PF with enable_hca, loads its vport and adds
a representor. Since satellite PF devlink ports are registered in a
later patch, guard mlx5_esw_offloads_devlink_port() against vports
with no devlink port to avoid NULL dereference during representor
attach.
Costa Shulyupin [Fri, 15 May 2026 18:01:57 +0000 (21:01 +0300)]
docs: sysctl/net: Remove ax25, netrom, rose entries
These networking subsystems were removed in commit dd8d4bc28ad7
("net: remove ax25 and amateur radio (hamradio) subsystem"),
but the sysctl directory table still listed them.
Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260515180200.1490926-1-costa.shul@redhat.com>
Daniel Pereira [Fri, 15 May 2026 18:21:58 +0000 (15:21 -0300)]
docs: pt_BR: update minimal software requirements in changes.rst
Update the Brazilian Portuguese translation of changes.rst to align with
the latest English version.
Key changes include:
- Updated minimum versions for Rust (1.85.0), bindgen (0.71.1), and
pahole (1.22).
- Fixed ReST syntax for internal references (:ref:) and external links.
- Corrected formatting for tool names and config options using inline
code backticks.
- Synchronized technical descriptions for udev, kmod, and NFS-utils.
v2:
- Fix alignment in the minimal software requirements table that broke the build.
- Fix Sphinx footnote syntax.
Signed-off-by: Daniel Pereira <danielmaraboo@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260515182200.654324-1-danielmaraboo@gmail.com>
Akiyoshi Kurita [Wed, 13 May 2026 13:11:11 +0000 (22:11 +0900)]
docs/ja_JP: translate more of submitting-patches.rst (no-mime)
Translate the "No MIME, no links, no compression, no attachments.
Just plain text" and "Respond to review comments" sections in
Documentation/translations/ja_JP/process/submitting-patches.rst.
Keep the wording close to the English text and wrap lines to match
the style used in the surrounding Japanese translation.
Fixes: a03ef333fbd6 ("Documentation: security-bugs: explain what is and is not a security bug") Signed-off-by: Baruch Siach <baruch@tkos.co.il> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <da8ee1e8b4e99261ec11544c4e1a4f81316ae965.1779032501.git.baruch@tkos.co.il>
Daniel Pereira [Tue, 19 May 2026 14:00:33 +0000 (11:00 -0300)]
docs: pt_BR: Translate process/kernel-docs.rst into Portuguese
Translate Documentation/process/kernel-docs.rst into Portuguese (pt_BR)
and update the main index.
The content was adapted following the RST formatting rules and the
appropriate technical terminology for Brazilian Portuguese.
Signed-off-by: Daniel Pereira <danielmaraboo@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260519140035.1031694-1-danielmaraboo@gmail.com>
docs: submitting-patches: Clarify that "reviewer" is a person
Common understanding of word "Reviewer" is: a person performing a review
work [1]. Tools are not persons, thus cannot be reviewers in this term.
Also tools cannot make statements and cannot take responsibility for the
review.
Our docs already clearly mark that "Reviewed-by" must come from a
person:
- "By offering my Reviewed-by: tag, I state that:"
Usage of first person "I" and word "state"
- "A Reviewed-by tag is *a statement of opinion* that the patch is an
appropriate modification of the kernel without any remaining serious"
Only a person can make a statement of opinion.
- "Any interested reviewer (who has done the work) can offer a
Reviewed-by"
A person can offer a tag thus above does not grant the tool
permission to offer a tag.
However this might not be enough, so let's clarify that only a person
with a known identity can state the "Reviewer's statement of oversight".
Link: https://en.wiktionary.org/wiki/reviewer Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260520154846.162170-2-krzysztof.kozlowski@oss.qualcomm.com>
Dan Carpenter [Thu, 21 May 2026 12:49:36 +0000 (15:49 +0300)]
net: lan966x: cleanup error handling in lan966x_fdma_rx_alloc_page_pool()
This code works, but there are a few things to tidy up:
1. No need to an unlikely() because IS_ERR() already has an unlikely()
built in.
2. No need to use PTR_ERR_OR_ZERO() because it's not an error pointer.
3. Use the returned error code directly instead of using groveling in
rx->page_pool to find it.
octeontx2-af: validate body pcifunc in rvu_mbox_handler_rep_event_notify
rvu_mbox_handler_rep_event_notify() in drivers/net/ethernet/marvell/
octeontx2/af/rvu_rep.c queues a sender-controlled REP_EVENT_NOTIFY
request body verbatim, and rvu_rep_up_notify() then forwards
event->pcifunc (the nested body field, distinct from the
AF-normalised header pcifunc) into rvu_get_pfvf(), rvu_get_pf() and
the AF->PF mailbox device index without any bounds check.
A VF attached to a PF that has been put into switchdev
representor mode reaches this path: the VF mailbox handler
otx2_pfvf_mbox_handler() forwards every message id including
MBOX_MSG_REP_EVENT_NOTIFY to AF without an allowlist, and the AF
dispatcher rewrites only msg->pcifunc, leaving struct
rep_event::pcifunc attacker-controlled. The sibling
rvu_mbox_handler_esw_cfg() refuses requests whose header pcifunc
is not rvu->rep_pcifunc; this handler has no equivalent gate.
An out-of-range body pcifunc selects an &rvu->pf[]/&rvu->hwvf[]
element past the allocated array and, for RVU_EVENT_MAC_ADDR_CHANGE,
turns into a six-byte attacker-chosen OOB ether_addr_copy() target
inside the queued worker; KASAN reports a slab-out-of-bounds write
in rvu_rep_wq_handler.
Reject malformed requests at the handler entry by gating on
is_pf_func_valid(), which is already the canonical PF/VF range check
in this driver; expose it via rvu.h so callers in rvu_rep.c can use
it instead of open-coding the same range arithmetic.
Fixes: b8fea84a0468 ("octeontx2-pf: Add support to sync link state between representor and VFs") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260520154157.1439319-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 25 May 2026 19:45:40 +0000 (12:45 -0700)]
Merge tag 'for-7.1/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mikulas Patocka:
- fix crashes in dm-vdo if GFP_NOWAIT allocation fails
* tag 'for-7.1/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm vdo: use GFP_NOIO for blkdev_issue_zeroout on format path
Tejun Heo [Fri, 22 May 2026 17:06:01 +0000 (07:06 -1000)]
sched_ext: Convert ops.set_cmask() to arena-resident cmask
ops_cid.set_cmask() expects a cmask. The kernel couldn't write into the
arena, so it translated cpumask -> cmask in kernel memory and passed the
result as a trusted pointer. The BPF cmask helpers all operate on arena
cmasks though, so the BPF side had to word-by-word probe-read the kernel
cmask into an arena cmask via cmask_copy_from_kernel() before any helper
could touch it. It works, but is clumsy.
With direct kernel-side arena access now in place, build the cmask in the
arena. The kernel writes to it through the kern_va side of the dual mapping.
BPF directly dereferences it via an __arena pointer like any other arena
struct.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>