git.ipfire.org Git - thirdparty/kernel/linux.git/log

]> git.ipfire.org Git - thirdparty/kernel/linux.git/log

projects / thirdparty / kernel / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Sibi Sankar [Fri, 13 Mar 2026 12:08:13 +0000 (17:38 +0530)]

arm64: dts: qcom: glymur: Add ADSP and CDSP for Glymur SoC

Add remoteproc PAS loader for ADSP and CDSP with its fastrpc nodes.

Signed-off-by: Sibi Sankar <sibi.sankar@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260313120814.1312410-5-sibi.sankar@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Guangshuo Li [Thu, 7 May 2026 10:06:03 +0000 (18:06 +0800)]

drm/bridge: imx8qxp-pxl2dpi: avoid ERR_PTR with device_node cleanup

imx8qxp_pxl2dpi_get_available_ep_from_port() returns ERR_PTR()
on errors. imx8qxp_pxl2dpi_find_next_bridge() stores its return
value in a __free(device_node) variable before checking IS_ERR().
When the function returns on the error path, the cleanup action calls
of_node_put() on the ERR_PTR() value.

Do not let a device_node cleanup variable hold error pointers. Change
imx8qxp_pxl2dpi_get_available_ep_from_port() to return an int and pass
the endpoint node through an output argument. Initialize the output
argument to NULL so callers hold either NULL on error paths or a valid
device_node pointer on successful path.

Fixes: ceea3f7806a10 ("drm/bridge: imx8qxp-pxl2dpi: simplify put of device_node pointers")
Cc: stable@vger.kernel.org
Reviewed-by: Liu Ying <victor.liu@nxp.com>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Link: https://patch.msgid.link/20260507100604.667731-1-lgs201920130244@gmail.com
Signed-off-by: Liu Ying <victor.liu@nxp.com>

commit | commitdiff | tree

Tao Cui [Fri, 8 May 2026 12:54:12 +0000 (20:54 +0800)]

net: ethtool: fix missing closing paren in rings_reply_size()

sizeof(u32) on the _RINGS_CQE_SIZE line is missing its closing
parenthesis, causing nla_total_size() to absorb the subsequent
_TX_PUSH and _RX_PUSH entries.

The resulting size estimate happens to be numerically identical
due to NLA alignment, so not treating this as a real fix.
But the nesting is wrong and misleading.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260508125412.189804-1-cuitao@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 12 May 2026 01:28:11 +0000 (18:28 -0700)]

Merge branch 'net-sched-prepare-lockless-qdisc-dumps'

Eric Dumazet says:

====================
net/sched: prepare lockless qdisc dumps

Goal is to no longer acquire RTNL in qdisc dumps.

This series annotate data-races, and change mq and mq_prio to
no longer acquire children qdisc spinlocks.
====================

Link: https://patch.msgid.link/20260510091455.4039245-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:55 +0000 (09:14 +0000)]

net/sched: mq_prio: no longer acquire qdisc spinlocks in mqprio_dump_class_stats()

Prepare mqprio_dump_class_stats() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

As a bonus we no longer have to release/acquire d->lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:54 +0000 (09:14 +0000)]

net/sched: mq_prio: no longer acquire qdisc spinlocks in mqprio_dump()

Prepare mqprio_dump() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:53 +0000 (09:14 +0000)]

net/sched: mq: no longer acquire qdisc spinlocks in dump operations

Prepare mq_dump_common() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:52 +0000 (09:14 +0000)]

net/sched: add const qualifiers to gnet_stats helpers

In preparation of lockless qdisc dumps, add const qualifiers to:

- gnet_stats_add_basic()
- gnet_stats_copy_basic()
- gnet_stats_copy_basic_hw()
- gnet_stats_copy_queue()
- gnet_stats_read_basic()
- ___gnet_stats_copy_basic()
- qdisc_qstats_copy()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:51 +0000 (09:14 +0000)]

net/sched: add qdisc_qlen_lockless() helper

Used in contexts were qdisc spinlock is not held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:50 +0000 (09:14 +0000)]

net/sched: annotate data-races around sch->qstats.backlog

Add qstats_backlog_sub() and qstats_backlog_add() helpers
and use them instead of open-coding them.

These helpers use WRITE_ONCE() to prevent store-tearing.

Also use WRITE_ONCE() in fq_reset() and qdisc_reset()
when sch->qstats.backlog is cleared.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:49 +0000 (09:14 +0000)]

net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()

Helpers to increment or decrement sch->q.qlen, with appropriate
WRITE_ONCE() to prevent store tearing.

Add other WRITE_ONCE() when sch->q.qlen is changed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Sun, 10 May 2026 09:14:48 +0000 (09:14 +0000)]

net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]

Stats are read locklessly, add READ_ONCE() to prevent load-tearing.

Write side will be handled in separate patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Richard Fitzgerald [Mon, 11 May 2026 11:42:39 +0000 (12:42 +0100)]

ASoC: cs35l56: Check for successful runtime-resume in cs35l56_dsp_work()

In cs35l56_dsp_work() check that the request for runtime-resume was
successful instead of assuming that it can't fail.

We may as well do this using the new PM_RUNTIME_ACQUIRE*() macros and
remove the manual pm_runtime_put_autosuspend() and associated gotos.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260511114239.44970-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

commit | commitdiff | tree

Mario Limonciello [Mon, 11 May 2026 15:36:36 +0000 (10:36 -0500)]

ASoC: SOF: amd: Fix error code handling in psp_send_cmd()

The smn_read_register() helper returns negative error codes on failure
or the register value on success. When used with read_poll_timeout(),
the return value is stored in the 'data' variable.

Currently 'data' is declared as u32, which causes negative error codes
to be cast to large positive values. This makes the condition 'data > 0'
incorrectly treat errors as success.

Fix by changing 'data' from u32 to int, matching the pattern used in
psp_mbox_ready() which correctly handles the same helper function.

Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/linux-sound/agGES8vWrLOrBu28@stanley.mountain/
Fixes: f120cf33d232 ("ASoC: SOF: amd: Use AMD_NODE")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20260511153638.724810-1-mario.limonciello@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>

commit | commitdiff | tree

Evgenii Burenchev [Thu, 7 May 2026 14:55:17 +0000 (17:55 +0300)]

qed: fix division by zero in qed_init_wfq_param when all vports are configured

In qed_init_wfq_param(), variable non_requested_count can become zero
when the number of vports with the configured flag set (including the
current vport being configured) equals total num_vports. This happens
when configuring the last unconfigured vport or when re-configuring
an already configured vport.

The function then calculates left_rate_per_vp = total_left_rate /
non_requested_count, which causes division by zero.

Fix this by skipping the division when non_requested_count is zero.
In that case, there is no remaining bandwidth to distribute, so just
record the configuration for the current vport and return success.

Fixes: bcd197c81f63 ("qed: Add vport WFQ configuration APIs")
Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>
Link: https://patch.msgid.link/20260507145520.23106-1-evg28bur@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 12 May 2026 01:05:25 +0000 (18:05 -0700)]

Merge branch 'net-phy-motorcomm-add-acpi-_dsd-property-support'

chunzhi.lin says:

====================
net: phy: motorcomm: add ACPI _DSD property support

This series makes the Motorcomm PHY driver parse firmware properties via
device_property_*() so the same property set can be provided by either
Devicetree or ACPI _DSD.

Patch 1 switches drivers/net/phy/motorcomm.c from of_property_*() to
device_property_*() on &phydev->mdio.dev.

Patch 2 documents Motorcomm yt8xxx PHY ACPI _DSD properties under
Documentation/firmware-guide/acpi/dsd and links the new document from
the ACPI index.
====================

Link: https://patch.msgid.link/20260507040221.3679454-1-linchunzhi0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

chunzhi.lin [Thu, 7 May 2026 04:02:21 +0000 (12:02 +0800)]

docs: acpi: dsd: add Motorcomm yt8xxx PHY properties

Document Motorcomm yt8xxx PHY ACPI _DSD properties consumed by
the motorcomm PHY driver.

Describe property placement, UUID usage, and reference the DT binding
for value constraints and defaults.

Signed-off-by: chunzhi.lin <linchunzhi0@gmail.com>
Changes in v2:
- Keep dsd/ entries sorted in Documentation/firmware-guide/acpi/index.rst

Link: https://patch.msgid.link/20260507040221.3679454-3-linchunzhi0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

chunzhi.lin [Thu, 7 May 2026 04:02:20 +0000 (12:02 +0800)]

net: phy: motorcomm: use device properties for firmware tuning

The Motorcomm PHY driver reads optional firmware properties via
of_property_read_*() from phydev->mdio.dev.of_node. This works for
Device Tree based systems, but causes ACPI platforms to ignore the same
properties when they are supplied through _DSD.

As a result, ACPI-described Motorcomm PHY devices fall back to default
settings instead of applying firmware-provided tuning such as
rx/tx internal delay, drive strength, clock output frequency, and
optional boolean controls like auto-sleep-disabled,
keep-pll-enabled, and tx clock inversion.

Switch these lookups to device_property_read_*() so the driver uses the
generic firmware node interface and can consume the same property names
from either Device Tree or ACPI.

This keeps the existing DT behavior unchanged while allowing ACPI
platforms to honor PHY configuration from firmware.

We have completed testing on Sophgo RISC-V architecture server SD3-10.
This server has a 64-core Thead C920 CPU whose DWMAC is connected to
Motorcomm's PHY YT8531. This server supports UEFI boot and it would like
to use the ACPI table.

Signed-off-by: chunzhi.lin <linchunzhi0@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260507040221.3679454-2-linchunzhi0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Taniya Das [Fri, 10 Apr 2026 03:49:04 +0000 (09:19 +0530)]

arm64: dts: qcom: Add support for MM clock controllers for Glymur

Add the device nodes for the multimedia clock controllers videocc, gpucc
and gxclkctl.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260410-glymur_mmcc_dt_config_v2-v3-1-acce9d106e72@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Davide Caratti [Fri, 8 May 2026 17:05:10 +0000 (19:05 +0200)]

net/sched: dualpi2: initialize timer earlier in dualpi2_init()

'pi2_timer' needs to be initialized in all error paths of dualpi2_init():
otherwise, a failure in qdisc_create_dflt() causes the following crash in
dualpi2_destroy():

  # tc qdisc add dev crash0 handle 1: root dualpi2
  BUG: kernel NULL pointer dereference, address: 0000000000000010
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] SMP PTI
  CPU: 4 UID: 0 PID: 471 Comm: tc Tainted: G            E       7.1.0-rc1-virtme #2 PREEMPT(full)
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  RIP: 0010:hrtimer_active+0x39/0x60
  Code: f9 eb 23 0f b6 41 38 3c 01 0f 87 87 64 c0 ff 83 e0 01 75 33 48 39 4a 50 74 28 44 3b 42 10 75 06 48 3b 51 30 74 21 48 8b 51 30 <44> 8b 42 10 41 f6 c0 01 74 cf f3 90 44 8b 42 10 41 f6 c0 01 74 c3
  RSP: 0018:ffffd0db80b93620 EFLAGS: 00010282
  RAX: ffffffffc0400320 RBX: ffff8cf24a4c86b8 RCX: ffff8cf24a4c86b8
  RDX: 0000000000000000 RSI: ffff8cf2429c2ab0 RDI: ffff8cf24a4c86b8
  RBP: 00000000fffffff4 R08: 0000000000000003 R09: 0000000000000000
  R10: 0000000000000001 R11: ffff8cf24a39c500 R12: ffff8cf24822c000
  R13: ffffd0db80b936c0 R14: ffffffffc02cf360 R15: 00000000ffffffff
  FS:  00007fbc01706580(0000) GS:ffff8cf2dc759000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000010 CR3: 0000000008e02003 CR4: 0000000000172ef0
  Call Trace:
   <TASK>
   hrtimer_cancel+0x15/0x40
   dualpi2_destroy+0x20/0x40 [sch_dualpi2]
   qdisc_create+0x230/0x570
   tc_modify_qdisc+0x716/0xc10
   rtnetlink_rcv_msg+0x188/0x780
   netlink_rcv_skb+0xcd/0x150
   netlink_unicast+0x1ba/0x290
   netlink_sendmsg+0x242/0x4d0
   ____sys_sendmsg+0x39e/0x3e0
   ___sys_sendmsg+0xe1/0x130
   __sys_sendmsg+0xad/0x110
   do_syscall_64+0x14f/0xf80
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7fbc0188b08e
  Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa
  RSP: 002b:00007fff593260e0 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc0188b08e
  RDX: 0000000000000000 RSI: 00007fff59326190 RDI: 0000000000000003
  RBP: 00007fff593260f0 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000202 R12: 000055f06124f260
  R13: 0000000069fca043 R14: 000055f061255640 R15: 000055f06124d3f8
   </TASK>
  Modules linked in: sch_dualpi2(E)
  CR2: 0000000000000010

[1] https://lore.kernel.org/netdev/2e78e01c504c633ebdff18d041833cf2e079a3a4.1607020450.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/20200725201707.16909-1-xiyou.wangcong@gmail.com/

v2:
- rebased on top of latest net.git

Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/1faca91179702b31da5d87653e1e036543e32722.1778259798.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Konrad Dybcio [Fri, 10 Apr 2026 10:05:13 +0000 (12:05 +0200)]

soc: qcom: socinfo: Add PMICs that ship with Glymur

Add the missing REVID_PERPH_SUBTYPE<->name mappings for PMICs that ship
with Glymur.

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260410-topic-glymur_pmics-v1-1-26bdbb577000@oss.qualcomm.com
[bjorn: Dropped 92,93,96,97,98 as these have already been applied]
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 12 May 2026 01:01:39 +0000 (18:01 -0700)]

Merge branch 'mptcp-pm-in-kernel-increase-limits'

Matthieu Baerts says:

====================
mptcp: pm: in-kernel: increase limits

Allow switching from 8 to 64 for the maximum number of subflows and
accepted ADD_ADDR, and from 8 to 255 for the number of MPTCP endpoints.

The previous limit of 8 subflows makes sense in most cases. Using more
subflows will very likely *not* improve the situation, and could even
decrease the performances. But there are no technical limitations nor
performance impact to raise this limit, so let's do it: this will allow
people with very specific use-cases, and researchers to easily create
more subflows, and measure the performance impact by themselves.

- Patches 1-2: increase subflows and accepted ADD_ADDR limits.

- Patches 3-4: increase endpoints limit.

- Patches 5-7: validate the new limits: 64 subflows, 255 endpoints.

- Patch 8: selftests: use send()/recv() instead of sendto()/recvfrom().
====================

Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-0-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:53 +0000 (17:40 +0200)]

selftests: mptcp: pm: use simpler send/recv forms

Instead of sendto() and recvfrom() which the NL address that was already
provided before.

Just simpler and easier to read without the to/from variants.

While at it, fix a checkpatch warning by removing multiple assignments.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-8-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:52 +0000 (17:40 +0200)]

selftests: mptcp: pm: validate new limits

These limits have been recently updated, from 8 to:

- 64 for the subflows and accepted add_addr

- 255 for the MPTCP endpoints

These modifications validate the new limits, but are also compatible
with the previous ones, to be able to continue to validate stable kernel
using the last version of the selftests. That's why new variables are
now used instead of hard-coded values.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-7-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:51 +0000 (17:40 +0200)]

selftests: mptcp: join: validate 8x8 subflows

The limits have been recently increased, it is required to validate that
having 64 subflows is allowed.

Here, both the client and the server have 8 network interfaces. The
server has 8 endpoints marked as 'signal' to announce all its v4
addresses. The client also has 8 endpoints, but marked as 'subflow' and
'fullmesh' in order to create 8 subflows to each address announced by
the server. This means 63 additional subflows will be created after the
initial one.

If it is not possible to increase the limits to 64, it means an older
kernel version is being used, and the test is skipped.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-6-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:50 +0000 (17:40 +0200)]

selftests: mptcp: join: allow changing ifaces nr per test

By default, 4 network interfaces are created per subtest in a dedicated
net namespace. Each netns has a dedicated pair of v4 and v6 addresses.
Future tests will need more.

Simply always creating more network interfaces per test will increase
the execution time for all other tests, for no other benefits. So now it
is possible to change this number only when needed, by setting ifaces_nr
when calling 'reset' and 'init_shapers', e.g.

ifaces_nr=8 reset "Subtest title"
ifaces_nr=8 init_shapers

Note that it might also be interesting to decrease the default value to
2 to reduce the setup time, especially when a debug kernel config is
being used.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-5-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:49 +0000 (17:40 +0200)]

mptcp: pm: in-kernel: increase endpoints limit

The endpoints are managed in a list which was limited to 8 entries.

This limit can be too small in some cases: by having the same limit as
the number of subflows, it might not allow creating all expected
subflows when having a mix of v4 and v6 addresses that can all use MPTCP
on v4/v6 only networks.

While increasing the limit above the new subflows one, why not using the
technical limit: 255. Indeed, the endpoint will each have an ID that
will be used on the wire, limited to u8, and the ID 0 is reserved to the
initial subflow.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-4-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:48 +0000 (17:40 +0200)]

mptcp: pm: kernel: allow flushing more than 8 endpoints

The mptcp_rm_list structure contains an array of IDs of 8 entries: to be
able to send a RM_ADDR with 8 IDs. This limitation was OK so far because
there could maximum 8 endpoints.

But this is going to change in the next commit. To cope with that, if
one of the arrays is full, the iteration stops, the lists are processed,
then the iteration continues where it previously stopped.

Note that if there are many endpoints to remove, and multiple RM_ADDR to
send, it might be more likely that some of these RM_ADDRs are dropped or
lost. This is a known limitation: RM_ADDR are not retransmitted in
MPTCPv1.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-3-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:47 +0000 (17:40 +0200)]

mptcp: pm: in-kernel: increase all limits to 64

This means switching the maximum from 8 to 64 for the number of subflows
and accepted ADD_ADDR.

The previous limit of 8 subflows makes sense in most cases. Using more
subflows will very likely *not* improve the situation, and could even
decrease the performances. But there are no technical limitations nor
performance impact to raise this limit, so let's do it: this will allow
people with very specific use-cases, and researchers to easily create
more subflows, and measure the performance impact by themselves.

The theoretical limit is 255 -- the ID is written in a u8 on the wire --
but 64 is more than enough. With so many subflows, it will be costly to
iterate over all of them when operations are done in bottom half.

Note that the in-kernel PM will continue to create subflows in reply to
ADD_ADDR with a single batch of maximum 8 subflows. Same when adding new
"subflow" endpoints with the fullmesh flag. Increasing those batch
limits would have a memory impact, and it looks fine not to cover these
cases with larger batches for the moment. If more is needed later, the
position of the last subflow from the list could be remembered, and the
list iteration could continue later.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/434
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-2-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts (NGI0) [Fri, 8 May 2026 15:40:46 +0000 (17:40 +0200)]

mptcp: pm: in-kernel: explicitly limit batches to array size

The in-kernel PM can create subflows in reply to ADD_ADDR by batch of
maximum 8 subflows for the moment. Same when adding new "subflow"
endpoints with the fullmesh flag. This limit is linked to the arrays
used during these steps.

There was no explicit limit to the arrays size (8), because the limit of
extra subflows is the same (8). It seems safer to use an explicit limit,
but also these two sizes are going to be different in the next commit.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-1-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Abel Vesa [Tue, 14 Apr 2026 17:05:51 +0000 (20:05 +0300)]

arm64: dts: qcom: glymur: Drop RPMh CXO clocks from QMP PHYs

On Glymur, all QMP PHYs except the one used by USB SS0 take their
reference clock from the TCSR clock controller. Since these TCSR clocks
already derive from RPMH_CXO_CLK as their sole parent, there is no need
to provide an extra `clkref` clock to the PHY nodes.

Drop the extra RPMh CXO clock inputs and use the TCSR clocks as the PHY
reference clocks instead.

This also fixes the devicetree schema validation, as the bindings do not
allow a separate `clkref` clock.

Fixes: 4eee57dd4df9 ("arm64: dts: qcom: glymur: Add USB related nodes")
Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reported-by: Rob Herring <robh@kernel.org>
Closes: https://lore.kernel.org/r/20260410145205.GA554754-robh@kernel.org/
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260414-dts-glymur-drop-rpmh-cxo-clk-from-qmpphys-v1-1-ab12d77c4aec@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Abel Vesa [Wed, 15 Apr 2026 07:52:56 +0000 (10:52 +0300)]

arm64: dts: qcom: glymur: Mark USB SS1 and SS2 as role-switch capable

Like USB SS0, the USB SS1 and SS2 controllers on Glymur also support
USB role switching.

Describe this by adding the 'usb-role-switch' property to both controllers.

Fixes: 4eee57dd4df9 ("arm64: dts: qcom: glymur: Add USB related nodes")
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260415-dts-qcom-glymur-usb-role-switch-fix-v1-1-409e1a257f1f@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Abel Vesa [Wed, 15 Apr 2026 07:52:57 +0000 (10:52 +0300)]

arm64: dts: qcom: glymur-crd: Drop forced host mode for USB SS0 and SS1

The two USB Type-C ports on Glymur CRD are dual-role capable.

Do not force their controllers into host mode. Drop the explicit
'dr_mode = "host"' properties so they can use their default OTG mode
instead.

Fixes: c8b63029455b ("arm64: dts: qcom: glymur-crd: Enable USB support")
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260415-dts-qcom-glymur-usb-role-switch-fix-v1-2-409e1a257f1f@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 8 May 2026 12:08:46 +0000 (12:08 +0000)]

tcp: Fix out-of-bounds access for twsk in tcp_ao_established_key().

lockdep_sock_is_held() was added in tcp_ao_established_key()
by the cited commit.

It can be called from tcp_v[46]_timewait_ack() with twsk.

Since it does not have sk->sk_lock, the lockdep annotation
results in out-of-bound access.

  $ pahole -C tcp_timewait_sock vmlinux | grep size
   /* size: 288, cachelines: 5, members: 8 */
  $ pahole -C sock vmlinux | grep sk_lock
   socket_lock_t              sk_lock;              /*   440   192 */

Let's not use lockdep_sock_is_held() for TCP_TIME_WAIT.

Fixes: 6b2d11e2d8fc ("net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260508120853.4098365-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Breno Leitao [Fri, 8 May 2026 13:55:04 +0000 (06:55 -0700)]

tools/bootconfig: render kernel.* subtree as cmdline string with -C

Add a -C option that finds the "kernel" subtree of a bootconfig file
and prints it as a flat, space-separated cmdline string by calling the
shared xbc_snprint_cmdline() renderer. An empty or absent kernel.*
subtree produces empty output and exits successfully.

This lets the kernel build embed a bootconfig file as a plain cmdline
string at build time, so embedded bootconfig values can reach
parse_early_param() during architecture setup without parsing the
bootconfig at runtime.

The renderer is intentionally limited to the kernel.* subtree: that is
the only thing the kernel build needs to embed; init.* and other
subtrees keep going through the runtime parser.

Example of this new mode:
# cat /tmp/test.bconf
kernel {
foo = bar
baz = "hello world"
arr = 1, 2
}
init.foo = nope

# ./tools/bootconfig/bootconfig -C /tmp/test.bconf
foo=bar baz="hello world" arr=1 arr=2 %

Link: https://lore.kernel.org/all/20260508-bootconfig_using_tools-v1-2-1132219aa773@debian.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

commit | commitdiff | tree

Breno Leitao [Fri, 8 May 2026 13:55:03 +0000 (06:55 -0700)]

bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c

Move xbc_snprint_cmdline() from init/main.c to lib/bootconfig.c so the
function (and its xbc_namebuf scratch buffer) becomes part of the shared
parser library. tools/bootconfig already compiles lib/bootconfig.c
directly, which lets a follow-up patch reuse the same renderer in the
userspace tool to convert a bootconfig file into a flat cmdline string
at build time.

No functional change.

Link: https://lore.kernel.org/all/20260508-bootconfig_using_tools-v1-1-1132219aa773@debian.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

commit | commitdiff | tree

David Yang [Thu, 7 May 2026 21:40:51 +0000 (05:40 +0800)]

net: mention the convention for .ndo_setup_tc()

qdisc_offload_dump_helper(), originated from commit 602f3baf2218
("net_sch: red: Add offload ability to RED qdisc"), is designed to that

    Whether RED is being offloaded is being determined every time dump
    action is being called because parent change of this qdisc could
    change its offload state but doesn't require any RED function to be
    called.

and returning -EOPNOTSUPP (for dump queries) does not mean "I don't have
any statistics", but "I don't offload this qdisc anymore". At least two
existing drivers did it wrong, so it is worth mentioning.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260507214054.2539790-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Arthur Kiyanovski [Thu, 7 May 2026 00:35:15 +0000 (00:35 +0000)]

net: ena: PHC: Check return code before setting timestamp output

ena_phc_gettimex64() is setting the output parameter regardless
of whether ena_com_phc_get_timestamp() succeeded or failed.

When ena_com_phc_get_timestamp() returns an error, the timestamp
parameter may contain uninitialized stack memory (e.g., when PHC is
disabled or in blocked state) or invalid hardware values. Passing
these to userspace via the PTP ioctl is both a security issue
(information leak) and a correctness bug.

Fix by checking the return code after releasing the lock and only
setting the output timestamp on success.

Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260507003518.22554-1-akiyano@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Allison Henderson [Tue, 5 May 2026 23:43:36 +0000 (16:43 -0700)]

net/rds: reset op_nents when zerocopy page pin fails

When iov_iter_get_pages2() fails in rds_message_zcopy_from_user(),
the pinned pages are released with put_page(), and
rm->data.op_mmp_znotifier is cleared. But we fail to properly
clear rm->data.op_nents.

Later when rds_message_purge() is called from rds_sendmsg() the
cleanup loop iterates over the incorrectly non zero number of
op_nents and frees them again.

Fix this by properly resetting op_nents when it should be in
rds_message_zcopy_from_user().

Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
Signed-off-by: Allison Henderson <achender@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260505234336.2132721-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kathiravan Thirumoorthy [Mon, 11 May 2026 16:54:54 +0000 (22:24 +0530)]

arm64: dts: qcom: ipq9650: add watchdog node

Add the watchdog device node for IPQ9650 SoC.

Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260511-ipq9650_wdt-v1-1-1948934c1e12@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Kathiravan Thirumoorthy [Thu, 7 May 2026 17:08:30 +0000 (22:38 +0530)]

arm64: dts: qcom: add IPQ9650 SoC and rdp488 board support

Add initial device tree support for the Qualcomm IPQ9650 SoC and
rdp488 board.

Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260507-ipq9650_boot_to_shell-v3-4-62742b49c991@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Kathiravan Thirumoorthy [Thu, 7 May 2026 17:08:29 +0000 (22:38 +0530)]

dt-bindings: qcom: add IPQ9650 boards

Document the new IPQ9650 SoC/board device tree bindings.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260507-ipq9650_boot_to_shell-v3-3-62742b49c991@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Bjorn Andersson [Mon, 11 May 2026 23:14:18 +0000 (18:14 -0500)]

Merge branch '20260507-ipq9650_boot_to_shell-v3-1-62742b49c991@oss.qualcomm.com' into arm64-for-7.2

Merge the QCS9650 GCC DeviceTree binding from topic branch, to get
access to clock and reset constants.

commit | commitdiff | tree

Kathiravan Thirumoorthy [Thu, 7 May 2026 17:08:27 +0000 (22:38 +0530)]

dt-bindings: clock: add Qualcomm IPQ9650 GCC

Add binding for the Qualcomm IPQ9650 Global Clock Controller.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260507-ipq9650_boot_to_shell-v3-1-62742b49c991@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

commit | commitdiff | tree

Florian Eckert [Fri, 17 Apr 2026 08:35:45 +0000 (10:35 +0200)]

MAINTAINERS: Remove Chuanhua Lei as PCIe intel-gw maintainer

Chuanhua Lei's email address has been bouncing for months. Remove the entry
and mark the PCI intel-gw driver as orphaned.

Signed-off-by: Florian Eckert <fe@dev.tdt.de>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260417-pcie-intel-gw-v5-1-0a2b933fe04f@dev.tdt.de

commit | commitdiff | tree

Ihor Solodrai [Sat, 9 May 2026 00:57:30 +0000 (17:57 -0700)]

selftests/bpf: Use both hrtimer enqueue helpers in vmlinux test

The vmlinux selftest triggers nanosleep and checks that both kprobe
and fentry programs observe the hrtimer enqueue path.

After the hrtimer_start_expires_user() conversion [1], nanosleep
reaches hrtimer_start_range_ns_user() instead of
hrtimer_start_range_ns(). Hard-coding either symbol makes the test
fail either on bpf tree or on linux-next [2].

Update the test to resolve the target symbol at runtime via
libbpf_find_vmlinux_btf_id(). This is a nice example of how to modify
a BPF program to work on both older and newer kernel revision.

[1] https://lore.kernel.org/all/20260408114952.062400833@kernel.org/
[2] https://github.com/kernel-patches/bpf/actions/runs/25485909958/job/74782902203

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260509005730.250956-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Viacheslav Dubeyko [Tue, 5 May 2026 22:00:52 +0000 (15:00 -0700)]

hfsplus: rework hfsplus_readdir() logic

The xfstests' test-case generic/637 fails with error:

FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.15.0-rc4+ #8 SMP PREEMPT_DYNAMIC Thu May 1 16:43:22 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

QA output created by 637
entries 7 and 8 have duplicate d_off 8
Found unlinked files in open dir (see xfstests-dev/results//generic/637.full for details)

Debugging of the hfsplus_readdir() logic showed this:

hfsplus: hfsplus_readdir(): 163 ctx->pos 0
hfsplus: hfsplus_readdir(): 189 ctx->pos 1
hfsplus: hfsplus_readdir(): 264 ctx->pos 2, ino 18
hfsplus: hfsplus_readdir(): 264 ctx->pos 3, ino 19
hfsplus: hfsplus_readdir(): 264 ctx->pos 4, ino 28
hfsplus: hfsplus_readdir(): 264 ctx->pos 5, ino 118
hfsplus: hfsplus_readdir(): 264 ctx->pos 6, ino 29
hfsplus: hfsplus_readdir(): 264 ctx->pos 7, ino 30
hfsplus: hfsplus_readdir(): 264 ctx->pos 8, ino 31
hfsplus: hfsplus_readdir(): 304 ctx->pos 8
hfsplus: hfsplus_unlink():420 dir->i_ino 17, inode->i_ino 28
hfsplus: hfsplus_readdir(): 141 ctx->pos 7
hfsplus: hfsplus_readdir(): 264 ctx->pos 7, ino 31
hfsplus: hfsplus_readdir(): 264 ctx->pos 8, ino 32
hfsplus: hfsplus_readdir(): 264 ctx->pos 9, ino 33

It means that hfsplus_readdir() stopped the processing of
folder's items on ctx->pos 8, then, item with ino 28 has
been deleted and hfsplus_readdir() re-started the logic
from ctx->pos 7. As a result, previous and new sets of
folder's items have overlapping values for the case of
d_off 8.

Currently, HFS+ has very complicated and fragile logic
of rd->file->f_pos correction in hfsplus_delete_cat().
This patch removes this logic and it stores the current
pos into hfsplus_readdir_data. Finally, if rd->pos == ctx->pos
then hfsplus_readdir() tries to find the position in
b-tree's node by means of hfsplus_cat_key. This position is
used to re-start the folder's content traversal.

sudo ./check generic/637
FSTYP         -- hfsplus
PLATFORM      -- Linux/x86_64 hfsplus-testing-0001 7.1.0-rc1+ #44 SMP PREEMPT_DYNAMIC Mon May  4 15:58:45 PDT 2026
MKFS_OPTIONS  -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/637  22s ...  22s
Ran: generic/637
Passed all 1 tests

Closes: https://github.com/hfs-linux-kernel/hfs-linux-kernel/issues/198
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20260505220051.2854696-2-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>

commit | commitdiff | tree

Johannes Thumshirn [Wed, 29 Apr 2026 20:58:15 +0000 (22:58 +0200)]

zonefs: handle integer overflow in zonefs_fname_to_fno

In zonefs the file name in one of the two directories corresponds to the
zone number.

Here Alexey reported a possible integer overflow in zonefs_fname_to_fno(),
where the parsing of the zone number from the file name can overflow the
'long' data type.

Add a check for integer overflows and if the fno 'long' did overflow
return -ENOENT.

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Fixes: d207794ababe ("zonefs: Dynamically create file inodes when needed")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

commit | commitdiff | tree

Tejun Heo [Mon, 11 May 2026 22:43:39 +0000 (12:43 -1000)]

Merge branch 'for-7.1-fixes' into for-7.2

Pull to receive:

9a415cc53711 ("sched_ext: Avoid UAF in scx_root_enable_workfn() init failure path")

Conflicts with for-7.2's scx_task_iter_relock() rework. The fix moves
put_task_struct(p) past scx_error(); for-7.2 still has it at the old
position. Resolved by dropping the old one.

Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Mon, 11 May 2026 22:38:49 +0000 (15:38 -0700)]

Merge tag 'linux_kselftest-kunit-fixes-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fixes from Shuah Khan:
"Fix to decouple KUNIT_DEBUGFS and KUNIT_ALL_TESTS options and fix
  KUNIT_DEBUGFS dependencies so it depends on DEBUG_FS without which it
  will not be useful"

* tag 'linux_kselftest-kunit-fixes-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: config: KUNIT_DEBUGFS should depend on DEBUG_FS
  kunit: config: Enable KUNIT_DEBUGFS by default

commit | commitdiff | tree

Alexei Starovoitov [Mon, 11 May 2026 22:25:24 +0000 (15:25 -0700)]

Merge branch 'selftests-bpf-add-xdp-load-balancer-benchmark'

Puranjay Mohan says:

====================
selftests/bpf: Add XDP load-balancer benchmark

Changelog:
RFC: https://lore.kernel.org/all/20260420111726.2118636-1-puranjay@kernel.org/
Changes in v1:
- Replace bpf_get_cpu_time_counter() with bpf_ktime_get_ns()
- Replace bpf_repeat() with plain for loop and may_goto
- Refactor collect_measurements() to reuse bench_force_done()
- Remove histogram, verbose calibration output, and per-scenario status prints
- Trim run script table to p50/stddev/p99
- Set env.quiet when --machine-readable is passed
- Add || true to run script benchmark invocation for set -e safety
- Add bpf-nop benchmark as timing overhead baseline (patch 3)
- Use named struct for LRU inner map to fix build on older toolchains

This series adds an XDP load-balancer benchmark (based on Katran) to the BPF
selftest bench framework.

Motivation
----------

Existing BPF bench tests measure individual operations (map lookups,
kprobes, ring buffers) in isolation.  Production BPF programs combine
parsing, map lookups, branching, and packet rewriting in a single call
chain.  The performance characteristics of such programs depend on the
interaction of these operations -- register pressure, spills, inlining
decisions, branch layout -- which isolated micro-benchmarks do not
capture.

This benchmark implements a simplified L4 load-balancer modeled after
katran [1].  The BPF program reproduces katran's core datapath:

  L3/L4 parsing -> VIP hash lookup -> per-CPU LRU connection table
  with consistent-hash fallback -> real server selection -> per-VIP
  and per-real stats -> IPIP/IP6IP6 encapsulation

The BPF code exercises hash maps, array-of-maps (per-CPU LRU),
percpu arrays, jhash, bpf_xdp_adjust_head(), bpf_ktime_get_ns(),
and bpf_get_smp_processor_id() in a single pipeline.

This is intended as the first in a series of BPF workload benchmarks
covering other use cases (sched_ext, etc.).

Design
------

A userspace loop calling bpf_prog_test_run_opts(repeat=1) would
measure syscall overhead, not BPF program cost -- the ~4 ns early-exit
paths would be buried under kernel entry/exit.  Using repeat=N is
also unsuitable: the kernel re-runs the same packet without resetting
state between iterations, so the second iteration of an encap scenario
would process an already-encapsulated packet.

Instead, timing is measured inside the BPF program using
bpf_ktime_get_ns().  BENCH_BPF_LOOP() brackets N iterations with
timestamp reads using a plain for loop with may_goto, runs a
caller-supplied reset block between iterations to undo side effects
(e.g. strip encapsulation), and records the elapsed time per batch.
One extra untimed iteration runs afterward for output validation.

Auto-calibration picks a batch size targeting ~10 ms per invocation.
A proportionality sanity check verifies that 2N iterations take ~2x
as long as N.

24 scenarios cover the code-path matrix:

  - Protocol: TCP, UDP
  - Address family: IPv4, IPv6, cross-AF (IPv4-in-IPv6)
  - LRU state: hit, miss (16M flow space), diverse (4K flows), cold
  - Consistent-hash: direct (LRU bypass)
  - TCP flags: SYN (skip LRU, force CH), RST (skip LRU insert)
  - Early exits: unknown VIP, non-IP, ICMP, fragments, IP options

Each scenario validates correctness before benchmarking by comparing
the output packet byte-for-byte against a pre-built expected packet
and checking BPF map counters.

Sample single-scenario output:

  $ sudo ./bench xdp-lb --scenario tcp-v4-lru-hit
  Setting up benchmark 'xdp-lb'...
  Benchmark 'xdp-lb' started.
  tcp-v4-lru-hit: median 74.51 ns/op, stddev 0.11, p99 74.81 (202 samples)

Sample run script output:

  $ ./benchs/run_bench_xdp_lb.sh

  XDP load-balancer benchmark
  ===========================
  +----------------------------------+----------+---------+----------+
  | Single-flow baseline             |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-lru-hit                   |    74.30 |    0.08 |    74.48 |
  | tcp-v4-ch                        |   101.73 |    0.11 |   102.01 |
  | tcp-v6-lru-hit                   |    76.77 |    0.14 |    77.04 |
  | tcp-v6-ch                        |   121.40 |    0.10 |   121.65 |
  | udp-v4-lru-hit                   |   107.42 |    0.22 |   107.90 |
  | udp-v6-lru-hit                   |   110.21 |    0.12 |   110.45 |
  | tcp-v4v6-lru-hit                 |    74.82 |    0.35 |    75.43 |
  +----------------------------------+----------+---------+----------+
  | Diverse flows (4K src addrs)     |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-lru-diverse               |    86.63 |    0.37 |    89.04 |
  | tcp-v4-ch-diverse                |   104.09 |    0.19 |   105.67 |
  | tcp-v6-lru-diverse               |    89.34 |    0.42 |    90.70 |
  | tcp-v6-ch-diverse                |   122.20 |    0.21 |   123.78 |
  | udp-v4-lru-diverse               |   119.37 |    0.58 |   123.10 |
  +----------------------------------+----------+---------+----------+
  | TCP flags                        |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-syn                       |   165.52 |   15.68 |   198.34 |
  | tcp-v4-rst-miss                  |   161.34 |    2.69 |   172.64 |
  +----------------------------------+----------+---------+----------+
  | LRU stress                       |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-lru-miss                  |   440.39 |   35.75 |   550.62 |
  | udp-v4-lru-miss                  |   571.88 |   57.38 |   680.61 |
  | tcp-v4-lru-warmup                |   317.75 |    9.55 |   356.20 |
  +----------------------------------+----------+---------+----------+
  | Early exits                      |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | pass-v4-no-vip                   |    18.26 |    0.13 |    18.66 |
  | pass-v6-no-vip                   |    19.08 |    0.01 |    19.10 |
  | pass-v4-icmp                     |     6.81 |    0.02 |     6.86 |
  | pass-non-ip                      |     5.71 |    0.03 |     5.76 |
  | drop-v4-frag                     |     6.09 |    0.01 |     6.10 |
  | drop-v4-options                  |     5.88 |    0.00 |     5.89 |
  | drop-v6-frag                     |     6.00 |    0.03 |     6.04 |
  +----------------------------------+----------+---------+----------+

Patches
-------

Patch 1 adds bench_force_done() to the bench framework so benchmarks
can signal early completion when enough samples have been collected.

Patch 2 adds the shared BPF batch-timing library (BPF-side timing
arrays, BENCH_BPF_LOOP macro, userspace statistics and calibration).

Patch 3 adds a bpf-nop benchmark as a timing overhead baseline and
usage example for the timing library.

Patch 4 adds the common header shared between the BPF program and
userspace (flow_key, vip_definition, real_definition, encap helpers).

Patch 5 adds the XDP load-balancer BPF program.

Patch 6 adds the userspace benchmark driver with 24 scenarios,
packet construction, validation, and bench framework integration.

Patch 7 adds the run script for running all scenarios.

[1] https://github.com/facebookincubator/katran
====================

Link: https://patch.msgid.link/20260427232313.1582588-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Mon, 27 Apr 2026 23:23:04 +0000 (16:23 -0700)]

selftests/bpf: Add XDP load-balancer benchmark run script

Add a convenience script that runs all 24 XDP load-balancer scenarios
and formats the results as a table with median, stddev, and p99
columns.

./benchs/run_bench_xdp_lb.sh

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-8-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Mon, 27 Apr 2026 23:23:03 +0000 (16:23 -0700)]

selftests/bpf: Add XDP load-balancer benchmark driver

Wire up the userspace side of the XDP load-balancer benchmark.

24 scenarios cover the full code-path matrix: TCP/UDP, IPv4/IPv6,
cross-AF encap, LRU hit/miss/diverse/cold, consistent-hash bypass,
SYN/RST flag handling, and early exits (unknown VIP, non-IP, ICMP,
fragments, IP options).

Before benchmarking each scenario validates correctness: the output
packet is compared byte-for-byte against a pre-built expected packet
and BPF map counters are checked against the expected values.

Usage:
sudo ./bench -a -w3 -p1 xdp-lb --scenario tcp-v4-lru-hit
sudo ./bench xdp-lb --list-scenarios

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-7-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Mon, 27 Apr 2026 23:23:02 +0000 (16:23 -0700)]

selftests/bpf: Add XDP load-balancer BPF program

Add the BPF datapath for the XDP load-balancer benchmark, a
simplified L4 load-balancer inspired by katran.

The pipeline: L3/L4 parse -> VIP lookup -> per-CPU LRU connection
table or consistent-hash fallback -> real server lookup -> per-VIP
and per-real stats -> IPIP/IP6IP6 encapsulation. TCP SYN forces
the consistent-hash path (skipping LRU); TCP RST skips LRU insert
to avoid polluting the table.

process_packet() is marked __noinline so that the BENCH_BPF_LOOP
reset block (which strips encapsulation) operates on valid packet
pointers after bpf_xdp_adjust_head().

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-6-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Mon, 27 Apr 2026 23:23:01 +0000 (16:23 -0700)]

selftests/bpf: Add XDP load-balancer common definitions

Add the shared header for the XDP load-balancer benchmark. This
defines the data structures used by both the BPF program and
userspace: flow_key, vip_definition, real_definition, and the
stats/control structures.

Also provides the encapsulation source-address helpers shared
between the BPF datapath (for encap) and userspace (for building
expected output packets used in validation).

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Mon, 27 Apr 2026 23:23:00 +0000 (16:23 -0700)]

selftests/bpf: Add bpf-nop benchmark for timing overhead baseline

Add a minimal benchmark that measures the overhead of the batch-timing
infrastructure itself. The BPF program runs an empty BENCH_BPF_LOOP body
(~1.5-2 ns/op), establishing the floor cost that all timing-library
benchmarks include.

[root@virtme-ng tools/testing/selftests/bpf]# sudo ./bench -a -p8 bpf-nop
Setting up benchmark 'bpf-nop'...
Benchmark 'bpf-nop' started.
bpf-nop: median 1.82 ns/op, stddev 0.01, p99 1.86 (1754 samples)

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Mon, 27 Apr 2026 23:22:59 +0000 (16:22 -0700)]

selftests/bpf: Add BPF batch-timing library

Add a reusable timing library for BPF benchmarks that need to measure
BPF program execution time.

The BPF side (progs/bench_bpf_timing.bpf.h) provides per-CPU sample
arrays and BENCH_BPF_LOOP(), a macro that brackets batch_iters
iterations with bpf_ktime_get_ns() reads and records the elapsed time.
One extra untimed iteration runs afterward for output validation.

The userspace side (benchs/bench_bpf_timing.c) collects samples from
the skeleton BSS, computes percentile statistics, and auto-calibrates
batch_iters to target ~10 ms per batch.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Mon, 27 Apr 2026 23:22:58 +0000 (16:22 -0700)]

selftests/bpf: Add bench_force_done() for early benchmark completion

The bench framework waits for duration_sec to elapse before collecting
results. Benchmarks that know exactly how many samples they need can
call bench_force_done() to signal completion early, avoiding wasted
wall-clock time.

Also refactor collect_measurements() to reuse bench_force_done()
instead of open-coding the same mutex/cond_signal sequence.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Tejun Heo [Mon, 11 May 2026 22:05:48 +0000 (12:05 -1000)]

sched_ext: Avoid UAF in scx_root_enable_workfn() init failure path

In scx_root_enable_workfn(), put_task_struct(p) is called before scx_error()
dereferences p->comm and p->pid. If the iterator's reference is the last
drop, the task is freed synchronously and the deref becomes a UAF.

Move put_task_struct() past scx_error().

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/all/20260511214031.AF5E9C2BCB0@smtp.kernel.org/
Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Jesse Zhang [Fri, 3 Apr 2026 07:58:31 +0000 (15:58 +0800)]

drm/amdgpu/gfx_v12_0: set gfx.rs64_enable from PFP header on GFX12

gfx_v12_0_init_microcode() always loads RS64 CP ucode but never set
adev->gfx.rs64_enable, so it stayed false and code that branches on it
(e.g. MEC pipe reset) used the legacy CP_MEC_CNTL path incorrectly.

Match GFX11: derive RS64 mode from the PFP firmware header (v2.0) via
amdgpu_ucode_hdr_version(). Log at debug when RS64 is enabled.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b03d53598b0d2048e8fa7303b8d0784768ec4fa6)

commit | commitdiff | tree

Xiang Liu [Thu, 7 May 2026 12:56:15 +0000 (20:56 +0800)]

drm/amd/ras: Fix CPER ring debugfs read overflow

The legacy CPER debugfs reader can reach the payload path without a
valid pointer snapshot. The remaining user byte count is also treated as
the ring occupancy in dwords, so reads past the header can copy more than
requested.

Take the CPER lock before sampling pointers. Resample rptr/wptr for
payload reads, bound the payload copy by available dwords and the
remaining user size, and advance the file position for each dword copied.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1e40ef87ffdc291e05ccdade8b9170cc9c1c4249)

commit | commitdiff | tree

Mikhail Gavrilov [Tue, 5 May 2026 01:05:37 +0000 (09:05 +0800)]

drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED

[Why]
dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with
DC_FP_START()/DC_FP_END(). In x86 non-RT, DC_FP_START takes fpregs_lock(),
which disables local softirqs.

The DML1 path through dcn32_enable_phantom_plane() calls kvzalloc() to
allocate ~335 KiB for dc_plane_state. This triggers the vmalloc path,
which calls BUG_ON(in_interrupt()) because it's invoked within the
FPU-enabled (softirq disabled) region, leading to a kernel crash.

[How]
Wrap the dc_state_create_phantom_plane() call with the
DC_RUN_WITH_PREEMPTION_ENABLED() macro to allow preemption during
this memory allocation.

Fixes: 235c67634230 ("drm/amd/display: add DCN32/321 specific files for Display Core")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4470
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: James Lin <pinglei.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 885ccbef7b94a8b38f69c4211c679021aa27ad11)
Cc: stable@vger.kernel.org

commit | commitdiff | tree

Christian König [Mon, 20 Apr 2026 14:08:35 +0000 (16:08 +0200)]

drm/amdgpu: fix userq hang detection and reset

Fix lock inversions pointed out by Prike and Sunil. The hang detection
timeout *CAN'T* grab locks under which we wait for fences, especially
not the userq_mutex lock.

Then instead of this completely broken handling with the
hang_detect_fence just cancel the work when fences are processed and
re-start if necessary.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)

commit | commitdiff | tree

Christian König [Mon, 20 Apr 2026 13:13:57 +0000 (15:13 +0200)]

drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues

Well the reset handling seems broken on multiple levels.

As first step of fixing this remove most calls to the hang detection.
That function should only be called after we run into a timeout! And *NOT*
as random check spread over the code in multiple places.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)

commit | commitdiff | tree

Christian König [Thu, 16 Apr 2026 13:32:11 +0000 (15:32 +0200)]

drm/amdgpu: rework amdgpu_userq_signal_ioctl v3

This one was fortunately not looking so bad as the wait ioctl path, but
there were still a few things which could be fixed/improved:

1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that
   before taking the userq_lock.
2. Use a new mutex as protection for the fence_drv_xa so that we can do
   memory allocations while holding it.
3. Starting the reset timer is unnecessary when the fence is already
   signaled when we create it.
4. Cleanup error handling, avoid trying to free the queue when we don't
   even got one.

v2: fix incorrect usage of xa_find, destroy the new mutex on error
v3: cleanup ref ordering

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)

commit | commitdiff | tree

Christian König [Mon, 20 Apr 2026 18:18:43 +0000 (20:18 +0200)]

drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset

The purpose of a GPU reset is to make sure that fence can be signaled
again and the signal and resume workers can make progress again.

So waiting for the resume worker or any fence in the GPU reset path is
just utterly nonsense.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fcd5f065eab46993af43442fd77ee8d9eb9c5bdf)

commit | commitdiff | tree

Tejun Heo [Mon, 11 May 2026 21:06:05 +0000 (11:06 -1000)]

sched_ext: Mark !CONFIG_EXT_SUB_SCHED dummy stubs static inline

Mark !CONFIG_EXT_SUB_SCHED dummy stubs static inline to avoid
-Wunused-function in configs without callers. No functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Christian Brauner [Mon, 27 Apr 2026 15:51:43 +0000 (17:51 +0200)]

Merge patch series "proc: subset=pid: Relax check of mount visibility"

Alexey Gladkov <legion@kernel.org> says:

When mounting procfs with the subset=pids option, all static files become
unavailable and only the dynamic part with information about pids is accessible.

In this case, there is no point in imposing additional restrictions on the
visibility of the entire filesystem for the mounter. Everything that can be
hidden in procfs is already inaccessible.

Currently, these restrictions prevent procfs from being mounted inside rootless
containers, as almost all container implementations override part of procfs to
hide certain directories. Relaxing these restrictions will allow pidfs to be
used in nested containerization.

* patches from https://patch.msgid.link/cover.1777278334.git.legion@kernel.org:
  docs: proc: add documentation about mount restrictions
  proc: handle subset=pid separately in userns visibility checks
  proc: prevent reconfiguring subset=pid
  proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN
  sysfs: remove trivial sysfs_get_tree() wrapper
  fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED
  namespace: record fully visible mounts in list

Link: https://patch.msgid.link/cover.1777278334.git.legion@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Alexey Gladkov [Mon, 27 Apr 2026 08:26:08 +0000 (10:26 +0200)]

docs: proc: add documentation about mount restrictions

procfs has a number of mounting restrictions that are not documented
anywhere.

Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://patch.msgid.link/e7cb804df3c1759ee17cf9df1dc4c211d63d7a5f.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Alexey Gladkov [Mon, 27 Apr 2026 08:26:07 +0000 (10:26 +0200)]

proc: handle subset=pid separately in userns visibility checks

When procfs is mounted with subset=pid, only the dynamic process-related
part of the filesystem remains visible. That part cannot be hidden by
overmounts, so checking whether an existing procfs mount is fully
visible does not make sense for this mode.

At the same time, a subset=pid procfs mount must not be used as evidence
that a later procfs mount would not reveal additional information. It
provides a restricted view of procfs, not the full filesystem view.

Mark subset=pid procfs instances as restricted variants. Ignore
restricted variants when looking for an already-visible mount, and allow
new restricted variants without consulting mnt_already_visible().

Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://patch.msgid.link/4d5e760c3d534dd2e05578d119cc408450053a98.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Alexey Gladkov [Mon, 27 Apr 2026 08:26:06 +0000 (10:26 +0200)]

proc: prevent reconfiguring subset=pid

Changing subset=pid on an existing procfs instance is not safe. If a
full procfs mount has entries hidden by overmounts, switching it to
subset=pid would hide the top-level procfs entries from lookup and
readdir while leaving the existing overmounts reachable.

Reject attempts to change the subset=pid state during reconfigure before
applying any other procfs mount options, so a failed reconfigure cannot
partially update the instance.

Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://patch.msgid.link/13295f40f642af5d6e7038d681d43132ad80f7b2.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Christian Brauner [Mon, 27 Apr 2026 14:49:01 +0000 (16:49 +0200)]

Merge patch series "revamp fs/filesystems.c"

Mateusz Guzik <mjguzik@gmail.com> says:

The file is a mess with a hand-rolled linked list in a desperate need of
a clean up.

The code to emit /proc/filesystems is used frequently because libselinux
reads the file, which in turn is linked into numerous frequently used
programs (even ones you would not suspect, like sed!). In order to
combat that pre-gen the string instead of pointer-chasing and printfing
one by-one.

open+read+close cycle single-threaded (ops/s):
before: 442732
after:  1063462 (+140%)

Additionally scalability is also improved thanks to bypassing ref
maintenance on open/close.

open+read+close cycle with 20 processes (ops/s):
before: 606177
after: 3300576 (+444%)

The main bottleneck afterwards is the spurious lockref trip on open.

* patches from https://patch.msgid.link/20260425220844.1763933-1-mjguzik@gmail.com:
  fs: cache the string generated by reading /proc/filesystems
  fs: RCU-ify filesystems list
  proc: allow to mark /proc files permanent outside of fs/proc/

Link: https://patch.msgid.link/20260425220844.1763933-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Alexey Gladkov [Mon, 27 Apr 2026 08:26:05 +0000 (10:26 +0200)]

proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN

Cache the mounters credentials and allow access to the net directories
contingent of the permissions of the mounter of proc.

Do not show /proc/self/net when proc is mounted with subset=pid option
and the mounter does not have CAP_NET_ADMIN. To avoid inadvertently
allowing access to /proc/<pid>/net, updating mounter credentials is not
supported.

Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://patch.msgid.link/d2466fe9085367f1e24693c437ecb8cff2789660.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Mateusz Guzik [Sat, 25 Apr 2026 22:08:44 +0000 (00:08 +0200)]

fs: cache the string generated by reading /proc/filesystems

It is being read surprisingly often (e.g., by mkdir, ls and even sed!).

This is lock-protected pointer chasing over a linked list to pay for
sprintf for every fs (32 on my boxen).

Instead cache the result.

While here make the file as permanent to avoid spurious ref trips in
procfs.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260425220844.1763933-4-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Christian Brauner [Mon, 27 Apr 2026 08:26:04 +0000 (10:26 +0200)]

sysfs: remove trivial sysfs_get_tree() wrapper

Now that FS_USERNS_MOUNT_RESTRICTED is a file_system_type flag,
sysfs_get_tree() is a trivial wrapper around kernfs_get_tree() with no
additional logic. Point sysfs_fs_context_ops.get_tree directly at
kernfs_get_tree() and remove the wrapper.

Link: https://patch.msgid.link/e8ac71fc96ad864c8b58fc0a8e5305550c01db25.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Christian Brauner [Sat, 25 Apr 2026 22:08:43 +0000 (00:08 +0200)]

fs: RCU-ify filesystems list

The drivers list was protected by an rwlock; every mount, every open
of /proc/filesystems and the legacy sysfs(2) syscall walked a
hand-rolled singly-linked list under it. /proc/filesystems is
especially hot because libselinux causes programs as mundane as
mkdir, ls and sed to open and read it on every invocation.

Convert the list to an RCU-protected hlist and switch the writer side
to a plain spinlock. Writers keep their existing non-sleeping
section while readers walk under rcu_read_lock() with no lock traffic:

  - register_filesystem()/unregister_filesystem() take
    file_systems_lock, publish via hlist_{add_tail,del_init}_rcu()
    and invalidate the cached /proc/filesystems string.
    unregister_filesystem() keeps its synchronize_rcu() after
    dropping the lock so in-flight readers are drained before the
    module (and its embedded file_system_type) can go away.

  - __get_fs_type(), list_bdev_fs_names() and the
    fs_index()/fs_name()/fs_maxindex() helpers walk the list under
    rcu_read_lock(). fs_name() continues to drop the read-side
    lock after try_module_get() and accesses ->name outside the RCU
    section; the module reference pins the embedded file_system_type
    across the boundary.

struct file_system_type::next becomes struct hlist_node list; no
in-tree caller references the old ->next field outside
fs/filesystems.c.

Link: https://patch.msgid.link/20260425220844.1763933-3-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Christian Brauner [Mon, 27 Apr 2026 08:26:03 +0000 (10:26 +0200)]

fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED

Whether a filesystem's mounts need to undergo a visibility check in user
namespaces is a static property of the filesystem type, not a runtime
property of each superblock instance. Both proc and sysfs always set
SB_I_USERNS_VISIBLE on their superblocks unconditionally (sysfs does so
on first creation, and subsequent mounts reuse the same superblock).

Move this flag from sb->s_iflags (SB_I_USERNS_VISIBLE) to
file_system_type->fs_flags (FS_USERNS_MOUNT_RESTRICTED) so the intent
is expressed at the filesystem type level where it belongs.

All check sites are updated to test sb->s_type->fs_flags instead of
sb->s_iflags. The SB_I_NOEXEC and SB_I_NODEV flags remain on the
superblock as they are runtime properties set during fill_super.

Link: https://patch.msgid.link/72887c5b6204dc3adf5a53104f0be6bd8bc4f6cd.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Alexey Dobriyan [Sat, 25 Apr 2026 22:08:42 +0000 (00:08 +0200)]

proc: allow to mark /proc files permanent outside of fs/proc/

Add proc_make_permanent() function to mark PDE as permanent to speed up
open/read/close (one alloc/free and lock/unlock less).

Enable it for built-in code and for compiled-in modules.
This function becomes nop magically in modular code.

Note, note, note!

If built-in code creates and deletes PDEs dynamically (not in init
hook), then proc_make_permanent() must not be used.

It is intended for simple code:

static int __init xxx_module_init(void)
{
g_pde = proc_create_single();
proc_make_permanent(g_pde);
return 0;
}
static void __exit xxx_module_exit(void)
{
remove_proc_entry(g_pde);
}

If module is built-in then exit hook never executed and PDE is
permanent so it is OK to mark it as such.

If module is module then rmmod will yank PDE, but proc_make_permanent()
is nop and core /proc code will do everything right.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: https://patch.msgid.link/20260425220844.1763933-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Christian Brauner [Mon, 27 Apr 2026 08:26:02 +0000 (10:26 +0200)]

namespace: record fully visible mounts in list

Instead of wading through all the mounts in the mount namespace rbtree
to find fully visible procfs and sysfs mounts, be honest about them
being special cruft and record them in a separate per-mount namespace
list.

Link: https://patch.msgid.link/684859a8e0ac929cb89c1fbe16ce15b30c70eb1f.1777278334.git.legion@kernel.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Mateusz Guzik [Thu, 23 Apr 2026 17:04:31 +0000 (19:04 +0200)]

fs: retire stale lock ordering annotations from inode hash

1. iunique does not take the hash lock as of:
3f19b2ab97a97b41 ("vfs, afs, ext4: Make the inode hash table RCU searchable")
2. s_inode_list_lock is no longer taken under the hash lock as of:
c918f15420e336a9 ("fs: call inode_sb_list_add() outside of inode hash lock")

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260423170431.1483370-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Christian Brauner [Wed, 22 Apr 2026 13:00:32 +0000 (15:00 +0200)]

Merge patch series "assorted ->i_count changes + extension of lockless handling"

Mateusz Guzik <mjguzik@gmail.com> says:

The stock kernel support partial lockless in handling in that iput() can
decrement any value > 1. Any ref acquire however requires the spinlock.

With this patchset ref acquires when the value was already at least 1
also become lockless. That is, only transitions 0->1 and 1->0 take the
lock.

I verified when nfs calls into the hash taking the lock is typically
avoided. Similarly, btrfs likes to igrab() and avoids the lock.
However, I have to fully admit I did not perform any benchmarks. While
cleaning stuff up I noticed lockless operation is almost readily
available so I went for it.

Clean-up wise, the icount_read_once() stuff lines up with
inode_state_read_once(). The prefix is different but I opted to not
change it due to igrab(), ihold() et al.

* patches from https://patch.msgid.link/20260421182538.1215894-1-mjguzik@gmail.com:
  fs: allow lockless ->i_count bumps as long as it does not transition 0->1
  fs: relocate and tidy up ihold()
  fs: add icount_read_once() and stop open-coding ->i_count loads

Link: https://patch.msgid.link/20260421182538.1215894-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Mateusz Guzik [Tue, 21 Apr 2026 18:25:38 +0000 (20:25 +0200)]

fs: allow lockless ->i_count bumps as long as it does not transition 0->1

With this change only 0->1 and 1->0 transitions need the lock.

I verified all places which look at the refcount either only care about
it staying 0 (and have the lock enforce it) or don't hold the inode lock
to begin with (making the above change irrelevant to their correcness or
lack thereof).

I also confirmed nfs and btrfs like to call into these a lot and now
avoid the lock in the common case, shaving off some atomics.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260421182538.1215894-4-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Mateusz Guzik [Tue, 21 Apr 2026 18:25:37 +0000 (20:25 +0200)]

fs: relocate and tidy up ihold()

The placement was illogical, move it next to igrab().

Take this opportunity to add docs and an assert on the refcount. While
its modification remains gated with a WARN_ON, the new assert will also
dump the inode state which might aid debugging.

No functional changes.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260421182538.1215894-3-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Mateusz Guzik [Tue, 21 Apr 2026 18:25:36 +0000 (20:25 +0200)]

fs: add icount_read_once() and stop open-coding ->i_count loads

Similarly to inode_state_read_once(), it makes the caller spell out
they acknowledge instability of the returned value.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260421182538.1215894-2-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

commit | commitdiff | tree

Matthew Auld [Fri, 8 May 2026 10:26:37 +0000 (11:26 +0100)]

drm/xe/dma-buf: fix UAF with retry loop

Retry doesn't work here, since bo will be freed on error, leading to
UAF. However, now that we do the alloc & init before the attach, we can
now combine this as one unit and have the init do the alloc for us. This
should make the retry safe.

Reported by Sashiko.

v2: Fix up the error unwind (CI)

Closes: https://sashiko.dev/#/patchset/20260506184332.86743-2-matthew.auld%40intel.com
Fixes: eb289a5f6cc6 ("drm/xe: Convert xe_dma_buf.c for exhaustive eviction")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.18+
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260508102635.149172-4-matthew.auld@intel.com
(cherry picked from commit 479669418253e0f27f8cf5db01a731352ea592e7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

commit | commitdiff | tree

Matthew Auld [Fri, 8 May 2026 10:26:36 +0000 (11:26 +0100)]

drm/xe/dma-buf: handle empty bo and UAF races

There look to be some nasty races here when triggering the
invalidate_mappings hook:

1) We do xe_bo_alloc() followed by the attach, before the actual full bo
   init step in xe_dma_buf_init_obj(). However the bo is visible on the
   attachments list after the attach.  This is bad since exporter driver,
   say amdgpu, can at any time call back into our invalidate_mappings hook,
   with an empty/bogus bo, leading to potential bugs/crashes.

2) Similar to 1) but here we get a UAF, when the invalidate_mappings
   hook is triggered. For example, we get as far as xe_bo_init_locked()
   but this fails in some way. But here the bo will be freed on error, but
   we still have it attached from dma-buf pov, so if the
   invalidate_mappings is now triggered then the bo we access is gone and
   we trigger UAF and more bugs/crashes.

To fix this, move the attach step until after we actually have a fully
set up buffer object. Note that the bo is not published to userspace
until later, so not sure what the comment "Don't publish the bo
until we have a valid attachment", is referring to.

We have at least two different customers reporting hitting a NULL ptr
deref in evict_flags when importing something from amdgpu, followed by
triggering the evict flow. Hit rate is also pretty low, which would
hint at some kind of race, so something like 1) or 2) might explain
this.

v2:
  - Shuffle the order of the ops slightly (no functional change)
  - Improve the comment to better explain the ordering (Matt B)

Assisted-by: Gemini:gemini-3 #debug
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7903
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/4055
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260508102635.149172-3-matthew.auld@intel.com
(cherry picked from commit af1f2ad0c59fe4e2f924c526f66e968289d77971)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

commit | commitdiff | tree

Matt Roper [Thu, 7 May 2026 21:00:15 +0000 (14:00 -0700)]

drm/xe: Make decision to use Xe2-style blitter instructions a feature flag

The blitter engines' MEM_COPY and MEM_SET instructions were added as
part of the same hardware change that introduced service copy engines
(i.e., BCS1-BCS8) which is why the driver checks for service copy engine
presence when deciding whether to use these instructions or the older
XY_* instructions.  However when making this decision the driver should
consider which engines are part of the hardware architecture, not which
engines are present/usable on the current device.  For graphics IP
versions that architecturally include service copy engines (i.e.,
everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and
MEM_COPY even in if all of the service copy engines wind up getting
fused off.  I.e., we need to decide based on whether the platform's
graphics descriptor contains these engines, rather than whether the
usable engine mask contains them.  This logic got broken when
gt->info.__engine_mask was removed, although in practice that mistake
has been harmless so far because there haven't been any hardware
SKUs that fuse off all of the service copy engines yet.

Replace the incorrect has_service_copy_support() function with a GT
feature flag that tracks more accurately whether the new blitter
instructions are usable.  In addition to fixing incorrect logic if all
service copies are fused off, the flag also makes it more obvious what
the calling code is trying to do; previously it wasn't terribly obvious
why "has service copy engines" was being used as the condition for using
different instructions on all copy engine types.

The new feature flag is named 'has_xe2_blt_instructions' because we
expect this flag to be set for all Xe2 and later platforms (i.e.,
everything officially supported by the Xe driver).  Technically there's
also one Xe1-era platform (PVC) that supports these engines/instructions
and will set this flag, but this still seems to be the most clear and
understandable name for the flag.

Fixes: 61549a2ee594 ("drm/xe: Drop __engine_mask")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 09b399842907565a64e351fb22da790b4c673ffb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

commit | commitdiff | tree

Arvind Yadav [Wed, 6 May 2026 13:20:27 +0000 (18:50 +0530)]

drm/xe/madvise: Track purgeability with BO-local counters

xe_bo_recompute_purgeable_state() walks all VMAs of a BO to determine
whether the BO can be made purgeable. This makes VMA create/destroy and
madvise updates O(n) in the number of mappings.

Replace the walk with BO-local counters protected by the BO dma-resv
lock:

  - vma_count tracks the number of VMAs mapping the BO.
  - willneed_count tracks active WILLNEED holders, including WILLNEED
    VMAs and active dma-buf exports for non-imported BOs.

A DONTNEED BO is promoted back to WILLNEED on a 0->1 transition of
willneed_count. A BO is demoted to DONTNEED on a 1->0 transition only
when it still has VMAs, preserving the previous behaviour where a BO
with no mappings keeps its current madvise state.

PURGED remains terminal, preserving the existing "once purged, always
purged" rule.

Fixes: 4f44961eab84 ("drm/xe/vm: Prevent binding of purged buffer objects")
v2:
  - Use early return for imported BOs in all four helpers to avoid
    nesting (Matt B).
  - Group purgeability state into a purgeable sub-struct on struct
    xe_bo (Matt B).
  - Reword xe_bo_willneed_put_locked() kernel-doc to explain that a 1->0
    transition means all remaining active VMAs are DONTNEED (Matt B).

v3:
  - Move DONTNEED/PURGED reject from vma_lock_and_validate() into
    xe_vma_create(), gated on attr->purgeable_state == WILLNEED.
    Fixes vm_bind bypass and partial-unbind rejection on DONTNEED
    BOs (Matt B).
  - Drop .check_purged from MAP and REMAP; keep it for PREFETCH and
    add a comment why (Matt B).
  - Skip BO validation in vma_lock_and_validate() for non-WILLNEED
    VMA remnants so cleanup/remap paths do not repopulate
    DONTNEED/PURGED BOs.

Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260506132027.2556046-1-arvind.yadav@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
(cherry picked from commit 23fb2ea56cb4fa2587bc072b04e4e698687a48e4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

commit | commitdiff | tree

Linus Walleij [Mon, 11 May 2026 20:33:30 +0000 (22:33 +0200)]

Merge branch 'ib-mux-pinctrl' into devel

commit | commitdiff | tree

Frank Li [Thu, 7 May 2026 15:21:14 +0000 (11:21 -0400)]

mux: describe np parameter in __devm_mux_state_get()

Add a description for the 'np' parameter of __devm_mux_state_get() to fix
build warning.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605061502.ullLjmtN-lkp@intel.com/
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>

commit | commitdiff | tree

Guopeng Zhang [Sat, 9 May 2026 10:20:31 +0000 (18:20 +0800)]

cgroup/cpuset: Reserve DL bandwidth only for root-domain moves

cpuset_can_attach() currently adds the bandwidth of all migrating
SCHED_DEADLINE tasks to sum_migrate_dl_bw. If the source and destination
cpuset effective CPU masks do not overlap, the whole sum is then
reserved in the destination root domain.

set_cpus_allowed_dl(), however, subtracts bandwidth from the source
root domain only when the affinity change really moves the task between
root domains. A DL task can move between cpusets that are still in the
same root domain, so including that task in sum_migrate_dl_bw can reserve
destination bandwidth without a matching source-side subtraction.

Share the root-domain move test with set_cpus_allowed_dl(). Keep
nr_migrate_dl_tasks counting all migrating deadline tasks for cpuset DL
task accounting, but add to sum_migrate_dl_bw only for tasks that need a
root-domain bandwidth move. Keep using the destination cpuset effective
CPU mask and leave the broader can_attach()/attach() transaction model
unchanged.

Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Cc: stable@vger.kernel.org # v6.10+
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Yang Wang [Fri, 8 May 2026 02:31:22 +0000 (10:31 +0800)]

drm/amd/pm: update dpm clock pm attributes for aldebaran (gc 9.4.2)

v1:
Separate DPM clock attribute constraints for Arcturus (9.4.1) and
Aldebaran (9.4.2) ASICs.

- For Aldebaran:
* mclk/socclk: Disable write, only voltage control supported
* fclk/pcie: Mark as unsupported
- Remove 9.4.2 from global pcie check and handle it in ASIC specific case
- Update comments to reflect correct hardware names

v2:
fix some coding logic issue (by asad)

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Jesse Zhang [Fri, 3 Apr 2026 07:58:31 +0000 (15:58 +0800)]

drm/amdgpu/gfx_v12_0: set gfx.rs64_enable from PFP header on GFX12

gfx_v12_0_init_microcode() always loads RS64 CP ucode but never set
adev->gfx.rs64_enable, so it stayed false and code that branches on it
(e.g. MEC pipe reset) used the legacy CP_MEC_CNTL path incorrectly.

Match GFX11: derive RS64 mode from the PFP firmware header (v2.0) via
amdgpu_ucode_hdr_version(). Log at debug when RS64 is enabled.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Caden Chien [Tue, 21 Apr 2026 10:02:34 +0000 (18:02 +0800)]

drm/amdgpu/vpe: add vpe v2.0.0 support

This patch adds support for vpe v2.0.0 with new structs and ip functions

Acked-by: Roy Chan <Roy.Chan@amd.com>
Signed-off-by: Caden Chien <chih-wei.chien@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Caden Chien [Wed, 22 Apr 2026 10:02:25 +0000 (18:02 +0800)]

drm/amdgpu/vpe: add new vpe v2.0.0 register offset and sh/mask

New offset and sh/mask are added for vpe v2.0.0

Acked-by: Roy Chan <Roy.Chan@amd.com>
Signed-off-by: Caden Chien <chih-wei.chien@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Caden Chien [Tue, 21 Apr 2026 09:28:07 +0000 (17:28 +0800)]

drm/amdgpu/nbio: add doorbell range init for vpe on 7.11.4

A callback function is added to setup doorbell range during vpe hw
queue initialization on nbio 7.11.4.

Signed-off-by: Caden Chien <chih-wei.chien@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Caden Chien [Tue, 21 Apr 2026 09:26:34 +0000 (17:26 +0800)]

drm/amdgpu/nbio: remove doorbell entry5 for vcn on 7.11.4

S2A doorbell entry 5 on nbio 7.11.4 is used by vpe 2.0

Signed-off-by: Caden Chien <chih-wei.chien@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Alex Deucher [Tue, 13 Jan 2026 20:25:10 +0000 (15:25 -0500)]

drm/amdgpu: simplify VCN reset helper

Remove the wrapper function.

Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Alex Deucher [Thu, 1 Jan 2026 22:20:18 +0000 (17:20 -0500)]

drm/amdgpu: plumb timedout fence through to force completion

When we do a full adapter reset, if we know the timedout fence
mark the fence with -ETIME rather than -ECANCELED so it
gets properly handled by userspace.

v2: rebase

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Rodrigo Siqueira [Wed, 7 Jun 2023 21:22:45 +0000 (15:22 -0600)]

drm/amd/display: Add FRL registers for DCN316

Add the required FRL registers for DCN316.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

A mirror of Linus' kernel repository