]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
7 weeks agoet131x: Add missing check after DMA map
Thomas Fourier [Wed, 16 Jul 2025 09:47:30 +0000 (11:47 +0200)] 
et131x: Add missing check after DMA map

The DMA map functions can fail and should be tested for errors.
If the mapping fails, unmap and return an error.

Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Acked-by: Mark Einon <mark.einon@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250716094733.28734-2-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ag71xx: Add missing check after DMA map
Thomas Fourier [Wed, 16 Jul 2025 09:57:25 +0000 (11:57 +0200)] 
net: ag71xx: Add missing check after DMA map

The DMA map functions can fail and should be tested for errors.

Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250716095733.37452-3-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests/drivers/net: Support ipv6 for napi_id test
Tianyi Cui [Thu, 17 Jul 2025 01:19:13 +0000 (18:19 -0700)] 
selftests/drivers/net: Support ipv6 for napi_id test

Add support for IPv6 environment for napi_id test.

Test Plan:

    ./run_kselftest.sh -t drivers/net:napi_id.py
    TAP version 13
    1..1
    # timeout set to 45
    # selftests: drivers/net: napi_id.py
    # TAP version 13
    # 1..1
    # ok 1 napi_id.test_napi_id
    # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
    ok 1 selftests: drivers/net: napi_id.py

Signed-off-by: Tianyi Cui <1997cui@gmail.com>
Link: https://patch.msgid.link/20250717011913.1248816-1-1997cui@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoibmvnic: Use ndo_get_stats64 to fix inaccurate SAR reporting
Mingming Cao [Wed, 16 Jul 2025 15:21:15 +0000 (11:21 -0400)] 
ibmvnic: Use ndo_get_stats64 to fix inaccurate SAR reporting

VNIC testing on multi-core Power systems showed SAR stats drift
and packet rate inconsistencies under load.

Implements ndo_get_stats64 to provide safe aggregation of queue-level
atomic64 counters into rtnl_link_stats64 for use by tools like 'ip -s',
'ifconfig', and 'sar'. Switch to ndo_get_stats64 to align SAR reporting
with the standard kernel interface for retrieving netdev stats.

This removes redundant per-adapter stat updates, reduces overhead,
eliminates cacheline bouncing from hot path updates, and improves
the accuracy of reported packet rates.

Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
----
Changes since v3:
link to v3: https://www.spinics.net/lists/netdev/msg1107999.html
-- keep per queue counters as u64 (this patch) and drop off patch 1 in v3

Link: https://patch.msgid.link/20250716152115.61143-1-mmc@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-mlx5-misc-changes-2025-07-16'
Jakub Kicinski [Fri, 18 Jul 2025 01:53:11 +0000 (18:53 -0700)] 
Merge branch 'net-mlx5-misc-changes-2025-07-16'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-07-16

This series contains misc enhancements to the mlx5 driver.

v1: https://lore.kernel.org/1752471585-18053-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/1752675472-201445-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5e: Properly access RCU protected qdisc_sleeping variable
Leon Romanovsky [Wed, 16 Jul 2025 14:17:49 +0000 (17:17 +0300)] 
net/mlx5e: Properly access RCU protected qdisc_sleeping variable

qdisc_sleeping variable is declared as "struct Qdisc __rcu" and
as such needs proper annotation while accessing it.

Without rtnl_dereference(), the following error is generated by sparse:

drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning:
  incorrect type in initializer (different address spaces)
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40:    expected
  struct Qdisc *qdisc
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40:    got struct
  Qdisc [noderef] __rcu *qdisc_sleeping

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5e: fix kdoc warning on eswitch.h
Moshe Shemesh [Wed, 16 Jul 2025 14:17:48 +0000 (17:17 +0300)] 
net/mlx5e: fix kdoc warning on eswitch.h

Fix the following kdoc warning:
git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/mlx5/core/ |\
xargs scripts/kernel-doc --none
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:824: warning: cannot
understand function prototype: 'struct mlx5_esw_event_info '

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: HWS, Enable IPSec hardware offload in legacy mode
Lama Kayal [Wed, 16 Jul 2025 14:17:47 +0000 (17:17 +0300)] 
net/mlx5: HWS, Enable IPSec hardware offload in legacy mode

IPSec hardware offload in legacy mode should not be affected by the
steering mode, hence it should also work properly with hmfs mode.

Remove steering mode validation when calculating the cap for packet
offload, this will also enable the missing cap MLX5_IPSEC_CAP_PRIO
needed for crypto offload.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'amd-drm-fixes-6.16-2025-07-17' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 18 Jul 2025 01:41:10 +0000 (11:41 +1000)] 
Merge tag 'amd-drm-fixes-6.16-2025-07-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.16-2025-07-17:

amdgpu:
- Fix a DC memory leak
- DCN 4.0.1 degamma LUT fix
- Fix reset counter handling for soft recovery
- GC 8 fix

radeon:
- Drop console locks when suspending/resuming

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250717171935.642380-1-alexander.deucher@amd.com
7 weeks agonet: pcs: xpcs: mask readl() return value to 16 bits
Jack Ping CHNG [Wed, 16 Jul 2025 03:03:49 +0000 (11:03 +0800)] 
net: pcs: xpcs: mask readl() return value to 16 bits

readl() returns 32-bit value but Clause 22/45 registers are 16-bit wide.
Masking with 0xFFFF avoids using garbage upper bits.

Signed-off-by: Jack Ping CHNG <jchng@maxlinear.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250716030349.3796806-1-jchng@maxlinear.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: Fix an IS_ERR() vs NULL bug in esw_qos_move_node()
Dan Carpenter [Tue, 15 Jul 2025 23:01:30 +0000 (18:01 -0500)] 
net/mlx5: Fix an IS_ERR() vs NULL bug in esw_qos_move_node()

The __esw_qos_alloc_node() function returns NULL on error.  It doesn't
return error pointers.  Update the error checking to match.

Fixes: 96619c485fa6 ("net/mlx5: Add support for setting tc-bw on nodes")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/0ce4ec2a-2b5d-4652-9638-e715a99902a7@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethernet: mtk_wed: Fix NULL vs IS_ERR() bug in mtk_wed_get_memory_region()
Dan Carpenter [Tue, 15 Jul 2025 23:01:19 +0000 (18:01 -0500)] 
net: ethernet: mtk_wed: Fix NULL vs IS_ERR() bug in mtk_wed_get_memory_region()

We recently changed this from using devm_ioremap() to using
devm_ioremap_resource() and unfortunately the former returns NULL while
the latter returns error pointers.  The check for errors needs to be
updated as well.

Fixes: e27dba1951ce ("net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/87c10dbd-df86-4971-b4f5-40ba02c076fb@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: airoha: Fix a NULL vs IS_ERR() bug in airoha_npu_run_firmware()
Dan Carpenter [Tue, 15 Jul 2025 23:01:10 +0000 (18:01 -0500)] 
net: airoha: Fix a NULL vs IS_ERR() bug in airoha_npu_run_firmware()

The devm_ioremap_resource() function returns error pointers.  It never
returns NULL.  Update the check to match.

Fixes: e27dba1951ce ("net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/fc6d194e-6bf5-49ca-bc77-3fdfda62c434@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'add-shared-phy-counter-support-for-qca807x-and-qca808x'
Jakub Kicinski [Fri, 18 Jul 2025 01:35:52 +0000 (18:35 -0700)] 
Merge branch 'add-shared-phy-counter-support-for-qca807x-and-qca808x'

Luo Jie says:

====================
Add shared PHY counter support for QCA807x and QCA808x

The implementation of the PHY counter is identical for both QCA808x and
QCA807x series devices. This includes counters for both good and bad CRC
frames in the RX and TX directions, which are active when CRC checking
is enabled.

This patch series introduces PHY counter functions into a shared library,
enabling counter support for the QCA808x and QCA807x families through this
common infrastructure. Additionally, enable CRC checking and configure
automatic clearing of counters after reading within config_init() to ensure
accurate counter recording.

v2: https://lore.kernel.org/20250714-qcom_phy_counter-v2-0-94dde9d9769f@quicinc.com
v1: https://lore.kernel.org/20250709-qcom_phy_counter-v1-0-93a54a029c46@quicinc.com
====================

Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-0-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: qcom: qca807x: Support PHY counter
Luo Jie [Tue, 15 Jul 2025 11:02:28 +0000 (19:02 +0800)] 
net: phy: qcom: qca807x: Support PHY counter

Within the QCA807X PHY operation's config_init() function, enable CRC
checking for received and transmitted frames and configure counter to
clear after being read to support counter recording. Additionally, add
support for PHY counter operations.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-3-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: qcom: qca808x: Support PHY counter
Luo Jie [Tue, 15 Jul 2025 11:02:27 +0000 (19:02 +0800)] 
net: phy: qcom: qca808x: Support PHY counter

Enable CRC checking for received and transmitted frames, and configure
counters to clear after being read within config_init() for accurate
counter recording. Additionally, add PHY counter operations and integrate
shared functions.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-2-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: qcom: Add PHY counter support
Luo Jie [Tue, 15 Jul 2025 11:02:26 +0000 (19:02 +0800)] 
net: phy: qcom: Add PHY counter support

Add PHY counter functionality to the shared library. The implementation
is identical for the current QCA807X and QCA808X PHYs.

The PHY counter can be configured to perform CRC checking for both received
and transmitted packets. Additionally, the packet counter can be set to
automatically clear after it is read.

The PHY counter includes 32-bit packet counters for both RX (received) and
TX (transmitted) packets, as well as 16-bit counters for recording CRC
error packets for both RX and TX.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-1-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetdevsim: remove redundant branch
Dennis Chen [Wed, 16 Jul 2025 16:57:50 +0000 (12:57 -0400)] 
netdevsim: remove redundant branch

bool notify is referenced nowhere else in the function except to check
whether or not to call rtnl_offload_xstats_notify(). Remove it and move
the call to the previous branch.

Signed-off-by: Dennis Chen <dechen@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250716165750.561175-1-dechen@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Fri, 18 Jul 2025 01:07:37 +0000 (18:07 -0700)] 
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-07-17

We've added 13 non-merge commits during the last 20 day(s) which contain
a total of 4 files changed, 712 insertions(+), 84 deletions(-).

The main changes are:

1) Avoid skipping or repeating a sk when using a TCP bpf_iter,
   from Jordan Rife.

2) Clarify the driver requirement on using the XDP metadata,
   from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  doc: xdp: Clarify driver implementation for XDP Rx metadata
  selftests/bpf: Add tests for bucket resume logic in established sockets
  selftests/bpf: Create iter_tcp_destroy test program
  selftests/bpf: Create established sockets in socket iterator tests
  selftests/bpf: Make ehash buckets configurable in socket iterator tests
  selftests/bpf: Allow for iteration over multiple states
  selftests/bpf: Allow for iteration over multiple ports
  selftests/bpf: Add tests for bucket resume logic in listening sockets
  bpf: tcp: Avoid socket skips and repeats during iteration
  bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch items
  bpf: tcp: Get rid of st_bucket_done
  bpf: tcp: Make sure iter->batch always contains a full bucket snapshot
  bpf: tcp: Make mem flags configurable through bpf_iter_tcp_realloc_batch
====================

Link: https://patch.msgid.link/20250717191731.4142326-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoperf sched timehist: decode process names of processes in zombie state
Anubhav Shelat [Wed, 16 Jul 2025 20:39:15 +0000 (16:39 -0400)] 
perf sched timehist: decode process names of processes in zombie state

Previously when running perf trace timehist --state, when recording
processes in the zombie state the process name would not be decoded
properly and appears with just the PID:

1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
1140057.412222 [0012]  :1248612[1248612]                   0.000      0.000      0.332      Z
1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S

Now some extra processing has been added to decode the process name:

1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
1140057.412222 [0012]  sleep[1248612]                      0.000      0.000      0.332      Z
1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S

Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
Link: https://lore.kernel.org/r/20250716203914.45772-2-ashelat@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
7 weeks agoMerge tag 'drm-intel-fixes-2025-07-17' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Thu, 17 Jul 2025 23:59:42 +0000 (09:59 +1000)] 
Merge tag 'drm-intel-fixes-2025-07-17' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- DP AUX DPCD address fix (Imre)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aHkQmRhelb4Fzqau@intel.com
7 weeks agoMerge tag 'drm-misc-fixes-2025-07-16' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 17 Jul 2025 23:42:22 +0000 (09:42 +1000)] 
Merge tag 'drm-misc-fixes-2025-07-16' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.16 final?:
- nouveau ioctl validation fix.
- panfrost scheduler bug.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/ee784a3a-30b4-489a-8503-b1be3b09268c@linux.intel.com
7 weeks agostring: Group str_has_prefix() and strstarts()
Andy Shevchenko [Fri, 11 Jul 2025 08:55:14 +0000 (11:55 +0300)] 
string: Group str_has_prefix() and strstarts()

The two str_has_prefix() and strstarts() are about the same
with a slight difference on what they return. Group them in
the header.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250711085514.1294428-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <kees@kernel.org>
7 weeks agofork: reorder function qualifiers for copy_clone_args_from_user
Dishank Jogi [Wed, 16 Jul 2025 09:35:25 +0000 (15:05 +0530)] 
fork: reorder function qualifiers for copy_clone_args_from_user

Change the order of function qualifiers from 'noinline static' to 'static noinline'
in copy_clone_args_from_user for consistency with kernel coding style.

No functional change intended. The goal is to improve readability and
maintain consistent ordering of qualifiers across the codebase.

Signed-off-by: Dishank Jogi <dishank.jogi@siqol.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/r/20250716093525.449994-1-dishank.jogi@siqol.com
Signed-off-by: Kees Cook <kees@kernel.org>
7 weeks agobinfmt_elf: remove the 4k limitation of program header size
Yin Fengwei [Thu, 17 Jul 2025 11:01:08 +0000 (19:01 +0800)] 
binfmt_elf: remove the 4k limitation of program header size

We have assembly code generated by a script. GCC successfully compiles
it. However, the kernel cannot load it on an ARM64 platform with a 4K
page size. In contrast, the same ELF file loads correctly on the same
platform with a 64K page size.

The root cause is the Linux kernel's ELF_MIN_ALIGN limitation on the
program headers of ELF files. The ELF file contains 78 program headers
(the script inserts many holes when generating the assembly code). On
ARM64 with a 4K page size, the ELF_MIN_ALLIGN enforces a maximum of 74
program headers, causing the ELF file to fail. However, with a 64K page
size, the ELF_MIN_ALIGN is relaxed to over 1,184 program headers, allowing
the file to run correctly.

Cook kindly identified[1] that this limitation was introduced in
Linux-0.99.15f without an explanation for its purpose.

The ELF specification does not impose such a restriction on program
headers. Removing the ELF_MIN_ALIGN limitation on program headers to
align with the ELF spec. After removing ELF_MIN_ALIGN limitation,
64K size limitation still exist which should be sufficient.

Suggested-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/linux-mm/202506270854.A729825@keescook/
Signed-off-by: Yin Fengwei <fengwei_yin@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250717110108.55586-1-fengwei_yin@linux.alibaba.com
Signed-off-by: Kees Cook <kees@kernel.org>
7 weeks agoselftests: net: prevent Python from buffering the output
Jakub Kicinski [Wed, 16 Jul 2025 20:57:12 +0000 (13:57 -0700)] 
selftests: net: prevent Python from buffering the output

Make sure Python doesn't buffer the output, otherwise for some
tests we may see false positive timeouts in NIPA. NIPA thinks that
a machine has hung if the test doesn't print anything for 3min.
This is also nice to heave for running the tests manually,
especially in vng.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250716205712.1787325-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'neighbour-convert-rtm_getneigh-to-rcu-and-make-pneigh-rtnl-free'
Jakub Kicinski [Thu, 17 Jul 2025 23:25:23 +0000 (16:25 -0700)] 
Merge branch 'neighbour-convert-rtm_getneigh-to-rcu-and-make-pneigh-rtnl-free'

Kuniyuki Iwashima says:

====================
neighbour: Convert RTM_GETNEIGH to RCU and make pneigh RTNL-free.

This is kind of v3 of the series below [0] but without NEIGHTBL patches.

Patch 1 ~ 4 and 9 come from the series to convert RTM_GETNEIGH to RCU.

Other patches clean up pneigh_lookup() and convert the pneigh code to
RCU + private mutex so that we can easily remove RTNL from RTM_NEWNEIGH
in the later series.

[0]: https://lore.kernel.org/netdev/20250418012727.57033-1-kuniyu@amazon.com/

v2: https://lore.kernel.org/20250712203515.4099110-1-kuniyu@google.com
v1: https://lore.kernel.org/20250711191007.3591938-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20250716221221.442239-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Update pneigh_entry in pneigh_create().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:20 +0000 (22:08 +0000)] 
neighbour: Update pneigh_entry in pneigh_create().

neigh_add() updates pneigh_entry() found or created by pneigh_create().

This update is serialised by RTNL, but we will remove it.

Let's move the update part to pneigh_create() and make it return errno
instead of a pointer of pneigh_entry.

Now, the pneigh code is RTNL free.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-16-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Protect tbl->phash_buckets[] with a dedicated mutex.
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:19 +0000 (22:08 +0000)] 
neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.

tbl->phash_buckets[] is only modified in the slow path by pneigh_create()
and pneigh_delete() under the table lock.

Both of them are called under RTNL, so no extra lock is needed, but we
will remove RTNL from the paths.

pneigh_create() looks up a pneigh_entry, and this part can be lockless,
but it would complicate the logic like

  1. lookup
  2. allocate pengih_entry for GFP_KERNEL
  3. lookup again but under lock
  4. if found, return it after freeing the allocated memory
  5. else, return the new one

Instead, let's add a per-table mutex and run lookup and allocation
under it.

Note that updating pneigh_entry part in neigh_add() is still protected
by RTNL and will be moved to pneigh_create() in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-15-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Drop read_lock_bh(&tbl->lock) in pneigh_lookup().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:18 +0000 (22:08 +0000)] 
neighbour: Drop read_lock_bh(&tbl->lock) in pneigh_lookup().

Now, all callers of pneigh_lookup() are under RCU, and the read
lock there is no longer needed.

Let's drop the lock, inline __pneigh_lookup_1() to pneigh_lookup(),
and call it from pneigh_create().

The next patch will remove tbl->lock from pneigh_create().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Remove __pneigh_lookup().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:17 +0000 (22:08 +0000)] 
neighbour: Remove __pneigh_lookup().

__pneigh_lookup() is the lockless version of pneigh_lookup(),
but its only caller pndisc_is_router() holds the table lock and
reads pneigh_netry.flags.

This is because accessing pneigh_entry after pneigh_lookup() was
illegal unless the caller holds RTNL or the table lock.

Now, pneigh_entry is guaranteed to be alive during the RCU critical
section.

Let's call pneigh_lookup() and use READ_ONCE() for n->flags in
pndisc_is_router() and remove __pneigh_lookup().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Use rcu_dereference() in pneigh_get_{first,next}().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:16 +0000 (22:08 +0000)] 
neighbour: Use rcu_dereference() in pneigh_get_{first,next}().

Now pneigh_entry is guaranteed to be alive during the
RCU critical section even without holding tbl->lock.

Let's use rcu_dereference() in pneigh_get_{first,next}().

Note that neigh_seq_start() still holds tbl->lock for the
normal neighbour entry.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Drop read_lock_bh(&tbl->lock) in pneigh_dump_table().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:15 +0000 (22:08 +0000)] 
neighbour: Drop read_lock_bh(&tbl->lock) in pneigh_dump_table().

Now pneigh_entry is guaranteed to be alive during the
RCU critical section even without holding tbl->lock.

Let's drop read_lock_bh(&tbl->lock) and use rcu_dereference()
to iterate tbl->phash_buckets[] in pneigh_dump_table()

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Convert RTM_GETNEIGH to RCU.
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:14 +0000 (22:08 +0000)] 
neighbour: Convert RTM_GETNEIGH to RCU.

Only __dev_get_by_index() is the RTNL dependant in neigh_get().

Let's replace it with dev_get_by_index_rcu() and convert RTM_GETNEIGH
to RCU.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Annotate access to struct pneigh_entry.{flags,protocol}.
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:13 +0000 (22:08 +0000)] 
neighbour: Annotate access to struct pneigh_entry.{flags,protocol}.

We will convert pneigh readers to RCU, and its flags and protocol
will be read locklessly.

Let's annotate the access to the two fields.

Note that all access to pn->permanent is under RTNL (neigh_add()
and pneigh_ifdown_and_unlock()), so WRITE_ONCE() and READ_ONCE()
are not needed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Free pneigh_entry after RCU grace period.
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:12 +0000 (22:08 +0000)] 
neighbour: Free pneigh_entry after RCU grace period.

We will convert RTM_GETNEIGH to RCU.

neigh_get() looks up pneigh_entry by pneigh_lookup() and passes
it to pneigh_fill_info().

Then, we must ensure that the entry is alive till pneigh_fill_info()
completes, but read_lock_bh(&tbl->lock) in pneigh_lookup() does not
guarantee that.

Also, we will convert all readers of tbl->phash_buckets[] to RCU.

Let's use call_rcu() to free pneigh_entry and update phash_buckets[]
and ->next by rcu_assign_pointer().

pneigh_ifdown_and_unlock() uses list_head to avoid overwriting
->next and moving RCU iterators to another list.

pndisc_destructor() (only IPv6 ndisc uses this) uses a mutex, so it
is not delayed to call_rcu(), where we cannot sleep.  This is fine
because the mcast code works with RCU and ipv6_dev_mc_dec() frees
mcast objects after RCU grace period.

While at it, we change the return type of pneigh_ifdown_and_unlock()
to void.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Annotate neigh_table.phash_buckets and pneigh_entry.next with __rcu.
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:11 +0000 (22:08 +0000)] 
neighbour: Annotate neigh_table.phash_buckets and pneigh_entry.next with __rcu.

The next patch will free pneigh_entry with call_rcu().

Then, we need to annotate neigh_table.phash_buckets[] and
pneigh_entry.next with __rcu.

To make the next patch cleaner, let's annotate the fields in advance.

Currently, all accesses to the fields are under the neigh table lock,
so rcu_dereference_protected() is used with 1 for now, but most of them
(except in pneigh_delete() and pneigh_ifdown_and_unlock()) will be
replaced with rcu_dereference() and rcu_dereference_check().

Note that pneigh_ifdown_and_unlock() changes pneigh_entry.next to a
local list, which is illegal because the RCU iterator could be moved
to another list.  This part will be fixed in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Split pneigh_lookup().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:10 +0000 (22:08 +0000)] 
neighbour: Split pneigh_lookup().

pneigh_lookup() has ASSERT_RTNL() in the middle of the function, which
is confusing.

When called with the last argument, creat, 0, pneigh_lookup() literally
looks up a proxy neighbour entry.  This is the case of the reader path
as the fast path and RTM_GETNEIGH.

pneigh_lookup(), however, creates a pneigh_entry when called with creat 1
from RTM_NEWNEIGH and SIOCSARP, which require RTNL.

Let's split pneigh_lookup() into two functions.

We will convert all the reader paths to RCU, and read_lock_bh(&tbl->lock)
in the new pneigh_lookup() will be dropped.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Move neigh_find_table() to neigh_get().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:09 +0000 (22:08 +0000)] 
neighbour: Move neigh_find_table() to neigh_get().

neigh_valid_get_req() calls neigh_find_table() to fetch neigh_tables[].

neigh_find_table() uses rcu_dereference_rtnl(), but RTNL actually does
not protect it at all; neigh_table_clear() can be called without RTNL
and only waits for RCU readers by synchronize_rcu().

Fortunately, there is no bug because IPv4 is built-in, IPv6 cannot be
unloaded, and DECNET was removed.

To fetch neigh_tables[] by rcu_dereference() later, let's move
neigh_find_table() from neigh_valid_get_req() to neigh_get().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Allocate skb in neigh_get().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:08 +0000 (22:08 +0000)] 
neighbour: Allocate skb in neigh_get().

We will remove RTNL for neigh_get() and run it under RCU instead.

neigh_get_reply() and pneigh_get_reply() allocate skb with GFP_KERNEL.

Let's move the allocation before __dev_get_by_index() in neigh_get().

Now, neigh_get_reply() and pneigh_get_reply() are inlined and
rtnl_unicast() is factorised.

We will convert pneigh_lookup() to __pneigh_lookup() later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Move two validations from neigh_get() to neigh_valid_get_req().
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:07 +0000 (22:08 +0000)] 
neighbour: Move two validations from neigh_get() to neigh_valid_get_req().

We will remove RTNL for neigh_get() and run it under RCU instead.

neigh_get() returns -EINVAL in the following cases:

  * NDA_DST is not specified
  * Both ndm->ndm_ifindex and NTF_PROXY are not specified

These validations do not require RCU.

Let's move them to neigh_valid_get_req().

While at it, the extack string for the first case is replaced with
NL_SET_ERR_ATTR_MISS().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoneighbour: Make neigh_valid_get_req() return ndmsg.
Kuniyuki Iwashima [Wed, 16 Jul 2025 22:08:06 +0000 (22:08 +0000)] 
neighbour: Make neigh_valid_get_req() return ndmsg.

neigh_get() passes 4 local variable pointers to neigh_valid_get_req().

If it returns a pointer of struct ndmsg, we do not need to pass two
of them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodrm/mediatek: mtk_dpi: Reorder output formats on MT8195/88
Louis-Alexis Eyraud [Fri, 6 Jun 2025 12:50:12 +0000 (14:50 +0200)] 
drm/mediatek: mtk_dpi: Reorder output formats on MT8195/88

Reorder output format arrays in both MT8195 DPI and DP_INTF block
configuration by decreasing preference order instead of alphanumeric
one, as expected by the atomic_get_output_bus_fmts callback function
of drm_bridge controls, so the RGB ones are used first during the
bus format negotiation process.

Fixes: 20fa6a8fc588 ("drm/mediatek: mtk_dpi: Allow additional output formats on MT8195/88")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250606-mtk_dpi-mt8195-fix-wrong-color-v1-1-47988101b798@collabora.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
7 weeks agodrm/mediatek: only announce AFBC if really supported
Icenowy Zheng [Sat, 31 May 2025 12:11:40 +0000 (20:11 +0800)] 
drm/mediatek: only announce AFBC if really supported

Currently even the SoC's OVL does not declare the support of AFBC, AFBC
is still announced to the userspace within the IN_FORMATS blob, which
breaks modern Wayland compositors like KWin Wayland and others.

Gate passing modifiers to drm_universal_plane_init() behind querying the
driver of the hardware block for AFBC support.

Fixes: c410fa9b07c3 ("drm/mediatek: Add AFBC support to Mediatek DRM driver")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: CK Hu <ck.hu@medaitek.com>
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250531121140.387661-1-uwu@icenowy.me/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
7 weeks agodrm/mediatek: Add wait_event_timeout when disabling plane
Jason-JH Lin [Tue, 24 Jun 2025 11:31:41 +0000 (19:31 +0800)] 
drm/mediatek: Add wait_event_timeout when disabling plane

Our hardware registers are set through GCE, not by the CPU.
DRM might assume the hardware is disabled immediately after calling
atomic_disable() of drm_plane, but it is only truly disabled after the
GCE IRQ is triggered.

Additionally, the cursor plane in DRM uses async_commit, so DRM will
not wait for vblank and will free the buffer immediately after calling
atomic_disable().

To prevent the framebuffer from being freed before the layer disable
settings are configured into the hardware, which can cause an IOMMU
fault error, a wait_event_timeout has been added to wait for the
ddp_cmdq_cb() callback,indicating that the GCE IRQ has been triggered.

Fixes: 2f965be7f900 ("drm/mediatek: apply CMDQ control flow")
Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250624113223.443274-1-jason-jh.lin@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
7 weeks agoMerge branch 'ethtool-rss-support-rss_set-via-netlink'
Jakub Kicinski [Thu, 17 Jul 2025 23:14:01 +0000 (16:14 -0700)] 
Merge branch 'ethtool-rss-support-rss_set-via-netlink'

Jakub Kicinski says:

====================
ethtool: rss: support RSS_SET via Netlink

Support configuring RSS settings via Netlink.
Creating and removing contexts remains for the following series.

v2: https://lore.kernel.org/20250714222729.743282-1-kuba@kernel.org
v1: https://lore.kernel.org/20250711015303.3688717-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250716000331.1378807-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: rss_api: test input-xfrm and hash fields
Jakub Kicinski [Wed, 16 Jul 2025 00:03:31 +0000 (17:03 -0700)] 
selftests: drv-net: rss_api: test input-xfrm and hash fields

Test configuring input-xfrm and hash fields with all the limitations.
Tested on mlx5 (CX6):

  # ./ksft-net-drv/drivers/net/hw/rss_api.py
  TAP version 13
  1..10
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_nl_set_key
  ok 7 rss_api.test_rxfh_fields
  ok 8 rss_api.test_rxfh_fields_set
  ok 9 rss_api.test_rxfh_fields_set_xfrm
  ok 10 rss_api.test_rxfh_fields_ntf
  # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://patch.msgid.link/20250716000331.1378807-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoethtool: rss: support setting flow hashing fields
Jakub Kicinski [Wed, 16 Jul 2025 00:03:30 +0000 (17:03 -0700)] 
ethtool: rss: support setting flow hashing fields

Add support for ETHTOOL_SRXFH (setting hashing fields) in RSS_SET.

The tricky part is dealing with symmetric hashing. In netlink user
can change the hashing fields and symmetric hash in one request,
in IOCTL the two used to be set via different uAPI requests.
Since fields and hash function config are still separate driver
callbacks - changes to the two are not atomic. Keep things simple
and validate the settings against both pre- and post- change ones.
Meaning that we will reject the config request if user tries
to correct the flow fields and set input_xfrm in one request,
or disables input_xfrm and makes flow fields non-symmetric.

We can adjust it later if there's a real need. Starting simple feels
right, and potentially partially applying the settings isn't nice,
either.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoethtool: rss: support setting input-xfrm via Netlink
Jakub Kicinski [Wed, 16 Jul 2025 00:03:29 +0000 (17:03 -0700)] 
ethtool: rss: support setting input-xfrm via Netlink

Support configuring symmetric hashing via Netlink.
We have the flow field config prepared as part of SET handling,
so scan it for conflicts instead of querying the driver again.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: define input-xfrm enum in the spec
Jakub Kicinski [Wed, 16 Jul 2025 00:03:28 +0000 (17:03 -0700)] 
netlink: specs: define input-xfrm enum in the spec

Help YNL decode the values for input-xfrm by defining
the possible values in the spec. Don't define "no change"
as it's an IOCTL artifact with no use in Netlink.

With this change on mlx5 input-xfrm gets decoded:

 # ynl --family ethtool --dump rss-get
 [{'header': {'dev-index': 2, 'dev-name': 'eth0'},
   'hfunc': 1,
   'hkey': b'V\xa8\xf9\x9 ...',
   'indir': [0, 1, ... ],
   'input-xfrm': {'sym-or-xor'},                         <<<
   'flow-hash': {'ah4': {'ip-dst', 'ip-src'},
                 'ah6': {'ip-dst', 'ip-src'},
                 'esp4': {'ip-dst', 'ip-src'},
                 'esp6': {'ip-dst', 'ip-src'},
                 'ip4': {'ip-dst', 'ip-src'},
                 'ip6': {'ip-dst', 'ip-src'},
                 'tcp4': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'},
                 'tcp6': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'},
                 'udp4': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'},
                 'udp6': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'}}
 }]

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: rss_api: test setting hashing key via Netlink
Jakub Kicinski [Wed, 16 Jul 2025 00:03:27 +0000 (17:03 -0700)] 
selftests: drv-net: rss_api: test setting hashing key via Netlink

Test setting hashing key via Netlink.

  # ./tools/testing/selftests/drivers/net/hw/rss_api.py
  TAP version 13
  1..7
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_nl_set_key
  ok 7 rss_api.test_rxfh_fields
  # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://patch.msgid.link/20250716000331.1378807-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoethtool: rss: support setting hkey via Netlink
Jakub Kicinski [Wed, 16 Jul 2025 00:03:26 +0000 (17:03 -0700)] 
ethtool: rss: support setting hkey via Netlink

Support setting RSS hashing key via ethtool Netlink.
Use the Netlink policy to make sure user doesn't pass
an empty key, "resetting" the key is not a thing.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoethtool: rss: support setting hfunc via Netlink
Jakub Kicinski [Wed, 16 Jul 2025 00:03:25 +0000 (17:03 -0700)] 
ethtool: rss: support setting hfunc via Netlink

Support setting RSS hash function / algo via ethtool Netlink.
Like IOCTL we don't validate that the function is within the
range known to the kernel. The drivers do a pretty good job
validating the inputs, and the IDs are technically "dynamically
queried" rather than part of uAPI.

Only change should be that in Netlink we don't support user
explicitly passing ETH_RSS_HASH_NO_CHANGE (0), if no change
is requested the attribute should be absent.

The ETH_RSS_HASH_NO_CHANGE is retained in driver-facing
API for consistency (not that I see a strong reason for it).

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: rss_api: test setting indirection table via Netlink
Jakub Kicinski [Wed, 16 Jul 2025 00:03:24 +0000 (17:03 -0700)] 
selftests: drv-net: rss_api: test setting indirection table via Netlink

Test setting indirection table via Netlink.

  # ./tools/testing/selftests/drivers/net/hw/rss_api.py
  TAP version 13
  1..6
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_fields
  # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250716000331.1378807-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotools: ynl: support packing binary arrays of scalars
Jakub Kicinski [Wed, 16 Jul 2025 00:03:23 +0000 (17:03 -0700)] 
tools: ynl: support packing binary arrays of scalars

We support decoding a binary type with a scalar subtype already,
add support for sending such arrays to the kernel. While at it
also support using "None" to indicate that the binary attribute
should be empty. I couldn't decide whether empty binary should
be [] or None, but there should be no harm in supporting both.

Link: https://patch.msgid.link/20250716000331.1378807-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: rss_api: factor out checking min queue count
Jakub Kicinski [Wed, 16 Jul 2025 00:03:22 +0000 (17:03 -0700)] 
selftests: drv-net: rss_api: factor out checking min queue count

Multiple tests check min queue count, create a helper.

Link: https://patch.msgid.link/20250716000331.1378807-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoethtool: rss: initial RSS_SET (indirection table handling)
Jakub Kicinski [Wed, 16 Jul 2025 00:03:21 +0000 (17:03 -0700)] 
ethtool: rss: initial RSS_SET (indirection table handling)

Add initial support for RSS_SET, for now only operations on
the indirection table are supported.

Unlike the ioctl don't check if at least one parameter is
being changed. This is how other ethtool-nl ops behave,
so pick the ethtool-nl consistency vs copying ioctl behavior.

There are two special cases here:
 1) resetting the table to defaults;
 2) support for tables of different size.

For (1) I use an empty Netlink attribute (array of size 0).

(2) may require some background. AFAICT a lot of modern devices
allow allocating RSS tables of different sizes. mlx5 can upsize
its tables, bnxt has some "table size calculation", and Intel
folks asked about RSS table sizing in context of resource allocation
in the past. The ethtool IOCTL API has a concept of table size,
but right now the user is expected to provide a table exactly
the size the device requests. Some drivers may change the table
size at runtime (in response to queue count changes) but the
user is not in control of this. What's not great is that all
RSS contexts share the same table size. For example a device
with 128 queues enabled, 16 RSS contexts 8 queues in each will
likely have 256 entry tables for each of the 16 contexts,
while 32 would be more than enough given each context only has
8 queues. To address this the Netlink API should avoid enforcing
table size at the uAPI level, and should allow the user to express
the min table size they expect.

To fully solve (2) we will need more driver plumbing but
at the uAPI level this patch allows the user to specify
a table size smaller than what the device advertises. The device
table size must be a multiple of the user requested table size.
We then replicate the user-provided table to fill the full device
size table. This addresses the "allow the user to express the min
table size" objective, while not enforcing any fixed size.
From Netlink perspective .get_rxfh_indir_size() is now de facto
the "max" table size supported by the device.

We may choose to support table replication in ethtool, too,
when we actually plumb this thru the device APIs.

Initially I was considering moving full pattern generation
to the kernel (which queues to use, at which frequency and
what min sequence length). I don't think this complexity
would buy us much and most if not all devices have pow-2
table sizes, which simplifies the replication a lot.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodocs: document linked lists
Nicolas Frattaroli [Mon, 14 Jul 2025 08:14:47 +0000 (10:14 +0200)] 
docs: document linked lists

Add example-driven documentation for the kernel's generic linked list
data structure. This includes discussion of situations where linked
lists are likely inappropriate, and references to further reading.

Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Link: https://lore.kernel.org/r/20250714-linked-list-docs-v3-1-56c461580866@collabora.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
7 weeks agoscripts: kdoc: make it backward-compatible with Python 3.7
Mauro Carvalho Chehab [Fri, 11 Jul 2025 07:27:09 +0000 (09:27 +0200)] 
scripts: kdoc: make it backward-compatible with Python 3.7

There was a change at kdoc that ended breaking compatibility
with Python 3.7: str.removesuffix() was introduced on version
3.9.

Restore backward compatibility.

Reported-by: Akira Yokosawa <akiyks@gmail.com>
Closes: https://lore.kernel.org/linux-doc/57be9f77-9a94-4cde-aacb-184cae111506@gmail.com/
Fixes: 27ad33b6b349 ("kernel-doc: Fix symbol matching for dropped suffixes")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/d13058d285838ac2bc04c492e60531c013a8a919.1752218291.git.mchehab+huawei@kernel.org
7 weeks agodocs: kernel-doc: emit warnings for ancient versions of Python
Mauro Carvalho Chehab [Fri, 11 Jul 2025 07:27:08 +0000 (09:27 +0200)] 
docs: kernel-doc: emit warnings for ancient versions of Python

Kernel-doc requires at least version 3.6 to run, as it uses f-string.
Yet, Kernel build currently calls kernel-doc with -none on some places.
Better not to bail out when older versions are found.

Versions of Python prior to 3.7 do not guarantee to remember the insertion
order of dicts; since kernel-doc depends on that guarantee, running with
such older versions could result in output with reordered sections.

Check Python version when called via command line.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/7d7fa3a3aa1fafa0cc9ea29c889de4c7d377dca6.1752218291.git.mchehab+huawei@kernel.org
7 weeks agoDocumentation/rtla: Describe exit status
Costa Shulyupin [Sun, 8 Jun 2025 10:55:30 +0000 (13:55 +0300)] 
Documentation/rtla: Describe exit status

Commit 18682166f61494072d58 ("rtla: Set distinctive exit value for failed
tests") expands exit status making it useful.

Add section 'EXIT STATUS' and required SPDX-License-Identifier
to the documentation.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
[jc: fixed sphinx error caused by missing blank line]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250608105531.758809-2-costa.shul@redhat.com
7 weeks agoDocumentation/rtla: Add include common_appendix.rst
Costa Shulyupin [Sun, 8 Jun 2025 10:44:36 +0000 (13:44 +0300)] 
Documentation/rtla: Add include common_appendix.rst

Add include common_appendix.rst into
Documentation/tools/rtla/rtla-timerlat-hist.rst - the only file of
rtla-*.rst still without common_appendix.rst.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250608104437.753708-2-costa.shul@redhat.com
7 weeks agoALSA: hda: Use pci_is_display()
Mario Limonciello [Thu, 17 Jul 2025 17:38:08 +0000 (12:38 -0500)] 
ALSA: hda: Use pci_is_display()

The inline pci_is_display() helper does the same thing.  Use it.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patch.msgid.link/20250717173812.3633478-6-superm1@kernel.org
7 weeks agoiommu/vt-d: Use pci_is_display()
Mario Limonciello [Thu, 17 Jul 2025 17:38:07 +0000 (12:38 -0500)] 
iommu/vt-d: Use pci_is_display()

The inline pci_is_display() helper does the same thing.  Use it.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patch.msgid.link/20250717173812.3633478-5-superm1@kernel.org
7 weeks agovga_switcheroo: Use pci_is_display()
Mario Limonciello [Thu, 17 Jul 2025 17:38:06 +0000 (12:38 -0500)] 
vga_switcheroo: Use pci_is_display()

The inline pci_is_display() helper does the same thing.  Use it.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patch.msgid.link/20250717173812.3633478-4-superm1@kernel.org
7 weeks agovfio/pci: Use pci_is_display()
Mario Limonciello [Thu, 17 Jul 2025 17:38:05 +0000 (12:38 -0500)] 
vfio/pci: Use pci_is_display()

The inline pci_is_display() helper does the same thing.  Use it.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://patch.msgid.link/20250717173812.3633478-3-superm1@kernel.org
7 weeks agoPCI: Add pci_is_display() to check if device is a display controller
Mario Limonciello [Thu, 17 Jul 2025 17:38:04 +0000 (12:38 -0500)] 
PCI: Add pci_is_display() to check if device is a display controller

Several places in the kernel do class shifting to match whether a PCI
device is display class.  Add pci_is_display() for those places to use.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patch.msgid.link/20250717173812.3633478-2-superm1@kernel.org
7 weeks agodocs: kernel: Clarify printk_ratelimit_burst reset behavior
Breno Leitao [Mon, 14 Jul 2025 12:06:27 +0000 (05:06 -0700)] 
docs: kernel: Clarify printk_ratelimit_burst reset behavior

Add clarification that the printk_ratelimit_burst window resets after
printk_ratelimit seconds have elapsed, allowing another burst of
messages to be sent. This helps users understand that the rate limiting
is not permanent but operates in periodic windows.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250714-docs_ratelimit-v1-1-51a6d9071f1a@debian.org
7 weeks agosmp: Document preemption and stop_machine() mutual exclusion
Joel Fernandes [Tue, 15 Jul 2025 20:01:51 +0000 (16:01 -0400)] 
smp: Document preemption and stop_machine() mutual exclusion

Recently while revising RCU's cpu online checks, there was some discussion
around how IPIs synchronize with hotplug.

Add comments explaining how preemption disable creates mutual exclusion with
CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
atomically updates CPU masks and flushes IPIs with interrupts disabled, and
cannot proceed while any CPU (including the IPI sender) has preemption
disabled.

[ Apply peterz feedback. ]

Cc: Andrea Righi <arighi@nvidia.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: rcu@vger.kernel.org
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
7 weeks agostop_machine: Improve kernel-doc function-header comments
Paul E. McKenney [Tue, 8 Jul 2025 22:47:29 +0000 (15:47 -0700)] 
stop_machine: Improve kernel-doc function-header comments

Add more detail to the kernel-doc function-header comments for
stop_machine(), stop_machine_cpuslocked(), and stop_core_cpuslocked().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
7 weeks agonet/mlx5e: TX, Fix dma unmapping for devmem tx
Dragos Tatulea [Wed, 16 Jul 2025 07:00:42 +0000 (10:00 +0300)] 
net/mlx5e: TX, Fix dma unmapping for devmem tx

net_iovs should have the dma address set to 0 so that
netmem_dma_unmap_page_attrs() correctly skips the unmap. This was
not done in mlx5 when support for devmem tx was added and resulted
in the splat below when the platform iommu was enabled.

This patch addresses the issue by using netmem_dma_unmap_addr_set()
which handles the net_iov case when setting the dma address. A small
refactoring of mlx5e_dma_push() was required to be able to use this API.
The function was split in two versions and each version called
accordingly. Note that netmem_dma_unmap_addr_set() introduces an
additional if case.

Splat:
  WARNING: CPU: 14 PID: 2587 at drivers/iommu/dma-iommu.c:1228 iommu_dma_unmap_page+0x7d/0x90
  Modules linked in: [...]
  Unloaded tainted modules: i10nm_edac(E):1 fjes(E):1
  CPU: 14 UID: 0 PID: 2587 Comm: ncdevmem Tainted: G S          E       6.15.0+ #3 PREEMPT(voluntary)
  Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
  Hardware name: HPE ProLiant DL380 Gen10 Plus/ProLiant DL380 Gen10 Plus, BIOS U46 06/01/2022
  RIP: 0010:iommu_dma_unmap_page+0x7d/0x90
  Code: [...]
  RSP: 0000:ff6b1e3ea0b2fc58 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ff46ef2d0a2340c8 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
  RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff8827a120
  R10: 0000000000000000 R11: 0000000000000000 R12: 00000000d8000000
  R13: 0000000000000008 R14: 0000000000000001 R15: 0000000000000000
  FS:  00007feb69adf740(0000) GS:ff46ef2c779f1000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007feb69cca000 CR3: 0000000154b97006 CR4: 0000000000773ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   dma_unmap_page_attrs+0x227/0x250
   mlx5e_poll_tx_cq+0x163/0x510 [mlx5_core]
   mlx5e_napi_poll+0x94/0x720 [mlx5_core]
   __napi_poll+0x28/0x1f0
   net_rx_action+0x33a/0x420
   ? mlx5e_completion_event+0x3d/0x40 [mlx5_core]
   handle_softirqs+0xe8/0x2f0
   __irq_exit_rcu+0xcd/0xf0
   common_interrupt+0x47/0xa0
   asm_common_interrupt+0x26/0x40
  RIP: 0033:0x7feb69cd08ec
  Code: [...]
  RSP: 002b:00007ffc01b8c880 EFLAGS: 00000246
  RAX: 00000000c3a60cf7 RBX: 0000000000045e12 RCX: 000000000000000e
  RDX: 00000000000035b4 RSI: 0000000000000000 RDI: 00007ffc01b8c8c0
  RBP: 00007ffc01b8c8b0 R08: 0000000000000000 R09: 0000000000000064
  R10: 00007ffc01b8c8c0 R11: 0000000000000000 R12: 00007feb69cca000
  R13: 00007ffc01b90e48 R14: 0000000000427e18 R15: 00007feb69d07000
   </TASK>

Cc: Mina Almasry <almasrymina@google.com>
Reported-by: Stanislav Fomichev <stfomichev@gmail.com>
Closes: https://lore.kernel.org/all/aFM6r9kFHeTdj-25@mini-arch/
Fixes: 5a842c288cfa ("net/mlx5e: Add TX support for netmems")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/1752649242-147678-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agobtf: Fix virt_to_phys() on arm64 when mmapping BTF
Lorenz Bauer [Thu, 17 Jul 2025 16:49:49 +0000 (17:49 +0100)] 
btf: Fix virt_to_phys() on arm64 when mmapping BTF

Breno Leitao reports that arm64 emits the following warning
with CONFIG_DEBUG_VIRTUAL:

    [   58.896157] virt_to_phys used for non-linear address: 000000009fea9737
      (__start_BTF+0x0/0x685530)
    [   23.988669] WARNING: CPU: 25 PID: 1442 at arch/arm64/mm/physaddr.c:15
      __virt_to_phys (arch/arm64/mm/physaddr.c:?)

        ...

    [   24.075371] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
    [   24.080276] Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22
      07/03/2024
    [   24.088295] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
    [   24.098440] pc : __virt_to_phys (arch/arm64/mm/physaddr.c:?)
    [   24.105398] lr : __virt_to_phys (arch/arm64/mm/physaddr.c:?)

...

    [   24.197257] Call trace:
    [   24.199761] __virt_to_phys (arch/arm64/mm/physaddr.c:?) (P)
    [   24.206883] btf_sysfs_vmlinux_mmap (kernel/bpf/sysfs_btf.c:27)
    [   24.214264] sysfs_kf_bin_mmap (fs/sysfs/file.c:179)
    [   24.218536] kernfs_fop_mmap (fs/kernfs/file.c:462)
    [   24.222461] mmap_region (./include/linux/fs.h:? mm/internal.h:167
       mm/vma.c:2405 mm/vma.c:2467 mm/vma.c:2622 mm/vma.c:2692)

It seems that the memory layout on arm64 maps the kernel image in vmalloc space
which is different than x86. This makes virt_to_phys emit the warning.

Fix this by translating the address using __pa_symbol as suggested by
Breno instead.

Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/bpf/g2gqhkunbu43awrofzqb4cs4sxkxg2i4eud6p4qziwrdh67q4g@mtw3d3aqfgmb/
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Tested-by: Breno Leitao <leitao@debian>
Fixes: a539e2a6d51d ("btf: Allow mmap of vmlinux btf")
Link: https://lore.kernel.org/r/20250717-vmlinux-mmap-pa-symbol-v1-1-970be6681158@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoPM: hibernate: Fix up white space that does not follow coding style
Darshan Rathod [Wed, 16 Jul 2025 12:42:16 +0000 (12:42 +0000)] 
PM: hibernate: Fix up white space that does not follow coding style

Fix up white space usage that does not follow the kernel coding style
rules in several places in snapshot.c.

Signed-off-by: Darshan Rathod <darshanrathod475@gmail.com>
Link: https://patch.msgid.link/20250716124216.64329-1-darshanrathod475@gmail.com
[ rjw: New subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
7 weeks agosched_ext: idle: Handle migration-disabled tasks in idle selection
Andrea Righi [Sat, 5 Jul 2025 05:43:51 +0000 (07:43 +0200)] 
sched_ext: idle: Handle migration-disabled tasks in idle selection

When SCX_OPS_ENQ_MIGRATION_DISABLED is enabled, migration-disabled tasks
are also routed to ops.enqueue(). A scheduler may attempt to dispatch
such tasks directly to an idle CPU using the default idle selection
policy via scx_bpf_select_cpu_and() or scx_bpf_select_cpu_dfl().

This scenario must be properly handled by the built-in idle policy to
avoid returning an idle CPU where the target task isn't allowed to run.
Otherwise, it can lead to errors such as:

 EXIT: runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled Chrome_ChildIOT[291646] from CPU 3 to 14)

Prevent this by explicitly handling migration-disabled tasks in the
built-in idle selection logic, maintaining their CPU affinity.

Fixes: a730e3f7a48bc ("sched_ext: idle: Consolidate default idle CPU selection kfuncs")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
7 weeks agosched_ext: Fix scx_bpf_reenqueue_local() reference
Christian Loehle [Tue, 8 Jul 2025 16:12:51 +0000 (17:12 +0100)] 
sched_ext: Fix scx_bpf_reenqueue_local() reference

The comment mentions bpf_scx_reenqueue_local(), but the function
is provided for the BPF program implementing scx, as such the
naming convention is scx_bpf_reenqueue_local(), fix the comment.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
7 weeks agoworkqueue: Use atomic_try_cmpxchg_relaxed() in tryinc_node_nr_active()
Uros Bizjak [Wed, 9 Jul 2025 13:19:03 +0000 (15:19 +0200)] 
workqueue: Use atomic_try_cmpxchg_relaxed() in tryinc_node_nr_active()

Use try_cmpxchg() family of locking primitives instead of
cmpxchg(*ptr, old, new) == old.

The x86 CMPXCHG instruction returns success in the ZF flag, so this
change saves a compare after CMPXCHG (and related move instruction
in front of CMPXCHG).

Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when
CMPXCHG fails. There is no need to re-read the value in the loop.

The generated assembly improves from:

     3f7: 44 8b 0a              mov    (%rdx),%r9d
     3fa: eb 12                 jmp    40e <...>
     3fc: 8d 79 01              lea    0x1(%rcx),%edi
     3ff: 89 c8                 mov    %ecx,%eax
     401: f0 0f b1 7a 04        lock cmpxchg %edi,0x4(%rdx)
     406: 39 c1                 cmp    %eax,%ecx
     408: 0f 84 83 00 00 00     je     491 <...>
     40e: 8b 4a 04              mov    0x4(%rdx),%ecx
     411: 41 39 c9              cmp    %ecx,%r9d
     414: 7f e6                 jg     3fc <...>

to:

    256b: 45 8b 08              mov    (%r8),%r9d
    256e: 41 8b 40 04           mov    0x4(%r8),%eax
    2572: 41 39 c1              cmp    %eax,%r9d
    2575: 7e 10                 jle    2587 <...>
    2577: 8d 78 01              lea    0x1(%rax),%edi
    257a: f0 41 0f b1 78 04     lock cmpxchg %edi,0x4(%r8)
    2580: 75 f0                 jne    2572 <...>

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
7 weeks agoselftests/cgroup: fix cpu.max tests
Shashank Balaji [Fri, 4 Jul 2025 11:08:41 +0000 (20:08 +0900)] 
selftests/cgroup: fix cpu.max tests

Current cpu.max tests (both the normal one and the nested one) are broken.

They setup cpu.max with 1000 us quota and the default period (100,000 us).
A cpu hog is run for a duration of 1s as per wall clock time. This corresponds
to 10 periods, hence an expected usage of 10,000 us. We want the measured
usage (as per cpu.stat) to be close to 10,000 us.

Previously, this approximate equality test was done by
`!values_close(usage_usec, expected_usage_usec, 95)`: if the absolute
difference between usage_usec and expected_usage_usec is greater than 95% of
their sum, then we pass. And expected_usage_usec was set to 1,000,000 us.
Mathematically, this translates to the following being true for pass:

|usage - expected_usage| > (usage + expected_usage)*0.95

If usage > expected_usage:
usage - expected_usage > (usage + expected_usage)*0.95
0.05*usage > 1.95*expected_usage
usage > 39*expected_usage = 39s

If usage < expected_usage:
expected_usage - usage > (usage + expected_usage)*0.95
0.05*expected_usage > 1.95*usage
usage < 0.0256*expected_usage = 25,600 us

Combined,

Pass if usage < 25,600 us or > 39 s,

which makes no sense given that all we need is for usage_usec to be close to
10,000 us.

Fix this by explicitly calcuating the expected usage duration based on the
configured quota, default period, and the duration, and compare usage_usec
and expected_usage_usec using values_close() with a 10% error margin.

Also, use snprintf to get the quota string to write to cpu.max instead of
hardcoding the quota, ensuring a single source of truth.

Remove the check comparing user_usec and expected_usage_usec, since on running
this test modified with printfs, it's seen that user_usec and usage_usec can
regularly exceed the theoretical expected_usage_usec:

$ sudo ./test_cpu
user: 10485, usage: 10485, expected: 10000
ok 1 test_cpucg_max
user: 11127, usage: 11127, expected: 10000
ok 2 test_cpucg_max_nested
$ sudo ./test_cpu
user: 10286, usage: 10286, expected: 10000
ok 1 test_cpucg_max
user: 10404, usage: 11271, expected: 10000
ok 2 test_cpucg_max_nested

Hence, a values_close() check of usage_usec and expected_usage_usec is
sufficient.

Fixes: a79906570f9646ae17 ("cgroup: Add test_cpucg_max_nested() testcase")
Fixes: 889ab8113ef1386c57 ("cgroup: Add test_cpucg_max() testcase")
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
7 weeks agoMerge back earlier cpufreq material for 6.17-rc1
Rafael J. Wysocki [Thu, 17 Jul 2025 18:02:51 +0000 (20:02 +0200)] 
Merge back earlier cpufreq material for 6.17-rc1

7 weeks agoMerge tag 'amd-pstate-v6.17-2025-07-16' of ssh://gitolite.kernel.org/pub/scm/linux...
Rafael J. Wysocki [Thu, 17 Jul 2025 18:02:14 +0000 (20:02 +0200)] 
Merge tag 'amd-pstate-v6.17-2025-07-16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux

Merge amd-pstate 6.17 content from Mario Limonciello:

"Documentation update"

* tag 'amd-pstate-v6.17-2025-07-16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
  Documentation: amd-pstate:fix minimum performance state label error

7 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 17 Jul 2025 17:56:56 +0000 (10:56 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.16-rc7).

Conflicts:

Documentation/netlink/specs/ovpn.yaml
  880d43ca9aa4 ("netlink: specs: clean up spaces in brackets")
  af52020fc599 ("ovpn: reject unexpected netlink attributes")

drivers/net/phy/phy_device.c
  a44312d58e78 ("net: phy: Don't register LEDs for genphy")
  f0f2b992d818 ("net: phy: Don't register LEDs for genphy")
https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org

drivers/net/wireless/intel/iwlwifi/fw/regulatory.c
drivers/net/wireless/intel/iwlwifi/mld/regulatory.c
  5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap")
  ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware")

net/ipv6/mcast.c
  ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()")
  a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()")
https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoPM: sleep: Rearrange suspend/resume error handling in the core
Rafael J. Wysocki [Wed, 16 Jul 2025 19:31:23 +0000 (21:31 +0200)] 
PM: sleep: Rearrange suspend/resume error handling in the core

Notice that device_suspend_noirq(), device_suspend_late() and
device_suspend() all set async_error on errors, so they don't really
need to return a value.  Accordingly, make them all void and use
async_error in their callers instead of their return values.

Moreover, since async_error is updated concurrently without locking
during asynchronous suspend and resume processing, use READ_ONCE() and
WRITE_ONCE() for accessing it in those places to ensure that all of the
accesses will be carried out as expected.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Saravana Kannan <saravanak@google.com>
Link: https://patch.msgid.link/6198088.lOV4Wx5bFT@rjwysocki.net
7 weeks agoRevert "cgroup_freezer: cgroup_freezing: Check if not frozen"
Chen Ridong [Thu, 17 Jul 2025 08:55:50 +0000 (08:55 +0000)] 
Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"

This reverts commit cff5f49d433fcd0063c8be7dd08fa5bf190c6c37.

Commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not
frozen") modified the cgroup_freezing() logic to verify that the FROZEN
flag is not set, affecting the return value of the freezing() function,
in order to address a warning in __thaw_task.

A race condition exists that may allow tasks to escape being frozen. The
following scenario demonstrates this issue:

CPU 0 (get_signal path) CPU 1 (freezer.state reader)
try_to_freeze read freezer.state
__refrigerator freezer_read
update_if_frozen
WRITE_ONCE(current->__state, TASK_FROZEN);
...
/* Task is now marked frozen */
/* frozen(task) == true */
/* Assuming other tasks are frozen */
freezer->state |= CGROUP_FROZEN;
/* freezing(current) returns false */
/* because cgroup is frozen (not freezing) */
break out
__set_current_state(TASK_RUNNING);
/* Bug: Task resumes running when it should remain frozen */

The existing !frozen(p) check in __thaw_task makes the
WARN_ON_ONCE(freezing(p)) warning redundant. Removing this warning enables
reverting the commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check
if not frozen") to resolve the issue.

The warning has been removed in the previous patch. This patch revert the
commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not
frozen") to complete the fix.

Fixes: cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not frozen")
Reported-by: Zhong Jiawei<zhongjiawei1@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
7 weeks agosched,freezer: Remove unnecessary warning in __thaw_task
Chen Ridong [Thu, 17 Jul 2025 08:55:49 +0000 (08:55 +0000)] 
sched,freezer: Remove unnecessary warning in __thaw_task

Commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not
frozen") modified the cgroup_freezing() logic to verify that the FROZEN
flag is not set, affecting the return value of the freezing() function,
in order to address a warning in __thaw_task.

A race condition exists that may allow tasks to escape being frozen. The
following scenario demonstrates this issue:

CPU 0 (get_signal path) CPU 1 (freezer.state reader)
try_to_freeze read freezer.state
__refrigerator freezer_read
update_if_frozen
WRITE_ONCE(current->__state, TASK_FROZEN);
...
/* Task is now marked frozen */
/* frozen(task) == true */
/* Assuming other tasks are frozen */
freezer->state |= CGROUP_FROZEN;
/* freezing(current) returns false */
/* because cgroup is frozen (not freezing) */
break out
__set_current_state(TASK_RUNNING);
/* Bug: Task resumes running when it should remain frozen */

The existing !frozen(p) check in __thaw_task makes the
WARN_ON_ONCE(freezing(p)) warning redundant. Removing this warning enables
reverting commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if
not frozen") to resolve the issue.

This patch removes the warning from __thaw_task. A subsequent patch will
revert commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if
not frozen") to complete the fix.

Reported-by: Zhong Jiawei<zhongjiawei1@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
7 weeks agoMerge back earlier material related to system sleep
Rafael J. Wysocki [Thu, 17 Jul 2025 17:55:25 +0000 (19:55 +0200)] 
Merge back earlier material related to system sleep

7 weeks agocgroup: llist: avoid memory tears for llist_node
Shakeel Butt [Fri, 4 Jul 2025 18:08:04 +0000 (11:08 -0700)] 
cgroup: llist: avoid memory tears for llist_node

Before the commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi
safe"), the struct llist_node is expected to be private to the one
inserting the node to the lockless list or the one removing the node
from the lockless list. After the mentioned commit, the llist_node in
the rstat code is per-cpu shared between the stacked contexts i.e.
process, softirq, hardirq & nmi. It is possible the compiler may tear
the loads or stores of llist_node. Let's avoid that.

KCSAN reported the following race:

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 60 UID: 0 PID: 5425 ... 6.16.0-rc3-next-20250626 #1 NONE
 Tainted: [E]=UNSIGNED_MODULE
 Hardware name: ...
 ==================================================================
 ==================================================================
 BUG: KCSAN: data-race in css_rstat_flush / css_rstat_updated
 write to 0xffffe8fffe1c85f0 of 8 bytes by task 1061 on cpu 1:
  css_rstat_flush+0x1b8/0xeb0
  __mem_cgroup_flush_stats+0x184/0x190
  flush_memcg_stats_dwork+0x22/0x50
  process_one_work+0x335/0x630
  worker_thread+0x5f1/0x8a0
  kthread+0x197/0x340
  ret_from_fork+0xd3/0x110
  ret_from_fork_asm+0x11/0x20
 read to 0xffffe8fffe1c85f0 of 8 bytes by task 3551 on cpu 15:
  css_rstat_updated+0x81/0x180
  mod_memcg_lruvec_state+0x113/0x2d0
  __mod_lruvec_state+0x3d/0x50
  lru_add+0x21e/0x3f0
  folio_batch_move_lru+0x80/0x1b0
  __folio_batch_add_and_move+0xd7/0x160
  folio_add_lru_vma+0x42/0x50
  do_anonymous_page+0x892/0xe90
  __handle_mm_fault+0xfaa/0x1520
  handle_mm_fault+0xdc/0x350
  do_user_addr_fault+0x1dc/0x650
  exc_page_fault+0x5c/0x110
  asm_exc_page_fault+0x22/0x30
 value changed: 0xffffe8fffe18e0d0 -> 0xffffe8fffe1c85f0

$ ./scripts/faddr2line vmlinux css_rstat_flush+0x1b8/0xeb0
css_rstat_flush+0x1b8/0xeb0:
init_llist_node at include/linux/llist.h:86
(inlined by) llist_del_first_init at include/linux/llist.h:308
(inlined by) css_process_update_tree at kernel/cgroup/rstat.c:148
(inlined by) css_rstat_updated_list at kernel/cgroup/rstat.c:258
(inlined by) css_rstat_flush at kernel/cgroup/rstat.c:389

$ ./scripts/faddr2line vmlinux css_rstat_updated+0x81/0x180
css_rstat_updated+0x81/0x180:
css_rstat_updated at kernel/cgroup/rstat.c:90 (discriminator 1)

These are expected race and a simple READ_ONCE/WRITE_ONCE resolves these
reports. However let's add comments to explain the race and the need for
memory barriers if stronger guarantees are needed.

More specifically the rstat updater and the flusher can race and cause a
scenario where the stats updater skips adding the css to the lockless
list but the flusher might not see those updates done by the skipped
updater. This is benign race and the subsequent flusher will flush those
stats and at the moment there aren't any rstat users which are not fine
with this kind of race. However some future user might want more
stricter guarantee, so let's add appropriate comments to ease the job of
future users.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe")
Signed-off-by: Tejun Heo <tj@kernel.org>
7 weeks agoMerge tag 'net-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 17 Jul 2025 17:04:04 +0000 (10:04 -0700)] 
Merge tag 'net-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from Bluetooth, CAN, WiFi and Netfilter.

  More code here than I would have liked. That said, better now than
  next week. Nothing particularly scary stands out. The improvement to
  the OpenVPN input validation is a bit large but better get them in
  before the code makes it to a final release. Some of the changes we
  got from sub-trees could have been split better between the fix and
  -next refactoring, IMHO, that has been communicated.

  We have one known regression in a TI AM65 board not getting link. The
  investigation is going a bit slow, a number of people are on vacation.
  We'll try to wrap it up, but don't think it should hold up the
  release.

  Current release - fix to a fix:

   - Bluetooth: L2CAP: fix attempting to adjust outgoing MTU, it broke
     some headphones and speakers

  Current release - regressions:

   - wifi: ath12k: fix packets received in WBM error ring with REO LUT
     enabled, fix Rx performance regression

   - wifi: iwlwifi:
       - fix crash due to a botched indexing conversion
       - mask reserved bits in chan_state_active_bitmap, avoid FW assert()

  Current release - new code bugs:

   - nf_conntrack: fix crash due to removal of uninitialised entry

   - eth: airoha: fix potential UaF in airoha_npu_get()

  Previous releases - regressions:

   - net: fix segmentation after TCP/UDP fraglist GRO

   - af_packet: fix the SO_SNDTIMEO constraint not taking effect and a
     potential soft lockup waiting for a completion

   - rpl: fix UaF in rpl_do_srh_inline() for sneaky skb geometry

   - virtio-net: fix recursive rtnl_lock() during probe()

   - eth: stmmac: populate entire system_counterval_t in get_time_fn()

   - eth: libwx: fix a number of crashes in the driver Rx path

   - hv_netvsc: prevent IPv6 addrconf after IFF_SLAVE lost that meaning

  Previous releases - always broken:

   - mptcp: fix races in handling connection fallback to pure TCP

   - rxrpc: assorted error handling and race fixes

   - sched: another batch of "security" fixes for qdiscs (QFQ, HTB)

   - tls: always refresh the queue when reading sock, avoid UaF

   - phy: don't register LEDs for genphy, avoid deadlock

   - Bluetooth: btintel: check if controller is ISO capable on
     btintel_classify_pkt_type(), work around FW returning incorrect
     capabilities

  Misc:

   - make OpenVPN Netlink input checking more strict before it makes it
     to a final release

   - wifi: cfg80211: remove scan request n_channels __counted_by, it's
     only yielding false positives"

* tag 'net-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  rxrpc: Fix to use conn aborts for conn-wide failures
  rxrpc: Fix transmission of an abort in response to an abort
  rxrpc: Fix notification vs call-release vs recvmsg
  rxrpc: Fix recv-recv race of completed call
  rxrpc: Fix irq-disabled in local_bh_enable()
  selftests/tc-testing: Test htb_dequeue_tree with deactivation and row emptying
  net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree
  net: bridge: Do not offload IGMP/MLD messages
  selftests: Add test cases for vlan_filter modification during runtime
  net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
  tls: always refresh the queue when reading sock
  virtio-net: fix recursived rtnl_lock() during probe()
  net/mlx5: Update the list of the PCI supported devices
  hv_netvsc: Set VF priv_flags to IFF_NO_ADDRCONF before open to prevent IPv6 addrconf
  phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept()
  Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU
  netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
  net: fix segmentation after TCP/UDP fraglist GRO
  ipv6: mcast: Delay put pmc->idev in mld_del_delrec()
  net: airoha: fix potential use-after-free in airoha_npu_get()
  ...

7 weeks agoMerge tag 'pm-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Thu, 17 Jul 2025 16:46:37 +0000 (09:46 -0700)] 
Merge tag 'pm-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These address three issues introduced during the current development
  cycle and related to system suspend and hibernation, one triggering
  when asynchronous suspend of devices fails, one possibly affecting
  memory management in the core suspend code error path, and one due to
  duplicate filesystems freezing during system suspend:

   - Fix a deadlock that may occur on asynchronous device suspend
     failures due to missing completion updates in error paths (Rafael
     Wysocki)

   - Drop a misplaced pm_restore_gfp_mask() call, which may cause swap
     to be accessed too early if system suspend fails, from
     suspend_devices_and_enter() (Rafael Wysocki)

   - Remove duplicate filesystems_freeze/thaw() calls, which sometimes
     cause systems to be unable to resume, from enter_state() (Zihuan
     Zhang)"

* tag 'pm-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: Update power.completion for all devices on errors
  PM: suspend: clean up redundant filesystems_freeze/thaw() handling
  PM: suspend: Drop a misplaced pm_restore_gfp_mask() call

7 weeks agodrm/amdgpu: move reset support type checks into the caller
Alex Deucher [Tue, 15 Jul 2025 15:55:05 +0000 (11:55 -0400)] 
drm/amdgpu: move reset support type checks into the caller

Rather than checking in the callbacks, check if the reset
type is supported in the caller.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/sdma7: re-emit unprocessed state on ring reset
Alex Deucher [Thu, 29 May 2025 17:12:35 +0000 (13:12 -0400)] 
drm/amdgpu/sdma7: re-emit unprocessed state on ring reset

Re-emit the unprocessed state after resetting the queue.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/sdma6: re-emit unprocessed state on ring reset
Alex Deucher [Thu, 29 May 2025 17:11:54 +0000 (13:11 -0400)] 
drm/amdgpu/sdma6: re-emit unprocessed state on ring reset

Re-emit the unprocessed state after resetting the queue.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
Alex Deucher [Thu, 26 Jun 2025 13:53:18 +0000 (09:53 -0400)] 
drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset

Re-emit the unprocessed state after resetting the queue.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/sdma5: re-emit unprocessed state on ring reset
Alex Deucher [Thu, 26 Jun 2025 13:52:55 +0000 (09:52 -0400)] 
drm/amdgpu/sdma5: re-emit unprocessed state on ring reset

Re-emit the unprocessed state after resetting the queue.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/gfx12: re-emit unprocessed state on ring reset
Alex Deucher [Wed, 28 May 2025 02:29:31 +0000 (22:29 -0400)] 
drm/amdgpu/gfx12: re-emit unprocessed state on ring reset

Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/gfx11: re-emit unprocessed state on ring reset
Alex Deucher [Wed, 28 May 2025 02:05:13 +0000 (22:05 -0400)] 
drm/amdgpu/gfx11: re-emit unprocessed state on ring reset

Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/gfx10: re-emit unprocessed state on ring reset
Alex Deucher [Fri, 23 May 2025 04:33:04 +0000 (00:33 -0400)] 
drm/amdgpu/gfx10: re-emit unprocessed state on ring reset

Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
Alex Deucher [Wed, 28 May 2025 03:23:53 +0000 (23:23 -0400)] 
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset

Re-emit the unprocessed state after resetting the queue.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
Alex Deucher [Wed, 28 May 2025 03:19:29 +0000 (23:19 -0400)] 
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset

Re-emit the unprocessed state after resetting the queue.

Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Add WARN_ON to the resource clear function
Arunpravin Paneer Selvam [Wed, 16 Jul 2025 07:51:23 +0000 (13:21 +0530)] 
drm/amdgpu: Add WARN_ON to the resource clear function

Set the dirty bit when the memory resource is not cleared
during BO release.

v2(Christian):
  - Drop the cleared flag set to false.
  - Improve the amdgpu_vram_mgr_set_clear_state() function.

v3:
  - Add back the resource clear flag set function call after
    being cleared during eviction (Christian).
  - Modified the patch subject name.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/pm: Use cached metrics data on SMUv13.0.6
Lijo Lazar [Fri, 11 Jul 2025 06:39:06 +0000 (12:09 +0530)] 
drm/amd/pm: Use cached metrics data on SMUv13.0.6

Cached metrics data validity is 1ms on SMUv13.0.6 SOCs. It's not
reasonable for any client to query gpu_metrics at a faster rate and
constantly interrupt PMFW.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/pm: Use cached data for min/max clocks
Lijo Lazar [Fri, 11 Jul 2025 16:18:33 +0000 (21:48 +0530)] 
drm/amd/pm: Use cached data for min/max clocks

If dpm tables are already populated on SMU v13.0.6 SOCs, use the cached
data. Otherwise, fetch values from firmware.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>