]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
8 weeks agoLoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
Dave Martin [Tue, 1 Jul 2025 13:56:02 +0000 (14:56 +0100)] 
LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names

Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: loongarch@lists.linux.dev
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-10-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agohexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
Dave Martin [Tue, 1 Jul 2025 13:56:01 +0000 (14:56 +0100)] 
hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names

Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Brian Cain <bcain@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: linux-hexagon@vger.kernel.org
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-9-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agocsky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
Dave Martin [Tue, 1 Jul 2025 13:56:00 +0000 (14:56 +0100)] 
csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names

Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: linux-csky@vger.kernel.org
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-8-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agoarm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
Dave Martin [Tue, 1 Jul 2025 13:55:59 +0000 (14:55 +0100)] 
arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names

Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.

This does not affect the correctness of switch(note_type) and similar
code, since note type values known to Linux for coredump purposes were
already required to be unique.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-7-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agoARM: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
Dave Martin [Tue, 1 Jul 2025 13:55:58 +0000 (14:55 +0100)] 
ARM: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names

Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-6-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agoARC: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
Dave Martin [Tue, 1 Jul 2025 13:55:57 +0000 (14:55 +0100)] 
ARC: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names

Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: linux-snps-arc@lists.infradead.org
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-5-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agobinfmt_elf: Dump non-arch notes with strictly matching name and type
Dave Martin [Tue, 1 Jul 2025 13:55:56 +0000 (14:55 +0100)] 
binfmt_elf: Dump non-arch notes with strictly matching name and type

The note names for some arch-independent coredump notes are specified
manually, albeit by referring to the NN_<foo> #define corresponding
to the NT_<foo> #define that specifies the note type.

Now that there are no exceptional cases, refactor fill_note() to pick
the correct NN_ and NT_ macros implcitly for the requested note type.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-4-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agoregset: Add explicit core note name in struct user_regset
Dave Martin [Tue, 1 Jul 2025 13:55:55 +0000 (14:55 +0100)] 
regset: Add explicit core note name in struct user_regset

There is currently hard-coded logic spread around the tree for
determining the note name for regset notes emitted in coredumps.

Now that the names are declared explicitly in <uapi/elf.h>, this can be
simplified.

In preparation for getting rid of the special-case logic, add an
explicit core_note_name field in struct user_regset for specifying the
note name explicitly.  To help avoid mistakes, a convenience macro
USER_REGSET_NOTE_TYPE() is provided to set .core_note_type and
.core_note_name based on the note type.

When dumping core, use the new field to set the note name, if the
regset specifies it.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-3-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agoregset: Fix kerneldoc for struct regset_get() in user_regset
Dave Martin [Tue, 1 Jul 2025 13:55:54 +0000 (14:55 +0100)] 
regset: Fix kerneldoc for struct regset_get() in user_regset

Commit 7717cb9bdd04 ("regset: new method and helpers for it") added a
new interface ->regset_get() for struct user_regset, and commit
1e6986c9db21 ("regset: kill ->get()") got rid of the old interface.

The kerneldoc comment block was never updated to take account of this
change, though.

Update it.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-2-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
8 weeks agolockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization
Breno Leitao [Fri, 21 Mar 2025 09:30:49 +0000 (02:30 -0700)] 
lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization

lockdep_unregister_key() is called from critical code paths, including
sections where rtnl_lock() is held. For example, when replacing a qdisc
in a network device, network egress traffic is disabled while
__qdisc_destroy() is called for every network queue.

If lockdep is enabled, __qdisc_destroy() calls lockdep_unregister_key(),
which gets blocked waiting for synchronize_rcu() to complete.

For example, a simple tc command to replace a qdisc could take 13
seconds:

  # time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq
    real    0m13.195s
    user    0m0.001s
    sys     0m2.746s

During this time, network egress is completely frozen while waiting for
RCU synchronization.

Use synchronize_rcu_expedited() instead to minimize the impact on
critical operations like network connectivity changes.

This improves 10x the function call to tc, when replacing the qdisc for
a network card.

   # time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq
     real     0m1.789s
     user     0m0.000s
     sys      0m1.613s

[boqun: Fixed the comment and add more information for the temporary
workaround, and add TODO information for hazptr]

Reported-by: Erik Lundgren <elundgren@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250321-lockdep-v1-1-78b732d195fb@debian.org
8 weeks agolocking/mutex: Remove redundant #ifdefs
Ran Xiaokai [Fri, 4 Jul 2025 01:52:18 +0000 (01:52 +0000)] 
locking/mutex: Remove redundant #ifdefs

hung_task_{set,clear}_blocker() is already guarded by
CONFIG_DETECT_HUNG_TASK_BLOCKER in hung_task.h, So remove
the redudant check of #ifdef.

Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250704015218.359754-1-ranxiaokai627@163.com
8 weeks agolocking/lockdep: Change 'static const' variables to enum values
Arnd Bergmann [Wed, 9 Apr 2025 12:22:58 +0000 (14:22 +0200)] 
locking/lockdep: Change 'static const' variables to enum values

gcc warns about 'static const' variables even in headers when building
with -Wunused-const-variables enabled:

In file included from kernel/locking/lockdep_proc.c:25:
kernel/locking/lockdep_internals.h:69:28: error: 'LOCKF_USED_IN_IRQ_READ' defined but not used [-Werror=unused-const-variable=]
   69 | static const unsigned long LOCKF_USED_IN_IRQ_READ =
      |                            ^~~~~~~~~~~~~~~~~~~~~~
kernel/locking/lockdep_internals.h:63:28: error: 'LOCKF_ENABLED_IRQ_READ' defined but not used [-Werror=unused-const-variable=]
   63 | static const unsigned long LOCKF_ENABLED_IRQ_READ =
      |                            ^~~~~~~~~~~~~~~~~~~~~~
kernel/locking/lockdep_internals.h:57:28: error: 'LOCKF_USED_IN_IRQ' defined but not used [-Werror=unused-const-variable=]
   57 | static const unsigned long LOCKF_USED_IN_IRQ =
      |                            ^~~~~~~~~~~~~~~~~
kernel/locking/lockdep_internals.h:51:28: error: 'LOCKF_ENABLED_IRQ' defined but not used [-Werror=unused-const-variable=]
   51 | static const unsigned long LOCKF_ENABLED_IRQ =
      |                            ^~~~~~~~~~~~~~~~~

This one is easy to avoid by changing the generated constant definition
into an equivalent enum.

Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250409122314.2848028-6-arnd@kernel.org
8 weeks agolocking/lockdep: Avoid struct return in lock_stats()
Arnd Bergmann [Tue, 10 Jun 2025 09:29:21 +0000 (11:29 +0200)] 
locking/lockdep: Avoid struct return in lock_stats()

Returning a large structure from the lock_stats() function causes clang
to have multiple copies of it on the stack and copy between them, which
can end up exceeding the frame size warning limit:

kernel/locking/lockdep.c:300:25: error: stack frame size (1464) exceeds limit (1280) in 'lock_stats' [-Werror,-Wframe-larger-than]
  300 | struct lock_class_stats lock_stats(struct lock_class *class)

Change the calling conventions to directly operate on the caller's copy,
which apparently is what gcc does already.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250610092941.2642847-1-arnd@kernel.org
8 weeks agodt-bindings: interrupt-controller: Convert apm,xgene1-msi to DT schema
Rob Herring (Arm) [Thu, 10 Jul 2025 18:07:55 +0000 (13:07 -0500)] 
dt-bindings: interrupt-controller: Convert apm,xgene1-msi to DT schema

Convert the Applied Micro X-Gene MSI controller binding to DT schema
format. MSI controllers go in interrupt-controller directory so move the
schema there.

Link: https://lore.kernel.org/r/20250710180757.2970583-1-robh@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
8 weeks agoext4: limit the maximum folio order
Zhang Yi [Mon, 7 Jul 2025 14:08:14 +0000 (22:08 +0800)] 
ext4: limit the maximum folio order

In environments with a page size of 64KB, the maximum size of a folio
can reach up to 128MB. Consequently, during the write-back of folios,
the 'rsv_blocks' will be overestimated to 1,577, which can make
pressure on the journal space where the journal is small. This can
easily exceed the limit of a single transaction. Besides, an excessively
large folio is meaningless and will instead increase the overhead of
traversing the bhs within the folio. Therefore, limit the maximum order
of a folio to 2048 filesystem blocks.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Joseph Qi <jiangqi903@gmail.com>
Closes: https://lore.kernel.org/linux-ext4/CA+G9fYsyYQ3ZL4xaSg1-Tt5Evto7Zd+hgNWZEa9cQLbahA1+xg@mail.gmail.com/
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-12-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agodrivers: cpufreq: add Tegra114 support
Svyatoslav Ryhel [Mon, 14 Jul 2025 08:17:11 +0000 (11:17 +0300)] 
drivers: cpufreq: add Tegra114 support

Tegra114 is fully compatible with existing Tegra124 cpufreq driver.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
8 weeks agorust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
Ritvik Gupta [Sun, 13 Jul 2025 19:02:44 +0000 (00:32 +0530)] 
rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs

Replace the following unsafe initializations:
1. `MaybeUninit::uninit().assume_init()` with `Opaque::uninit()`
2. `core::mem::zeroed()` with `Opaque::zeroed()`

Suggested-by: Benno Lossin <lossin@kernel.org>
Link: https://github.com/Rust-for-Linux/linux/issues/1178
Suggested-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/rust-for-linux/CAH5fLgj0OoCn56OkNUmiPQ=RAVa_VmS-yMZ4TNBSpGPNtZ5D0A@mail.gmail.com/
Reviewed-by: Benno Lossin <lossin@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
8 weeks agoMerge tag 'for-6.16/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 15 Jul 2025 02:25:28 +0000 (19:25 -0700)] 
Merge tag 'for-6.16/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mikulas Patocka:

 - dm-bufio: fix scheduling in atomic

* tag 'for-6.16/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-bufio: fix sched in atomic context

8 weeks agogfs2: a minor finish_xmote cleanup
Andreas Gruenbacher [Mon, 23 Jun 2025 20:20:55 +0000 (22:20 +0200)] 
gfs2: a minor finish_xmote cleanup

As a minor clean-up to commit 1fc05c8d8426 ("gfs2: cancel timed-out
glock requests"), when a demote request is in progress in
finish_xmote(), there is no point in waking up the glock holder at the
head of the queue because the reply from dlm cannot be on behalf of that
glock holder.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
8 weeks agogfs2: simplify finish_xmote
Andreas Gruenbacher [Tue, 24 Jun 2025 18:41:37 +0000 (20:41 +0200)] 
gfs2: simplify finish_xmote

As a follow-up to commit a431d49243a0 ("gfs2: Fix request cancelation
bug"), it turns out that any call to finish_xmote() is always followed
by a call to run_queue(), either

 * directly when glock_work_func() calls finish_xmote() before calling
   run_queue(), or

 * indirectly when do_xmote() calls finish_xmote() before calling
   gfs2_glock_queue_work(), which queues a call to glock_work_func() in
   work queue context,

so remove the code in finish_xmote() that duplicates the functionality
of run_queue().

In addition, the code this commit removes is missing a check for the
GLF_DEMOTE flag which indicates that no further promotes should be
performed, so if that code didn't get removed, that check would have to
be added.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
8 weeks agogfs2: sanitize the gdlm_ast -> finish_xmote interface
Andreas Gruenbacher [Wed, 25 Jun 2025 12:41:58 +0000 (14:41 +0200)] 
gfs2: sanitize the gdlm_ast -> finish_xmote interface

When gdlm_ast() is called with a non-zero status code, this means that
the requested operation did not succeed and the current lock state
didn't change.  Turn that into a non-zero LM_OUT_* status code (with ret
& ~LM_OUT_ST_MASK != 0) instead of pretending that dlm returned the
current lock state.

That way, we can easily change finish_xmote() to only update
gl->gl_state when the state has actually changed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
8 weeks agoefi: add API doc entry for ovmf_debug_log
Gerd Hoffmann [Fri, 11 Jul 2025 05:44:46 +0000 (07:44 +0200)] 
efi: add API doc entry for ovmf_debug_log

Document the newly added sysfs ABI for accessing the in-memory debug log
provided by OVMF EFI firmware (when enabled)

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
8 weeks agowifi: rtw88: Fix macid assigned to TDLS station
Bitterblue Smith [Sun, 13 Jul 2025 19:27:32 +0000 (22:27 +0300)] 
wifi: rtw88: Fix macid assigned to TDLS station

When working in station mode, TDLS peers are assigned macid 0, even
though 0 was already assigned to the AP. This causes the connection
with the AP to stop working after the TDLS connection is torn down.

Assign the next available macid to TDLS peers, same as client stations
in AP mode.

Fixes: 902cb7b11f9a ("wifi: rtw88: assign mac_id for vif/sta and update to TX desc")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/58648c09-8553-4bcc-a977-9dc9afd63780@gmail.com
8 weeks agowifi: rtw88: enable TX reports for the management queue
Andrey Skvortsov [Fri, 11 Jul 2025 08:47:40 +0000 (11:47 +0300)] 
wifi: rtw88: enable TX reports for the management queue

This is needed for AP mode. Otherwise client sees the network, but
can't connect to it.

REG_FWHW_TXQ_CTRL+1 is set to WLAN_TXQ_RPT_EN (0x1F) in common mac
init function (__rtw8723x_mac_init), but the value was overwritten
from mac table later.

Tables with register values for phy parameters initialization are
copied from vendor driver usually. When table will be regenerated,
manual modifications to it may be lost. To avoid regressions in this
case new callback mac_postinit is introduced, that is called after
parameters from table are set.

Tested on rtl8723cs, that reuses rtw8703b driver.

Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250711084740.3396766-1-andrej.skvortzov@gmail.com
8 weeks agowifi: rtl8xxxu: Fix RX skb size for aggregation disabled
Martin Kaistra [Wed, 9 Jul 2025 12:15:22 +0000 (14:15 +0200)] 
wifi: rtl8xxxu: Fix RX skb size for aggregation disabled

Commit 1e5b3b3fe9e0 ("rtl8xxxu: Adjust RX skb size to include space for
phystats") increased the skb size when aggregation is enabled but decreased
it for the aggregation disabled case.

As a result, if a frame near the maximum size is received,
rtl8xxxu_rx_complete() is called with status -EOVERFLOW and then the
driver starts to malfunction and no further communication is possible.

Restore the skb size in the aggregation disabled case.

Fixes: 1e5b3b3fe9e0 ("rtl8xxxu: Adjust RX skb size to include space for phystats")
Signed-off-by: Martin Kaistra <martin.kaistra@linutronix.de>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250709121522.1992366-1-martin.kaistra@linutronix.de
8 weeks agowifi: rtw89: 8852b: implement RFK multi-channel handling and support chanctx up to 2
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:23 +0000 (12:24 +0800)] 
wifi: rtw89: 8852b: implement RFK multi-channel handling and support chanctx up to 2

To support multiple channels, 2 for this case, RFK requires to record each
calibration result for each channel in different RFK tables, and also needs
to notify FW. Then, when FW runs in MCC (multi-channel concurrency), it can
switch to the corresponding calibration result according to the channel.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-15-pkshih@realtek.com
8 weeks agowifi: rtw89: 8852b: configure FW version for SCAN_OFFLOAD_EXTRA_OP feature
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:22 +0000 (12:24 +0800)] 
wifi: rtw89: 8852b: configure FW version for SCAN_OFFLOAD_EXTRA_OP feature

After v0.29.128.0, RTL8852B FW supports SCAN_OFFLOAD_EXTRA_OP feature.
With it, FW can do back-op and TX NULL frames on the second connection.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-14-pkshih@realtek.com
8 weeks agowifi: rtw89: 8852bt: implement RFK multi-channel handling and support chanctx up...
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:21 +0000 (12:24 +0800)] 
wifi: rtw89: 8852bt: implement RFK multi-channel handling and support chanctx up to 2

To support multiple channels, 2 for this case, RFK requires to record each
calibration result for each channel in different RFK tables, and also needs
to notify FW. Then, when FW runs in MCC (multi-channel concurrency), it can
switch to the corresponding calibration result according to the channel.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-13-pkshih@realtek.com
8 weeks agowifi: rtw89: 8852bt: configure FW version for SCAN_OFFLOAD_EXTRA_OP feature
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:20 +0000 (12:24 +0800)] 
wifi: rtw89: 8852bt: configure FW version for SCAN_OFFLOAD_EXTRA_OP feature

After v0.29.127.0, RTL8852BT FW supports SCAN_OFFLOAD_EXTRA_OP feature.
With it, FW can do back-op and TX NULL frames on the second connection.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-12-pkshih@realtek.com
8 weeks agowifi: rtw89: tweak tx wake notify matching condition
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:19 +0000 (12:24 +0800)] 
wifi: rtw89: tweak tx wake notify matching condition

8852BT needs to call TX wake notify once entering Leisure Power Save.
8852C needs to call TX wake notify after entering low power mode. Other
AX chips only MGMT packets needs to call TX wake after entering low power
mode. BE chips no need to call TX wake.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-11-pkshih@realtek.com
8 weeks agowifi: rtw89: update SER L2 type default value
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:18 +0000 (12:24 +0800)] 
wifi: rtw89: update SER L2 type default value

8852BT after FW 0.29.127, 8852B after FW 0.29.128 and 8922A after FW
0.35.79, the SER L2 flow determines different L2 types by parameter,
the zero value will trigger FW SER L2 assert.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-10-pkshih@realtek.com
8 weeks agoMerge branch 'tcp-receiver-changes'
Jakub Kicinski [Tue, 15 Jul 2025 01:40:51 +0000 (18:40 -0700)] 
Merge branch 'tcp-receiver-changes'

Eric Dumazet says:

====================
tcp: receiver changes

Before accepting an incoming packet:

- Make sure to not accept a packet beyond advertized RWIN.
  If not, increment a new SNMP counter (LINUX_MIB_BEYOND_WINDOW)

- ooo packets should update rcv_mss and tp->scaling_ratio.

- Make sure to not accept packet beyond sk_rcvbuf limit.

This series includes three associated packetdrill tests.
====================

Link: https://patch.msgid.link/20250711114006.480026-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests/net: packetdrill: add tcp_rcv_toobig.pkt
Eric Dumazet [Fri, 11 Jul 2025 11:40:06 +0000 (11:40 +0000)] 
selftests/net: packetdrill: add tcp_rcv_toobig.pkt

Check that TCP receiver behavior after "tcp: stronger sk_rcvbuf checks"

Too fat packet is dropped unless receive queue is empty.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agotcp: stronger sk_rcvbuf checks
Eric Dumazet [Fri, 11 Jul 2025 11:40:05 +0000 (11:40 +0000)] 
tcp: stronger sk_rcvbuf checks

Currently, TCP stack accepts incoming packet if sizes of receive queues
are below sk->sk_rcvbuf limit.

This can cause memory overshoot if the packet is big, like an 1/2 MB
BIG TCP one.

Refine the check to take into account the incoming skb truesize.

Note that we still accept the packet if the receive queue is empty,
to not completely freeze TCP flows in pathological conditions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agotcp: add const to tcp_try_rmem_schedule() and sk_rmem_schedule() skb
Eric Dumazet [Fri, 11 Jul 2025 11:40:04 +0000 (11:40 +0000)] 
tcp: add const to tcp_try_rmem_schedule() and sk_rmem_schedule() skb

These functions to not modify the skb, add a const qualifier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests/net: packetdrill: add tcp_ooo_rcv_mss.pkt
Eric Dumazet [Fri, 11 Jul 2025 11:40:03 +0000 (11:40 +0000)] 
selftests/net: packetdrill: add tcp_ooo_rcv_mss.pkt

We make sure tcpi_rcv_mss and tp->scaling_ratio
are correctly updated if no in-order packet has been received yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agotcp: call tcp_measure_rcv_mss() for ooo packets
Eric Dumazet [Fri, 11 Jul 2025 11:40:02 +0000 (11:40 +0000)] 
tcp: call tcp_measure_rcv_mss() for ooo packets

tcp_measure_rcv_mss() is used to update icsk->icsk_ack.rcv_mss
(tcpi_rcv_mss in tcp_info) and tp->scaling_ratio.

Calling it from tcp_data_queue_ofo() makes sure these
fields are updated, and permits a better tuning
of sk->sk_rcvbuf, in the case a new flow receives many ooo
packets.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests/net: packetdrill: add tcp_rcv_big_endseq.pkt
Eric Dumazet [Fri, 11 Jul 2025 11:40:01 +0000 (11:40 +0000)] 
selftests/net: packetdrill: add tcp_rcv_big_endseq.pkt

This test checks TCP behavior when receiving a packet beyond the window.

It checks the new TcpExtBeyondWindow SNMP counter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agotcp: add LINUX_MIB_BEYOND_WINDOW
Eric Dumazet [Fri, 11 Jul 2025 11:40:00 +0000 (11:40 +0000)] 
tcp: add LINUX_MIB_BEYOND_WINDOW

Add a new SNMP MIB : LINUX_MIB_BEYOND_WINDOW

Incremented when an incoming packet is received beyond the
receiver window.

nstat -az | grep TcpExtBeyondWindow

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agotcp: do not accept packets beyond window
Eric Dumazet [Fri, 11 Jul 2025 11:39:59 +0000 (11:39 +0000)] 
tcp: do not accept packets beyond window

Currently, TCP accepts incoming packets which might go beyond the
offered RWIN.

Add to tcp_sequence() the validation of packet end sequence.

Add the corresponding check in the fast path.

We relax this new constraint if the receive queue is empty,
to not freeze flows from buggy peers.

Add a new drop reason : SKB_DROP_REASON_TCP_INVALID_END_SEQUENCE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agowifi: rtw89: introduce fw feature group and redefine CRASH_TRIGGER
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:17 +0000 (12:24 +0800)] 
wifi: rtw89: introduce fw feature group and redefine CRASH_TRIGGER

Some FW features may have variants on how to deal with in leaf functions,
e.g. H2C commands. However, from SW component point of view, it might not
matter which variant is supported exactly. In some cases, SW component may
just care whether any of the variants is supported or not.

So, introduce a concept of FW feature group which can manage a set of FW
features and can easily be checked if at least one of them is supported.

Since CRASH_TRIGGER will have variants and then matches the case mentioned
above, so redefine CRASH_TRIGGER as a FW feature group and add a variant
of type 0 for original handling.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-9-pkshih@realtek.com
8 weeks agoefistub: Lower default log level
Aaron Kling [Tue, 8 Jul 2025 07:30:42 +0000 (02:30 -0500)] 
efistub: Lower default log level

Some uefi implementations will write the efistub logs to the display
over a splash image. This is not desirable for debug and info logs, so
lower the default efi log level to exclude them.

Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
8 weeks agowifi: rtw89: check LPS H2C command complete by C2H reg instead of done ack
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:16 +0000 (12:24 +0800)] 
wifi: rtw89: check LPS H2C command complete by C2H reg instead of done ack

8852B after FW 0.29.127, 8852BT after FW 0.29.127 and 8922A after FW
0.35.76 driver check LPS H2C command received by FW using C2H reg instead
of done ack.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-8-pkshih@realtek.com
8 weeks agowifi: rtw89: mcc: solve GO's TBTT change and TBTT too close to NoA issue
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:15 +0000 (12:24 +0800)] 
wifi: rtw89: mcc: solve GO's TBTT change and TBTT too close to NoA issue

For some implementation acting as GO under MCC(GO+STA), the GO's TBTT
might change after STA roams to another AP. This could result the new GO
beacon TX at the STA timeslot of GC+STA, causing GC beacon loss.
Therefore, if the GC detects beacon loss, it will pause MCC and remain
on the GO side for 100 TU to detect the new TBTT beacon.

Additionally, some implementation acting as GO under MCC might TX beacon
too close to the NoA period. The GC calculates timeslot pattern the TOB
(time offset behind) or TOA(time offset ahead) less than the minimum
RX beacon time, which leads to beacon loss. Therefore, disable the
beacon filter in this case. Then, if the GO's TBTT changed, the pattern
TOB/TOA greater than the minimum RX beacon time, the beacon filter should
be retriggered during MCC update.

Moreover, if the beacon filter is disabled initially but the GO timeslot
change, causing QoS null data detection fail, also pause MCC to detect new
TBTT beacon.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-7-pkshih@realtek.com
8 weeks agowifi: rtw89: extend HW scan of WiFi 7 chips for extra OP chan when concurrency
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:14 +0000 (12:24 +0800)] 
wifi: rtw89: extend HW scan of WiFi 7 chips for extra OP chan when concurrency

HW scan of WiFi 7 chips supports multiple op channel configurations.
When concurrency, fill two channels info to HW scan to avoid packet
lost.

However, the OP chan timing is arranged by FW, and the actual stay
period depends on AP. It's hard to calculate total scan time for once
NoA in advance. Therefore, change the scan back to GO OP channel every
200TU and TX beacon to ensure GC doesn't beacon loss. Additionally, add
a period NoA with large duration to ensure GC doesn't TX packet during
GO scanning and can still listen beacon at TBTT to update the new NoA
info after scan complete.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-6-pkshih@realtek.com
8 weeks agowifi: rtw89: mcc: when MCC stop forcing to stay at GO role
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:13 +0000 (12:24 +0800)] 
wifi: rtw89: mcc: when MCC stop forcing to stay at GO role

MCC stop might triggered by scan, and need to force to stay at GO role
to keep TX beacon. Also, AX chips need to TX more 3 beacons to ensure
GC can receive once NoA beacon before scan when GC in courtesy mode.
BE chips no needs to TX 3 more beacon because it can TX beacon every
200TU during scan, even GC in courtesy mode can receive beacon every
600TU.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-5-pkshih@realtek.com
8 weeks agowifi: rtw89: mcc: enlarge GO NoA duration to cover channel switching time
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:12 +0000 (12:24 +0800)] 
wifi: rtw89: mcc: enlarge GO NoA duration to cover channel switching time

MCC require time to switch channel when changing timeslot. If GC TX
nulldata 0 while GO is switching channel, GO can't receive it. Therefore,
enlarge the GO NoA duration to cover the channel switching time.

However, the enlarged NoA duration might cause GC's timeslot less than
minimum of RX beacon time. Therefore, adjust strict and anchor pattern
condition to avoid it.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-4-pkshih@realtek.com
8 weeks agowifi: rtw89: add DIG suspend/resume flow when scan and connection
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:11 +0000 (12:24 +0800)] 
wifi: rtw89: add DIG suspend/resume flow when scan and connection

The PD lower bound set after one interface is connected, If second
interface needs to connect, packets might not be detected because the
PD lower bound is too high. Therefore, a DIG suspend/resume flow is
added to decrease the PD lower bound during scanning or connection,
and the original PD level is resumed afterward.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-3-pkshih@realtek.com
8 weeks agowifi: rtw89: mcc: add H2C command to support different PD level in MCC
Chih-Kang Chang [Thu, 10 Jul 2025 04:24:10 +0000 (12:24 +0800)] 
wifi: rtw89: mcc: add H2C command to support different PD level in MCC

Packet detection(PD) lower bound is the threshold for sensing packet,
and it is dynamically calculated based on RSSI. In MCC, the two
interfaces have different RSSI values, so it is necessary to set
different values to ensure packets can be received. Therefore, add
H2C command to let firmware to switch PD lower bound when MCC mode.

Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250710042423.73617-2-pkshih@realtek.com
8 weeks agonet: wangxun: fix LIBWX dependencies again
Arnd Bergmann [Fri, 11 Jul 2025 08:23:34 +0000 (10:23 +0200)] 
net: wangxun: fix LIBWX dependencies again

Two more drivers got added that use LIBWX and cause a build warning

WARNING: unmet direct dependencies detected for LIBWX
  Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
  Selected by [y]:
  - NGBEVF [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PCI_MSI [=y]
  Selected by [m]:
  - NGBE [=m] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PCI [=y] && PTP_1588_CLOCK_OPTIONAL [=m]

ld: drivers/net/ethernet/wangxun/libwx/wx_lib.o: in function `wx_clean_tx_irq':
wx_lib.c:(.text+0x5a68): undefined reference to `ptp_schedule_worker'
ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_nway_reset':
wx_ethtool.c:(.text+0x880): undefined reference to `phylink_ethtool_nway_reset'

Add the same dependency on PTP_1588_CLOCK_OPTIONAL to the two driver
using this library module, following the pattern from commit
8fa19c2c69fb ("net: wangxun: fix LIBWX dependencies").

Fixes: 377d180bd71c ("net: wangxun: add txgbevf build")
Fixes: a0008a3658a3 ("net: wangxun: add ngbevf build")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://patch.msgid.link/20250711082339.1372821-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agowifi: rtw89: regd/acpi: support 6 GHz VLP policy via ACPI DSM
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:06 +0000 (14:50 +0800)] 
wifi: rtw89: regd/acpi: support 6 GHz VLP policy via ACPI DSM

Process ACPI DSM function 11 to get 6 GHz VLP support by country. If
not allowed, return error to block the connection. By default, i.e.
ACPI DSM function is not configured, disallow 6 GHz VLP on country US
and country CA, because some platform-level certifications are needed
in FCC regulation before operating on 6 GHz VLP connection.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250709065006.32028-5-pkshih@realtek.com
8 weeks agowifi: rtw89: regd/acpi: support regulatory rules via ACPI DSM and parse rule of regd_UK
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:05 +0000 (14:50 +0800)] 
wifi: rtw89: regd/acpi: support regulatory rules via ACPI DSM and parse rule of regd_UK

ACPI DSM function 10 is defined for the enablement for Realtek regulatory
rules. The first rule is whether to allow regd_UK regulatory settings or
not. If not, the strict one, i.e. regd_ETSI, regulatory settings will be
used on country GB.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250709065006.32028-4-pkshih@realtek.com
8 weeks agowifi: rtw89: regd/acpi: update field definition to specific country in UNII-4 conf
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:04 +0000 (14:50 +0800)] 
wifi: rtw89: regd/acpi: update field definition to specific country in UNII-4 conf

Originally, fields of ACPI DSM function 6 were handled for countries
following specific regulatory.
BIT(0) for countries following FCC regulatory
BIT(1) for countries following IC regulatory

Now, update to the following (one field for one specific country).
BIT(0) for country US
BIT(1) for country CA

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250709065006.32028-3-pkshih@realtek.com
8 weeks agowifi: rtw89: regd/acpi: support country CA by BIT(1) in 6 GHz SP conf
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:03 +0000 (14:50 +0800)] 
wifi: rtw89: regd/acpi: support country CA by BIT(1) in 6 GHz SP conf

ACPI DSM function 7 is used to decide whether 6 GHz Standard Power
(SP) is allowed on given countries. Now, add BIT(1) for country CA.
Besides, for searching country index, replace for-loop with index
getter function.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250709065006.32028-2-pkshih@realtek.com
8 weeks agoscsi: bfa: Double-free fix
jackysliu [Tue, 24 Jun 2025 11:58:24 +0000 (19:58 +0800)] 
scsi: bfa: Double-free fix

When the bfad_im_probe() function fails during initialization, the memory
pointed to by bfad->im is freed without setting bfad->im to NULL.

Subsequently, during driver uninstallation, when the state machine enters
the bfad_sm_stopping state and calls the bfad_im_probe_undo() function,
it attempts to free the memory pointed to by bfad->im again, thereby
triggering a double-free vulnerability.

Set bfad->im to NULL if probing fails.

Signed-off-by: jackysliu <1972843537@qq.com>
Link: https://lore.kernel.org/r/tencent_3BB950D6D2D470976F55FC879206DE0B9A09@qq.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoAdd support to set NAPI threaded for individual NAPI
Samiullah Khawaja [Thu, 10 Jul 2025 21:12:03 +0000 (21:12 +0000)] 
Add support to set NAPI threaded for individual NAPI

A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.

Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.

Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.

Tested
 ./tools/testing/selftests/net/nl_netdev.py
 TAP version 13
 1..7
 ok 1 nl_netdev.empty_check
 ok 2 nl_netdev.lo_check
 ok 3 nl_netdev.page_pool_check
 ok 4 nl_netdev.napi_list_check
 ok 5 nl_netdev.dev_set_threaded
 ok 6 nl_netdev.napi_set_threaded
 ok 7 nl_netdev.nsim_rxq_reset_down
 # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge patch series "ufs: ufs-qcom: Align programming sequence as per HW spec"
Martin K. Petersen [Tue, 15 Jul 2025 01:00:41 +0000 (21:00 -0400)] 
Merge patch series "ufs: ufs-qcom: Align programming sequence as per HW spec"

Nitin Rawat <quic_nitirawa@quicinc.com> says:

This patch series adds programming support for Qualcomm UFS
to align with Hardware Specification.

In this patch series below changes are taken care.

1. Enable QUnipro Internal Clock Gating
2. Update esi_vec_mask for HW major version >= 6

Link: https://lore.kernel.org/r/20250714075336.2133-1-quic_nitirawa@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: isci: Fix dma_unmap_sg() nents value
Thomas Fourier [Fri, 27 Jun 2025 14:24:47 +0000 (16:24 +0200)] 
scsi: isci: Fix dma_unmap_sg() nents value

The dma_unmap_sg() functions should be called with the same nents as the
dma_map_sg(), not the value the map function returned.

Fixes: ddcc7e347a89 ("isci: fix dma_unmap_sg usage")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250627142451.241713-2-fourier.thomas@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: mvsas: Fix dma_unmap_sg() nents value
Thomas Fourier [Fri, 27 Jun 2025 13:48:18 +0000 (15:48 +0200)] 
scsi: mvsas: Fix dma_unmap_sg() nents value

The dma_unmap_sg() functions should be called with the same nents as the
dma_map_sg(), not the value the map function returned.

Fixes: b5762948263d ("[SCSI] mvsas: Add Marvell 6440 SAS/SATA driver")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250627134822.234813-2-fourier.thomas@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: elx: efct: Fix dma_unmap_sg() nents value
Thomas Fourier [Fri, 27 Jun 2025 11:41:13 +0000 (13:41 +0200)] 
scsi: elx: efct: Fix dma_unmap_sg() nents value

The dma_unmap_sg() functions should be called with the same nents as the
dma_map_sg(), not the value the map function returned.

Fixes: 692e5d73a811 ("scsi: elx: efct: LIO backend interface routines")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250627114117.188480-2-fourier.thomas@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: scsi_transport_fc: Change to use per-rport devloss_work_q
Ewan D. Milne [Mon, 7 Jul 2025 20:22:25 +0000 (16:22 -0400)] 
scsi: scsi_transport_fc: Change to use per-rport devloss_work_q

Configurations with large numbers of FC rports per host instance are
taking a very long time to complete all devloss work.  Increase potential
parallelism by using a per-rport devloss_work_q for dev_loss_work and
fast_io_fail_work.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20250707202225.1203189-1-emilne@redhat.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: ufs: exynos: Fix programming of HCI_UTRL_NEXUS_TYPE
André Draszik [Mon, 7 Jul 2025 17:05:27 +0000 (18:05 +0100)] 
scsi: ufs: exynos: Fix programming of HCI_UTRL_NEXUS_TYPE

On Google gs101, the number of UTP transfer request slots (nutrs) is 32,
and in this case the driver ends up programming the UTRL_NEXUS_TYPE
incorrectly as 0.

This is because the left hand side of the shift is 1, which is of type
int, i.e. 31 bits wide. Shifting by more than that width results in
undefined behaviour.

Fix this by switching to the BIT() macro, which applies correct type
casting as required. This ensures the correct value is written to
UTRL_NEXUS_TYPE (0xffffffff on gs101), and it also fixes a UBSAN shift
warning:

    UBSAN: shift-out-of-bounds in drivers/ufs/host/ufs-exynos.c:1113:21
    shift exponent 32 is too large for 32-bit type 'int'

For consistency, apply the same change to the nutmrs / UTMRL_NEXUS_TYPE
write.

Fixes: 55f4b1f73631 ("scsi: ufs: ufs-exynos: Add UFS host support for Exynos SoCs")
Cc: stable@vger.kernel.org
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://lore.kernel.org/r/20250707-ufs-exynos-shift-v1-1-1418e161ae40@linaro.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: core: Fix kernel doc for scsi_track_queue_full()
Bagas Sanjaya [Wed, 2 Jul 2025 03:58:23 +0000 (10:58 +0700)] 
scsi: core: Fix kernel doc for scsi_track_queue_full()

Sphinx reports indentation warning on scsi_track_queue_full() return
values:

Documentation/driver-api/scsi:101: ./drivers/scsi/scsi.c:247: ERROR: Unexpected indentation. [docutils]

Fix the warning by making the return values listing a bullet list.

Fixes: eb44820c28bc ("[SCSI] Add Documentation and integrate into docbook build")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20250702035822.18072-2-bagasdotme@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: ibmvscsi_tgt: Fix dma_unmap_sg() nents value
Thomas Fourier [Mon, 30 Jun 2025 11:18:02 +0000 (13:18 +0200)] 
scsi: ibmvscsi_tgt: Fix dma_unmap_sg() nents value

The dma_unmap_sg() functions should be called with the same nents as the
dma_map_sg(), not the value the map function returned.

Fixes: 88a678bbc34c ("ibmvscsis: Initial commit of IBM VSCSI Tgt Driver")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250630111803.94389-2-fourier.thomas@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: ibmvscsi_tgt: Fix typo in comment
Ankit Dange [Sat, 28 Jun 2025 12:53:20 +0000 (18:23 +0530)] 
scsi: ibmvscsi_tgt: Fix typo in comment

Correct the misspelling of "transitition" to "transition" in a comment
in ibmvscsi_tgt.c for clarity.

Signed-off-by: Ankit Dange <ankitdange37@gmail.com>
Link: https://lore.kernel.org/r/20250628125320.295824-1-ankitdange37@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoMerge patch series "mpi3mr: Few minor bug fixes"
Martin K. Petersen [Tue, 15 Jul 2025 00:56:16 +0000 (20:56 -0400)] 
Merge patch series "mpi3mr: Few minor bug fixes"

Ranjan Kumar <ranjan.kumar@broadcom.com> says:

Few minor fixes of mpi3mr driver.

Link: https://lore.kernel.org/r/20250627194539.48851-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agonet: phy: Don't register LEDs for genphy
Sean Anderson [Thu, 10 Jul 2025 20:14:53 +0000 (16:14 -0400)] 
net: phy: Don't register LEDs for genphy

If a PHY has no driver, the genphy driver is probed/removed directly in
phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the
LEDs will be (un)registered when probing/removing the genphy driver.
This could occur if the leds are for a non-generic driver that isn't
loaded for whatever reason. Synchronously removing the PHY device in
phy_detach leads to the following deadlock:

rtnl_lock()
ndo_close()
    ...
    phy_detach()
        phy_remove()
            phy_leds_unregister()
                led_classdev_unregister()
                    led_trigger_set()
                        netdev_trigger_deactivate()
                            unregister_netdevice_notifier()
                                rtnl_lock()

There is a corresponding deadlock on the open/register side of things
(and that one is reported by lockdep), but it requires a race while this
one is deterministic. Regular drivers do not have this problem since
they are probed asynchronously (without RTNL held).

Generic PHYs do not support LEDs anyway, so don't bother registering
them.

[JakubL this is a net-next version of
 commit f0f2b992d818 ("net: phy: Don't register LEDs for genphy"),
 which uses APIs removed in -next.]

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250710201454.1280277-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoscsi: mpi3mr: Update driver version to 8.14.0.5.50
Ranjan Kumar [Fri, 27 Jun 2025 19:45:39 +0000 (01:15 +0530)] 
scsi: mpi3mr: Update driver version to 8.14.0.5.50

Updated driver version to 8.14.0.5.50

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250627194539.48851-5-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: mpi3mr: Serialize admin queue BAR writes on 32-bit systems
Ranjan Kumar [Fri, 27 Jun 2025 19:45:38 +0000 (01:15 +0530)] 
scsi: mpi3mr: Serialize admin queue BAR writes on 32-bit systems

On 32-bit systems, 64-bit BAR writes to admin queue registers are
performed as two 32-bit writes. Without locking, this can cause partial
writes when accessed concurrently.

Updated per-queue spinlocks is used to serialize these writes and prevent
race conditions.

Fixes: 824a156633df ("scsi: mpi3mr: Base driver code")
Cc: stable@vger.kernel.org
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250627194539.48851-4-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: mpi3mr: Drop unnecessary volatile from __iomem pointers
Ranjan Kumar [Fri, 27 Jun 2025 19:45:37 +0000 (01:15 +0530)] 
scsi: mpi3mr: Drop unnecessary volatile from __iomem pointers

The volatile qualifier is redundant for __iomem pointers.

Cleaned up usage in mpi3mr_writeq() and sysif_regs pointer as per
Upstream compliance.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250627194539.48851-3-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agonet: phy: Don't register LEDs for genphy
Sean Anderson [Mon, 7 Jul 2025 19:58:03 +0000 (15:58 -0400)] 
net: phy: Don't register LEDs for genphy

If a PHY has no driver, the genphy driver is probed/removed directly in
phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the
LEDs will be (un)registered when probing/removing the genphy driver.
This could occur if the leds are for a non-generic driver that isn't
loaded for whatever reason. Synchronously removing the PHY device in
phy_detach leads to the following deadlock:

rtnl_lock()
ndo_close()
    ...
    phy_detach()
        phy_remove()
            phy_leds_unregister()
                led_classdev_unregister()
                    led_trigger_set()
                        netdev_trigger_deactivate()
                            unregister_netdevice_notifier()
                                rtnl_lock()

There is a corresponding deadlock on the open/register side of things
(and that one is reported by lockdep), but it requires a race while this
one is deterministic.

Generic PHYs do not support LEDs anyway, so don't bother registering
them.

Fixes: 01e5b728e9e4 ("net: phy: Add a binding for PHY LEDs")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250707195803.666097-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoscsi: mpi3mr: Fix race between config read submit and interrupt completion
Ranjan Kumar [Fri, 27 Jun 2025 19:45:36 +0000 (01:15 +0530)] 
scsi: mpi3mr: Fix race between config read submit and interrupt completion

The "is_waiting" flag was updated after calling complete(), which could
lead to a race where the waiting thread wakes up before the flag is
cleared. This may cause a missed wakeup or stale state check.

Reorder the operations to update "is_waiting" before signaling completion
to ensure consistent state.

Fixes: 824a156633df ("scsi: mpi3mr: Base driver code")
Cc: stable@vger.kernel.org
Co-developed-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250627194539.48851-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agonetdevsim: implement peer queue flow control
Breno Leitao [Fri, 11 Jul 2025 17:06:59 +0000 (10:06 -0700)] 
netdevsim: implement peer queue flow control

Add flow control mechanism between paired netdevsim devices to stop the
TX queue during high traffic scenarios. When a receive queue becomes
congested (approaching NSIM_RING_SIZE limit), the corresponding transmit
queue on the peer device is stopped using netif_subqueue_try_stop().

Once the receive queue has sufficient capacity again, the peer's
transmit queue is resumed with netif_tx_wake_queue().

Key changes:
  * Add nsim_stop_peer_tx_queue() to pause peer TX when RX queue is full
  * Add nsim_start_peer_tx_queue() to resume peer TX when RX queue drains
  * Implement queue mapping validation to ensure TX/RX queue count match
  * Wake all queues during device unlinking to prevent stuck queues
  * Use RCU protection when accessing peer device references
  * wake the queues when changing the queue numbers
  * Remove IFF_NO_QUEUE given it will enqueue packets now

The flow control only activates when devices have matching TX/RX queue
counts to ensure proper queue mapping.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250711-netdev_flow_control-v3-1-aa1d5a155762@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests: net: add test for variable PMTU in broadcast routes
Oscar Maes [Thu, 10 Jul 2025 14:27:14 +0000 (16:27 +0200)] 
selftests: net: add test for variable PMTU in broadcast routes

Added a test for variable PMTU in broadcast routes.

This test uses iputils' ping and attempts to send a ping between
two peers, which should result in a regular echo reply.

This test will fail when the receiving peer does not receive the echo
request due to a lack of packet fragmentation.

Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250710142714.12986-2-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: ipv4: fix incorrect MTU in broadcast routes
Oscar Maes [Thu, 10 Jul 2025 14:27:13 +0000 (16:27 +0200)] 
net: ipv4: fix incorrect MTU in broadcast routes

Currently, __mkroute_output overrules the MTU value configured for
broadcast routes.

This buggy behaviour can be reproduced with:

ip link set dev eth1 mtu 9000
ip route del broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2
ip route add broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2 mtu 1500

The maximum packet size should be 1500, but it is actually 8000:

ping -b 192.168.0.255 -s 8000

Fix __mkroute_output to allow MTU values to be configured for
for broadcast routes (to support a mixed-MTU local-area-network).

Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250710142714.12986-1-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Tue, 15 Jul 2025 00:26:57 +0000 (17:26 -0700)] 
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2025-07-14

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: IFC updates for disabled host PF
  net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc
  RDMA/mlx5: Fix UMR modifying of mkey page size
  net/mlx5: Expose HCA capability bits for mkey max page size
====================

Link: https://patch.msgid.link/1752481357-34780-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Tue, 15 Jul 2025 00:26:20 +0000 (17:26 -0700)] 
Merge tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-07-11

The first patch is by Geert Uytterhoeven and converts the rcar_can
driver to DEFINE_SIMPLE_DEV_PM_OPS.

The last patch is by Biju Das and removes unused macros from the
rcar_canfd driver.

* tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: rcar_canfd: Drop unused macros
  can: rcar_can: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
====================

Link: https://patch.msgid.link/20250711101706.2822687-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet/x25: Remove unused x25_terminate_link()
Dr. David Alan Gilbert [Sat, 12 Jul 2025 20:57:59 +0000 (21:57 +0100)] 
net/x25: Remove unused x25_terminate_link()

x25_terminate_link() has been unused since the last use was removed
in 2020 by:
commit 7eed751b3b2a ("net/x25: handle additional netdev events")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://patch.msgid.link/20250712205759.278777-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests/tc-testing: Create test cases for adding qdiscs to invalid qdisc parents
Victor Nogueira [Sat, 12 Jul 2025 14:50:35 +0000 (11:50 -0300)] 
selftests/tc-testing: Create test cases for adding qdiscs to invalid qdisc parents

As described in a previous commit [1], Lion's patch [2] revealed an ancient
bug in the qdisc API. Whenever a user tries to add a qdisc to an
invalid parent (not a class, root, or ingress qdisc), the qdisc API will
detect this after qdisc_create is called. Some qdiscs (like fq_codel, pie,
and sfq) call functions (on their init callback) which assume the parent is
valid, so qdisc_create itself may have caused a NULL pointer dereference in
such cases.

This commit creates 3 TDC tests that attempt to add fq_codel, pie and sfq
qdiscs to invalid parents

- Attempts to add an fq_codel qdisc to an hhf qdisc parent
- Attempts to add a pie qdisc to a drr qdisc parent
- Attempts to add an sfq qdisc to an inexistent hfsc classid (which would
  belong to a valid hfsc qdisc)

[1] https://lore.kernel.org/all/20250707210801.372995-1-victor@mojatatu.com/
[2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250712145035.705156-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoselftests: drv-net: add rss_api to the Makefile
Jakub Kicinski [Sat, 12 Jul 2025 01:20:05 +0000 (18:20 -0700)] 
selftests: drv-net: add rss_api to the Makefile

I missed adding rss_api.py to the Makefile. The NIPA Makefile
checking script was scanning for shell scripts only, so it
didn't flag it either.

Fixes: 4d13c6c449af ("selftests: drv-net: test RSS Netlink notifications")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250712012005.4010263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: thunderx: Fix format-truncation warning in bgx_acpi_match_id()
Alok Tiwari [Fri, 11 Jul 2025 14:05:30 +0000 (07:05 -0700)] 
net: thunderx: Fix format-truncation warning in bgx_acpi_match_id()

The buffer bgx_sel used in snprintf() was too small to safely hold
the formatted string "BGX%d" for all valid bgx_id values. This caused
a -Wformat-truncation warning with `Werror` enabled during build.

Increase the buffer size from 5 to 7 and use `sizeof(bgx_sel)` in
snprintf() to ensure safety and suppress the warning.

Build warning:
  CC      drivers/net/ethernet/cavium/thunder/thunder_bgx.o
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c: In function
‘bgx_acpi_match_id’:
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:27: error: ‘%d’
directive output may be truncated writing between 1 and 3 bytes into a
region of size 2 [-Werror=format-truncation=]
    snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
                             ^~
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:23: note:
directive argument in the range [0, 255]
    snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
                         ^~~~~~~
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:2: note:
‘snprintf’ output between 5 and 7 bytes into a destination of size 5
    snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);

compiler warning due to insufficient snprintf buffer size.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250711140532.2463602-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'net-fec-add-some-optimizations'
Jakub Kicinski [Tue, 15 Jul 2025 00:14:34 +0000 (17:14 -0700)] 
Merge branch 'net-fec-add-some-optimizations'

Wei Fang says:

====================
net: fec: add some optimizations

Add some optimizations to the fec driver, see each patch for details.

v1: https://lore.kernel.org/20250710090902.1171180-1-wei.fang@nxp.com
====================

Link: https://patch.msgid.link/20250711091639.1374411-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: add fec_set_hw_mac_addr() helper function
Wei Fang [Fri, 11 Jul 2025 09:16:39 +0000 (17:16 +0800)] 
net: fec: add fec_set_hw_mac_addr() helper function

In the current driver, the MAC address is set in both fec_restart() and
fec_set_mac_address(), so a generic helper function fec_set_hw_mac_addr()
is added to set the hardware MAC address to make the code more compact.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250711091639.1374411-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: add more macros for bits of FEC_ECR
Wei Fang [Fri, 11 Jul 2025 09:16:38 +0000 (17:16 +0800)] 
net: fec: add more macros for bits of FEC_ECR

There are also some RCR bits that are not defined but are used by the
driver, so add macro definitions for these bits to improve readability
and maintainability.

In addition, although FEC_RCR_HALFDPX has been defined, it is not used
in the driver. According to the description of FEC_RCR[1] in RM, it is
used to disable receive on transmit. Therefore, it is more appropriate
to redefine FEC_RCR[1] as FEC_RCR_DRT.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250711091639.1374411-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: fec: use phy_interface_mode_is_rgmii() to check RGMII mode
Wei Fang [Fri, 11 Jul 2025 09:16:37 +0000 (17:16 +0800)] 
net: fec: use phy_interface_mode_is_rgmii() to check RGMII mode

Use the generic helper function phy_interface_mode_is_rgmii() to check
RGMII mode.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250711091639.1374411-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agosmc: Fix various oops due to inet_sock type confusion.
Kuniyuki Iwashima [Fri, 11 Jul 2025 06:07:52 +0000 (06:07 +0000)] 
smc: Fix various oops due to inet_sock type confusion.

syzbot reported weird splats [0][1] in cipso_v4_sock_setattr() while
freeing inet_sk(sk)->inet_opt.

The address was freed multiple times even though it was read-only memory.

cipso_v4_sock_setattr() did nothing wrong, and the root cause was type
confusion.

The cited commit made it possible to create smc_sock as an INET socket.

The issue is that struct smc_sock does not have struct inet_sock as the
first member but hijacks AF_INET and AF_INET6 sk_family, which confuses
various places.

In this case, inet_sock.inet_opt was actually smc_sock.clcsk_data_ready(),
which is an address of a function in the text segment.

  $ pahole -C inet_sock vmlinux
  struct inet_sock {
  ...
          struct ip_options_rcu *    inet_opt;             /*   784     8 */

  $ pahole -C smc_sock vmlinux
  struct smc_sock {
  ...
          void                       (*clcsk_data_ready)(struct sock *); /*   784     8 */

The same issue for another field was reported before. [2][3]

At that time, an ugly hack was suggested [4], but it makes both INET
and SMC code error-prone and hard to change.

Also, yet another variant was fixed by a hacky commit 98d4435efcbf3
("net/smc: prevent NULL pointer dereference in txopt_get").

Instead of papering over the root cause by such hacks, we should not
allow non-INET socket to reuse the INET infra.

Let's add inet_sock as the first member of smc_sock.

[0]:
kvfree_call_rcu(): Double-freed call. rcu_head 000000006921da73
WARNING: CPU: 0 PID: 6718 at mm/slab_common.c:1956 kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955
Modules linked in:
CPU: 0 UID: 0 PID: 6718 Comm: syz.0.17 Tainted: G        W           6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955
lr : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955
sp : ffff8000a03a7730
x29: ffff8000a03a7730 x28: 00000000fffffff5 x27: 1fffe000184823d3
x26: dfff800000000000 x25: ffff0000c2411e9e x24: ffff0000dd88da00
x23: ffff8000891ac9a0 x22: 00000000ffffffea x21: ffff8000891ac9a0
x20: ffff8000891ac9a0 x19: ffff80008afc2480 x18: 00000000ffffffff
x17: 0000000000000000 x16: ffff80008ae642c8 x15: ffff700011ede14c
x14: 1ffff00011ede14c x13: 0000000000000004 x12: ffffffffffffffff
x11: ffff700011ede14c x10: 0000000000ff0100 x9 : 5fa3c1ffaf0ff000
x8 : 5fa3c1ffaf0ff000 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff8000a03a7078 x4 : ffff80008f766c20 x3 : ffff80008054d360
x2 : 0000000000000000 x1 : 0000000000000201 x0 : 0000000000000000
Call trace:
 kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 (P)
 cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914
 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000
 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581
 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912
 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706
 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251
 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295
 vfs_setxattr+0x158/0x2ac fs/xattr.c:321
 do_setxattr fs/xattr.c:636 [inline]
 file_setxattr+0x1b8/0x294 fs/xattr.c:646
 path_setxattrat+0x2ac/0x320 fs/xattr.c:711
 __do_sys_fsetxattr fs/xattr.c:761 [inline]
 __se_sys_fsetxattr fs/xattr.c:758 [inline]
 __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879
 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600

[1]:
Unable to handle kernel write to read-only memory at virtual address ffff8000891ac9a8
KASAN: probably user-memory-access in range [0x0000000448d64d40-0x0000000448d64d47]
Mem abort info:
  ESR = 0x000000009600004e
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x0e: level 2 permission fault
Data abort info:
  ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
  CM = 0, WnR = 1, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000207144000
[ffff8000891ac9a8] pgd=0000000000000000, p4d=100000020f950003, pud=100000020f951003, pmd=0040000201000781
Internal error: Oops: 000000009600004e [#1]  SMP
Modules linked in:
CPU: 0 UID: 0 PID: 6946 Comm: syz.0.69 Not tainted 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1971
lr : add_ptr_to_bulk_krc_lock mm/slab_common.c:1838 [inline]
lr : kvfree_call_rcu+0xfc/0x3f0 mm/slab_common.c:1963
sp : ffff8000a28a7730
x29: ffff8000a28a7730 x28: 00000000fffffff5 x27: 1fffe00018b09bb3
x26: 0000000000000001 x25: ffff80008f66e000 x24: ffff00019beaf498
x23: ffff00019beaf4c0 x22: 0000000000000000 x21: ffff8000891ac9a0
x20: ffff8000891ac9a0 x19: 0000000000000000 x18: 00000000ffffffff
x17: ffff800093363000 x16: ffff80008052c6e4 x15: ffff700014514ecc
x14: 1ffff00014514ecc x13: 0000000000000004 x12: ffffffffffffffff
x11: ffff700014514ecc x10: 0000000000000001 x9 : 0000000000000001
x8 : ffff00019beaf7b4 x7 : ffff800080a94154 x6 : 0000000000000000
x5 : ffff8000935efa60 x4 : 0000000000000008 x3 : ffff80008052c7fc
x2 : 0000000000000001 x1 : ffff8000891ac9a0 x0 : 0000000000000001
Call trace:
 kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1967 (P)
 cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914
 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000
 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581
 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912
 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706
 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251
 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295
 vfs_setxattr+0x158/0x2ac fs/xattr.c:321
 do_setxattr fs/xattr.c:636 [inline]
 file_setxattr+0x1b8/0x294 fs/xattr.c:646
 path_setxattrat+0x2ac/0x320 fs/xattr.c:711
 __do_sys_fsetxattr fs/xattr.c:761 [inline]
 __se_sys_fsetxattr fs/xattr.c:758 [inline]
 __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879
 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Code: aa1f03e2 52800023 97ee1e8d b4000195 (f90006b4)

Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Reported-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/686d9b50.050a0220.1ffab7.0020.GAE@google.com/
Tested-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com
Reported-by: syzbot+f22031fad6cbe52c70e7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/686da0f3.050a0220.1ffab7.0022.GAE@google.com/
Reported-by: syzbot+271fed3ed6f24600c364@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 # [2]
Link: https://lore.kernel.org/netdev/99f284be-bf1d-4bc4-a629-77b268522fff@huawei.com/
Link: https://lore.kernel.org/netdev/20250331081003.1503211-1-wangliang74@huawei.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20250711060808.2977529-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agodev: Pass netdevice_tracker to dev_get_by_flags_rcu().
Kuniyuki Iwashima [Fri, 11 Jul 2025 05:10:59 +0000 (05:10 +0000)] 
dev: Pass netdevice_tracker to dev_get_by_flags_rcu().

This is a follow-up for commit eb1ac9ff6c4a5 ("ipv6: anycast: Don't
hold RTNL for IPV6_JOIN_ANYCAST.").

We should not add a new device lookup API without netdevice_tracker.

Let's pass netdevice_tracker to dev_get_by_flags_rcu() and rename it
with netdev_ prefix to match other newer APIs.

Note that we always use GFP_ATOMIC for netdev_hold() as it's expected
to be called under RCU.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250708184053.102109f6@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250711051120.2866855-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agonet: phy: micrel: Add ksz9131_resume()
Biju Das [Fri, 11 Jul 2025 05:40:21 +0000 (06:40 +0100)] 
net: phy: micrel: Add ksz9131_resume()

The Renesas RZ/G3E SMARC EVK uses KSZ9131RNXC phy. On deep power state,
PHY loses the power and on wakeup the rgmii delays are not reconfigured
causing it to fail.

Replace the callback kszphy_resume()->ksz9131_resume() for reconfiguring
the rgmii_delay when it exits from PM suspend state.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250711054029.48536-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoio_uring/net: cast min_not_zero() type
Jens Axboe [Mon, 14 Jul 2025 22:36:08 +0000 (16:36 -0600)] 
io_uring/net: cast min_not_zero() type

kernel test robot reports that xtensa complains about different
signedness for a min_not_zero() comparison. Cast the int part to size_t
to avoid this issue.

Fixes: e227c8cdb47b ("io_uring/net: use passed in 'len' in io_recv_buf_select()")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507150504.zO5FsCPm-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
8 weeks agoKVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created
Kai Huang [Sun, 13 Jul 2025 22:20:19 +0000 (10:20 +1200)] 
KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created

Reject the KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created and
update the documentation to reflect it.

The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC
frequency that all subsequently created vCPUs can use.  It is only
intended to be called before any vCPU is created.  Allowing it to be
called after that only results in confusion but nothing good.

Note this is an ABI change.  But currently in Qemu (the de facto
userspace VMM) only TDX uses this VM ioctl, and it is only called once
before creating any vCPU, therefore the risk of breaking userspace is
pretty low.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/r/135a35223ce8d01cea06b6cef30bfe494ec85827.1752444335.git.kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
8 weeks agoscsi: ufs: ufs-qcom: Enable QUnipro Internal Clock Gating
Nitin Rawat [Mon, 14 Jul 2025 07:53:36 +0000 (13:23 +0530)] 
scsi: ufs: ufs-qcom: Enable QUnipro Internal Clock Gating

Enable internal clock gating for Qualcomm UFS host controller by setting
the following attributes to 1 during host controller initialization:

 - DL_VS_CLK_CFG
 - PA_VS_CLK_CFG_REG
 - DME_VS_CORE_CLK_CTRL.DME_HW_CGC_EN

This change is necessary to support the internal clock gating mechanism
in Qualcomm UFS host controller. This is power saving feature and hence
driver can continue to function correctly despite any error in enabling
these feature.

Signed-off-by: Nitin Rawat <quic_nitirawa@quicinc.com>
Link: https://lore.kernel.org/r/20250714075336.2133-4-quic_nitirawa@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: ufs: core: Add ufshcd_dme_rmw() to modify DME attributes
Nitin Rawat [Mon, 14 Jul 2025 07:53:35 +0000 (13:23 +0530)] 
scsi: ufs: core: Add ufshcd_dme_rmw() to modify DME attributes

Introduce ufshcd_dme_rmw() API to read, modify, and write DME attributes
in UFS host controllers using a mask and value.

Signed-off-by: Nitin Rawat <quic_nitirawa@quicinc.com>
Link: https://lore.kernel.org/r/20250714075336.2133-3-quic_nitirawa@quicinc.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agoscsi: ufs: ufs-qcom: Update esi_vec_mask for HW major version >= 6
Bao D. Nguyen [Mon, 14 Jul 2025 07:53:34 +0000 (13:23 +0530)] 
scsi: ufs: ufs-qcom: Update esi_vec_mask for HW major version >= 6

The MCQ feature and ESI are supported by all Qualcomm UFS controller
versions 6 and above.

Therefore, update the ESI vector mask in the UFS_MEM_CFG3 register for
platforms with major version number of 6 or higher.

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Nitin Rawat <quic_nitirawa@quicinc.com>
Link: https://lore.kernel.org/r/20250714075336.2133-2-quic_nitirawa@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
8 weeks agopNFS: Fix disk addr range check in block/scsi layout
Sergey Bashirov [Wed, 2 Jul 2025 13:32:21 +0000 (16:32 +0300)] 
pNFS: Fix disk addr range check in block/scsi layout

At the end of the isect translation, disc_addr represents the physical
disk offset. Thus, end calculated from disk_addr is also a physical disk
offset. Therefore, range checking should be done using map->disk_offset,
not map->start.

Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250702133226.212537-1-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
8 weeks agopNFS: Fix stripe mapping in block/scsi layout
Sergey Bashirov [Tue, 1 Jul 2025 12:21:48 +0000 (15:21 +0300)] 
pNFS: Fix stripe mapping in block/scsi layout

Because of integer division, we need to carefully calculate the
disk offset. Consider the example below for a stripe of 6 volumes,
a chunk size of 4096, and an offset of 70000.

chunk = div_u64(offset, dev->chunk_size) = 70000 / 4096 = 17
offset = chunk * dev->chunk_size = 17 * 4096 = 69632
disk_offset_wrong = div_u64(offset, dev->nr_children) = 69632 / 6 = 11605
disk_chunk = div_u64(chunk, dev->nr_children) = 17 / 6 = 2
disk_offset = disk_chunk * dev->chunk_size = 2 * 4096 = 8192

Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250701122341.199112-1-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
8 weeks agopNFS: Handle RPC size limit for layoutcommits
Sergey Bashirov [Mon, 30 Jun 2025 18:35:29 +0000 (21:35 +0300)] 
pNFS: Handle RPC size limit for layoutcommits

When there are too many block extents for a layoutcommit, they may not
all fit into the maximum-sized RPC. This patch allows the generic pnfs
code to properly handle -ENOSPC returned by the block/scsi layout driver
and trigger additional layoutcommits if necessary.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-5-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
8 weeks agopNFS: Add prepare commit trace to block/scsi layout
Sergey Bashirov [Mon, 30 Jun 2025 18:35:28 +0000 (21:35 +0300)] 
pNFS: Add prepare commit trace to block/scsi layout

Replace dprintk with trace event in ext_tree_prepare_commit() function.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-4-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
8 weeks agopNFS: Fix extent encoding in block/scsi layout
Sergey Bashirov [Mon, 30 Jun 2025 18:35:27 +0000 (21:35 +0300)] 
pNFS: Fix extent encoding in block/scsi layout

The ext_tree_encode_commit() function may be called multiple times for
the same file, layout, and last written byte if the provided buffer is
not large enough to encode all extents in it.

The first problem is that the last written byte field must be zeroed
only on a successful call, otherwise we will lose its actual value and
get an integer overflow on the next encoding attempt.

The second problem is that we can't count and encode in one pass. The
extent state changes during encoding, so if we return -ENOSPC but have
already encoded some extents into a small buffer, they will not be
re-encoded into a new larger buffer on the next try. As a result, the
client never commits these extents to the server.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-3-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
8 weeks agopNFS: Fix uninited ptr deref in block/scsi layout
Sergey Bashirov [Mon, 30 Jun 2025 18:35:26 +0000 (21:35 +0300)] 
pNFS: Fix uninited ptr deref in block/scsi layout

The error occurs on the third attempt to encode extents. When function
ext_tree_prepare_commit() reallocates a larger buffer to retry encoding
extents, the "layoutupdate_pages" page array is initialized only after the
retry loop. But ext_tree_free_commitdata() is called on every iteration
and tries to put pages in the array, thus dereferencing uninitialized
pointers.

An additional problem is that there is no limit on the maximum possible
buffer_size. When there are too many extents, the client may create a
layoutcommit that is larger than the maximum possible RPC size accepted
by the server.

During testing, we observed two typical scenarios. First, one memory page
for extents is enough when we work with small files, append data to the
end of the file, or preallocate extents before writing. But when we fill
a new large file without preallocating, the number of extents can be huge,
and counting the number of written extents in ext_tree_encode_commit()
does not help much. Since this number increases even more between
unlocking and locking of ext_tree, the reallocated buffer may not be
large enough again and again.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-2-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
8 weeks agoNFS: Remove unused function nfs_umount
Dr. David Alan Gilbert [Tue, 18 Feb 2025 21:52:50 +0000 (21:52 +0000)] 
NFS: Remove unused function nfs_umount

nfs_umount() has been unused since 2013's
commit 4580a92d44e2 ("NFS: Use server-recommended security flavor by
default (NFSv3)")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20250218215250.263709-1-linux@treblig.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>