git.ipfire.org Git - thirdparty/kernel/linux.git/log

rtc: Rename lib_test to test_rtc_lib

When compiling the RTC library functions test as a module, the module
has the non-descriptive name "lib_test.ko". Fix this by renaming it to
"test_rtc_lib.ko".

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/47019d7f8ced12107b54a372fdf34b1b8f7b6183.1751355848.git.geert@linux-m68k.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode

Andrei Lalaev reported a NULL pointer deref when a CAN device is
restarted from Bus Off and the driver does not implement the struct
can_priv::do_set_mode callback.

There are 2 code path that call struct can_priv::do_set_mode:
- directly by a manual restart from the user space, via
can_changelink()
- delayed automatic restart after bus off (deactivated by default)

To prevent the NULL pointer deference, refuse a manual restart or
configure the automatic restart delay in can_changelink() and report
the error via extack to user space.

As an additional safety measure let can_restart() return an error if
can_priv::do_set_mode is not set instead of dereferencing it
unchecked.

Reported-by: Andrei Lalaev <andrey.lalaev@gmail.com>
Closes: https://lore.kernel.org/all/20250714175520.307467-1-andrey.lalaev@gmail.com
Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface")
Link: https://patch.msgid.link/20250718-fix-nullptr-deref-do_set_mode-v1-1-0b520097bb96@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'md-6.17-20250722' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into for-6.17/block

Pull MD updates from Yu:

"- call del_gendisk synchronously, from Xiao
- cleanup unused variable, from John
- cleanup workqueue flags, from Ryo
- fix faulty rdev can't be removed during resync, from Qixing"

* tag 'md-6.17-20250722' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
  md/raid10: fix set but not used variable in sync_request_write()
  md: allow removing faulty rdev during resync
  md/raid5: unset WQ_CPU_INTENSIVE for raid5 unbound workqueue
  md: remove/add redundancy group only in level change
  md: Don't clear MD_CLOSING until mddev is freed
  md: call del_gendisk in control path

Merge tag 'nvme-6.17-2025-07-22' of git://git.infradead.org/nvme into for-6.17/block

Pull NVMe updates from Christoph:

"- try PCIe function level reset on init failure (Keith Busch)
- log TLS handshake failures at error level (Maurizio Lombardi)
- pci-epf: do not complete commands twice if nvmet_req_init() fails
   (Rick Wertenbroek)
- misc cleanups (Alok Tiwari)"

* tag 'nvme-6.17-2025-07-22' of git://git.infradead.org/nvme:
  nvme-pci: try function level reset on init failure
  nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
  nvme-tcp: log TLS handshake failures at error level
  docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
  nvme: fix typo in status code constant for self-test in progress
  nvmet: remove redundant assignment of error code in nvmet_ns_enable()
  nvme: fix incorrect variable in io cqes error message
  nvme: fix multiple spelling and grammar issues in host drivers

ip6_gre: Factor out common ip6gre tunnel match into helper

Extract common ip6gre tunnel match from ip6gre_tunnel_lookup() into new
helper function ip6gre_tunnel_match() to reduce code duplication.

No functional change intended.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250719081551.963670-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag

AMU counters on certain CPPC-based platforms tend to yield inaccurate
delivered performance measurements on systems that are idle/mostly idle.
This results in an inaccurate frequency being stored by cpufreq in its
policy structure when the CPU is brought online. [1]

Consequently, if the userspace governor tries to set the frequency to a
new value, there is a possibility that it would be the erroneous value
stored earlier. In such a scenario, cpufreq would assume that the
requested frequency has already been set and return early, resulting in
the correct/new frequency request never making it to the hardware.

Since the operating frequency is liable to this sort of inconsistency,
mark the CPPC driver with CPUFREQ_NEED_UPDATE_LIMITS so that it is always
invoked when a target frequency update is requested.

Link: https://lore.kernel.org/linux-pm/20250619000925.415528-3-pmalani@google.com/
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Prashant Malani <pmalani@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20250722055611.130574-2-pmalani@google.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

arm64: Kconfig: Keep selects somewhat alphabetically ordered

Recent patches selecting HAVE_RELIABLE_STACKTRACE and HAVE_LIVEPATCH
added them to the end of the ARM64 Kconfig select list. Move them around
to keep this list nearly alphabetically ordered.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ACPI/PCI: Remove space before newline

There is an extraneous space before a newline in an acpi_handle_debug()
message. Remove it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250721145952.2601422-1-colin.i.king@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class

might_sleep could be trigger in the atomic context in qfq_delete_class.

qfq_destroy_class was moved into atomic context locked
by sch_tree_lock to avoid a race condition bug on
qfq_aggregate. However, might_sleep could be triggered by
qfq_destroy_class, which introduced sleeping in atomic context (path:
qfq_destroy_class->qdisc_put->__qdisc_destroy->lockdep_unregister_key
->might_sleep).

Considering the race is on the qfq_aggregate objects, keeping
qfq_rm_from_agg in the lock but moving the left part out can solve
this issue.

Fixes: 5e28d5a3f774 ("net/sched: sch_qfq: Fix race condition on qfq_aggregate")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/4a04e0cc-a64b-44e7-9213-2880ed641d77@sabinyo.mountain
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250717230128.159766-1-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

arm64: signal: Remove ISB when resetting POR_EL0

POR_EL0 is set to its most permissive value before setting up the
signal frame, to ensure that uaccess succeeds regardless of the
signal stack's pkey.

We are now tolerant to spurious POE faults. This means that we do
not strictly need to issue an ISB after updating POR_EL0, even when
followed by uaccess. The question is whether a fault is likely to
happen or not if the ISB is omitted; in this case the answer seems
to be no. If the regular stack is used, then it should already be
accessible. If the alternate signal stack is used, then a special
(inaccessible) pkey may be used - the assumption is that this
situation is very uncommon.

Remove the ISB to speed up the regular path - this should not have
any functional impact regardless of the scenario.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20250619160042.2499290-3-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ALSA: usb-audio: qcom: Adjust mutex unlock order

The mutexes qdev_mutex and chip->mutex are acquired in that order
throughout the driver. To preserve proper lock hierarchy and avoid
potential deadlocks, they must be released in the reverse
order of acquisition.

This change reorders the unlock sequence to first release chip->mutex
followed by qdev_mutex, ensuring consistency with the locking pattern.

[ fixed the code indentations and Fixes tag by tiwai ]

Fixes: 326bbc348298a ("ALSA: usb-audio: qcom: Introduce QC USB SND offloading support")
Signed-off-by: Erick Karanja <karanja99erick@gmail.com>
Link: https://patch.msgid.link/20250721114554.1666104-1-karanja99erick@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'gve-af_xdp-zero-copy-for-dqo-rda'

Joshua Washington says:

====================
gve: AF_XDP zero-copy for DQO RDA

This patch series adds support for AF_XDP zero-copy in the DQO RDA queue
format.

XSK infrastructure is updated to re-post buffers when adding XSK pools
because XSK umem will be posted directly to the NIC, a departure from
the bounce buffer model used in GQI QPL. A registry of XSK pools is
introduced to prevent the usage of XSK pools when in copy mode.

v1: https://lore.kernel.org/netdev/20250714160451.124671-1-jeroendb@google.com/
====================

Link: https://patch.msgid.link/20250717152839.973004-1-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: implement DQO RX datapath and control path for AF_XDP zero-copy

Add the RX datapath for AF_XDP zero-copy for DQ RDA. The RX path is
quite similar to that of the normal XDP case. Parallel methods are
introduced to properly handle XSKs instead of normal driver buffers.

To properly support posting from XSKs, queues are destroyed and
recreated, as the driver was initially making use of page pool buffers
instead of the XSK pool memory.

Expose support for AF_XDP zero-copy, as the TX and RX datapaths both
exist.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-6-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: implement DQO TX datapath for AF_XDP zero-copy

In the descriptor clean path, a number of changes need to be made to
accommodate out of order completions and double completions.

The XSK stack can only handle completions being processed in order, as a
single counter is incremented in xsk_tx_completed to sigify how many XSK
descriptors have been completed. Because completions can come back out
of order in DQ, a separate queue of XSK descriptors must be maintained.
This queue keeps the pending packets in the order that they were written
so that the descriptors can be counted in xsk_tx_completed in the same
order.

For double completions, a new pending packet state and type are
introduced. The new type, GVE_TX_PENDING_PACKET_DQO_XSK, plays an
anlogous role to pre-existing _SKB and _XDP_FRAME pending packet types
for XSK descriptors. The new state, GVE_PACKET_STATE_XSK_COMPLETE,
represents packets for which no more completions are expected. This
includes packets which have received a packet completion or reinjection
completion, as well as packets whose reinjection completion timer have
timed out. At this point, such packets can be counted as part of
xsk_tx_completed() and freed.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-5-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: keep registry of zc xsk pools in netdev_priv

Relying on xsk_get_pool_from_qid for getting whether zero copy is
enabled on a queue is erroneous, as an XSK pool is registered in
xp_assign_dev whether AF_XDP zero-copy is enabled or not. This becomes
problematic when queues are restarted in copy mode, as all RX queues
with XSKs will register a pool, causing the driver to exercise the
zero-copy codepath.

This patch adds a bitmap to keep track of which queues have zero-copy
enabled.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-4-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: merge xdp and xsk registration

The existence of both of these xdp_rxq and xsk_rxq is redundant. xdp_rxq
can be used in both the zero-copy mode and the copy mode case. XSK pool
memory model registration is prioritized over normal memory model
registration to ensure that memory model registration happens only once
per queue.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-3-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: deduplicate xdp info and xsk pool registration logic

The XDP registration path currently has a lot of reused logic, leading
changes to the codepaths to be unnecessarily complex. gve_reg_xsk_pool
extracts the logic of registering an XSK pool with a queue into a method
that can be used by both XDP_SETUP_XSK_POOL and gve_reg_xdp_info.
gve_unreg_xdp_info is used to undo XDP info registration in the error
path instead of explicitly unregistering the XDP info, as it is more
complete and idempotent.

This patch will be followed by other changes to the XDP registration
logic, and will simplify those changes due to the use of common methods.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-2-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"

There is a spelling mistake in a ksft_exit_fail_msg() message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250715130627.1907017-1-colin.i.king@gmail.com

selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t

The kernel does not provide sys_futex() on 32-bit architectures that do not
support 32-bit time representations, such as riscv32.

As a result, glibc cannot define SYS_futex, causing compilation failures in
tests that rely on this syscall. Define SYS_futex as SYS_futex_time64 in
such cases to ensure successful compilation and compatibility.

Signed-off-by: Cynthia Huang <cynthia@andestech.com>
Signed-off-by: Ben Zong-You Xie <ben717@andestech.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/all/20250710103630.3156130-1-ben717@andestech.com

sched/idle: Remove play_idle()

play_idle() is no longer in use, so delete it.

Signed-off-by: Feng Lee <379943137@qq.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/tencent_C3E0BD9B812C27A30FC49F1EA6A4B1352707@qq.com

rust: io: fix broken intra-doc links to `platform::Device`

`platform` is not accessible from here.

Thus fix the intra-doc links by qualifying the paths a bit more.

Fixes: 1d0d4b28513b ("rust: io: mem: add a generic iomem abstraction")
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20250722085500.1360401-2-ojeda@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

rust: io: fix broken intra-doc link to missing `flags` module

There is no `mod flags` in this case, unlike others. Instead, they are
associated constants for the `Flags` type.

Thus reword the sentence to fix the broken intra-doc link, providing
an example of constant and linking to it to clarify which ones we are
referring to.

Fixes: 493fc33ec252 ("rust: io: add resource abstraction")
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20250722085500.1360401-1-ojeda@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

arch/powerpc: Remove .interp section in vmlinux

When building with CONFIG_RELOCATABLE, there is a .interp section
which contains the name of the expected ELF interpreter:

Contents of section .interp:
c0000000021c1bac 2f757372 2f6c6962 2f6c642e 736f2e31 /usr/lib/ld.so.1
c0000000021c1bbc 00 .

That information is useless and even likely wrong. Remove it.

Link: https://github.com/linuxppc/issues/issues/434
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/eeaf8fd6628a75d19872ab31cf7e7179e2baef5e.1751366959.git.christophe.leroy@csgroup.eu

powerpc: Drop GPL boilerplate text with obsolete FSF address

The FSF does not reside in the Franklin street anymore, so we should not
request the people to write to this address. Fortunately, these header
files already contain a proper SPDX license identifier, so it should be
fine to simply drop all of this license boilerplate code here.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250711072553.198777-1-thuth@redhat.com

powerpc: Don't use %pK through printk

In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.

Switch to the regular pointer formatting which is safer and
easier to reason about.

Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250718-restricted-pointers-powerpc-v2-1-fd7bddd809f3@linutronix.de

wifi: mac80211: don't require cipher and keylen in gtk rekey

ieee80211_add_gtk_rekey receives a keyconf as an argument, and the
cipher and keylen are taken from there to the new allocated key.
But in rekey, both the cipher and the keylen should be the same as of
the old key, so let ieee80211_add_gtk_rekey find those, so drivers won't
have to fill it in.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250721214922.3c5c023bfae9.Ie6594ae2b4b6d5b3d536e642b349046ebfce7a5d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: Set num_sub_specs before looping through sub_specs

The processing of the struct cfg80211_sar_specs::sub_specs flexible
array requires its counter, num_sub_specs, to be assigned before the
loop in nl80211_set_sar_specs(). Leave the final assignment after the
loop in place in case fewer ended up in the array.

Fixes: aa4ec06c455d ("wifi: cfg80211: use __counted_by where appropriate")
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20250721183125.work.183-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Write cnt before copying in ieee80211_copy_rnr_beacon()

While I caught the need for setting cnt early in nl80211_parse_rnr_elems()
in the original annotation of struct cfg80211_rnr_elems with __counted_by,
I missed a similar pattern in ieee80211_copy_rnr_beacon(). Fix this by
moving the cnt assignment to before the loop.

Fixes: 7b6d7087031b ("wifi: cfg80211: Annotate struct cfg80211_rnr_elems with __counted_by")
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20250721182521.work.540-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

kselftest/arm64: Handle attempts to disable SM on SME only systems

The ABI for disabling streaming mode via ptrace is to do a write via the
SVE register set. Following the recent round of fixes to the ptrace code
we don't support this operation on systems without SVE, which is detected
as failures by fp-ptrace. Update the program so that it knows that this
operation is not currently supported.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250718-arm64-fp-ptrace-sme-only-v1-3-3b96dd19a503@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

kselftest/arm64: Fix SVE write data generation for SME only systems

fp-ptrace does not handle SME only systems correctly when generating data,
on SME only systems scenarios where we are not in streaming mode will not
have an expected vector length. This leads to attempts to do memcpy()s of
zero byte arrays which can crash, fix this by skipping generation of SVE
data for cases where we do not expect to have an active vector length.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250718-arm64-fp-ptrace-sme-only-v1-2-3b96dd19a503@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

kselftest/arm64: Test SME on SME only systems in fp-ptrace

When checking that the vector extensions are supported fp-ptrace
currently only checks for SVE being supported which means that we get
into a confused half configured state for SME only systems. Check for
SME as well.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250718-arm64-fp-ptrace-sme-only-v1-1-3b96dd19a503@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace

The NT_ARM_SVE register set supports two data formats, the native SVE one
and an alternative format where we embed a copy of user_fpsimd_data as used
for NT_PRFPREG in the SVE register set. The register data is set as for a
write to NT_PRFPREG and changes in vector length and streaming mode are
handled as for any NT_ARM_SVE write. This has not previously been tested by
fp-ptrace, add coverage of it.

We do not support writes in FPSIMD format for NT_ARM_SSVE so we skip the
test for anything that would leave us in streaming mode.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250718-arm64-fp-ptrace-sve-fpsimd-v1-1-7ecda32aa297@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

kselftest/arm64: Allow sve-ptrace to run on SME only systems

Currently the sve-ptrace test program only runs if the system supports
SVE but since SME includes streaming SVE the tests it offers are valid
even on a system that only supports SME. Since the tests already have
individual hwcap checks just remove the top level test and rely on those.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250718-arm64-sve-ptrace-sme-only-v1-1-2a1121e51b1d@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Merge tag 'ath-next-20250721' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath into wireless-next

Jeff Johnson says:
==================
ath.git patches for v6.17

Highlights for some specific drivers include:

ath9k:
Add AHB "of" support

ath11k:
Support device-specific firmware override
Fix potentially reordered access to device memory

ath12k:
Add more Wi-Fi 7 functionality
Add more statistics to DebugFS
Support different memory profiles
Support 802.11 encap/decap offload to firmware
Fix potentially reordered access to device memory

And of course there is the usual set of cleanups and bug fixes across
the entire family of "ath" drivers.
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

s390: Handle KCOV __init vs inline mismatches

When KCOV is enabled all functions get instrumented, unless
the __no_sanitize_coverage attribute is used. To prepare for
__no_sanitize_coverage being applied to __init functions, we have to
handle differences in how GCC's inline optimizations get resolved. For
s390 this exposed a place where the __init annotation was missing but
ended up being "accidentally correct". Fix this cases and force a couple
functions to be inline with __always_inline.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250717232519.2984886-7-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

arm: Handle KCOV __init vs inline mismatches

When KCOV is enabled all functions get instrumented, unless
the __no_sanitize_coverage attribute is used. To prepare for
__no_sanitize_coverage being applied to __init functions, we have to
handle differences in how GCC's inline optimizations get resolved. For
arm this exposed several places where __init annotations were missing
but ended up being "accidentally correct". Fix these cases and force
several functions to be inline with __always_inline.

Acked-by: Nishanth Menon <nm@ti.com>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20250717232519.2984886-5-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

mips: Handle KCOV __init vs inline mismatch

When KCOV is enabled all functions get instrumented, unless
the __no_sanitize_coverage attribute is used. To prepare for
__no_sanitize_coverage being applied to __init functions, we
have to handle differences in how GCC's inline optimizations get
resolved. For mips this requires adding the __init annotation on
init_mips_clocksource().

Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/r/20250717232519.2984886-9-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section

Move a few kfence and debug_pagealloc related functions in hash_utils.c
and radix_pgtable.c to __init sections since these are only invoked once
by an __init function during system initialization.

i.e.
- hash_debug_pagealloc_alloc_slots()
- hash_kfence_alloc_pool()
- hash_kfence_map_pool()
The above 3 functions only gets called by __init htab_initialize().

- alloc_kfence_pool()
- map_kfence_pool()
The above 2 functions only gets called by __init radix_init_pgtable()

This should also help fix warning msgs like:

>> WARNING: modpost: vmlinux: section mismatch in reference:
hash_debug_pagealloc_alloc_slots+0xb0 (section: .text) ->
memblock_alloc_try_nid (section: .init.text)

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504190552.mnFGs5sj-lkp@intel.com/
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20250717232519.2984886-8-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON

To reduce stale data lifetimes, enable CONFIG_INIT_ON_FREE_DEFAULT_ON as
well. This matches the addition of CONFIG_STACKLEAK=y, which is doing
similar for stack memory.

Link: https://lore.kernel.org/r/20250717232519.2984886-13-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

configs/hardening: Enable CONFIG_KSTACK_ERASE

Since we can wipe the stack with both Clang and GCC plugins, enable this
for the "hardening.config" for wider testing.

Link: https://lore.kernel.org/r/20250717232519.2984886-12-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS

In preparation for Clang stack depth tracking for KSTACK_ERASE,
split the stackleak-specific cflags out of GCC_PLUGINS_CFLAGS into
KSTACK_ERASE_CFLAGS.

Link: https://lore.kernel.org/r/20250717232519.2984886-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth

The Clang stack depth tracking implementation has a fixed name for
the stack depth tracking callback, "__sanitizer_cov_stack_depth", so
rename the GCC plugin function to match since the plugin has no external
dependencies on naming.

Link: https://lore.kernel.org/r/20250717232519.2984886-2-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

stackleak: Rename STACKLEAK to KSTACK_ERASE

In preparation for adding Clang sanitizer coverage stack depth tracking
that can support stack depth callbacks:

- Add the new top-level CONFIG_KSTACK_ERASE option which will be
  implemented either with the stackleak GCC plugin, or with the Clang
  stack depth callback support.
- Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE,
  but keep it for anything specific to the GCC plugin itself.
- Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named
  for what it does rather than what it protects against), but leave as
  many of the internals alone as possible to avoid even more churn.

While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS,
since that's the only place it is referenced from.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

scsi: libiscsi: Initialize iscsi_conn->dd_data only if memory is allocated

In case of an ib_fast_reg_mr allocation failure during iSER setup, the
machine hits a panic because iscsi_conn->dd_data is initialized
unconditionally, even when no memory is allocated (dd_size == 0).  This
leads invalid pointer dereference during connection teardown.

Fix by setting iscsi_conn->dd_data only if memory is actually allocated.

Panic trace:
------------
iser: iser_create_fastreg_desc: Failed to allocate ib_fast_reg_mr err=-12
iser: iser_alloc_rx_descriptors: failed allocating rx descriptors / data buffers
BUG: unable to handle page fault for address: fffffffffffffff8
RIP: 0010:swake_up_locked.part.5+0xa/0x40
Call Trace:
  complete+0x31/0x40
  iscsi_iser_conn_stop+0x88/0xb0 [ib_iser]
  iscsi_stop_conn+0x66/0xc0 [scsi_transport_iscsi]
  iscsi_if_stop_conn+0x14a/0x150 [scsi_transport_iscsi]
  iscsi_if_rx+0x1135/0x1834 [scsi_transport_iscsi]
  ? netlink_lookup+0x12f/0x1b0
  ? netlink_deliver_tap+0x2c/0x200
  netlink_unicast+0x1ab/0x280
  netlink_sendmsg+0x257/0x4f0
  ? _copy_from_user+0x29/0x60
  sock_sendmsg+0x5f/0x70

Signed-off-by: Showrya M N <showrya@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://lore.kernel.org/r/20250627112329.19763-1-showrya@chelsio.com
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

wifi: iwlwifi: mvm/fw: Avoid -Wflex-array-member-not-at-end warnings

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

We have a flexible struct iwl_tx_cmd_v6 in the middle of a few structs,
but those don't even need the flexible part.

So, we add iwl_tx_cmd_v6_params, that will contain everything except the
flexible array and use this one for the containing structs.

Also, as part of the refactoring remove unused flex array `payload`.

So, with these changes, fix the following warnings:

drivers/net/wireless/intel/iwlwifi/mld/../fw/api/tdls.h:134:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/iwlwifi/mld/../fw/api/tdls.h:53:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/iwlwifi/mld/../fw/api/tx.h:745:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/iwlwifi/mld/../fw/api/tx.h:764:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/iwlwifi/mvm/../fw/api/tdls.h:134:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/iwlwifi/mvm/../fw/api/tdls.h:53:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/iwlwifi/mvm/../fw/api/tx.h:745:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/iwlwifi/mvm/../fw/api/tx.h:764:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://msgid.link/aCUOQ6wdD1jQjO36@kspp
[use iwl_tx_cmd_v6_params as described in the changed commit message]
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709224608.0785a61b0826.I6da02c2a12a5ed1e6d317045a6995d132850a455@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: Fix typo "ransport"

There is a spelling mistake of 'ransport' in comments which
should be 'transport'.

Link: https://lore.kernel.org/all/03DFEDFFB5729C96+20250714104736.559226-1-wangyuli@uniontech.com/
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Link: https://patch.msgid.link/8F065DF7EF7EEB89+20250715055828.932160-1-wangyuli@uniontech.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: fix cmd length when sending WOWLAN_TSC_RSC_PARAM

In iwl_mvm_wowlan_config_rsc_tsc() when calling iwl_mvm_send_cmd_pdu()
we are accidentally passing the size of a pointer rather than the size
of the object pointed by it.

Fix the expression in order to pass the approriate object length.

Fixes: 493681d9f95b ("wifi: iwlwifi: remove support of version 4 of iwl_wowlan_rsc_tsc_params_cmd")
Address-Coverity-ID: 1647627 ("Incorrect expression (SIZEOF_MISMATCH)")
Signed-off-by: Antonio Quartulli <antonio@mandelbit.com>
Link: https://patch.msgid.link/20250716201911.700-1-antonio@mandelbit.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

scsi: scsi_transport_fc: Add comments to describe added 'rport' parameter

Note that there is no executable code altered by this patch.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507181446.aAoFiDm5-lkp@intel.com/
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20250721164652.335716-1-emilne@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

lib/crypto: tests: Annotate worker to be on stack

The following warning traceback is seen if object debugging is enabled
with the new crypto test code.

ODEBUG: object 9000000106237c50 is on stack 9000000106234000, but NOT annotated.
------------[ cut here ]------------
WARNING: lib/debugobjects.c:655 at lookup_object_or_alloc.part.0+0x19c/0x1f4, CPU#0: kunit_try_catch/468
...

This also results in a boot stall when running the code in qemu:loongarch.

Initializing the worker with INIT_WORK_ONSTACK() fixes the problem.

Fixes: 950a81224e8b ("lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250721231917.3182029-1-linux@roeck-us.net
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

Merge branch 'ethtool-rss-support-creating-and-removing-contexts-via-netlink'

Jakub Kicinski says:

====================
ethtool: rss: support creating and removing contexts via Netlink

This series completes support of RSS configuration via Netlink.
All functionality supported by the IOCTL is now supported by
Netlink. Future series (time allowing) will add:
- hashing on the flow label, which started this whole thing;
- pinning the RSS context to a Netlink socket for auto-cleanup.

The first patch is a leftover held back from previous series
to avoid conflicting with Gal's fix.

Next 4 patches refactor existing code to make reusing it for
context creation possible. 2 patches after that add create
and delete commands. Last but not least the test is extended.
====================

Link: https://patch.msgid.link/20250717234343.2328602-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_api: context create and delete tests

Add test cases for creating and deleting contexts.

  TAP version 13
  1..12
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_nl_set_key
  ok 7 rss_api.test_rxfh_fields
  ok 8 rss_api.test_rxfh_fields_set
  ok 9 rss_api.test_rxfh_fields_set_xfrm # SKIP no input-xfrm supported
  ok 10 rss_api.test_rxfh_fields_ntf
  ok 11 rss_api.test_rss_ctx_add
  ok 12 rss_api.test_rss_ctx_ntf
  # Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0

Link: https://patch.msgid.link/20250717234343.2328602-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: support removing contexts via Netlink

Implement removing additional RSS contexts via Netlink.
Technically it'd be possible to shoehorn the delete operation
into ethnl_request_ops-compatible handler. The code ends
up longer than open coded version, and I think we'll need
a custom way of sending notifications at some stage (if we
allow tying the context lifetime to the netlink socket, in
the future).

Link: https://patch.msgid.link/20250717234343.2328602-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: support creating contexts via Netlink

Support creating contexts via Netlink. Setting flow hashing
fields on the new context is not supported at this stage,
it can be added later.

An empty indirection table is not supported. This is a carry
over from the IOCTL interface where empty indirection table
meant delete. We can repurpose empty indirection table in
Netlink but for now to avoid confusion reject it using the
policy.

Support letting user choose the ID for the new context. This was
not possible in IOCTL since the context ID field for the create
action had to be set to the ETH_RXFH_CONTEXT_ALLOC magic value.

Link: https://patch.msgid.link/20250717234343.2328602-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: move ethtool_rxfh_ctx_alloc() to common code

Move ethtool_rxfh_ctx_alloc() to common code, Netlink will need it.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: factor out populating response from context

Similarly to previous change, factor out populating the response.
We will use this after the context was allocated to send a notification
so this time factor out from the additional context handling, rather
than context 0 handling (for request context didn't exist, for response
it does).

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: factor out allocating memory for response

To ease the code reuse for RSS_CREATE we'll want to prepare
struct rss_reply_data for the new context. Unfortunately
we can't depend on the exiting scaffolding because the context
doesn't exist (ctx=NULL) when we start preparing. Factor out
the portion of the context 0 handling responsible for allocation
of request memory, so that we can call it directly.

Link: https://patch.msgid.link/20250717234343.2328602-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rejig the RSS notification machinery for more types

In anticipation for CREATE and DELETE notifications - explicitly
pass the notification type to ethtool_rss_notify(), when calling
from the IOCTL code.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: assert that drivers with sym hash are consistent for RSS contexts

Supporting per-RSS context configuration of hashing fields but
not the hashing algorithm would complicate the code a lot.
We'd need to cross check the config against all RSS contexts.
None of the drivers need this today, so explicitly prevent
new drivers with such skewed capabilities from registering.
If such driver appears it will need to first adjust the checks
in the core.

Link: https://patch.msgid.link/20250717234343.2328602-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-add-tcp_maxseg-sockopt-support'

Matthieu Baerts says:

====================
mptcp: add TCP_MAXSEG sockopt support

The TCP_MAXSEG socket option was not supported by MPTCP, mainly because
it has never been requested before. But there are still valid use-cases,
e.g. with HAProxy.

- Patch 1 is a small cleanup patch in the MPTCP sockopt file.

- Patch 2 expose some code from TCP, to avoid duplicating it in MPTCP.

- Patch 3 adds TCP_MAXSEG sockopt support in MPTCP.

- Patch 4 is not related to the others, it fixes a typo in a comment.

Note that the new TCP_MAXSEG sockopt support has been validated by a new
packetdrill script on the MPTCP CI:

https://github.com/multipath-tcp/packetdrill/pull/161

v1: https://lore.kernel.org/20250716-net-next-mptcp-tcp_maxseg-v1-0-548d3a5666f6@kernel.org
====================

Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-0-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: fix typo in a comment

This patch fixes the follow spelling mistake in a comment:

greter -> greater

Signed-off-by: moyuanhao <moyuanhao3676@163.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-4-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add TCP_MAXSEG sockopt support

The TCP_MAXSEG socket option is currently not supported by MPTCP, mainly
because it has never been requested before. But there are still valid
use-cases, e.g. with HAProxy.

This patch adds its support in MPTCP by propagating the value to all
subflows. The get part looks at the value on the first subflow, to be as
closed as possible to TCP. Only one value can be returned for the cached
MSS, so this can come only from one subflow.

Similar to mptcp_setsockopt_first_sf_only(), a generic helper
mptcp_setsockopt_all_subflows() is added to set sockopt for each
subflows of the mptcp socket.

Add a new member for struct mptcp_sock to store the TCP_MAXSEG value,
and return this value in getsockopt.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/515
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-3-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: add tcp_sock_set_maxseg

Add a helper tcp_sock_set_maxseg() to directly set the TCP_MAXSEG
sockopt from kernel space.

This new helper will be used in the following patch from MPTCP.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-2-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: sockopt: drop redundant tcp_getsockopt

tcp_getsockopt() is called twice in mptcp_getsockopt_first_sf_only() in
different conditions, which makes the code a bit redundant.

The first call to tcp_getsockopt() when the first subflow exists can be
replaced by going to a new label "get" before the second call.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-1-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Make hw_trap sysfs attribute idempotent

Update qeth driver to allow writing an existing value to the "hw_trap"
sysfs attribute. Attempting such a write earlier resulted in -EINVAL.
In other words, make the sysfs attribute idempotent.

After:
    $ cat hw_trap
    disarm
    $ echo disarm > hw_trap
    $

Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250718141711.1141049-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: qcom: qca807x: Enable WoL support using shared library

The Wake-on-LAN (WoL) functionality for the QCA807x series is identical
to that of the AT8031. WoL support for QCA807x is enabled by utilizing
the at8031_set_wol() function provided in the shared library.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250718-qca807x_wol_support-v1-1-cfe323cbb4e8@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: smsc95xx: add support for ethtool pause parameters

Implement ethtool .get_pauseparam and .set_pauseparam handlers for
configuring flow control on smsc95xx. The driver now supports enabling
or disabling transmit and receive pause frames, with or without
autonegotiation. Pause settings are applied during link-up based on
current PHY state and user configuration.

Previously, the driver used phy_get_pause() during link-up handling,
but lacked initialization and an ethtool interface to configure pause
modes. As a result, flow control support was effectively non-functional.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250718075157.297923-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'clk-microchip-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into clk-microchip

Microchip clock updates for v6.17

- Fix the PLL output ranges for Microchip SAM9X7, based on the
latest hardware documentation updates

* tag 'clk-microchip-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
clk: at91: sam9x7: update pll clk ranges

Merge tag 'thead-clk-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux into clk-thead

Pull T-HEAD TH1520 clk driver updates from Drew Fustini:

- Fix the parent data for osc_12m by referencing osc_24m by index.
- Mark essential bus clocks as CLK_IGNORE_UNUSED to fix boot hang
   associated with the PVT sensor.

* tag 'thead-clk-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux:
  clk: thead: th1520-ap: Correctly refer the parent of osc_12m
  clk: thead: Mark essential bus clocks as CLK_IGNORE_UNUSED

bpf: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...))

Intel linux test robot reported a warning that ERR_CAST can be used
for error pointer casting instead of more-complicated/rarely-used
ERR_PTR(PTR_ERR(...)) style.

There is no functionality change, but still let us replace two such
instances as it improves consistency and readability.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507201048.bceHy8zX-lkp@intel.com/
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250720164754.3999140-1-yonghong.song@linux.dev

Merge tag 'v6.17-rockchip-clk1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-rockchip

Pull a Rockchip clk driver update from Heiko Stuebner:

- 132MHz PLL rate for Rockchip rk3588

* tag 'v6.17-rockchip-clk1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
clk: rockchip: rk3568: Add PLL rate for 132MHz

Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-07-18 (idpf, ice, igc, igbvf, ixgbevf)

For idpf:
Ahmed and Sudheer add support for flow steering via ntuple filters.
Current support is for IPv4 and TCP/UDP only.

Milena adds support for cross timestamping.

Ahmed preserves coalesce settings across resets.

For ice:
Alex adds reporting of 40GbE speed in devlink port split.

Dawid adds support for E835 devices.

Jesse refactors profile ptype processing for cleaner, more readable,
code.

Dave adds a couple of helper functions for LAG to reduce code
duplication.

For igc:
Siang adds support to configure "Default Queue" during runtime using
ethtool's Network Flow Classification (NFC) wildcard rule approach.

For igbvf:
Yuto Ohnuki removes unused fields from igbvf_adapter.

For ixgbevf:
Yuto Ohnuki removes unused fields from ixgbevf_adapter.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbevf: remove unused fields from struct ixgbevf_adapter
  igbvf: remove unused fields from struct igbvf_adapter
  igc: Add wildcard rule support to ethtool NFC using Default Queue
  igc: Relocate RSS field definitions to igc_defines.h
  ice: breakout common LAG code into helpers
  ice: convert ice_add_prof() to bitmap
  ice: add E835 device IDs
  ice: add 40G speed to Admin Command GET PORT OPTION
  idpf: preserve coalescing settings across resets
  idpf: add cross timestamping
  idpf: add flow steering support
  virtchnl2: add flow steering support
  virtchnl2: rename enum virtchnl2_cap_rss
====================

Link: https://patch.msgid.link/20250718185118.2042772-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: Fix stuck TX queue for DQ queue format

gve_tx_timeout was calculating missed completions in a way that is only
relevant in the GQ queue format. Additionally, it was attempting to
disable device interrupts, which is not needed in either GQ or DQ queue
formats.

As a result, TX timeouts with the DQ queue format likely would have
triggered early resets without kicking the queue at all.

This patch drops the check for pending work altogether and always kicks
the queue after validating the queue has not seen a TX timeout too
recently.

Cc: stable@vger.kernel.org
Fixes: 87a7f321bb6a ("gve: Recover from queue stall due to missed IRQ")
Co-developed-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250717192024.1820931-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: cdc-ncm: check for filtering capability

If the decice does not support filtering, filtering
must not be used and all packets delivered for the
upper layers to sort.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://patch.msgid.link/20250717120649.2090929-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-renesas-gbeth: Add PM suspend/resume callbacks

Add PM suspend/resume callbacks for RZ/G3E SMARC EVK.

The PM deep entry is executed by pressing the SLEEP button and exit from
entry is by pressing the power button.

Logs:
root@smarc-rzg3e:~# PM: suspend entry (deep)
Filesystems sync: 0.115 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.002 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
printk: Suspending console(s) (use no_console_suspend to debug)
NOTICE:  BL2: v2.10.5(release):2.10.5/rz_soc_dev-162-g7148ba838
NOTICE:  BL2: Built : 14:23:58, Jul  5 2025
NOTICE:  BL2: SYS_LSI_MODE: 0x13e06
NOTICE:  BL2: SYS_LSI_DEVID: 0x8679447
NOTICE:  BL2: SYS_LSI_PRR: 0x0
NOTICE:  BL2: Booting BL31
renesas-gbeth 15c30000.ethernet end0: Link is Down
Disabling non-boot CPUs ...
psci: CPU3 killed (polled 0 ms)
psci: CPU2 killed (polled 0 ms)
psci: CPU1 killed (polled 0 ms)
Enabling non-boot CPUs ...
Detected VIPT I-cache on CPU1
GICv3: CPU1: found redistributor 100 region 0:0x0000000014960000
CPU1: Booted secondary processor 0x0000000100 [0x412fd050]
CPU1 is up
Detected VIPT I-cache on CPU2
GICv3: CPU2: found redistributor 200 region 0:0x0000000014980000
CPU2: Booted secondary processor 0x0000000200 [0x412fd050]
CPU2 is up
Detected VIPT I-cache on CPU3
GICv3: CPU3: found redistributor 300 region 0:0x00000000149a0000
CPU3: Booted secondary processor 0x0000000300 [0x412fd050]
CPU3 is up
dwmac4: Master AXI performs fixed burst length
15c30000.ethernet end0: No Safety Features support found
15c30000.ethernet end0: IEEE 1588-2008 Advanced Timestamp supported
15c30000.ethernet end0: configuring for phy/rgmii-id link mode
dwmac4: Master AXI performs fixed burst length
15c40000.ethernet end1: No Safety Features support found
15c40000.ethernet end1: IEEE 1588-2008 Advanced Timestamp supported
15c40000.ethernet end1: configuring for phy/rgmii-id link mode
OOM killer enabled.
Restarting tasks: Starting
Restarting tasks: Done
random: crng reseeded on system resumption
PM: suspend exit

15c30000.ethernet end0: Link is Up - 1Gbps/Full - flow control rx/tx
root@smarc-rzg3e:~# ifconfig end0 192.168.10.7 up
root@smarc-rzg3e:~# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=2.05 ms
64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=0.928 ms

Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20250717071109.8213-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: appletalk: Fix use-after-free in AARP proxy probe

The AARP proxy‐probe routine (aarp_proxy_probe_network) sends a probe,
releases the aarp_lock, sleeps, then re-acquires the lock.  During that
window an expire timer thread (__aarp_expire_timer) can remove and
kfree() the same entry, leading to a use-after-free.

race condition:

         cpu 0                          |            cpu 1
    atalk_sendmsg()                     |   atif_proxy_probe_device()
    aarp_send_ddp()                     |   aarp_proxy_probe_network()
    mod_timer()                         |   lock(aarp_lock) // LOCK!!
    timeout around 200ms                |   alloc(aarp_entry)
    and then call                       |   proxies[hash] = aarp_entry
    aarp_expire_timeout()               |   aarp_send_probe()
                                        |   unlock(aarp_lock) // UNLOCK!!
    lock(aarp_lock) // LOCK!!           |   msleep(100);
    __aarp_expire_timer(&proxies[ct])   |
    free(aarp_entry)                    |
    unlock(aarp_lock) // UNLOCK!!       |
                                        |   lock(aarp_lock) // LOCK!!
                                        |   UAF aarp_entry !!

==================================================================
BUG: KASAN: slab-use-after-free in aarp_proxy_probe_network+0x560/0x630 net/appletalk/aarp.c:493
Read of size 4 at addr ffff8880123aa360 by task repro/13278

CPU: 3 UID: 0 PID: 13278 Comm: repro Not tainted 6.15.2 #3 PREEMPT(full)
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0xc1/0x630 mm/kasan/report.c:521
kasan_report+0xca/0x100 mm/kasan/report.c:634
aarp_proxy_probe_network+0x560/0x630 net/appletalk/aarp.c:493
atif_proxy_probe_device net/appletalk/ddp.c:332 [inline]
atif_ioctl+0xb58/0x16c0 net/appletalk/ddp.c:857
atalk_ioctl+0x198/0x2f0 net/appletalk/ddp.c:1818
sock_do_ioctl+0xdc/0x260 net/socket.c:1190
sock_ioctl+0x239/0x6a0 net/socket.c:1311
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x194/0x200 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcb/0x250 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>

Allocated:
aarp_alloc net/appletalk/aarp.c:382 [inline]
aarp_proxy_probe_network+0xd8/0x630 net/appletalk/aarp.c:468
atif_proxy_probe_device net/appletalk/ddp.c:332 [inline]
atif_ioctl+0xb58/0x16c0 net/appletalk/ddp.c:857
atalk_ioctl+0x198/0x2f0 net/appletalk/ddp.c:1818

Freed:
kfree+0x148/0x4d0 mm/slub.c:4841
__aarp_expire net/appletalk/aarp.c:90 [inline]
__aarp_expire_timer net/appletalk/aarp.c:261 [inline]
aarp_expire_timeout+0x480/0x6e0 net/appletalk/aarp.c:317

The buggy address belongs to the object at ffff8880123aa300
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 96 bytes inside of
freed 192-byte region [ffff8880123aa300, ffff8880123aa3c0)

Memory state around the buggy address:
ffff8880123aa200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8880123aa280: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880123aa300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
ffff8880123aa380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8880123aa400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>
Link: https://patch.msgid.link/20250717012843.880423-1-hxzene@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmasp: Restore programming of TX map vector register

On ASP versions v2.x we need to program the TX map vector register to
properly exercise end-to-end flow control, otherwise the TX engine can
either lock-up, or cause the hardware calculated checksum to be
wrong/corrupted when multiple back to back packets are being submitted
for transmission. This register defaults to 0, which means no flow
control being applied.

Fixes: e9f31435ee7d ("net: bcmasp: Add support for asp-v3.0")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250718212242.3447751-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'amd-xgbe-add-hardware-ptp-timestamping'

Raju Rangoju says:

====================
amd-xgbe: add hardware PTP timestamping

Remove the hwptp abstraction and associated callbacks from the
struct xgbe_hw_if {} and move them to separate file after cleanup.

Adds complete support for hardware-based PTP (IEEE 1588)
timestamping to the AMD XGBE driver.
====================

Link: https://patch.msgid.link/20250718185628.4038779-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

amd-xgbe: add hardware PTP timestamping support

Adds complete support for hardware-based PTP (IEEE 1588)
timestamping to the AMD XGBE driver.

- Initialize and configure the MAC PTP registers based on link
speed and reference clock.
- Support both 50MHz and 125MHz PTP reference clocks.
- Update the driver interface and version data to support PTP
clock frequency selection.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250718185628.4038779-3-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

and-xgbe: remove the abstraction for hwptp

Remove the hwptp abstraction and associated callbacks from
the struct xgbe_hw_if {}.

The callback structure was only ever assigned a single function, without
null checks. This cleanup inlines the logic and moves all the hwtstamp
realted code a separate file, improving readability and maintainance.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250718185628.4038779-2-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

btrfs: send: use fallocate for hole punching with send stream v2

Currently holes are sent as writes full of zeroes, which results in
unnecessarily using disk space at the receiving end and increasing the
stream size.

In some cases we avoid sending writes of zeroes, like during a full
send operation where we just skip writes for holes.

But for some cases we fill previous holes with writes of zeroes too, like
in this scenario:

1) We have a file with a hole in the range [2M, 3M), we snapshot the
   subvolume and do a full send. The range [2M, 3M) stays as a hole at
   the receiver since we skip sending write commands full of zeroes;

2) We punch a hole for the range [3M, 4M) in our file, so that now it
   has a 2M hole in the range [2M, 4M), and snapshot the subvolume.
   Now if we do an incremental send, we will send write commands full
   of zeroes for the range [2M, 4M), removing the hole for [2M, 3M) at
   the receiver.

We could improve cases such as this last one by doing additional
comparisons of file extent items (or their absence) between the parent
and send snapshots, but that's a lot of code to add plus additional CPU
and IO costs.

Since the send stream v2 already has a fallocate command and btrfs-progs
implements a callback to execute fallocate since the send stream v2
support was added to it, update the kernel to use fallocate for punching
holes for V2+ streams.

Test coverage is provided by btrfs/284 which is a version of btrfs/007
that exercises send stream v2 instead of v1, using fsstress with random
operations and fssum to verify file contents.

Link: https://github.com/kdave/btrfs-progs/issues/1001
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Merge branch 'selftests-mptcp-connect-cover-alt-modes'

Matthieu Baerts says:

====================
selftests: mptcp: connect: cover alt modes

mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to
make sure everything works as expected when using "mmap" and "sendfile"
modes instead of "poll", and with the MPTCP checksum support.

These modes should be validated, but they are not when the selftests are
executed via the kselftest helpers. It means that most CIs validating
these selftests, like NIPA for the net development trees and LKFT for
the stable ones, are not covering these modes.

To fix that, new test programs have been added, simply calling
mptcp_connect.sh with the right parameters.

The first patch can be backported up to v5.6, and the second one up to
v5.14.

v1: https://lore.kernel.org/20250714-net-mptcp-sft-connect-alt-v1-0-bf1c5abbe575@kernel.org
====================

Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-0-8230ddd82454@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: connect: also cover checksum

The checksum mode has been added a while ago, but it is only validated
when manually launching mptcp_connect.sh with "-C".

The different CIs were then not validating these MPTCP Connect tests
with checksum enabled. To make sure they do, add a new test program
executing mptcp_connect.sh with the checksum mode.

Fixes: 94d66ba1d8e4 ("selftests: mptcp: enable checksum in mptcp_connect.sh")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-2-8230ddd82454@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: connect: also cover alt modes

The "mmap" and "sendfile" alternate modes for mptcp_connect.sh/.c are
available from the beginning, but only tested when mptcp_connect.sh is
manually launched with "-m mmap" or "-m sendfile", not via the
kselftests helpers.

The MPTCP CI was manually running "mptcp_connect.sh -m mmap", but not
"-m sendfile". Plus other CIs, especially the ones validating the stable
releases, were not validating these alternate modes.

To make sure these modes are validated by these CIs, add two new test
programs executing mptcp_connect.sh with the alternate modes.

Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-1-8230ddd82454@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

btrfs: unfold transaction aborts when writing dirty block groups

We have a single transaction abort call that can be due to an error from
one of two calls to update_block_group_item(). Unfold the transaction
abort calls so that if they happen we know which update_block_group_item()
call failed.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use saner variable type and name to indicate extrefs at add_inode_ref()

We are using a variable named 'log_ref_ver' of type int to indicate if we
are processing an extref item or not, using a value of 1 if so, otherwise
0. This is an odd name and type, so rename it to 'is_extref_item' and
change its type to bool.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't skip remaining extrefs if dir not found during log replay

During log replay, at add_inode_ref(), if we have an extref item that
contains multiple extrefs and one of them points to a directory that does
not exist in the subvolume tree, we are supposed to ignore it and process
the remaining extrefs encoded in the extref item, since each extref can
point to a different parent inode. However when that happens we just
return from the function and ignore the remaining extrefs.

The problem has been around since extrefs were introduced, in commit
f186373fef00 ("btrfs: extended inode refs"), but it's hard to hit in
practice because getting extref items encoding multiple extref requires
getting a hash collision when computing the offset of the extref's
key. The offset if computed like this:

key.offset = btrfs_extref_hash(dir_ino, name->name, name->len);

and btrfs_extref_hash() is just a wrapper around crc32c().

Fix this by moving to next iteration of the loop when we don't find
the parent directory that an extref points to.

Fixes: f186373fef00 ("btrfs: extended inode refs")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't ignore inode missing when replaying log tree

During log replay, at add_inode_ref(), we return -ENOENT if our current
inode isn't found on the subvolume tree or if a parent directory isn't
found. The error comes from btrfs_iget_logging() <- btrfs_iget() <-
btrfs_read_locked_inode().

The single caller of add_inode_ref(), replay_one_buffer(), ignores an
-ENOENT error because it expects that error to mean only that a parent
directory wasn't found and that is ok.

Before commit 5f61b961599a ("btrfs: fix inode lookup error handling during
log replay") we were converting any error when getting a parent directory
to -ENOENT and any error when getting the current inode to -EIO, so our
caller would fail log replay in case we can't find the current inode.
After that commit however in case the current inode is not found we return
-ENOENT to the caller and therefore it ignores the critical fact that the
current inode was not found in the subvolume tree.

Fix this by converting -ENOENT to 0 when we don't find a parent directory,
returning -ENOENT when we don't find the current inode and making the
caller, replay_one_buffer(), not ignore -ENOENT anymore.

Fixes: 5f61b961599a ("btrfs: fix inode lookup error handling during log replay")
CC: stable@vger.kernel.org # 6.16
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: enable large data folios for data reloc inode

For data reloc inodes, they are a special type of inodes that are not
exposed to user space, and are only utilized during data block groups
relocation.

They do not go under regular read-write operations, but have their file
extents manually created to have the same layout of a block group, then
its content is read from the original block group, and written back to
the new location which is in a new block group.

Previously all the handling was done in page units, and commit
c2832898126f ("btrfs: make relocate_one_page() handle subpage case")
changed the handling to subpage blocks.

On the other hand, data reloc inodes are a perfect match for large data
folios, as each relocation cluster represents one or more data extents
that are contiguous in their logical addresses.

This patch enables large folios for data reloc inodes by:

- Remove the special handling of data reloc inodes when setting folio
  order

- Change relocate_one_folio() to return the file offset of the next
  folio
  Originally it's designed to handle fixed page sized blocks, but with
  large folios, we can handle a large folio, thus we have to return the
  end of the current folio.

- Remove the warning on folio_order()

- Use folio_size() to replace fixed PAGE_SIZE usage

- Use file_offset as iterator inside relocate_file_extent_cluster

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: output more info when btrfs_subpage_assert() failed

The function btrfs_subpage_assert() is a very commonly utilized assert
to make sure the range passed in is correct inside the folio.

And when some code is not properly subpage/large folio compatible
btrfs_subpage_assert() will be the first to be triggered.
E.g. when I incorrectly enabled large folios for data reloc inodes, it
immediately triggered btrfs_subpage_assert().

In that case, outputting all the involved members will be very helpful,
this includes:

- start
- len
- folio position inside the mapping
- folio size

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reloc: unconditionally invalidate the page cache for each cluster

Commit 9d9ea1e68a05 ("btrfs: subpage: fix relocation potentially
overwriting last page data") fixed a bug when relocating data block
groups for subpage cases.

However for the incoming large folios for data reloc inode, we can hit
the same situation where block size is the same as page size, but the
folio we got is still larger than a block.

In that case, the old subpage specific check is no longer reliable.

Here we have to enhance the handling by:

- Unconditionally invalidate the page cache for the current cluster
  We set the @flush to true so that any dirty folios are properly
  written back first.

  And this time instead of dropping the whole page cache, just drop the
  range covered by the current cluster.

  This will bring some minor performance drop, as for a large folio, the
  heading half will be read twice (read by previous cluster, then
  invalidated, then read again by the current cluster).

  However that is required to support large folios, and this gets rid of
  the kinda tricky manual uptodate flag clearing for each block.

- Remove the special handling of writing back the whole page cache
  filemap_invalidate_inode() handles the write back already, and since
  we're invalidating all pages in the range, we no longer need to
  manually clear the uptodate flags for involved blocks.

  Thus there is no need to manually write back the whole page cache.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: defrag: add flag to force no-compression

Currently the defrag ioctl cannot rewrite the extents without
compression. Add a new flag for that, as setting compression to 0 (or
"no compression") means to do no changes to compression so take what is
the current default, like mount options or properties.

The defrag setting overrides mount or properties. The compression
BTRFS_DEFRAG_DONT_COMPRESS is only used for in-memory operations and
does not need to have a fixed value.

Mount with zstd:9, copy test file from /usr/bin/ (about 260KB):

  $ mount -o compress=zstd:9 /dev/vda /mnt
  $ filefrag -vsb testfile
  filefrag: -b needs a blocksize option, assuming 1024-byte blocks.
  Filesystem type is: 9123683e
  File size of testfile is 297704 (292 blocks of 1024 bytes)
   ext:     logical_offset:        physical_offset: length:   expected: flags:
     0:        0..     127:      13312..     13439:    128:             encoded
     1:      128..     255:      13364..     13491:    128:      13440: encoded
     2:      256..     291:      13424..     13459:     36:      13492: last,encoded,eof
  testfile: 3 extents found

  $ compsize testfile
  Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments.
  Type       Perc     Disk Usage   Uncompressed Referenced
  TOTAL       42%      124K         292K         292K
  zstd        42%      124K         292K         292K

Defrag to uncompressed:

  $ btrfs fi defrag --nocomp testfile
  $ filefrag -vsb testfile
  filefrag: -b needs a blocksize option, assuming 1024-byte blocks.
  Filesystem type is: 9123683e
  File size of testfile is 297704 (292 blocks of 1024 bytes)
   ext:     logical_offset:        physical_offset: length:   expected: flags:
     0:        0..     291:     291840..    292131:    292:             last,eof
  testfile: 1 extent found

  $ compsize testfile
  Processed 1 file, 1 regular extents (1 refs), 0 inline, 1 fragments.
  Type       Perc     Disk Usage   Uncompressed Referenced
  TOTAL      100%      292K         292K         292K
  none       100%      292K         292K         292K

Compress again with LZO:

  $ btrfs fi defrag -clzo testfile
  $ filefrag -vsb testfile
  filefrag: -b needs a blocksize option, assuming 1024-byte blocks.
  Filesystem type is: 9123683e
  File size of testfile is 297704 (292 blocks of 1024 bytes)
   ext:     logical_offset:        physical_offset: length:   expected: flags:
     0:        0..     127:      13312..     13439:    128:             encoded
     1:      128..     255:      13392..     13519:    128:      13440: encoded
     2:      256..     291:      13480..     13515:     36:      13520: last,encoded,eof
  testfile: 3 extents found

  $ compsize testfile
  Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments.
  Type       Perc     Disk Usage   Uncompressed Referenced
  TOTAL       64%      188K         292K         292K
  lzo         64%      188K         292K         292K

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix ssd_spread overallocation

If the ssd_spread mount option is enabled, then we run the so called
clustered allocator for data block groups. In practice, this results in
creating a btrfs_free_cluster which caches a block_group and borrows its
free extents for allocation.

Since the introduction of allocation size classes in 6.1, there has been
a bug in the interaction between that feature and ssd_spread.
find_free_extent() has a number of nested loops. The loop going over the
allocation stages, stored in ffe_ctl->loop and managed by
find_free_extent_update_loop(), the loop over the raid levels, and the
loop over all the block_groups in a space_info. The size class feature
relies on the block_group loop to ensure it gets a chance to see a
block_group of a given size class.  However, the clustered allocator
uses the cached cluster block_group and breaks that loop. Each call to
do_allocation() will really just go back to the same cached block_group.
Normally, this is OK, as the allocation either succeeds and we don't
want to loop any more or it fails, and we clear the cluster and return
its space to the block_group.

But with size classes, the allocation can succeed, then later fail,
outside of do_allocation() due to size class mismatch. That latter
failure is not properly handled due to the highly complex multi loop
logic. The result is a painful loop where we continue to allocate the
same num_bytes from the cluster in a tight loop until it fails and
releases the cluster and lets us try a new block_group. But by then, we
have skipped great swaths of the available block_groups and are likely
to fail to allocate, looping the outer loop. In pathological cases like
the reproducer below, the cached block_group is often the very last one,
in which case we don't perform this tight bg loop but instead rip
through the ffe stages to LOOP_CHUNK_ALLOC and allocate a chunk, which
is now the last one, and we enter the tight inner loop until an
allocation failure. Then allocation succeeds on the final block_group
and if the next allocation is a size mismatch, the exact same thing
happens again.

Triggering this is as easy as mounting with -o ssd_spread and then
running:

  mount -o ssd_spread $dev $mnt
  dd if=/dev/zero of=$mnt/big bs=16M count=1 &>/dev/null
  dd if=/dev/zero of=$mnt/med bs=4M count=1 &>/dev/null
  sync

if you do the two writes + sync in a loop, you can force btrfs to spin
an excessive amount on semi-successful clustered allocations, before
ultimately failing and advancing to the stage where we force a chunk
allocation. This results in 2G of data allocated per iteration, despite
only using ~20M of data. By using a small size classed extent, the inner
loop takes longer and we can spin for longer.

The simplest, shortest term fix to unbreak this is to make the clustered
allocator size_class aware in the dumbest way, where it fails on size
class mismatch. This may hinder the operation of the clustered
allocator, but better hindered than completely broken and terribly
overallocating.

Further re-design improvements are also in the works.

Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator")
CC: stable@vger.kernel.org # 6.1+
Reported-by: David Sterba <dsterba@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>

selftests: tc: Add generic erspan_opts matching support for tc-flower

Add test cases to tc_flower.sh to validate generic matching on ERSPAN
options. Both ERSPAN Type II and Type III are covered.

Also add check_tc_erspan_support() to verify whether tc supports
erspan_opts.

Signed-off-by: Li Shuang <shuali@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/1f354a1afd60f29bbbf02bd60cb52ecfc0b6bd17.1752848172.git.shuali@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

btrfs: zoned: requeue to unused block group list if zone finish failed

btrfs_zone_finish() can fail for several reason. If it is -EAGAIN, we need
to try it again later. So, put the block group to the retry list properly.

Failing to do so will keep the removable block group intact until remount
and can causes unnecessary ENOSPC.

Fixes: 74e91b12b115 ("btrfs: zoned: zone finish unused block group")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: zoned: do not remove unwritten non-data block group

There are some reports of "unable to find chunk map for logical 2147483648
length 16384" error message appears in dmesg. This means some IOs are
occurring after a block group is removed.

When a metadata tree node is cleaned on a zoned setup, we keep that node
still dirty and write it out not to create a write hole. However, this can
make a block group's used bytes == 0 while there is a dirty region left.

Such an unused block group is moved into the unused_bg list and processed
for removal. When the removal succeeds, the block group is removed from the
transaction->dirty_bgs list, so the unused dirty nodes in the block group
are not sent at the transaction commit time. It will be written at some
later time e.g, sync or umount, and causes "unable to find chunk map"
errors.

This can happen relatively easy on SMR whose zone size is 256MB. However,
calling do_zone_finish() on such block group returns -EAGAIN and keep that
block group intact, which is why the issue is hidden until now.

Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove btrfs_clear_extent_bits()

It's just a simple wrapper around btrfs_clear_extent_bit() that passes a
NULL for its last argument (a cached extent state record), plus there is
not counter part - we have a btrfs_set_extent_bit() but we do not have a
btrfs_set_extent_bits() (plural version). So just remove it and make all
callers use btrfs_clear_extent_bit() directly.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use cached state when falling back from NOCoW write to CoW write

We have a cached extent state record from the previous extent locking so
we can use when setting the EXTENT_NORESERVE in the range, allowing the
operation to be faster if the extent io tree is relatively large.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: set EXTENT_NORESERVE before range unlock in btrfs_truncate_block()

Set the EXTENT_NORESERVE bit in the io tree before unlocking the range so
that we can use the cached state and speedup the operation, since the
unlock operation releases the cached state.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't print relocation messages from auto reclaim

When BTRFS is doing automatic block-group reclaim, it is spamming the
kernel log messages a lot.

Add a 'verbose' parameter to btrfs_relocate_chunk() and
btrfs_relocate_block_group() to control the verbosity of these log
message. This way the old behaviour of printing log messages on a
user-space initiated balance operation can be kept while excessive log
spamming due to auto reclaim is mitigated.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove redundant auto reclaim log message

Remove the log message before reclaiming a chunk in
btrfs_reclaim_bgs_work(). Especially with automatic block-group
reclaiming these messages spam the kernel log.

Note there is also a tracepoint for the same condition to ease debugging.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>