Dave Martin [Tue, 1 Jul 2025 13:55:59 +0000 (14:55 +0100)]
arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
Instead of having the core code guess the note name for each regset,
use USER_REGSET_NOTE_TYPE() to pick the correct name from elf.h.
This does not affect the correctness of switch(note_type) and similar
code, since note type values known to Linux for coredump purposes were
already required to be unique.
Dave Martin [Tue, 1 Jul 2025 13:55:56 +0000 (14:55 +0100)]
binfmt_elf: Dump non-arch notes with strictly matching name and type
The note names for some arch-independent coredump notes are specified
manually, albeit by referring to the NN_<foo> #define corresponding
to the NT_<foo> #define that specifies the note type.
Now that there are no exceptional cases, refactor fill_note() to pick
the correct NN_ and NT_ macros implcitly for the requested note type.
Dave Martin [Tue, 1 Jul 2025 13:55:55 +0000 (14:55 +0100)]
regset: Add explicit core note name in struct user_regset
There is currently hard-coded logic spread around the tree for
determining the note name for regset notes emitted in coredumps.
Now that the names are declared explicitly in <uapi/elf.h>, this can be
simplified.
In preparation for getting rid of the special-case logic, add an
explicit core_note_name field in struct user_regset for specifying the
note name explicitly. To help avoid mistakes, a convenience macro
USER_REGSET_NOTE_TYPE() is provided to set .core_note_type and
.core_note_name based on the note type.
When dumping core, use the new field to set the note name, if the
regset specifies it.
Dave Martin [Tue, 1 Jul 2025 13:55:54 +0000 (14:55 +0100)]
regset: Fix kerneldoc for struct regset_get() in user_regset
Commit 7717cb9bdd04 ("regset: new method and helpers for it") added a
new interface ->regset_get() for struct user_regset, and commit 1e6986c9db21 ("regset: kill ->get()") got rid of the old interface.
The kerneldoc comment block was never updated to take account of this
change, though.
Breno Leitao [Fri, 21 Mar 2025 09:30:49 +0000 (02:30 -0700)]
lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization
lockdep_unregister_key() is called from critical code paths, including
sections where rtnl_lock() is held. For example, when replacing a qdisc
in a network device, network egress traffic is disabled while
__qdisc_destroy() is called for every network queue.
If lockdep is enabled, __qdisc_destroy() calls lockdep_unregister_key(),
which gets blocked waiting for synchronize_rcu() to complete.
For example, a simple tc command to replace a qdisc could take 13
seconds:
# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq
real 0m13.195s
user 0m0.001s
sys 0m2.746s
During this time, network egress is completely frozen while waiting for
RCU synchronization.
Use synchronize_rcu_expedited() instead to minimize the impact on
critical operations like network connectivity changes.
This improves 10x the function call to tc, when replacing the qdisc for
a network card.
# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq
real 0m1.789s
user 0m0.000s
sys 0m1.613s
[boqun: Fixed the comment and add more information for the temporary
workaround, and add TODO information for hazptr]
Reported-by: Erik Lundgren <elundgren@meta.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250321-lockdep-v1-1-78b732d195fb@debian.org
locking/lockdep: Change 'static const' variables to enum values
gcc warns about 'static const' variables even in headers when building
with -Wunused-const-variables enabled:
In file included from kernel/locking/lockdep_proc.c:25:
kernel/locking/lockdep_internals.h:69:28: error: 'LOCKF_USED_IN_IRQ_READ' defined but not used [-Werror=unused-const-variable=]
69 | static const unsigned long LOCKF_USED_IN_IRQ_READ =
| ^~~~~~~~~~~~~~~~~~~~~~
kernel/locking/lockdep_internals.h:63:28: error: 'LOCKF_ENABLED_IRQ_READ' defined but not used [-Werror=unused-const-variable=]
63 | static const unsigned long LOCKF_ENABLED_IRQ_READ =
| ^~~~~~~~~~~~~~~~~~~~~~
kernel/locking/lockdep_internals.h:57:28: error: 'LOCKF_USED_IN_IRQ' defined but not used [-Werror=unused-const-variable=]
57 | static const unsigned long LOCKF_USED_IN_IRQ =
| ^~~~~~~~~~~~~~~~~
kernel/locking/lockdep_internals.h:51:28: error: 'LOCKF_ENABLED_IRQ' defined but not used [-Werror=unused-const-variable=]
51 | static const unsigned long LOCKF_ENABLED_IRQ =
| ^~~~~~~~~~~~~~~~~
This one is easy to avoid by changing the generated constant definition
into an equivalent enum.
Arnd Bergmann [Tue, 10 Jun 2025 09:29:21 +0000 (11:29 +0200)]
locking/lockdep: Avoid struct return in lock_stats()
Returning a large structure from the lock_stats() function causes clang
to have multiple copies of it on the stack and copy between them, which
can end up exceeding the frame size warning limit:
dt-bindings: interrupt-controller: Convert apm,xgene1-msi to DT schema
Convert the Applied Micro X-Gene MSI controller binding to DT schema
format. MSI controllers go in interrupt-controller directory so move the
schema there.
Zhang Yi [Mon, 7 Jul 2025 14:08:14 +0000 (22:08 +0800)]
ext4: limit the maximum folio order
In environments with a page size of 64KB, the maximum size of a folio
can reach up to 128MB. Consequently, during the write-back of folios,
the 'rsv_blocks' will be overestimated to 1,577, which can make
pressure on the journal space where the journal is small. This can
easily exceed the limit of a single transaction. Besides, an excessively
large folio is meaningless and will instead increase the overhead of
traversing the bhs within the folio. Therefore, limit the maximum order
of a folio to 2048 filesystem blocks.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Joseph Qi <jiangqi903@gmail.com> Closes: https://lore.kernel.org/linux-ext4/CA+G9fYsyYQ3ZL4xaSg1-Tt5Evto7Zd+hgNWZEa9cQLbahA1+xg@mail.gmail.com/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-12-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
Replace the following unsafe initializations:
1. `MaybeUninit::uninit().assume_init()` with `Opaque::uninit()`
2. `core::mem::zeroed()` with `Opaque::zeroed()`
As a minor clean-up to commit 1fc05c8d8426 ("gfs2: cancel timed-out
glock requests"), when a demote request is in progress in
finish_xmote(), there is no point in waking up the glock holder at the
head of the queue because the reply from dlm cannot be on behalf of that
glock holder.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
As a follow-up to commit a431d49243a0 ("gfs2: Fix request cancelation
bug"), it turns out that any call to finish_xmote() is always followed
by a call to run_queue(), either
* directly when glock_work_func() calls finish_xmote() before calling
run_queue(), or
* indirectly when do_xmote() calls finish_xmote() before calling
gfs2_glock_queue_work(), which queues a call to glock_work_func() in
work queue context,
so remove the code in finish_xmote() that duplicates the functionality
of run_queue().
In addition, the code this commit removes is missing a check for the
GLF_DEMOTE flag which indicates that no further promotes should be
performed, so if that code didn't get removed, that check would have to
be added.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
gfs2: sanitize the gdlm_ast -> finish_xmote interface
When gdlm_ast() is called with a non-zero status code, this means that
the requested operation did not succeed and the current lock state
didn't change. Turn that into a non-zero LM_OUT_* status code (with ret
& ~LM_OUT_ST_MASK != 0) instead of pretending that dlm returned the
current lock state.
That way, we can easily change finish_xmote() to only update
gl->gl_state when the state has actually changed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
Bitterblue Smith [Sun, 13 Jul 2025 19:27:32 +0000 (22:27 +0300)]
wifi: rtw88: Fix macid assigned to TDLS station
When working in station mode, TDLS peers are assigned macid 0, even
though 0 was already assigned to the AP. This causes the connection
with the AP to stop working after the TDLS connection is torn down.
Assign the next available macid to TDLS peers, same as client stations
in AP mode.
Fixes: 902cb7b11f9a ("wifi: rtw88: assign mac_id for vif/sta and update to TX desc") Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/58648c09-8553-4bcc-a977-9dc9afd63780@gmail.com
wifi: rtw88: enable TX reports for the management queue
This is needed for AP mode. Otherwise client sees the network, but
can't connect to it.
REG_FWHW_TXQ_CTRL+1 is set to WLAN_TXQ_RPT_EN (0x1F) in common mac
init function (__rtw8723x_mac_init), but the value was overwritten
from mac table later.
Tables with register values for phy parameters initialization are
copied from vendor driver usually. When table will be regenerated,
manual modifications to it may be lost. To avoid regressions in this
case new callback mac_postinit is introduced, that is called after
parameters from table are set.
Martin Kaistra [Wed, 9 Jul 2025 12:15:22 +0000 (14:15 +0200)]
wifi: rtl8xxxu: Fix RX skb size for aggregation disabled
Commit 1e5b3b3fe9e0 ("rtl8xxxu: Adjust RX skb size to include space for
phystats") increased the skb size when aggregation is enabled but decreased
it for the aggregation disabled case.
As a result, if a frame near the maximum size is received,
rtl8xxxu_rx_complete() is called with status -EOVERFLOW and then the
driver starts to malfunction and no further communication is possible.
Restore the skb size in the aggregation disabled case.
Fixes: 1e5b3b3fe9e0 ("rtl8xxxu: Adjust RX skb size to include space for phystats") Signed-off-by: Martin Kaistra <martin.kaistra@linutronix.de> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250709121522.1992366-1-martin.kaistra@linutronix.de
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:23 +0000 (12:24 +0800)]
wifi: rtw89: 8852b: implement RFK multi-channel handling and support chanctx up to 2
To support multiple channels, 2 for this case, RFK requires to record each
calibration result for each channel in different RFK tables, and also needs
to notify FW. Then, when FW runs in MCC (multi-channel concurrency), it can
switch to the corresponding calibration result according to the channel.
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:21 +0000 (12:24 +0800)]
wifi: rtw89: 8852bt: implement RFK multi-channel handling and support chanctx up to 2
To support multiple channels, 2 for this case, RFK requires to record each
calibration result for each channel in different RFK tables, and also needs
to notify FW. Then, when FW runs in MCC (multi-channel concurrency), it can
switch to the corresponding calibration result according to the channel.
wifi: rtw89: tweak tx wake notify matching condition
8852BT needs to call TX wake notify once entering Leisure Power Save.
8852C needs to call TX wake notify after entering low power mode. Other
AX chips only MGMT packets needs to call TX wake after entering low power
mode. BE chips no need to call TX wake.
8852BT after FW 0.29.127, 8852B after FW 0.29.128 and 8922A after FW
0.35.79, the SER L2 flow determines different L2 types by parameter,
the zero value will trigger FW SER L2 assert.
Eric Dumazet [Fri, 11 Jul 2025 11:40:02 +0000 (11:40 +0000)]
tcp: call tcp_measure_rcv_mss() for ooo packets
tcp_measure_rcv_mss() is used to update icsk->icsk_ack.rcv_mss
(tcpi_rcv_mss in tcp_info) and tp->scaling_ratio.
Calling it from tcp_data_queue_ofo() makes sure these
fields are updated, and permits a better tuning
of sk->sk_rcvbuf, in the case a new flow receives many ooo
packets.
Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711114006.480026-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zong-Zhe Yang [Thu, 10 Jul 2025 04:24:17 +0000 (12:24 +0800)]
wifi: rtw89: introduce fw feature group and redefine CRASH_TRIGGER
Some FW features may have variants on how to deal with in leaf functions,
e.g. H2C commands. However, from SW component point of view, it might not
matter which variant is supported exactly. In some cases, SW component may
just care whether any of the variants is supported or not.
So, introduce a concept of FW feature group which can manage a set of FW
features and can easily be checked if at least one of them is supported.
Since CRASH_TRIGGER will have variants and then matches the case mentioned
above, so redefine CRASH_TRIGGER as a FW feature group and add a variant
of type 0 for original handling.
Some uefi implementations will write the efistub logs to the display
over a splash image. This is not desirable for debug and info logs, so
lower the default efi log level to exclude them.
wifi: rtw89: check LPS H2C command complete by C2H reg instead of done ack
8852B after FW 0.29.127, 8852BT after FW 0.29.127 and 8922A after FW
0.35.76 driver check LPS H2C command received by FW using C2H reg instead
of done ack.
wifi: rtw89: mcc: solve GO's TBTT change and TBTT too close to NoA issue
For some implementation acting as GO under MCC(GO+STA), the GO's TBTT
might change after STA roams to another AP. This could result the new GO
beacon TX at the STA timeslot of GC+STA, causing GC beacon loss.
Therefore, if the GC detects beacon loss, it will pause MCC and remain
on the GO side for 100 TU to detect the new TBTT beacon.
Additionally, some implementation acting as GO under MCC might TX beacon
too close to the NoA period. The GC calculates timeslot pattern the TOB
(time offset behind) or TOA(time offset ahead) less than the minimum
RX beacon time, which leads to beacon loss. Therefore, disable the
beacon filter in this case. Then, if the GO's TBTT changed, the pattern
TOB/TOA greater than the minimum RX beacon time, the beacon filter should
be retriggered during MCC update.
Moreover, if the beacon filter is disabled initially but the GO timeslot
change, causing QoS null data detection fail, also pause MCC to detect new
TBTT beacon.
wifi: rtw89: extend HW scan of WiFi 7 chips for extra OP chan when concurrency
HW scan of WiFi 7 chips supports multiple op channel configurations.
When concurrency, fill two channels info to HW scan to avoid packet
lost.
However, the OP chan timing is arranged by FW, and the actual stay
period depends on AP. It's hard to calculate total scan time for once
NoA in advance. Therefore, change the scan back to GO OP channel every
200TU and TX beacon to ensure GC doesn't beacon loss. Additionally, add
a period NoA with large duration to ensure GC doesn't TX packet during
GO scanning and can still listen beacon at TBTT to update the new NoA
info after scan complete.
wifi: rtw89: mcc: when MCC stop forcing to stay at GO role
MCC stop might triggered by scan, and need to force to stay at GO role
to keep TX beacon. Also, AX chips need to TX more 3 beacons to ensure
GC can receive once NoA beacon before scan when GC in courtesy mode.
BE chips no needs to TX 3 more beacon because it can TX beacon every
200TU during scan, even GC in courtesy mode can receive beacon every
600TU.
wifi: rtw89: mcc: enlarge GO NoA duration to cover channel switching time
MCC require time to switch channel when changing timeslot. If GC TX
nulldata 0 while GO is switching channel, GO can't receive it. Therefore,
enlarge the GO NoA duration to cover the channel switching time.
However, the enlarged NoA duration might cause GC's timeslot less than
minimum of RX beacon time. Therefore, adjust strict and anchor pattern
condition to avoid it.
wifi: rtw89: add DIG suspend/resume flow when scan and connection
The PD lower bound set after one interface is connected, If second
interface needs to connect, packets might not be detected because the
PD lower bound is too high. Therefore, a DIG suspend/resume flow is
added to decrease the PD lower bound during scanning or connection,
and the original PD level is resumed afterward.
wifi: rtw89: mcc: add H2C command to support different PD level in MCC
Packet detection(PD) lower bound is the threshold for sensing packet,
and it is dynamically calculated based on RSSI. In MCC, the two
interfaces have different RSSI values, so it is necessary to set
different values to ensure packets can be received. Therefore, add
H2C command to let firmware to switch PD lower bound when MCC mode.
ld: drivers/net/ethernet/wangxun/libwx/wx_lib.o: in function `wx_clean_tx_irq':
wx_lib.c:(.text+0x5a68): undefined reference to `ptp_schedule_worker'
ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_nway_reset':
wx_ethtool.c:(.text+0x880): undefined reference to `phylink_ethtool_nway_reset'
Add the same dependency on PTP_1588_CLOCK_OPTIONAL to the two driver
using this library module, following the pattern from commit 8fa19c2c69fb ("net: wangxun: fix LIBWX dependencies").
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:06 +0000 (14:50 +0800)]
wifi: rtw89: regd/acpi: support 6 GHz VLP policy via ACPI DSM
Process ACPI DSM function 11 to get 6 GHz VLP support by country. If
not allowed, return error to block the connection. By default, i.e.
ACPI DSM function is not configured, disallow 6 GHz VLP on country US
and country CA, because some platform-level certifications are needed
in FCC regulation before operating on 6 GHz VLP connection.
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:05 +0000 (14:50 +0800)]
wifi: rtw89: regd/acpi: support regulatory rules via ACPI DSM and parse rule of regd_UK
ACPI DSM function 10 is defined for the enablement for Realtek regulatory
rules. The first rule is whether to allow regd_UK regulatory settings or
not. If not, the strict one, i.e. regd_ETSI, regulatory settings will be
used on country GB.
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:04 +0000 (14:50 +0800)]
wifi: rtw89: regd/acpi: update field definition to specific country in UNII-4 conf
Originally, fields of ACPI DSM function 6 were handled for countries
following specific regulatory.
BIT(0) for countries following FCC regulatory
BIT(1) for countries following IC regulatory
Now, update to the following (one field for one specific country).
BIT(0) for country US
BIT(1) for country CA
Zong-Zhe Yang [Wed, 9 Jul 2025 06:50:03 +0000 (14:50 +0800)]
wifi: rtw89: regd/acpi: support country CA by BIT(1) in 6 GHz SP conf
ACPI DSM function 7 is used to decide whether 6 GHz Standard Power
(SP) is allowed on given countries. Now, add BIT(1) for country CA.
Besides, for searching country index, replace for-loop with index
getter function.
jackysliu [Tue, 24 Jun 2025 11:58:24 +0000 (19:58 +0800)]
scsi: bfa: Double-free fix
When the bfad_im_probe() function fails during initialization, the memory
pointed to by bfad->im is freed without setting bfad->im to NULL.
Subsequently, during driver uninstallation, when the state machine enters
the bfad_sm_stopping state and calls the bfad_im_probe_undo() function,
it attempts to free the memory pointed to by bfad->im again, thereby
triggering a double-free vulnerability.
Add support to set NAPI threaded for individual NAPI
A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.
Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.
Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.
Tested
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..7
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.napi_set_threaded
ok 7 nl_netdev.nsim_rxq_reset_down
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
scsi: scsi_transport_fc: Change to use per-rport devloss_work_q
Configurations with large numbers of FC rports per host instance are
taking a very long time to complete all devloss work. Increase potential
parallelism by using a per-rport devloss_work_q for dev_loss_work and
fast_io_fail_work.
André Draszik [Mon, 7 Jul 2025 17:05:27 +0000 (18:05 +0100)]
scsi: ufs: exynos: Fix programming of HCI_UTRL_NEXUS_TYPE
On Google gs101, the number of UTP transfer request slots (nutrs) is 32,
and in this case the driver ends up programming the UTRL_NEXUS_TYPE
incorrectly as 0.
This is because the left hand side of the shift is 1, which is of type
int, i.e. 31 bits wide. Shifting by more than that width results in
undefined behaviour.
Fix this by switching to the BIT() macro, which applies correct type
casting as required. This ensures the correct value is written to
UTRL_NEXUS_TYPE (0xffffffff on gs101), and it also fixes a UBSAN shift
warning:
UBSAN: shift-out-of-bounds in drivers/ufs/host/ufs-exynos.c:1113:21
shift exponent 32 is too large for 32-bit type 'int'
For consistency, apply the same change to the nutmrs / UTMRL_NEXUS_TYPE
write.
Fixes: 55f4b1f73631 ("scsi: ufs: ufs-exynos: Add UFS host support for Exynos SoCs") Cc: stable@vger.kernel.org Signed-off-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/r/20250707-ufs-exynos-shift-v1-1-1418e161ae40@linaro.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Peter Griffin <peter.griffin@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sean Anderson [Thu, 10 Jul 2025 20:14:53 +0000 (16:14 -0400)]
net: phy: Don't register LEDs for genphy
If a PHY has no driver, the genphy driver is probed/removed directly in
phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the
LEDs will be (un)registered when probing/removing the genphy driver.
This could occur if the leds are for a non-generic driver that isn't
loaded for whatever reason. Synchronously removing the PHY device in
phy_detach leads to the following deadlock:
There is a corresponding deadlock on the open/register side of things
(and that one is reported by lockdep), but it requires a race while this
one is deterministic. Regular drivers do not have this problem since
they are probed asynchronously (without RTNL held).
Generic PHYs do not support LEDs anyway, so don't bother registering
them.
[JakubL this is a net-next version of
commit f0f2b992d818 ("net: phy: Don't register LEDs for genphy"),
which uses APIs removed in -next.]
Ranjan Kumar [Fri, 27 Jun 2025 19:45:38 +0000 (01:15 +0530)]
scsi: mpi3mr: Serialize admin queue BAR writes on 32-bit systems
On 32-bit systems, 64-bit BAR writes to admin queue registers are
performed as two 32-bit writes. Without locking, this can cause partial
writes when accessed concurrently.
Updated per-queue spinlocks is used to serialize these writes and prevent
race conditions.
Sean Anderson [Mon, 7 Jul 2025 19:58:03 +0000 (15:58 -0400)]
net: phy: Don't register LEDs for genphy
If a PHY has no driver, the genphy driver is probed/removed directly in
phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the
LEDs will be (un)registered when probing/removing the genphy driver.
This could occur if the leds are for a non-generic driver that isn't
loaded for whatever reason. Synchronously removing the PHY device in
phy_detach leads to the following deadlock:
There is a corresponding deadlock on the open/register side of things
(and that one is reported by lockdep), but it requires a race while this
one is deterministic.
Generic PHYs do not support LEDs anyway, so don't bother registering
them.
Ranjan Kumar [Fri, 27 Jun 2025 19:45:36 +0000 (01:15 +0530)]
scsi: mpi3mr: Fix race between config read submit and interrupt completion
The "is_waiting" flag was updated after calling complete(), which could
lead to a race where the waiting thread wakes up before the flag is
cleared. This may cause a missed wakeup or stale state check.
Reorder the operations to update "is_waiting" before signaling completion
to ensure consistent state.
Fixes: 824a156633df ("scsi: mpi3mr: Base driver code") Cc: stable@vger.kernel.org Co-developed-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20250627194539.48851-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add flow control mechanism between paired netdevsim devices to stop the
TX queue during high traffic scenarios. When a receive queue becomes
congested (approaching NSIM_RING_SIZE limit), the corresponding transmit
queue on the peer device is stopped using netif_subqueue_try_stop().
Once the receive queue has sufficient capacity again, the peer's
transmit queue is resumed with netif_tx_wake_queue().
Key changes:
* Add nsim_stop_peer_tx_queue() to pause peer TX when RX queue is full
* Add nsim_start_peer_tx_queue() to resume peer TX when RX queue drains
* Implement queue mapping validation to ensure TX/RX queue count match
* Wake all queues during device unlinking to prevent stuck queues
* Use RCU protection when accessing peer device references
* wake the queues when changing the queue numbers
* Remove IFF_NO_QUEUE given it will enqueue packets now
The flow control only activates when devices have matching TX/RX queue
counts to ensure proper queue mapping.
Oscar Maes [Thu, 10 Jul 2025 14:27:13 +0000 (16:27 +0200)]
net: ipv4: fix incorrect MTU in broadcast routes
Currently, __mkroute_output overrules the MTU value configured for
broadcast routes.
This buggy behaviour can be reproduced with:
ip link set dev eth1 mtu 9000
ip route del broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2
ip route add broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2 mtu 1500
The maximum packet size should be 1500, but it is actually 8000:
ping -b 192.168.0.255 -s 8000
Fix __mkroute_output to allow MTU values to be configured for
for broadcast routes (to support a mixed-MTU local-area-network).
Jakub Kicinski [Tue, 15 Jul 2025 00:26:57 +0000 (17:26 -0700)]
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2025-07-14
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: IFC updates for disabled host PF
net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc
RDMA/mlx5: Fix UMR modifying of mkey page size
net/mlx5: Expose HCA capability bits for mkey max page size
====================
The first patch is by Geert Uytterhoeven and converts the rcar_can
driver to DEFINE_SIMPLE_DEV_PM_OPS.
The last patch is by Biju Das and removes unused macros from the
rcar_canfd driver.
* tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rcar_canfd: Drop unused macros
can: rcar_can: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
====================
Victor Nogueira [Sat, 12 Jul 2025 14:50:35 +0000 (11:50 -0300)]
selftests/tc-testing: Create test cases for adding qdiscs to invalid qdisc parents
As described in a previous commit [1], Lion's patch [2] revealed an ancient
bug in the qdisc API. Whenever a user tries to add a qdisc to an
invalid parent (not a class, root, or ingress qdisc), the qdisc API will
detect this after qdisc_create is called. Some qdiscs (like fq_codel, pie,
and sfq) call functions (on their init callback) which assume the parent is
valid, so qdisc_create itself may have caused a NULL pointer dereference in
such cases.
This commit creates 3 TDC tests that attempt to add fq_codel, pie and sfq
qdiscs to invalid parents
- Attempts to add an fq_codel qdisc to an hhf qdisc parent
- Attempts to add a pie qdisc to a drr qdisc parent
- Attempts to add an sfq qdisc to an inexistent hfsc classid (which would
belong to a valid hfsc qdisc)
net: thunderx: Fix format-truncation warning in bgx_acpi_match_id()
The buffer bgx_sel used in snprintf() was too small to safely hold
the formatted string "BGX%d" for all valid bgx_id values. This caused
a -Wformat-truncation warning with `Werror` enabled during build.
Increase the buffer size from 5 to 7 and use `sizeof(bgx_sel)` in
snprintf() to ensure safety and suppress the warning.
Build warning:
CC drivers/net/ethernet/cavium/thunder/thunder_bgx.o
drivers/net/ethernet/cavium/thunder/thunder_bgx.c: In function
‘bgx_acpi_match_id’:
drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:27: error: ‘%d’
directive output may be truncated writing between 1 and 3 bytes into a
region of size 2 [-Werror=format-truncation=]
snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
^~
drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:23: note:
directive argument in the range [0, 255]
snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
^~~~~~~
drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:2: note:
‘snprintf’ output between 5 and 7 bytes into a destination of size 5
snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
compiler warning due to insufficient snprintf buffer size.
net: fec: add fec_set_hw_mac_addr() helper function
In the current driver, the MAC address is set in both fec_restart() and
fec_set_mac_address(), so a generic helper function fec_set_hw_mac_addr()
is added to set the hardware MAC address to make the code more compact.
There are also some RCR bits that are not defined but are used by the
driver, so add macro definitions for these bits to improve readability
and maintainability.
In addition, although FEC_RCR_HALFDPX has been defined, it is not used
in the driver. According to the description of FEC_RCR[1] in RM, it is
used to disable receive on transmit. Therefore, it is more appropriate
to redefine FEC_RCR[1] as FEC_RCR_DRT.
smc: Fix various oops due to inet_sock type confusion.
syzbot reported weird splats [0][1] in cipso_v4_sock_setattr() while
freeing inet_sk(sk)->inet_opt.
The address was freed multiple times even though it was read-only memory.
cipso_v4_sock_setattr() did nothing wrong, and the root cause was type
confusion.
The cited commit made it possible to create smc_sock as an INET socket.
The issue is that struct smc_sock does not have struct inet_sock as the
first member but hijacks AF_INET and AF_INET6 sk_family, which confuses
various places.
In this case, inet_sock.inet_opt was actually smc_sock.clcsk_data_ready(),
which is an address of a function in the text segment.
Biju Das [Fri, 11 Jul 2025 05:40:21 +0000 (06:40 +0100)]
net: phy: micrel: Add ksz9131_resume()
The Renesas RZ/G3E SMARC EVK uses KSZ9131RNXC phy. On deep power state,
PHY loses the power and on wakeup the rgmii delays are not reconfigured
causing it to fail.
Replace the callback kszphy_resume()->ksz9131_resume() for reconfiguring
the rgmii_delay when it exits from PM suspend state.
kernel test robot reports that xtensa complains about different
signedness for a min_not_zero() comparison. Cast the int part to size_t
to avoid this issue.
Fixes: e227c8cdb47b ("io_uring/net: use passed in 'len' in io_recv_buf_select()") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507150504.zO5FsCPm-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kai Huang [Sun, 13 Jul 2025 22:20:19 +0000 (10:20 +1200)]
KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created
Reject the KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created and
update the documentation to reflect it.
The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC
frequency that all subsequently created vCPUs can use. It is only
intended to be called before any vCPU is created. Allowing it to be
called after that only results in confusion but nothing good.
Note this is an ABI change. But currently in Qemu (the de facto
userspace VMM) only TDX uses this VM ioctl, and it is only called once
before creating any vCPU, therefore the risk of breaking userspace is
pretty low.
This change is necessary to support the internal clock gating mechanism
in Qualcomm UFS host controller. This is power saving feature and hence
driver can continue to function correctly despite any error in enabling
these feature.
Bao D. Nguyen [Mon, 14 Jul 2025 07:53:34 +0000 (13:23 +0530)]
scsi: ufs: ufs-qcom: Update esi_vec_mask for HW major version >= 6
The MCQ feature and ESI are supported by all Qualcomm UFS controller
versions 6 and above.
Therefore, update the ESI vector mask in the UFS_MEM_CFG3 register for
platforms with major version number of 6 or higher.
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Signed-off-by: Nitin Rawat <quic_nitirawa@quicinc.com> Link: https://lore.kernel.org/r/20250714075336.2133-2-quic_nitirawa@quicinc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pNFS: Fix disk addr range check in block/scsi layout
At the end of the isect translation, disc_addr represents the physical
disk offset. Thus, end calculated from disk_addr is also a physical disk
offset. Therefore, range checking should be done using map->disk_offset,
not map->start.
Because of integer division, we need to carefully calculate the
disk offset. Consider the example below for a stripe of 6 volumes,
a chunk size of 4096, and an offset of 70000.
Sergey Bashirov [Mon, 30 Jun 2025 18:35:29 +0000 (21:35 +0300)]
pNFS: Handle RPC size limit for layoutcommits
When there are too many block extents for a layoutcommit, they may not
all fit into the maximum-sized RPC. This patch allows the generic pnfs
code to properly handle -ENOSPC returned by the block/scsi layout driver
and trigger additional layoutcommits if necessary.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250630183537.196479-5-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Sergey Bashirov [Mon, 30 Jun 2025 18:35:27 +0000 (21:35 +0300)]
pNFS: Fix extent encoding in block/scsi layout
The ext_tree_encode_commit() function may be called multiple times for
the same file, layout, and last written byte if the provided buffer is
not large enough to encode all extents in it.
The first problem is that the last written byte field must be zeroed
only on a successful call, otherwise we will lose its actual value and
get an integer overflow on the next encoding attempt.
The second problem is that we can't count and encode in one pass. The
extent state changes during encoding, so if we return -ENOSPC but have
already encoded some extents into a small buffer, they will not be
re-encoded into a new larger buffer on the next try. As a result, the
client never commits these extents to the server.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250630183537.196479-3-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Sergey Bashirov [Mon, 30 Jun 2025 18:35:26 +0000 (21:35 +0300)]
pNFS: Fix uninited ptr deref in block/scsi layout
The error occurs on the third attempt to encode extents. When function
ext_tree_prepare_commit() reallocates a larger buffer to retry encoding
extents, the "layoutupdate_pages" page array is initialized only after the
retry loop. But ext_tree_free_commitdata() is called on every iteration
and tries to put pages in the array, thus dereferencing uninitialized
pointers.
An additional problem is that there is no limit on the maximum possible
buffer_size. When there are too many extents, the client may create a
layoutcommit that is larger than the maximum possible RPC size accepted
by the server.
During testing, we observed two typical scenarios. First, one memory page
for extents is enough when we work with small files, append data to the
end of the file, or preallocate extents before writing. But when we fill
a new large file without preallocating, the number of extents can be huge,
and counting the number of written extents in ext_tree_encode_commit()
does not help much. Since this number increases even more between
unlocking and locking of ext_tree, the reallocated buffer may not be
large enough again and again.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250630183537.196479-2-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>