git.ipfire.org Git - thirdparty/kernel/stable.git/log

erofs: fix file-backed mounts no longer working on EROFS partitions

[ Upstream commit 7893cc12251f6f19e7689a4cf3ba803bddbd8437 ]

Sheng Yong reported [1] that Android APEX images didn't work with commit
072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for
now") because "EROFS-formatted APEX file images can be stored within an
EROFS-formatted Android system partition."

In response, I sent a quick fat-fingered [PATCH v3] to address the
report.  Unfortunately, the updated condition was incorrect:

         if (erofs_is_fileio_mode(sbi)) {
-            sb->s_stack_depth =
-                file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
-            if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
-                erofs_err(sb, "maximum fs stacking depth exceeded");
+            inode = file_inode(sbi->dif0.file);
+            if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
+                inode->i_sb->s_stack_depth) {

The condition `!sb->s_bdev` is always true for all file-backed EROFS
mounts, making the check effectively a no-op.

The real fix tested and confirmed by Sheng Yong [2] at that time was
[PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup
works:
    EROFS (on a block device) + EROFS (file-backed mount)

But sadly I screwed it up again by upstreaming the outdated [PATCH v3].

This patch applies the same logic as the delta between the upstream
[PATCH v3] and the real fix [PATCH v3 RESEND].

Reported-by: Sheng Yong <shengyong1@xiaomi.com>
Closes: https://lore.kernel.org/r/3acec686-4020-4609-aee4-5dae7b9b0093@gmail.com [1]
Fixes: 072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for now")
Link: https://lore.kernel.org/r/243f57b8-246f-47e7-9fb1-27a771e8e9e8@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

erofs: don't bother with s_stack_depth increasing for now

[ Upstream commit 072a7c7cdbea4f91df854ee2bb216256cd619f2a ]

Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
stack overflow when stacking an unlimited number of EROFS on top of
each other.

This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
(and such setups are already used in production for quite a long time).

One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
from 2 to 3, but proving that this is safe in general is a high bar.

After a long discussion on GitHub issues [1] about possible solutions,
one conclusion is that there is no need to support nesting file-backed
EROFS mounts on stacked filesystems, because there is always the option
to use loopback devices as a fallback.

As a quick fix for the composefs regression for this cycle, instead of
bumping `s_stack_depth` for file backed EROFS mounts, we disallow
nesting file-backed EROFS over EROFS and over filesystems with
`s_stack_depth` > 0.

This works for all known file-backed mount use cases (composefs,
containerd, and Android APEX for some Android vendors), and the fix is
self-contained.

Essentially, we are allowing one extra unaccounted fs stacking level of
EROFS below stacking filesystems, but EROFS can only be used in the read
path (i.e. overlayfs lower layers), which typically has much lower stack
usage than the write path.

We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
stack usage analysis or using alternative approaches, such as splitting
the `s_stack_depth` limitation according to different combinations of
stacking.

Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
Reported-by: Timothée Ravier <tim@siosm.fr>
Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Alexander Larsson <alexl@redhat.com>
Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

irqchip/gic-v5: Fix gicv5_its_map_event() ITTE read endianness

[ Upstream commit 1690eeb0cb2bb77096cb6c826b6849ef05013e34 ]

Kbuild bot (through sparse) reported that the ITTE read to carry out
a valid check in gicv5_its_map_event() lacks proper endianness handling.

Add the missing endianess conversion.

Fixes: 57d72196dfc8 ("irqchip/gic-v5: Add GICv5 ITS support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20251222102250.435460-1-lpieralisi@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202512131849.30ZRTBeR-lkp@intel.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>

ublk: fix use-after-free in ublk_partition_scan_work

[ Upstream commit f0d385f6689f37a2828c686fb279121df006b4cb ]

A race condition exists between the async partition scan work and device
teardown that can lead to a use-after-free of ub->ub_disk:

1. ublk_ctrl_start_dev() schedules partition_scan_work after add_disk()
2. ublk_stop_dev() calls ublk_stop_dev_unlocked() which does:
   - del_gendisk(ub->ub_disk)
   - ublk_detach_disk() sets ub->ub_disk = NULL
   - put_disk() which may free the disk
3. The worker ublk_partition_scan_work() then dereferences ub->ub_disk
   leading to UAF

Fix this by using ublk_get_disk()/ublk_put_disk() in the worker to hold
a reference to the disk during the partition scan. The spinlock in
ublk_get_disk() synchronizes with ublk_detach_disk() ensuring the worker
either gets a valid reference or sees NULL and exits early.

Also change flush_work() to cancel_work_sync() to avoid running the
partition scan work unnecessarily when the disk is already detached.

Fixes: 7fc4da6a304b ("ublk: scan partition in async way")
Reported-by: Ruikai Peng <ruikai@pwno.io>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arp: do not assume dev_hard_header() does not change skb->head

[ Upstream commit c92510f5e3f82ba11c95991824a41e59a9c5ed81 ]

arp_create() is the only dev_hard_header() caller
making assumption about skb->head being unchanged.

A recent commit broke this assumption.

Initialize @arp pointer after dev_hard_header() call.

Fixes: db5b4e39c4e6 ("ip6_gre: make ip6gre_header() robust")
Reported-by: syzbot+58b44a770a1585795351@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260107212250.384552-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: enetc: fix build warning when PAGE_SIZE is greater than 128K

[ Upstream commit 4b5bdabb5449b652122e43f507f73789041d4abe ]

The max buffer size of ENETC RX BD is 0xFFFF bytes, so if the PAGE_SIZE
is greater than 128K, ENETC_RXB_DMA_SIZE and ENETC_RXB_DMA_SIZE_XDP will
be greater than 0xFFFF, thus causing a build warning.

This will not cause any practical issues because ENETC is currently only
used on the ARM64 platform, and the max PAGE_SIZE is 64K. So this patch
is only for fixing the build warning that occurs when compiling ENETC
drivers for other platforms.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601050637.kHEKKOG7-lkp@intel.com/
Fixes: e59bc32df2e9 ("net: enetc: correct the value of ENETC_RXB_TRUESIZE")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260107091204.1980222-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: usb: pegasus: fix memory leak in update_eth_regs_async()

[ Upstream commit afa27621a28af317523e0836dad430bec551eb54 ]

When asynchronously writing to the device registers and if usb_submit_urb()
fail, the code fail to release allocated to this point resources.

Fixes: 323b34963d11 ("drivers: net: usb: pegasus: fix control urb submission")
Signed-off-by: Petko Manolov <petkan@nucleusys.com>
Link: https://patch.msgid.link/20260106084821.3746677-1-petko.manolov@konsulko.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset

[ Upstream commit c1d73b1480235731e35c81df70b08f4714a7d095 ]

`qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class
itself is active.

Two qfq_class objects may point to the same leaf_qdisc. This happens
when:

1. one QFQ qdisc is attached to the dev as the root qdisc, and

2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get()
/ qdisc_put()) and is pending to be destroyed, as in function
tc_new_tfilter.

When packets are enqueued through the root QFQ qdisc, the shared
leaf_qdisc->q.qlen increases. At the same time, the second QFQ
qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters
qfq_reset() with its own q->q.qlen == 0, but its class's leaf
qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate
an inactive aggregate and trigger a null-deref in qfq_deactivate_agg:

[    0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    0.903571] #PF: supervisor write access in kernel mode
[    0.903860] #PF: error_code(0x0002) - not-present page
[    0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0
[    0.904502] Oops: Oops: 0002 [#1] SMP NOPTI
[    0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE
[    0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[    0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
[    0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0

Code starting with the faulting instruction
===========================================
   0: 0f 84 4d 01 00 00     je     0x153
   6: 48 89 70 18           mov    %rsi,0x18(%rax)
   a: 8b 4b 10              mov    0x10(%rbx),%ecx
   d: 48 c7 c2 ff ff ff ff mov    $0xffffffffffffffff,%rdx
  14: 48 8b 78 08           mov    0x8(%rax),%rdi
  18: 48 d3 e2              shl    %cl,%rdx
  1b: 48 21 f2              and    %rsi,%rdx
  1e: 48 2b 13              sub    (%rbx),%rdx
  21: 48 8b 30              mov    (%rax),%rsi
  24: 48 d3 ea              shr    %cl,%rdx
  27: 8b 4b 18              mov    0x18(%rbx),%ecx
...
[    0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
[    0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
[    0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
[    0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
[    0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
[    0.909179] FS:  000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
[    0.909572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
[    0.910247] PKRU: 55555554
[    0.910391] Call Trace:
[    0.910527]  <TASK>
[    0.910638]  qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485)
[    0.910826]  qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036)
[    0.911040]  __qdisc_destroy (net/sched/sch_generic.c:1076)
[    0.911236]  tc_new_tfilter (net/sched/cls_api.c:2447)
[    0.911447]  rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
[    0.911663]  ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861)
[    0.911894]  netlink_rcv_skb (net/netlink/af_netlink.c:2550)
[    0.912100]  netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
[    0.912296]  ? __alloc_skb (net/core/skbuff.c:706)
[    0.912484]  netlink_sendmsg (net/netlink/af_netlink.c:1894)
[    0.912682]  sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1))
[    0.912880]  vfs_write (fs/read_write.c:593 fs/read_write.c:686)
[    0.913077]  ksys_write (fs/read_write.c:738)
[    0.913252]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[    0.913438]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
[    0.913687] RIP: 0033:0x424c34
[    0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9

Code starting with the faulting instruction
===========================================
   0: 89 02                 mov    %eax,(%rdx)
   2: 48 c7 c0 ff ff ff ff mov    $0xffffffffffffffff,%rax
   9: eb bd                 jmp    0xffffffffffffffc8
   b: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
  12: 00 00 00
  15: 90                    nop
  16: f3 0f 1e fa           endbr64
  1a: 80 3d 2d 44 09 00 00 cmpb   $0x0,0x9442d(%rip)        # 0x9444e
  21: 74 13                 je     0x36
  23: b8 01 00 00 00        mov    $0x1,%eax
  28: 0f 05                 syscall
  2a: 09                    .byte 0x9
[    0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[    0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34
[    0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003
[    0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000
[    0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28
[    0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001
[    0.917039]  </TASK>
[    0.917158] Modules linked in:
[    0.917316] CR2: 0000000000000000
[    0.917484] ---[ end trace 0000000000000000 ]---
[    0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
[    0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0

Code starting with the faulting instruction
===========================================
   0: 0f 84 4d 01 00 00     je     0x153
   6: 48 89 70 18           mov    %rsi,0x18(%rax)
   a: 8b 4b 10              mov    0x10(%rbx),%ecx
   d: 48 c7 c2 ff ff ff ff mov    $0xffffffffffffffff,%rdx
  14: 48 8b 78 08           mov    0x8(%rax),%rdi
  18: 48 d3 e2              shl    %cl,%rdx
  1b: 48 21 f2              and    %rsi,%rdx
  1e: 48 2b 13              sub    (%rbx),%rdx
  21: 48 8b 30              mov    (%rax),%rsi
  24: 48 d3 ea              shr    %cl,%rdx
  27: 8b 4b 18              mov    0x18(%rbx),%ecx
...
[    0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
[    0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
[    0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
[    0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
[    0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
[    0.921014] FS:  000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
[    0.921424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
[    0.922097] PKRU: 55555554
[    0.922240] Kernel panic - not syncing: Fatal exception
[    0.922590] Kernel Offset: disabled

Fixes: 0545a3037773 ("pkt_sched: QFQ - quick fair queue scheduler")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211_hwsim: fix typo in frequency notification

[ Upstream commit 333418872bfecf4843f1ded7a4151685dfcf07d5 ]

The NAN notification is for 5745 MHz which corresponds to channel 149
and not 5475 which is not actually a valid channel. This could result in
a NULL pointer dereference in cfg80211_next_nan_dw_notif.

Fixes: a37a6f54439b ("wifi: mac80211_hwsim: Add simulation support for NAN device")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260107143652.7dab2035836f.Iacbaf7bb94ed5c14a0928a625827e4137d8bfede@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: airoha: Fix schedule while atomic in airoha_ppe_deinit()

[ Upstream commit 6abcf751bc084804a9e5b3051442e8a2ce67f48a ]

airoha_ppe_deinit() runs airoha_npu_ppe_deinit() in atomic context.
airoha_npu_ppe_deinit routine allocates ppe_data buffer with GFP_KERNEL
flag. Rely on rcu_replace_pointer in airoha_ppe_deinit routine in order
to fix schedule while atomic issue in airoha_npu_ppe_deinit() since we
do not need atomic context there.

Fixes: 00a7678310fe3 ("net: airoha: Introduce flowtable offload support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260105-airoha-fw-ethtool-v2-1-3b32b158cc31@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

HID: quirks: work around VID/PID conflict for appledisplay

[ Upstream commit c7fabe4ad9219866c203164a214c474c95b36bf2 ]

For years I wondered why the Apple Cinema Display driver would not
just work for me. Turns out the hidraw driver instantly takes it
over. Fix by adding appledisplay VID/PIDs to hid_have_special_driver.

Fixes: 069e8a65cd79 ("Driver for Apple Cinema Display")
Signed-off-by: René Rebe <rene@exactco.de>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

sparc/PCI: Correct 64-bit non-pref -> pref BAR resources

[ Upstream commit bdb32359eab94013e80cf7e3d40a3fd4972da93a ]

SPARC T5-2 dts describes some PCI BARs as 64-bit resources without the
pref(etchable) bit (0x83... vs 0xc3... in assigned-addresses) for address
ranges above the 4G threshold. Such resources cannot be placed into a
non-prefetchable PCI bridge window that is capable only of 32-bit
addressing. As such, it looks like the platform is improperly described by
the dts.

The kernel detects this problem (see the IORESOURCE_PREFETCH check in
pci_find_parent_resource()) and fails to assign these BAR resources to the
resource tree due to lack of a compatible bridge window.

Prior to 754babaaf333 ("sparc/PCI: Remove pcibios_enable_device() as they
do nothing extra") SPARC arch code did not test whether device resources
were successfully in the resource tree when enabling a device, effectively
hiding the problem. After removing the arch-specific enable code,
pci_enable_resources() refuses to enable the device when it finds not all
mem resources are assigned, and therefore mpt3sas can't be enabled:

  pci 0001:04:00.0: reg 0x14: [mem 0x801110000000-0x80111000ffff 64bit]
  pci 0001:04:00.0: reg 0x1c: [mem 0x801110040000-0x80111007ffff 64bit]
  pci 0001:04:00.0: BAR 1 [mem 0x801110000000-0x80111000ffff 64bit]: can't claim; no compatible bridge window
  pci 0001:04:00.0: BAR 3 [mem 0x801110040000-0x80111007ffff 64bit]: can't claim; no compatible bridge window
  mpt3sas 0001:04:00.0: BAR 1 [mem size 0x00010000 64bit]: not assigned; can't enable device

For clarity, this filtered log only shows failures for one mpt3sas device
but other devices fail similarly. In the reported case, the end result with
all the failures is an unbootable system.

Things appeared to "work" before 754babaaf333 ("sparc/PCI: Remove
pcibios_enable_device() as they do nothing extra") because the resource
tree is agnostic to whether PCI BAR resources are properly in the tree or
not. So as long as there was a parent resource (e.g. a root bus resource)
that contains the address range, the resource tree code just places
resource request underneath it without any consideration to the
intermediate BAR resource. While it worked, it's incorrect setup still.

Add an OF fixup to set the IORESOURCE_PREFETCH flag for a 64-bit PCI
resource that has the end address above 4G requiring placement into the
prefetchable window. Also log the issue.

Fixes: 754babaaf333 ("sparc/PCI: Remove pcibios_enable_device() as they do nothing extra")
Reported-by: Nathaniel Roach <nroach44@gmail.com>
Closes: https://github.com/sparclinux/issues/issues/22
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Nathaniel Roach <nroach44@gmail.com>
Link: https://patch.msgid.link/20251124170411.3709-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

trace: ftrace_dump_on_oops[] is not exported, make it static

[ Upstream commit 1e2ed4bfd50ace3c4272cfab7e9aa90956fb7ae0 ]

The ftrace_dump_on_oops string is not used outside of trace.c so
make it static to avoid the export warning from sparse:

kernel/trace/trace.c:141:6: warning: symbol 'ftrace_dump_on_oops' was not declared. Should it be static?

Fixes: dd293df6395a2 ("tracing: Move trace sysctls into trace.c")
Link: https://patch.msgid.link/20260106231054.84270-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

block: don't merge bios with different app_tags

[ Upstream commit 6acd4ac5f8f0ec9b946875553e52907700bcfc77 ]

nvme_set_app_tag() uses the app_tag value from the bio_integrity_payload
of the struct request's first bio. This assumes all the request's bios
have the same app_tag. However, it is possible for bios with different
app_tag values to be merged into a single request.
Add a check in blk_integrity_merge_{bio,rq}() to prevent the merging of
bios/requests with different app_tag values if BIP_CHECK_APPTAG is set.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 3d8b5a22d404 ("block: add support to pass user meta buffer")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: netdevsim: fix inconsistent carrier state after link/unlink

[ Upstream commit d83dddffe1904e4a576d11a541878850a8e64cd2 ]

This patch fixes the edge case behavior on ifup/ifdown and
linking/unlinking two netdevsim interfaces:

1. unlink two interfaces netdevsim1 and netdevsim2
2. ifdown netdevsim1
3. ifup netdevsim1
4. link two interfaces netdevsim1 and netdevsim2
5. (Now two interfaces are linked in terms of netdevsim peer, but
    carrier state of the two interfaces remains DOWN.)

This inconsistent behavior is caused by the current implementation,
which only cares about the "link, then ifup" order, not "ifup, then
link" order. This patch fixes the inconsistency by calling
netif_carrier_on() when two netdevsim interfaces are linked.

This patch fixes buggy behavior on NetworkManager-based systems which
causes the netdevsim test to fail with the following error:

  # timeout set to 600
  # selftests: drivers/net/netdevsim: peer.sh
  # 2025/12/25 00:54:03 socat[9115] W address is opened in read-write mode but only supports read-only
  # 2025/12/25 00:56:17 socat[9115] W connect(7, AF=2 192.168.1.1:1234, 16): Connection timed out
  # 2025/12/25 00:56:17 socat[9115] E TCP:192.168.1.1:1234: Connection timed out
  # expected 3 bytes, got 0
  # 2025/12/25 00:56:17 socat[9109] W exiting on signal 15
  not ok 13 selftests: drivers/net/netdevsim: peer.sh # exit=1

This patch also solves timeout on TCP Fast Open (TFO) test in
NetworkManager-based systems because it also depends on netdevsim's
carrier consistency.

Fixes: 1a8fed52f7be ("netdevsim: set the carrier when the device goes up")
Signed-off-by: Yohei Kojima <yk@y-koj.net>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/602c9e1ba5bb2ee1997bb38b1d866c9c3b807ae9.1767624906.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: drv-net: Bring back tool() to driver __init__s

[ Upstream commit 353cfc0ef3f34ef7fe313ae38dac37f2454a7cf5 ]

The pp_alloc_fail.py test (which doesn't run in NIPA CI?) uses tool, add
back the import.

Resolves:
ImportError: cannot import name 'tool' from 'lib.py'

Fixes: 68a052239fc4 ("selftests: drv-net: update remaining Python init files")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260105163319.47619-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/sched: act_api: avoid dereferencing ERR_PTR in tcf_idrinfo_destroy

[ Upstream commit adb25a46dc0a43173f5ea5f5f58fc8ba28970c7c ]

syzbot reported a crash in tc_act_in_hw() during netns teardown where
tcf_idrinfo_destroy() passed an ERR_PTR(-EBUSY) value as a tc_action
pointer, leading to an invalid dereference.

Guard against ERR_PTR entries when iterating the action IDR so teardown
does not call tc_act_in_hw() on an error pointer.

Fixes: 84a7d6797e6a ("net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit()")
Link: https://syzkaller.appspot.com/bug?extid=8f1c492ffa4644ff3826
Reported-by: syzbot+8f1c492ffa4644ff3826@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8f1c492ffa4644ff3826
Signed-off-by: Shivani Gupta <shivani07g@gmail.com>
Link: https://patch.msgid.link/20260105005905.243423-1-shivani07g@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: sfp: return the number of written bytes for smbus single byte access

[ Upstream commit 13ff3e724207f579d3c814ee05516fefcb4f32e8 ]

We expect the SFP write accessors to return the number of written bytes.
We fail to do so for single-byte smbus accesses, which may cause errors
when setting a module's high-power state and for some cotsworks modules.

Let's return the amount of written bytes, as expected.

Fixes: 7662abf4db94 ("net: phy: sfp: Add support for SMBus module access")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260105151840.144552-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

udp: call skb_orphan() before skb_attempt_defer_free()

[ Upstream commit e5c8eda39a9fc1547d1398d707aa06c1d080abdd ]

Standard UDP receive path does not use skb->destructor.

But skmsg layer does use it, since it calls skb_set_owner_sk_safe()
from udp_read_skb().

This then triggers this warning in skb_attempt_defer_free():

DEBUG_NET_WARN_ON_ONCE(skb->destructor);

We must call skb_orphan() to fix this issue.

Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()")
Reported-by: syzbot+3e68572cf2286ce5ebe9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/695b83bd.050a0220.1c9965.002b.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260105093630.1976085-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "dsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable"

[ Upstream commit 7801edc9badd972cb62cf11c0427e70b6dca239d ]

This reverts commit 926eae604403acfa27ba5b072af458e87e634a50, which
never could have produced the intended effect:
https://lore.kernel.org/netdev/AM0PR06MB10396BBF8B568D77556FC46F8F7DEA@AM0PR06MB10396.eurprd06.prod.outlook.com/

The reason why it is broken beyond repair in this form is that the
mv88e6xxx driver outsources its "tx-p2p-microvolt" property to the OF
node of an external Ethernet PHY. This:
(a) does not work if there is no external PHY (chip-to-chip connection,
    or SFP module)
(b) pollutes the OF property namespace / bindings of said external PHY
    ("tx-p2p-microvolt" could have meaning for the Ethernet PHY's SerDes
    interface as well)

We can revisit the idea of making SerDes amplitude configurable once we
have proper bindings for the mv88e6xxx SerDes. Until then, remove the
code that leaves us with unnecessary baggage.

Fixes: 926eae604403 ("dsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable")
Cc: Holger Brunck <holger.brunck@hitachienergy.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260104093952.486606-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: fix aux device unplugging when rdma is not supported by vport

[ Upstream commit 4648fb2f2e7210c53b85220ee07d42d1e4bae3f9 ]

If vport flags do not contain VIRTCHNL2_VPORT_ENABLE_RDMA, driver does not
allocate vdev_info for this vport. This leads to kernel NULL pointer
dereference in idpf_idc_vport_dev_down(), which references vdev_info for
every vport regardless.

Check, if vdev_info was ever allocated before unplugging aux device.

Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: cap maximum Rx buffer size

[ Upstream commit 086efe0a1ecc36cffe46640ce12649a4cd3ff171 ]

The HW only supports a maximum Rx buffer size of 16K-128. On systems
using large pages, the libeth logic can configure the buffer size to be
larger than this. The upper bound is PAGE_SIZE while the lower bound is
MTU rounded up to the nearest power of 2. For example, ARM systems with
a 64K page size and an mtu of 9000 will set the Rx buffer size to 16K,
which will cause the config Rx queues message to fail.

Initialize the bufq/fill queue buf_len field to the maximum supported
size. This will trigger the libeth logic to cap the maximum Rx buffer
size by reducing the upper bound.

Fixes: 74d1412ac8f37 ("idpf: use libeth Rx buffer management for payload buffer")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: David Decotigny <ddecotig@google.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: Fix error handling in idpf_vport_open()

[ Upstream commit 87b8ee64685bc096a087af833d4594b2332bfdb1 ]

Fix error handling to properly cleanup interrupts when
idpf_vport_queue_ids_init() or idpf_rx_bufs_init_all() fail. Jump to
'intr_deinit' instead of 'queues_rel' to ensure interrupts are cleaned up
before releasing other resources.

Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: Fix RSS LUT NULL ptr issue after soft reset

[ Upstream commit ebecca5b093895da801b3eba1a55b4ec4027d196 ]

During soft reset, the RSS LUT is freed and not restored unless the
interface is up. If an ethtool command that accesses the rss lut is
attempted immediately after reset, it will result in NULL ptr
dereference. Also, there is no need to reset the rss lut if the soft reset
does not involve queue count change.

After soft reset, set the RSS LUT to default values based on the updated
queue count only if the reset was a result of a queue count change and
the LUT was not configured by the user. In all other cases, don't touch
the LUT.

Steps to reproduce:

** Bring the interface down (if up)
ifconfig eth1 down

** update the queue count (eg., 27->20)
ethtool -L eth1 combined 20

** display the RSS LUT
ethtool -x eth1

[82375.558338] BUG: kernel NULL pointer dereference, address: 0000000000000000
[82375.558373] #PF: supervisor read access in kernel mode
[82375.558391] #PF: error_code(0x0000) - not-present page
[82375.558408] PGD 0 P4D 0
[82375.558421] Oops: Oops: 0000 [#1] SMP NOPTI
<snip>
[82375.558516] RIP: 0010:idpf_get_rxfh+0x108/0x150 [idpf]
[82375.558786] Call Trace:
[82375.558793]  <TASK>
[82375.558804]  rss_prepare.isra.0+0x187/0x2a0
[82375.558827]  rss_prepare_data+0x3a/0x50
[82375.558845]  ethnl_default_doit+0x13d/0x3e0
[82375.558863]  genl_family_rcv_msg_doit+0x11f/0x180
[82375.558886]  genl_rcv_msg+0x1ad/0x2b0
[82375.558902]  ? __pfx_ethnl_default_doit+0x10/0x10
[82375.558920]  ? __pfx_genl_rcv_msg+0x10/0x10
[82375.558937]  netlink_rcv_skb+0x58/0x100
[82375.558957]  genl_rcv+0x2c/0x50
[82375.558971]  netlink_unicast+0x289/0x3e0
[82375.558988]  netlink_sendmsg+0x215/0x440
[82375.559005]  __sys_sendto+0x234/0x240
[82375.559555]  __x64_sys_sendto+0x28/0x30
[82375.560068]  x64_sys_call+0x1909/0x1da0
[82375.560576]  do_syscall_64+0x7a/0xfa0
[82375.561076]  ? clear_bhb_loop+0x60/0xb0
[82375.561567]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
<snip>

Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: Fix RSS LUT configuration on down interfaces

[ Upstream commit 445b49d13787da2fe8d51891ee196e5077feef44 ]

RSS LUT provisioning and queries on a down interface currently return
silently without effect. Users should be able to configure RSS settings
even when the interface is down.

Fix by maintaining RSS configuration changes in the driver's soft copy and
deferring HW programming until the interface comes up.

Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: Fix RSS LUT NULL pointer crash on early ethtool operations

[ Upstream commit 83f38f210b85676f40ba8586b5a8edae19b56995 ]

The RSS LUT is not initialized until the interface comes up, causing
the following NULL pointer crash when ethtool operations like rxhash on/off
are performed before the interface is brought up for the first time.

Move RSS LUT initialization from ndo_open to vport creation to ensure LUT
is always available. This enables RSS configuration via ethtool before
bringing the interface up. Simplify LUT management by maintaining all
changes in the driver's soft copy and programming zeros to the indirection
table when rxhash is disabled. Defer HW programming until the interface
comes up if it is down during rxhash and LUT configuration changes.

Steps to reproduce:
** Load idpf driver; interfaces will be created
modprobe idpf
** Before bringing the interfaces up, turn rxhash off
ethtool -K eth2 rxhash off

[89408.371875] BUG: kernel NULL pointer dereference, address: 0000000000000000
[89408.371908] #PF: supervisor read access in kernel mode
[89408.371924] #PF: error_code(0x0000) - not-present page
[89408.371940] PGD 0 P4D 0
[89408.371953] Oops: Oops: 0000 [#1] SMP NOPTI
<snip>
[89408.372052] RIP: 0010:memcpy_orig+0x16/0x130
[89408.372310] Call Trace:
[89408.372317]  <TASK>
[89408.372326]  ? idpf_set_features+0xfc/0x180 [idpf]
[89408.372363]  __netdev_update_features+0x295/0xde0
[89408.372384]  ethnl_set_features+0x15e/0x460
[89408.372406]  genl_family_rcv_msg_doit+0x11f/0x180
[89408.372429]  genl_rcv_msg+0x1ad/0x2b0
[89408.372446]  ? __pfx_ethnl_set_features+0x10/0x10
[89408.372465]  ? __pfx_genl_rcv_msg+0x10/0x10
[89408.372482]  netlink_rcv_skb+0x58/0x100
[89408.372502]  genl_rcv+0x2c/0x50
[89408.372516]  netlink_unicast+0x289/0x3e0
[89408.372533]  netlink_sendmsg+0x215/0x440
[89408.372551]  __sys_sendto+0x234/0x240
[89408.372571]  __x64_sys_sendto+0x28/0x30
[89408.372585]  x64_sys_call+0x1909/0x1da0
[89408.372604]  do_syscall_64+0x7a/0xfa0
[89408.373140]  ? clear_bhb_loop+0x60/0xb0
[89408.373647]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[89408.378887]  </TASK>
<snip>

Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: fix issue with ethtool -n command display

[ Upstream commit 36aae2ea6bd76b8246caa50e34a4f4824f0a3be8 ]

When ethtool -n is executed on an interface to display the flow steering
rules, "rxclass: Unknown flow type" error is generated.

The flow steering list maintained in the driver currently stores only the
location and q_index but other fields of the ethtool_rx_flow_spec are not
stored. This may be enough for the virtchnl command to delete the entry.
However, when the ethtool -n command is used to query the flow steering
rules, the ethtool_rx_flow_spec returned is not complete causing the
error below.

Resolve this by storing the flow spec (fsp) when rules are added and
returning the complete flow spec when rules are queried.

Also, change the return value from EINVAL to ENOENT when flow steering
entry is not found during query by location or when deleting an entry.

Add logic to detect and reject duplicate filter entries at the same
location and change logic to perform upfront validation of all error
conditions before adding flow rules through virtchnl. This avoids the
need for additional virtchnl delete messages when subsequent operations
fail, which was missing in the original upstream code.

Example:
Before the fix:
ethtool -n eth1
2 RX rings available
Total 2 rules

rxclass: Unknown flow type
rxclass: Unknown flow type

After the fix:
ethtool -n eth1
2 RX rings available
Total 2 rules

Filter: 0
        Rule Type: TCP over IPv4
        Src IP addr: 10.0.0.1 mask: 0.0.0.0
        Dest IP addr: 0.0.0.0 mask: 255.255.255.255
        TOS: 0x0 mask: 0xff
        Src port: 0 mask: 0xffff
        Dest port: 0 mask: 0xffff
        Action: Direct to queue 0

Filter: 1
        Rule Type: UDP over IPv4
        Src IP addr: 10.0.0.1 mask: 0.0.0.0
        Dest IP addr: 0.0.0.0 mask: 255.255.255.255
        TOS: 0x0 mask: 0xff
        Src port: 0 mask: 0xffff
        Dest port: 0 mask: 0xffff
        Action: Direct to queue 0

Fixes: ada3e24b84a0 ("idpf: add flow steering support")
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Co-developed-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: fix memory leak of flow steer list on rmmod

[ Upstream commit f9841bd28b600526ca4f6713b0ca49bf7bb98452 ]

The flow steering list maintains entries that are added and removed as
ethtool creates and deletes flow steering rules. Module removal with active
entries causes memory leak as the list is not properly cleaned up.

Prevent this by iterating through the remaining entries in the list and
freeing the associated memory during module removal. Add a spinlock
(flow_steer_list_lock) to protect the list access from multiple threads.

Fixes: ada3e24b84a0 ("idpf: add flow steering support")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: fix error handling in the init_task on load

[ Upstream commit 4d792219fe6f891b5b557a607ac8a0a14eda6e38 ]

If the init_task fails during a driver load, we end up without vports and
netdevs, effectively failing the entire process. In that state a
subsequent reset will result in a crash as the service task attempts to
access uninitialized resources. Following trace is from an error in the
init_task where the CREATE_VPORT (op 501) is rejected by the FW:

[40922.763136] idpf 0000:83:00.0: Device HW Reset initiated
[40924.449797] idpf 0000:83:00.0: Transaction failed (op 501)
[40958.148190] idpf 0000:83:00.0: HW reset detected
[40958.161202] BUG: kernel NULL pointer dereference, address: 00000000000000a8
...
[40958.168094] Workqueue: idpf-0000:83:00.0-vc_event idpf_vc_event_task [idpf]
[40958.168865] RIP: 0010:idpf_vc_event_task+0x9b/0x350 [idpf]
...
[40958.177932] Call Trace:
[40958.178491]  <TASK>
[40958.179040]  process_one_work+0x226/0x6d0
[40958.179609]  worker_thread+0x19e/0x340
[40958.180158]  ? __pfx_worker_thread+0x10/0x10
[40958.180702]  kthread+0x10f/0x250
[40958.181238]  ? __pfx_kthread+0x10/0x10
[40958.181774]  ret_from_fork+0x251/0x2b0
[40958.182307]  ? __pfx_kthread+0x10/0x10
[40958.182834]  ret_from_fork_asm+0x1a/0x30
[40958.183370]  </TASK>

Fix the error handling in the init_task to make sure the service and
mailbox tasks are disabled if the error happens during load. These are
started in idpf_vc_core_init(), which spawns the init_task and has no way
of knowing if it failed. If the error happens on reset, following
successful driver load, the tasks can still run, as that will allow the
netdevs to attempt recovery through another reset. Stop the PTP callbacks
either way as those will be restarted by the call to idpf_vc_core_init()
during a successful reset.

Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Reported-by: Vivek Kumar <iamvivekkumar@google.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: fix memory leak in idpf_vc_core_deinit()

[ Upstream commit e111cbc4adf9f9974eed040aeece7e17460f6bff ]

Make sure to free hw->lan_regs. Reported by kmemleak during reset:

unreferenced object 0xff1b913d02a936c0 (size 96):
  comm "kworker/u258:14", pid 2174, jiffies 4294958305
  hex dump (first 32 bytes):
    00 00 00 c0 a8 ba 2d ff 00 00 00 00 00 00 00 00  ......-.........
    00 00 40 08 00 00 00 00 00 00 25 b3 a8 ba 2d ff  ..@.......%...-.
  backtrace (crc 36063c4f):
    __kmalloc_noprof+0x48f/0x890
    idpf_vc_core_init+0x6ce/0x9b0 [idpf]
    idpf_vc_event_task+0x1fb/0x350 [idpf]
    process_one_work+0x226/0x6d0
    worker_thread+0x19e/0x340
    kthread+0x10f/0x250
    ret_from_fork+0x251/0x2b0
    ret_from_fork_asm+0x1a/0x30

Fixes: 6aa53e861c1a ("idpf: implement get LAN MMIO memory regions")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: fix memory leak in idpf_vport_rel()

[ Upstream commit f6242b354605faff263ca45882b148200915a3f6 ]

Free vport->rx_ptype_lkup in idpf_vport_rel() to avoid leaking memory
during a reset. Reported by kmemleak:

unreferenced object 0xff450acac838a000 (size 4096):
  comm "kworker/u258:5", pid 7732, jiffies 4296830044
  hex dump (first 32 bytes):
    00 00 00 00 00 10 00 00 00 10 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00  ................
  backtrace (crc 3da81902):
    __kmalloc_cache_noprof+0x469/0x7a0
    idpf_send_get_rx_ptype_msg+0x90/0x570 [idpf]
    idpf_init_task+0x1ec/0x8d0 [idpf]
    process_one_work+0x226/0x6d0
    worker_thread+0x19e/0x340
    kthread+0x10f/0x250
    ret_from_fork+0x251/0x2b0
    ret_from_fork_asm+0x1a/0x30

Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: detach and close netdevs while handling a reset

[ Upstream commit 2e281e1155fc476c571c0bd2ffbfe28ab829a5c3 ]

Protect the reset path from callbacks by setting the netdevs to detached
state and close any netdevs in UP state until the reset handling has
completed. During a reset, the driver will de-allocate resources for the
vport, and there is no guarantee that those will recover, which is why the
existing vport_ctrl_lock does not provide sufficient protection.

idpf_detach_and_close() is called right before reset handling. If the
reset handling succeeds, the netdevs state is recovered via call to
idpf_attach_and_open(). If the reset handling fails the netdevs remain
down. The detach/down calls are protected with RTNL lock to avoid racing
with callbacks. On the recovery side the attach can be done without
holding the RTNL lock as there are no callbacks expected at that point,
due to detach/close always being done first in that flow.

The previous logic restoring the netdevs state based on the
IDPF_VPORT_UP_REQUESTED flag in the init task is not needed anymore, hence
the removal of idpf_set_vport_state(). The IDPF_VPORT_UP_REQUESTED is
still being used to restore the state of the netdevs following the reset,
but has no use outside of the reset handling flow.

idpf_init_hard_reset() is converted to void, since it was used as such and
there is no error handling being done based on its return value.

Before this change, invoking hard and soft resets simultaneously will
cause the driver to lose the vport state:
ip -br a
<inf> UP
echo 1 > /sys/class/net/ens801f0/device/reset& \
ethtool -L ens801f0 combined 8
ip -br a
<inf> DOWN
ip link set <inf> up
ip -br a
<inf> DOWN

Also in case of a failure in the reset path, the netdev is left
exposed to external callbacks, while vport resources are not
initialized, leading to a crash on subsequent ifup/down:
[408471.398966] idpf 0000:83:00.0: HW reset detected
[408471.411744] idpf 0000:83:00.0: Device HW Reset initiated
[408472.277901] idpf 0000:83:00.0: The driver was unable to contact the device's firmware. Check that the FW is running. Driver state= 0x2
[408508.125551] BUG: kernel NULL pointer dereference, address: 0000000000000078
[408508.126112] #PF: supervisor read access in kernel mode
[408508.126687] #PF: error_code(0x0000) - not-present page
[408508.127256] PGD 2aae2f067 P4D 0
[408508.127824] Oops: Oops: 0000 [#1] SMP NOPTI
...
[408508.130871] RIP: 0010:idpf_stop+0x39/0x70 [idpf]
...
[408508.139193] Call Trace:
[408508.139637]  <TASK>
[408508.140077]  __dev_close_many+0xbb/0x260
[408508.140533]  __dev_change_flags+0x1cf/0x280
[408508.140987]  netif_change_flags+0x26/0x70
[408508.141434]  dev_change_flags+0x3d/0xb0
[408508.141878]  devinet_ioctl+0x460/0x890
[408508.142321]  inet_ioctl+0x18e/0x1d0
[408508.142762]  ? _copy_to_user+0x22/0x70
[408508.143207]  sock_do_ioctl+0x3d/0xe0
[408508.143652]  sock_ioctl+0x10e/0x330
[408508.144091]  ? find_held_lock+0x2b/0x80
[408508.144537]  __x64_sys_ioctl+0x96/0xe0
[408508.144979]  do_syscall_64+0x79/0x3d0
[408508.145415]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[408508.145860] RIP: 0033:0x7f3e0bb4caff

Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: convert vport state to bitmap

[ Upstream commit 8dd72ebc73f37b216410db17340f15e6fb2cdb7b ]

Convert vport state to a bitmap and remove the DOWN state which is
redundant in the existing logic. There are no functional changes aside
from the use of bitwise operations when setting and checking the states.
Removed the double underscore to be consistent with the naming of other
bitmaps in the header and renamed current_state to vport_is_up to match
the meaning of the new variable.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Chittim Madhu <madhu.chittim@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: keep the netdev when a reset fails

[ Upstream commit 083029bd8b445595222a3cd14076b880781c1765 ]

During a successful reset the driver would re-allocate vport resources
while keeping the netdevs intact. However, in case of an error in the
init task, the netdev of the failing vport will be unregistered,
effectively removing the network interface:

[  121.211076] idpf 0000:83:00.0: enabling device (0100 -> 0102)
[  121.221976] idpf 0000:83:00.0: Device HW Reset initiated
[  124.161229] idpf 0000:83:00.0 ens801f0: renamed from eth0
[  124.163364] idpf 0000:83:00.0 ens801f0d1: renamed from eth1
[  125.934656] idpf 0000:83:00.0 ens801f0d2: renamed from eth2
[  128.218429] idpf 0000:83:00.0 ens801f0d3: renamed from eth3

ip -br a
ens801f0         UP
ens801f0d1       UP
ens801f0d2       UP
ens801f0d3       UP
echo 1 > /sys/class/net/ens801f0/device/reset

[  145.885537] idpf 0000:83:00.0: resetting
[  145.990280] idpf 0000:83:00.0: reset done
[  146.284766] idpf 0000:83:00.0: HW reset detected
[  146.296610] idpf 0000:83:00.0: Device HW Reset initiated
[  211.556719] idpf 0000:83:00.0: Transaction timed-out (op:526 cookie:7700 vc_op:526 salt:77 timeout:60000ms)
[  272.996705] idpf 0000:83:00.0: Transaction timed-out (op:502 cookie:7800 vc_op:502 salt:78 timeout:60000ms)

ip -br a
ens801f0d1       DOWN
ens801f0d2       DOWN
ens801f0d3       DOWN

Re-shuffle the logic in the error path of the init task to make sure the
netdevs remain intact. This will allow the driver to attempt recovery via
subsequent resets, provided the FW is still functional.

The main change is to make sure that idpf_decfg_netdev() is not called
should the init task fail during a reset. The error handling is
consolidated under unwind_vports, as the removed labels had the same
cleanup logic split depending on the point of failure.

Fixes: ce1b75d0635c ("idpf: add ptypes and MAC filter support")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

PCI/VGA: Don't assume the only VGA device on a system is `boot_vga`

[ Upstream commit fd390ff144513eb0310c350b1cf5fa8d6ddd0c53 ]

Some systems ship with multiple display class devices but not all
of them are VGA devices. If the "only" VGA device on the system is not
used for displaying the image on the screen marking it as `boot_vga`
because nothing was found is totally wrong.

This behavior actually leads to mistakes of the wrong device being
advertised to userspace and then userspace can make incorrect decisions.

As there is an accurate `boot_display` sysfs file stop lying about
`boot_vga` by assuming if nothing is found it's the right device.

Reported-by: Aaron Erhardt <aer@tuxedocomputers.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220712
Tested-by: Aaron Erhardt <aer@tuxedocomputers.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: ad90860bd10ee ("fbcon: Use screen info to find primary device")
Tested-by: Luke D. Jones <luke@ljones.dev>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260106044638.52906-1-superm1@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: fix memory leak in skb_segment_list for GRO packets

[ Upstream commit 238e03d0466239410b72294b79494e43d4fabe77 ]

When skb_segment_list() is called during packet forwarding, it handles
packets that were aggregated by the GRO engine.

Historically, the segmentation logic in skb_segment_list assumes that
individual segments are split from a parent SKB and may need to carry
their own socket memory accounting. Accordingly, the code transfers
truesize from the parent to the newly created segments.

Prior to commit ed4cccef64c1 ("gro: fix ownership transfer"), this
truesize subtraction in skb_segment_list() was valid because fragments
still carry a reference to the original socket.

However, commit ed4cccef64c1 ("gro: fix ownership transfer") changed
this behavior by ensuring that fraglist entries are explicitly
orphaned (skb->sk = NULL) to prevent illegal orphaning later in the
stack. This change meant that the entire socket memory charge remained
with the head SKB, but the corresponding accounting logic in
skb_segment_list() was never updated.

As a result, the current code unconditionally adds each fragment's
truesize to delta_truesize and subtracts it from the parent SKB. Since
the fragments are no longer charged to the socket, this subtraction
results in an effective under-count of memory when the head is freed.
This causes sk_wmem_alloc to remain non-zero, preventing socket
destruction and leading to a persistent memory leak.

The leak can be observed via KMEMLEAK when tearing down the networking
environment:

unreferenced object 0xffff8881e6eb9100 (size 2048):
  comm "ping", pid 6720, jiffies 4295492526
  backtrace:
    kmem_cache_alloc_noprof+0x5c6/0x800
    sk_prot_alloc+0x5b/0x220
    sk_alloc+0x35/0xa00
    inet6_create.part.0+0x303/0x10d0
    __sock_create+0x248/0x640
    __sys_socket+0x11b/0x1d0

Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST
packets constructed by GRO, the truesize adjustment is removed.

The call to skb_release_head_state() must be preserved. As documented in
commit cf673ed0e057 ("net: fix fraglist segmentation reference count
leak"), it is still required to correctly drop references to SKB
extensions that may be overwritten during __copy_skb_header().

Fixes: ed4cccef64c1 ("gro: fix ownership transfer")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260104213101.352887-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: pgtable: Cleanup useless VA_USER_XXX definitions

[ Upstream commit 5e5be092ffadcab0093464ccd9e30f0c5cce16b9 ]

These marcos are not used after commit b5b4287accd7 ("riscv: mm: Use
hint address in mmap if available"). Cleanup VA_USER_XXX definitions
in asm/pgtable.h.

Fixes: b5b4287accd7 ("riscv: mm: Use hint address in mmap if available")
Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20251201005850.702569-1-guoren@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: cpufeature: Fix Zk bundled extension missing Zknh

[ Upstream commit 8632180daf735074a746ce2b3808a8f2c079310e ]

The Zk extension is a bundle consisting of Zkn, Zkr, and Zkt. The Zkn
extension itself is a bundle consisting of Zbkb, Zbkc, Zbkx, Zknd, Zkne,
and Zknh.

The current implementation of riscv_zk_bundled_exts manually listed
the dependencies but missed RISCV_ISA_EXT_ZKNH.

Fix this by introducing a RISCV_ISA_EXT_ZKN macro that lists the Zkn
components and using it in both riscv_zk_bundled_exts and
riscv_zkn_bundled_exts.

This adds the missing Zknh extension to Zk and reduces code duplication.

Fixes: 0d8295ed975b ("riscv: add ISA extension parsing for scalar crypto")
Link: https://patch.msgid.link/20231114141256.126749-4-cleger@rivosinc.com/
Signed-off-by: Guodong Xu <guodong@riscstar.com>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://patch.msgid.link/20251223-zk-missing-zknh-v1-1-b627c990ee1a@riscstar.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: airoha: Fix npu rx DMA definitions

[ Upstream commit a7fc8c641cab855824c45e5e8877e40fd528b5df ]

Fix typos in npu rx DMA descriptor definitions.

Fixes: b3ef7bdec66fb ("net: airoha: Add airoha_offload.h header")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260102-airoha-npu-dma-rx-def-fixes-v1-1-205fc6bf7d94@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: fix NULL pointer dereference in do_abort_log_replay()

[ Upstream commit 530e3d4af566ca44807d79359b90794dea24c4f3 ]

Coverity reported a NULL pointer dereference issue (CID 1666756) in
do_abort_log_replay(). When btrfs_alloc_path() fails in
replay_one_buffer(), wc->subvol_path is NULL, but btrfs_abort_log_replay()
calls do_abort_log_replay() which unconditionally dereferences
wc->subvol_path when attempting to print debug information. Fix this by
adding a NULL check before dereferencing wc->subvol_path in
do_abort_log_replay().

Fixes: 2753e4917624 ("btrfs: dump detailed info and specific messages on log replay failures")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: only enforce free space tree if v1 cache is required for bs < ps cases

[ Upstream commit 30bcf4e824aa37d305502f52e1527c7b1eabef3d ]

[BUG]
Since the introduction of btrfs bs < ps support, v1 cache was never on
the plan due to its hard coded PAGE_SIZE usage, and the future plan to
properly deprecate it.

However for bs < ps cases, even if 'nospace_cache,clear_cache' mount
option is specified, it's never respected and free space tree is always
enabled:

  mkfs.btrfs -f -O ^bgt,fst $dev
  mount $dev $mnt -o clear_cache,nospace_cache
  umount $mnt
  btrfs ins dump-super $dev
  ...
  compat_ro_flags 0x3
          ( FREE_SPACE_TREE |
            FREE_SPACE_TREE_VALID )
  ...

This means a different behavior compared to bs >= ps cases.

[CAUSE]
The forcing usage of v2 space cache is done inside
btrfs_set_free_space_cache_settings(), however it never checks if we're
even using space cache but always enabling v2 cache.

[FIX]
Instead unconditionally enable v2 cache, only forcing v2 cache if the
old v1 cache is required.

Now v2 space cache can be properly disabled on bs < ps cases:

  mkfs.btrfs -f -O ^bgt,fst $dev
  mount $dev $mnt -o clear_cache,nospace_cache
  umount $mnt
  btrfs ins dump-super $dev
  ...
  compat_ro_flags 0x0
  ...

Fixes: 9f73f1aef98b ("btrfs: force v2 space cache usage for subpage mount")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: release path before initializing extent tree in btrfs_read_locked_inode()

[ Upstream commit 8731f2c50b0b1d2b58ed5b9671ef2c4bdc2f8347 ]

In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree()
while holding a path with a read locked leaf from a subvolume tree, and
btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can
trigger reclaim.

This can create a circular lock dependency which lockdep warns about with
the following splat:

   [6.1433] ======================================================
   [6.1574] WARNING: possible circular locking dependency detected
   [6.1583] 6.18.0+ #4 Tainted: G     U
   [6.1591] ------------------------------------------------------
   [6.1599] kswapd0/117 is trying to acquire lock:
   [6.1606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1625]
            but task is already holding lock:
   [6.1633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
   [6.1646]
            which lock already depends on the new lock.

   [6.1657]
            the existing dependency chain (in reverse order) is:
   [6.1667]
            -> #2 (fs_reclaim){+.+.}-{0:0}:
   [6.1677]        fs_reclaim_acquire+0x9d/0xd0
   [6.1685]        __kmalloc_cache_noprof+0x59/0x750
   [6.1694]        btrfs_init_file_extent_tree+0x90/0x100
   [6.1702]        btrfs_read_locked_inode+0xc3/0x6b0
   [6.1710]        btrfs_iget+0xbb/0xf0
   [6.1716]        btrfs_lookup_dentry+0x3c5/0x8e0
   [6.1724]        btrfs_lookup+0x12/0x30
   [6.1731]        lookup_open.isra.0+0x1aa/0x6a0
   [6.1739]        path_openat+0x5f7/0xc60
   [6.1746]        do_filp_open+0xd6/0x180
   [6.1753]        do_sys_openat2+0x8b/0xe0
   [6.1760]        __x64_sys_openat+0x54/0xa0
   [6.1768]        do_syscall_64+0x97/0x3e0
   [6.1776]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [6.1784]
            -> #1 (btrfs-tree-00){++++}-{3:3}:
   [6.1794]        lock_release+0x127/0x2a0
   [6.1801]        up_read+0x1b/0x30
   [6.1808]        btrfs_search_slot+0x8e0/0xff0
   [6.1817]        btrfs_lookup_inode+0x52/0xd0
   [6.1825]        __btrfs_update_delayed_inode+0x73/0x520
   [6.1833]        btrfs_commit_inode_delayed_inode+0x11a/0x120
   [6.1842]        btrfs_log_inode+0x608/0x1aa0
   [6.1849]        btrfs_log_inode_parent+0x249/0xf80
   [6.1857]        btrfs_log_dentry_safe+0x3e/0x60
   [6.1865]        btrfs_sync_file+0x431/0x690
   [6.1872]        do_fsync+0x39/0x80
   [6.1879]        __x64_sys_fsync+0x13/0x20
   [6.1887]        do_syscall_64+0x97/0x3e0
   [6.1894]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [6.1903]
            -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
   [6.1913]        __lock_acquire+0x15e9/0x2820
   [6.1920]        lock_acquire+0xc9/0x2d0
   [6.1927]        __mutex_lock+0xcc/0x10a0
   [6.1934]        __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1944]        btrfs_evict_inode+0x20b/0x4b0
   [6.1952]        evict+0x15a/0x2f0
   [6.1958]        prune_icache_sb+0x91/0xd0
   [6.1966]        super_cache_scan+0x150/0x1d0
   [6.1974]        do_shrink_slab+0x155/0x6f0
   [6.1981]        shrink_slab+0x48e/0x890
   [6.1988]        shrink_one+0x11a/0x1f0
   [6.1995]        shrink_node+0xbfd/0x1320
   [6.1002]        balance_pgdat+0x67f/0xc60
   [6.1321]        kswapd+0x1dc/0x3e0
   [6.1643]        kthread+0xff/0x240
   [6.1965]        ret_from_fork+0x223/0x280
   [6.1287]        ret_from_fork_asm+0x1a/0x30
   [6.1616]
            other info that might help us debug this:

   [6.1561] Chain exists of:
              &delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim

   [6.1503]  Possible unsafe locking scenario:

   [6.1110]        CPU0                    CPU1
   [6.1411]        ----                    ----
   [6.1707]   lock(fs_reclaim);
   [6.1998]                                lock(btrfs-tree-00);
   [6.1291]                                lock(fs_reclaim);
   [6.1581]   lock(&delayed_node->mutex);
   [6.1874]
             *** DEADLOCK ***

   [6.1716] 2 locks held by kswapd0/117:
   [6.1999]  #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
   [6.1294]  #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0
   [6.1596]
            stack backtrace:
   [6.1183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G     U 6.18.0+ #4 PREEMPT(lazy)
   [6.1185] Tainted: [U]=USER
   [6.1186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
   [6.1187] Call Trace:
   [6.1187]  <TASK>
   [6.1189]  dump_stack_lvl+0x6e/0xa0
   [6.1192]  print_circular_bug.cold+0x17a/0x1c0
   [6.1194]  check_noncircular+0x175/0x190
   [6.1197]  __lock_acquire+0x15e9/0x2820
   [6.1200]  lock_acquire+0xc9/0x2d0
   [6.1201]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1204]  __mutex_lock+0xcc/0x10a0
   [6.1206]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1208]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1211]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1213]  __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1215]  btrfs_evict_inode+0x20b/0x4b0
   [6.1217]  ? lock_acquire+0xc9/0x2d0
   [6.1220]  evict+0x15a/0x2f0
   [6.1222]  prune_icache_sb+0x91/0xd0
   [6.1224]  super_cache_scan+0x150/0x1d0
   [6.1226]  do_shrink_slab+0x155/0x6f0
   [6.1228]  shrink_slab+0x48e/0x890
   [6.1229]  ? shrink_slab+0x2d2/0x890
   [6.1231]  shrink_one+0x11a/0x1f0
   [6.1234]  shrink_node+0xbfd/0x1320
   [6.1236]  ? shrink_node+0xa2d/0x1320
   [6.1236]  ? shrink_node+0xbd3/0x1320
   [6.1239]  ? balance_pgdat+0x67f/0xc60
   [6.1239]  balance_pgdat+0x67f/0xc60
   [6.1241]  ? finish_task_switch.isra.0+0xc4/0x2a0
   [6.1246]  kswapd+0x1dc/0x3e0
   [6.1247]  ? __pfx_autoremove_wake_function+0x10/0x10
   [6.1249]  ? __pfx_kswapd+0x10/0x10
   [6.1250]  kthread+0xff/0x240
   [6.1251]  ? __pfx_kthread+0x10/0x10
   [6.1253]  ret_from_fork+0x223/0x280
   [6.1255]  ? __pfx_kthread+0x10/0x10
   [6.1257]  ret_from_fork_asm+0x1a/0x30
   [6.1260]  </TASK>

This is because:

1) The fsync task is holding an inode's delayed node mutex (for a
   directory) while calling __btrfs_update_delayed_inode() and that needs
   to do a search on the subvolume's btree (therefore read lock some
   extent buffers);

2) The lookup task, at btrfs_lookup(), triggered reclaim with the
   GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while
   holding a read lock on a subvolume leaf;

3) The reclaim triggered kswapd which is doing inode eviction for the
   directory inode the fsync task is using as an argument to
   btrfs_commit_inode_delayed_inode() - but in that call chain we are
   trying to read lock the same leaf that the lookup task is holding
   while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL
   allocation.

Fix this by calling btrfs_init_file_extent_tree() after we don't need the
path anymore and release it in btrfs_read_locked_inode().

Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/linux-btrfs/6e55113a22347c3925458a5d840a18401a38b276.camel@linux.intel.com/
Fixes: 8679d2687c35 ("btrfs: initialize inode::file_extent_tree after i_mode has been set")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

vsock: Make accept()ed sockets use custom setsockopt()

[ Upstream commit ce5e612dd411de096aa041b9e9325ba1bec5f9f4 ]

SO_ZEROCOPY handling in vsock_connectible_setsockopt() does not get called
on accept()ed sockets due to a missing flag. Flip it.

Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-1-64778d6c4f88@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/pm: force send pcie parmater on navi1x

[ Upstream commit dc8a887de1a7d397ab4131f45676e89565417aa8 ]

v1:
the PMFW didn't initialize the PCIe DPM parameters
and requires the KMD to actively provide these parameters.

v2:
clean & remove unused code logic (lijo)

Fixes: 1a18607c07bb ("drm/amd/pm: override pcie dpm parameters only if it is necessary")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4671
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b0dbd5db7cf1f81e4aaedd25cb5e72ce369387b2)
Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/pm: fix wrong pcie parameter on navi1x

[ Upstream commit 4f74c2dd970611d3ec3bb0d58215e73af5cd7214 ]

fix wrong pcie dpm parameter on navi1x

Fixes: 1a18607c07bb ("drm/amd/pm: override pcie dpm parameters only if it is necessary")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4671
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Co-developed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5c5189cf4b0cc0a22bac74a40743ee711cff07f8)
Signed-off-by: Sasha Levin <sashal@kernel.org>

perf: Ensure swevent hrtimer is properly destroyed

[ Upstream commit ff5860f5088e9076ebcccf05a6ca709d5935cfa9 ]

With the change to hrtimer_try_to_cancel() in
perf_swevent_cancel_hrtimer() it appears possible for the hrtimer to
still be active by the time the event gets freed.

Make sure the event does a full hrtimer_cancel() on the free path by
installing a perf_event::destroy handler.

Fixes: eb3182ef0405 ("perf/core: Fix system hang caused by cpu-clock usage")
Reported-by: CyberUnicorns <a101e_iotvul@163.com>
Tested-by: CyberUnicorns <a101e_iotvul@163.com>
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

inet: frags: drop fraglist conntrack references

[ Upstream commit 2ef02ac38d3c17f34a00c4b267d961a8d4b45d1a ]

Jakub added a warning in nf_conntrack_cleanup_net_list() to make debugging
leaked skbs/conntrack references more obvious.

syzbot reports this as triggering, and I can also reproduce this via
ip_defrag.sh selftest:

conntrack cleanup blocked for 60s
WARNING: net/netfilter/nf_conntrack_core.c:2512
[..]

conntrack clenups gets stuck because there are skbs with still hold nf_conn
references via their frag_list.

   net.core.skb_defer_max=0 makes the hang disappear.

Eric Dumazet points out that skb_release_head_state() doesn't follow the
fraglist.

ip_defrag.sh can only reproduce this problem since
commit 6471658dc66c ("udp: use skb_attempt_defer_free()"), but AFAICS this
problem could happen with TCP as well if pmtu discovery is off.

The relevant problem path for udp is:
1. netns emits fragmented packets
2. nf_defrag_v6_hook reassembles them (in output hook)
3. reassembled skb is tracked (skb owns nf_conn reference)
4. ip6_output refragments
5. refragmented packets also own nf_conn reference (ip6_fragment
   calls ip6_copy_metadata())
6. on input path, nf_defrag_v6_hook skips defragmentation: the
   fragments already have skb->nf_conn attached
7. skbs are reassembled via ipv6_frag_rcv()
8. skb_consume_udp -> skb_attempt_defer_free() -> skb ends up
   in pcpu freelist, but still has nf_conn reference.

Possible solutions:
1 let defrag engine drop nf_conn entry, OR
2 export kick_defer_list_purge() and call it from the conntrack
   netns exit callback, OR
3 add skb_has_frag_list() check to skb_attempt_defer_free()

2 & 3 also solve ip_defrag.sh hang but share same drawback:

Such reassembled skbs, queued to socket, can prevent conntrack module
removal until userspace has consumed the packet. While both tcp and udp
stack do call nf_reset_ct() before placing skb on socket queue, that
function doesn't iterate frag_list skbs.

Therefore drop nf_conn entries when they are placed in defrag queue.
Keep the nf_conn entry of the first (offset 0) skb so that reassembled
skb retains nf_conn entry for sake of TX path.

Note that fixes tag is incorrect; it points to the commit introducing the
'ip_defrag.sh reproducible problem': no need to backport this patch to
every stable kernel.

Reported-by: syzbot+4393c47753b7808dac7d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/693b0fa7.050a0220.4004e.040d.GAE@google.com/
Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260102140030.32367-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

virtio_net: fix device mismatch in devm_kzalloc/devm_kfree

[ Upstream commit acb4bc6e1ba34ae1a34a9334a1ce8474c909466e ]

Initial rss_hdr allocation uses virtio_device->device,
but virtnet_set_queues() frees using net_device->device.
This device mismatch causing below devres warning

[ 3788.514041] ------------[ cut here ]------------
[ 3788.514044] WARNING: drivers/base/devres.c:1095 at devm_kfree+0x84/0x98, CPU#16: vdpa/1463
[ 3788.514054] Modules linked in: octep_vdpa virtio_net virtio_vdpa [last unloaded: virtio_vdpa]
[ 3788.514064] CPU: 16 UID: 0 PID: 1463 Comm: vdpa Tainted: G        W           6.18.0 #10 PREEMPT
[ 3788.514067] Tainted: [W]=WARN
[ 3788.514069] Hardware name: Marvell CN106XX board (DT)
[ 3788.514071] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 3788.514074] pc : devm_kfree+0x84/0x98
[ 3788.514076] lr : devm_kfree+0x54/0x98
[ 3788.514079] sp : ffff800084e2f220
[ 3788.514080] x29: ffff800084e2f220 x28: ffff0003b2366000 x27: 000000000000003f
[ 3788.514085] x26: 000000000000003f x25: ffff000106f17c10 x24: 0000000000000080
[ 3788.514089] x23: ffff00045bb8ab08 x22: ffff00045bb8a000 x21: 0000000000000018
[ 3788.514093] x20: ffff0004355c3080 x19: ffff00045bb8aa00 x18: 0000000000080000
[ 3788.514098] x17: 0000000000000040 x16: 000000000000001f x15: 000000000007ffff
[ 3788.514102] x14: 0000000000000488 x13: 0000000000000005 x12: 00000000000fffff
[ 3788.514106] x11: ffffffffffffffff x10: 0000000000000005 x9 : ffff800080c8c05c
[ 3788.514110] x8 : ffff800084e2eeb8 x7 : 0000000000000000 x6 : 000000000000003f
[ 3788.514115] x5 : ffff8000831bafe0 x4 : ffff800080c8b010 x3 : ffff0004355c3080
[ 3788.514119] x2 : ffff0004355c3080 x1 : 0000000000000000 x0 : 0000000000000000
[ 3788.514123] Call trace:
[ 3788.514125]  devm_kfree+0x84/0x98 (P)
[ 3788.514129]  virtnet_set_queues+0x134/0x2e8 [virtio_net]
[ 3788.514135]  virtnet_probe+0x9c0/0xe00 [virtio_net]
[ 3788.514139]  virtio_dev_probe+0x1e0/0x338
[ 3788.514144]  really_probe+0xc8/0x3a0
[ 3788.514149]  __driver_probe_device+0x84/0x170
[ 3788.514152]  driver_probe_device+0x44/0x120
[ 3788.514155]  __device_attach_driver+0xc4/0x168
[ 3788.514158]  bus_for_each_drv+0x8c/0xf0
[ 3788.514161]  __device_attach+0xa4/0x1c0
[ 3788.514164]  device_initial_probe+0x1c/0x30
[ 3788.514168]  bus_probe_device+0xb4/0xc0
[ 3788.514170]  device_add+0x614/0x828
[ 3788.514173]  register_virtio_device+0x214/0x258
[ 3788.514175]  virtio_vdpa_probe+0xa0/0x110 [virtio_vdpa]
[ 3788.514179]  vdpa_dev_probe+0xa8/0xd8
[ 3788.514183]  really_probe+0xc8/0x3a0
[ 3788.514186]  __driver_probe_device+0x84/0x170
[ 3788.514189]  driver_probe_device+0x44/0x120
[ 3788.514192]  __device_attach_driver+0xc4/0x168
[ 3788.514195]  bus_for_each_drv+0x8c/0xf0
[ 3788.514197]  __device_attach+0xa4/0x1c0
[ 3788.514200]  device_initial_probe+0x1c/0x30
[ 3788.514203]  bus_probe_device+0xb4/0xc0
[ 3788.514206]  device_add+0x614/0x828
[ 3788.514209]  _vdpa_register_device+0x58/0x88
[ 3788.514211]  octep_vdpa_dev_add+0x104/0x228 [octep_vdpa]
[ 3788.514215]  vdpa_nl_cmd_dev_add_set_doit+0x2d0/0x3c0
[ 3788.514218]  genl_family_rcv_msg_doit+0xe4/0x158
[ 3788.514222]  genl_rcv_msg+0x218/0x298
[ 3788.514225]  netlink_rcv_skb+0x64/0x138
[ 3788.514229]  genl_rcv+0x40/0x60
[ 3788.514233]  netlink_unicast+0x32c/0x3b0
[ 3788.514237]  netlink_sendmsg+0x170/0x3b8
[ 3788.514241]  __sys_sendto+0x12c/0x1c0
[ 3788.514246]  __arm64_sys_sendto+0x30/0x48
[ 3788.514249]  invoke_syscall.constprop.0+0x58/0xf8
[ 3788.514255]  do_el0_svc+0x48/0xd0
[ 3788.514259]  el0_svc+0x48/0x210
[ 3788.514264]  el0t_64_sync_handler+0xa0/0xe8
[ 3788.514268]  el0t_64_sync+0x198/0x1a0
[ 3788.514271] ---[ end trace 0000000000000000 ]---

Fix by using virtio_device->device consistently for
allocation and deallocation

Fixes: 4944be2f5ad8c ("virtio_net: Allocate rss_hdr with devres")
Signed-off-by: Kommula Shiva Shankar <kshankar@marvell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20260102101900.692770-1-kshankar@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

bnxt_en: Fix potential data corruption with HW GRO/LRO

[ Upstream commit ffeafa65b2b26df2f5b5a6118d3174f17bd12ec5 ]

Fix the max number of bits passed to find_first_zero_bit() in
bnxt_alloc_agg_idx().  We were incorrectly passing the number of
long words.  find_first_zero_bit() may fail to find a zero bit and
cause a wrong ID to be used.  If the wrong ID is already in use, this
can cause data corruption.  Sometimes an error like this can also be
seen:

bnxt_en 0000:83:00.0 enp131s0np0: TPA end agg_buf 2 != expected agg_bufs 1

Fix it by passing the correct number of bits MAX_TPA_P5.  Use
DECLARE_BITMAP() to more cleanly define the bitmap.  Add a sanity
check to warn if a bit cannot be found and reset the ring [MChan].

Fixes: ec4d8e7cf024 ("bnxt_en: Add TPA ID mapping logic for 57500 chips.")
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Srijit Bose <srijit.bose@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251231083625.3911652-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: wwan: iosm: Fix memory leak in ipc_mux_deinit()

[ Upstream commit 92e6e0a87f6860a4710f9494f8c704d498ae60f8 ]

Commit 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
allocated memory for pp_qlt in ipc_mux_init() but did not free it in
ipc_mux_deinit(). This results in a memory leak when the driver is
unloaded.

Free the allocated memory in ipc_mux_deinit() to fix the leak.

Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn>
Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20251230071853.1062223-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/ena: fix missing lock when update devlink params

[ Upstream commit 8da901ffe497a53fa4ecc3ceed0e6d771586f88e ]

Fix assert lock warning while calling devl_param_driverinit_value_set()
in ena.

WARNING: net/devlink/core.c:261 at devl_assert_locked+0x62/0x90, CPU#0: kworker/0:0/9
CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.19.0-rc2+ #1 PREEMPT(lazy)
Hardware name: Amazon EC2 m8i-flex.4xlarge/, BIOS 1.0 10/16/2017
Workqueue: events work_for_cpu_fn
RIP: 0010:devl_assert_locked+0x62/0x90

Call Trace:
<TASK>
devl_param_driverinit_value_set+0x15/0x1c0
ena_devlink_alloc+0x18c/0x220 [ena]
? __pfx_ena_devlink_alloc+0x10/0x10 [ena]
? trace_hardirqs_on+0x18/0x140
? lockdep_hardirqs_on+0x8c/0x130
? __raw_spin_unlock_irqrestore+0x5d/0x80
? __raw_spin_unlock_irqrestore+0x46/0x80
? devm_ioremap_wc+0x9a/0xd0
ena_probe+0x4d2/0x1b20 [ena]
? __lock_acquire+0x56a/0xbd0
? __pfx_ena_probe+0x10/0x10 [ena]
? local_clock+0x15/0x30
? __lock_release.isra.0+0x1c9/0x340
? mark_held_locks+0x40/0x70
? lockdep_hardirqs_on_prepare.part.0+0x92/0x170
? trace_hardirqs_on+0x18/0x140
? lockdep_hardirqs_on+0x8c/0x130
? __raw_spin_unlock_irqrestore+0x5d/0x80
? __raw_spin_unlock_irqrestore+0x46/0x80
? __pfx_ena_probe+0x10/0x10 [ena]
......
</TASK>

Fixes: 816b52624cf6 ("net: ena: Control PHC enable through devlink")
Signed-off-by: Frank Liang <xiliang@redhat.com>
Reviewed-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251231145808.6103-1-xiliang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Dealloc forgotten PSP RX modify header

[ Upstream commit 0462a15d2d1fafd3d48cf3c7c67393e42d03908c ]

The commit which added RX steering rules for PSP forgot to free a modify
header HW object on the cleanup path, which lead to health errors when
reloading the driver and uninitializing the device:

mlx5_core 0000:08:00.0: poll_health:803:(pid 3021): Fatal error 3 detected

Fix that by saving the modify header pointer in the PSP steering struct
and deallocating it after freeing the rule which references it.

Fixes: 9536fbe10c9d ("net/mlx5e: Add PSP steering in local NIC RX")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20251225132717.358820-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Don't print error message due to invalid module

[ Upstream commit 144297e2a24e3e54aee1180ec21120ea38822b97 ]

Dumping module EEPROM on newer modules is supported through the netlink
interface only.

Querying with old userspace ethtool (or other tools, such as 'lshw')
which still uses the ioctl interface results in an error message that
could flood dmesg (in addition to the expected error return value).
The original message was added under the assumption that the driver
should be able to handle all module types, but now that such flows are
easily triggered from userspace, it doesn't serve its purpose.

Change the log level of the print in mlx5_query_module_eeprom() to
debug.

Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20251225132717.358820-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Don't gate FEC histograms on ppcnt_statistical_group

[ Upstream commit 6c75dc9de40ff91ec2b621b78f6cd9031762067c ]

Currently, the ppcnt_statistical_group capability check
incorrectly gates access to FEC histogram statistics.
This capability applies only to statistical and physical
counter groups, not for histogram data.

Restrict the ppcnt_statistical_group check to the
Physical_Layer_Counters and Physical_Layer_Statistical_Counters
groups.
Histogram statistics access remains gated by the pphcr
capability.

The issue is harmless as of today, as it happens that
ppcnt_statistical_group is set on all existing devices that
have pphcr set.

Fixes: 6b81b8a0b197 ("net/mlx5e: Don't query FEC statistics when FEC is disabled")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20251225132717.358820-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Lag, multipath, give priority for routes with smaller network prefix

[ Upstream commit 31057979cdadfee9f934746fd84046b43506ba61 ]

Today multipath offload is controlled by a single route and the route
controlling is selected if it meets one of the following criteria:
        1. No controlling route is set.
        2. New route destination is the same as old one.
        3. New route metric is lower than old route metric.

This can cause unwanted behaviour in case a new route is added
with a smaller network prefix which should get the priority.

Fix this by adding a new criteria to give priority to new route with
a smaller network prefix.

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20251225132717.358820-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updates

[ Upstream commit 02d1e1a3f9239cdb3ecf2c6d365fb959d1bf39df ]

Directly increment the TSO features incurs a side effect: it will also
directly clear the flags in NETIF_F_ALL_FOR_ALL on the master device,
which can cause issues such as the inability to enable the nocache copy
feature on the bonding driver.

The fix is to include NETIF_F_ALL_FOR_ALL in the update mask, thereby
preventing it from being cleared.

Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master")
Signed-off-by: Di Zhu <zhud@hygon.cn>
Link: https://patch.msgid.link/20251224012224.56185-1-zhud@hygon.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: sock: fix hardened usercopy panic in sock_recv_errqueue

[ Upstream commit 2a71a1a8d0ed718b1c7a9ac61f07e5755c47ae20 ]

skbuff_fclone_cache was created without defining a usercopy region,
[1] unlike skbuff_head_cache which properly whitelists the cb[] field.
[2] This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is
enabled and the kernel attempts to copy sk_buff.cb data to userspace
via sock_recv_errqueue() -> put_cmsg().

The crash occurs when: 1. TCP allocates an skb using alloc_skb_fclone()
   (from skbuff_fclone_cache) [1]
2. The skb is cloned via skb_clone() using the pre-allocated fclone
[3] 3. The cloned skb is queued to sk_error_queue for timestamp
reporting 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb
[4] 6. __check_heap_object() fails because skbuff_fclone_cache has no
   usercopy whitelist [5]

When cloned skbs allocated from skbuff_fclone_cache are used in the
socket error queue, accessing the sock_exterr_skb structure in skb->cb
via put_cmsg() triggers a usercopy hardening violation:

[    5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)!
[    5.382796] kernel BUG at mm/usercopy.c:102!
[    5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[    5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7
[    5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
[    5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490
[    5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
[    5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74
[    5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0
[    5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74
[    5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001
[    5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00
[    5.384903] FS:  0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000
[    5.384903] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0
[    5.384903] PKRU: 55555554
[    5.384903] Call Trace:
[    5.384903]  <TASK>
[    5.384903]  __check_heap_object+0x9a/0xd0
[    5.384903]  __check_object_size+0x46c/0x690
[    5.384903]  put_cmsg+0x129/0x5e0
[    5.384903]  sock_recv_errqueue+0x22f/0x380
[    5.384903]  tls_sw_recvmsg+0x7ed/0x1960
[    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.384903]  ? schedule+0x6d/0x270
[    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.384903]  ? mutex_unlock+0x81/0xd0
[    5.384903]  ? __pfx_mutex_unlock+0x10/0x10
[    5.384903]  ? __pfx_tls_sw_recvmsg+0x10/0x10
[    5.384903]  ? _raw_spin_lock_irqsave+0x8f/0xf0
[    5.384903]  ? _raw_read_unlock_irqrestore+0x20/0x40
[    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5

The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
  - sizeof(struct sk_buff) = 232 - offsetof(struct sk_buff, cb) = 40 -
  offset of skb2.cb in fclones = 232 + 40 = 272 - crash offset 296 =
  272 + 24 (inside sock_exterr_skb.ee)

This patch uses a local stack variable as a bounce buffer to avoid the hardened usercopy check failure.

[1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
[2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
[3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
[4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
[5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719

Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251223203534.1392218-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: phy: mxl-86110: Add power management and soft reset support

[ Upstream commit 62f7edd59964eb588e96fce1ad35a2327ea54424 ]

Implement soft_reset, suspend, and resume callbacks using
genphy_soft_reset(), genphy_suspend(), and genphy_resume()
to fix PHY initialization and power management issues.

The soft_reset callback is needed to properly recover the PHY after an
ifconfig down/up cycle. Without it, the PHY can remain in power-down
state, causing MDIO register access failures during config_init().
The soft reset ensures the PHY is operational before configuration.

The suspend/resume callbacks enable proper power management during
system suspend/resume cycles.

Fixes: b2908a989c59 ("net: phy: add driver for MaxLinear MxL86110 PHY")
Signed-off-by: Stefano Radaelli <stefano.r@variscite.com>
Link: https://patch.msgid.link/20251223120940.407195-1-stefano.r@variscite.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

inet: ping: Fix icmp out counting

[ Upstream commit 4c0856c225b39b1def6c9a6bc56faca79550da13 ]

When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO
messages, ICMP_MIB_OUTMSGS is counted twice.

    ping_v4_sendmsg
      ping_v4_push_pending_frames
        ip_push_pending_frames
          ip_finish_skb
            __ip_make_skb
              icmp_out_count(net, icmp_type); // first count
      icmp_out_count(sock_net(sk), user_icmph.type); // second count

However, when the ping program uses an IPPROTO_RAW socket,
ICMP_MIB_OUTMSGS is counted correctly only once.

Therefore, the first count should be removed.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251224063145.3615282-1-yuan.gao@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: Fix crash when adding interface under a lag

[ Upstream commit 34f3ff52cb9fa7dbf04f5c734fcc4cb6ed5d1a95 ]

Commit 15faa1f67ab4 ("lan966x: Fix crash when adding interface under a lag")
fixed a similar issue in the lan966x driver caused by a NULL pointer dereference.
The ocelot_set_aggr_pgids() function in the ocelot driver has similar logic
and is susceptible to the same crash.

This issue specifically affects the ocelot_vsc7514.c frontend, which leaves
unused ports as NULL pointers. The felix_vsc9959.c frontend is unaffected as
it uses the DSA framework which registers all ports.

Fix this by checking if the port pointer is valid before accessing it.

Fixes: 528d3f190c98 ("net: mscc: ocelot: drop the use of the "lags" array")
Signed-off-by: Jerry Wu <w.7erry@foxmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/tencent_75EF812B305E26B0869C673DD1160866C90A@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress

[ Upstream commit 3128df6be147768fe536986fbb85db1d37806a9f ]

When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is
incorrectly stripped from frames during egress processing.

br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN
from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also
moves any "next" VLAN from the payload into hwaccel:

    /* move next vlan tag to hw accel tag */
    __skb_vlan_pop(skb, &vlan_tci);
    __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);

For QinQ frames where the C-VLAN sits in the payload, this moves it to
hwaccel where it gets lost during VXLAN encapsulation.

Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only
the hwaccel S-VLAN and leaves the payload untouched.

This path is only taken when vlan_tunnel is enabled and tunnel_info
is configured, so 802.1Q bridges are unaffected.

Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN
preserved in VXLAN payload via tcpdump.

Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: marvell: prestera: fix NULL dereference on devlink_alloc() failure

[ Upstream commit a428e0da1248c353557970848994f35fd3f005e2 ]

devlink_alloc() may return NULL on allocation failure, but
prestera_devlink_alloc() unconditionally calls devlink_priv() on
the returned pointer.

This leads to a NULL pointer dereference if devlink allocation fails.
Add a check for a NULL devlink pointer and return NULL early to avoid
the crash.

Fixes: 34dd1710f5a3 ("net: marvell: prestera: Add basic devlink support")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: Elad Nachman <enachman@marvell.com>
Link: https://patch.msgid.link/20251230052124.897012-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_conncount: update last_gc only when GC has been performed

[ Upstream commit 7811ba452402d58628e68faedf38745b3d485e3c ]

Currently last_gc is being updated everytime a new connection is
tracked, that means that it is updated even if a GC wasn't performed.
With a sufficiently high packet rate, it is possible to always bypass
the GC, causing the list to grow infinitely.

Update the last_gc value only when a GC has been actually performed.

Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_tables: fix memory leak in nf_tables_newrule()

[ Upstream commit d077e8119ddbb4fca67540f1a52453631a47f221 ]

In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to
the err_release_rule label without freeing the allocated flow, leading
to a memory leak.

Fix this by adding a new label err_destroy_flow and jumping to it when
nft_use_inc() fails. This ensures that the flow is properly released
in this error case.

Fixes: 1689f25924ada ("netfilter: nf_tables: report use refcount overflow")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

gpio: pca953x: handle short interrupt pulses on PCAL devices

[ Upstream commit 014a17deb41201449f76df2b20c857a9c3294a7c ]

GPIO drivers with latch input support may miss short pulses on input
pins even when input latching is enabled. The generic interrupt logic in
the pca953x driver reports interrupts by comparing the current input
value against the previously sampled one and only signals an event when
a level change is observed between two reads.

For short pulses, the first edge is captured when the input register is
read, but if the signal returns to its previous level before the read,
the second edge is not observed. As a result, successive pulses can
produce identical input values at read time and no level change is
detected, causing interrupts to be missed. Below timing diagram shows
this situation where the top signal is the input pin level and the
bottom signal indicates the latched value.

─────┐     ┌──*───────────────┐     ┌──*─────────────────┐     ┌──*───
     │     │  .               │     │  .                 │     │  .
     │     │  │               │     │  │                 │     │  │
     └──*──┘  │               └──*──┘  │                 └──*──┘  │
Input   │     │                  │     │                    │     │
        ▼     │                  ▼     │                    ▼     │
       IRQ    │                 IRQ    │                   IRQ    │
              .                        .                          .
─────┐        .┌──────────────┐        .┌────────────────┐        .┌──
     │         │              │         │                │         │
     │         │              │         │                │         │
     └────────*┘              └────────*┘                └────────*┘
Latched       │                        │                          │
              ▼                        ▼                          ▼
            READ 0                   READ 0                     READ 0
                                   NO CHANGE                  NO CHANGE

PCAL variants provide an interrupt status register that records which
pins triggered an interrupt, but the status and input registers cannot
be read atomically. The interrupt status is only cleared when the input
port is read, and the input value must also be read to determine the
triggering edge. If another interrupt occurs on a different line after
the status register has been read but before the input register is
sampled, that event will not be reflected in the earlier status
snapshot, so relying solely on the interrupt status register is also
insufficient.

Support for input latching and interrupt status handling was previously
added by [1], but the interrupt status-based logic was reverted by [2]
due to these issues. This patch addresses the original problem by
combining both sources of information. Events indicated by the interrupt
status register are merged with events detected through the existing
level-change logic. As a result:

* short pulses, whose second edges are invisible, are detected via the
  interrupt status register, and
* interrupts that occur between the status and input reads are still
  caught by the generic level-change logic.

This significantly improves robustness on devices that signal interrupts
as short pulses, while avoiding the issues that led to the earlier
reversion. In practice, even if only the first edge of a pulse is
observable, the interrupt is reliably detected.

This fixes missed interrupts from an Ilitek touch controller with its
interrupt line connected to a PCAL6416A, where active-low pulses are
approximately 200 us long.

[1] commit 44896beae605 ("gpio: pca953x: add PCAL9535 interrupt support for Galileo Gen2")
[2] commit d6179f6c6204 ("gpio: pca953x: Improve interrupt support")

Fixes: d6179f6c6204 ("gpio: pca953x: Improve interrupt support")
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20251217153050.142057-1-ernestvanhoecke@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

gpiolib: fix race condition for gdev->srcu

[ Upstream commit a7ac22d53d0990152b108c3f4fe30df45fcb0181 ]

If two drivers were calling gpiochip_add_data_with_key(), one may be
traversing the srcu-protected list in gpio_name_to_desc(), meanwhile
other has just added its gdev in gpiodev_add_to_list_unlocked().
This creates a non-mutexed and non-protected timeframe, when one
instance is dereferencing and using &gdev->srcu, before the other
has initialized it, resulting in crash:

[    4.935481] Unable to handle kernel paging request at virtual address ffff800272bcc000
[    4.943396] Mem abort info:
[    4.943400]   ESR = 0x0000000096000005
[    4.943403]   EC = 0x25: DABT (current EL), IL = 32 bits
[    4.943407]   SET = 0, FnV = 0
[    4.943410]   EA = 0, S1PTW = 0
[    4.943413]   FSC = 0x05: level 1 translation fault
[    4.943416] Data abort info:
[    4.943418]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[    4.946220]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    4.955261]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    4.955268] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000038e6c000
[    4.961449] [ffff800272bcc000] pgd=0000000000000000
[    4.969203] , p4d=1000000039739003
[    4.979730] , pud=0000000000000000
[    4.980210] phandle (CPU): 0x0000005e, phandle (BE): 0x5e000000 for node "reset"
[    4.991736] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
...
[    5.121359] pc : __srcu_read_lock+0x44/0x98
[    5.131091] lr : gpio_name_to_desc+0x60/0x1a0
[    5.153671] sp : ffff8000833bb430
[    5.298440]
[    5.298443] Call trace:
[    5.298445]  __srcu_read_lock+0x44/0x98
[    5.309484]  gpio_name_to_desc+0x60/0x1a0
[    5.320692]  gpiochip_add_data_with_key+0x488/0xf00
    5.946419] ---[ end trace 0000000000000000 ]---

Move initialization code for gdev fields before it is added to
gpio_devices, with adjacent initialization code.
Adjust goto statements  to reflect modified order of operations

Fixes: 47d8b4c1d868 ("gpio: add SRCU infrastructure to struct gpio_device")
Reviewed-by: Jakub Lewalski <jakub.lewalski@nokia.com>
Signed-off-by: Paweł Narewski <pawel.narewski@nokia.com>
[Bartosz: fixed a build issue, removed stray newline]
Link: https://lore.kernel.org/r/20251224082641.10769-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

gpiolib: rename GPIO chip printk macros

[ Upstream commit d4f335b410ddbe3e99f48f8b5ea78a25041274f1 ]

The chip_$level() macros take struct gpio_chip as argument so make it
follow the convention of using the 'gpiochip_' prefix.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Stable-dep-of: a7ac22d53d09 ("gpiolib: fix race condition for gdev->srcu")
Signed-off-by: Sasha Levin <sashal@kernel.org>

gpiolib: remove unnecessary 'out of memory' messages

[ Upstream commit 0ba6f1ed3808b1f095fbdb490006f0ecd00f52bd ]

We don't need to add additional logs when returning -ENOMEM so remove
unnecessary error messages.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Stable-dep-of: a7ac22d53d09 ("gpiolib: fix race condition for gdev->srcu")
Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nft_synproxy: avoid possible data-race on update operation

[ Upstream commit 36a3200575642846a96436d503d46544533bb943 ]

During nft_synproxy eval we are reading nf_synproxy_info struct which
can be modified on update operation concurrently. As nf_synproxy_info
struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations.

Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nft_set_pipapo: fix range overlap detection

[ Upstream commit 7711f4bb4b360d9c0ff84db1c0ec91e385625047 ]

set->klen has to be used, not sizeof().  The latter only compares a
single register but a full check of the entire key is needed.

Example:
table ip t {
        map s {
                typeof iifname . ip saddr : verdict
                flags interval
        }
}

nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected
nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected
nft add element t s '{ "lo" . 10.0.0.0/8 : drop }' # bug: no error

The 3rd 'add element' should be rejected via -ENOTEMPTY, not -EEXIST,
so userspace / nft can report an error to the user.

The latter is only correct for the 2nd case (re-add of existing element).

As-is, userspace is told that the command was successful, but no elements were
added.

After this patch, 3rd command gives:
Error: Could not process rule: File exists
add element t s { "lo" . 127.0.0.0/8 . "lo"  : drop }
                  ^^^^^^^^^^^^^^^^^^^^^^^^^

Fixes: 0eb4b5ee33f2 ("netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: mba8mx: Fix Ethernet PHY IRQ support

[ Upstream commit 89e87d0dc87eb3654c9ae01afc4a18c1c6d1e523 ]

Ethernet PHY interrupt mode is level triggered. Adjust the mode
accordingly.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 70cf622bb16e ("arm64: dts: mba8mx: Add Ethernet PHY IRQ support")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: imx8qm-ss-dma: correct the dma channels of lpuart

[ Upstream commit a988caeed9d918452aa0a68de2c6e94d86aa43ba ]

The commit 616effc0272b5 ("arm64: dts: imx8: Fix lpuart DMA channel
order") swap uart rx and tx channel at common imx8-ss-dma.dtsi. But miss
update imx8qm-ss-dma.dtsi.

The commit 5a8e9b022e569 ("arm64: dts: imx8qm-ss-dma: Pass lpuart
dma-names") just simple add dma-names as binding doc requirement.

Correct lpuart0 - lpuart3 dma rx and tx channels, and use defines for
the FSL_EDMA_RX flag.

Fixes: 5a8e9b022e56 ("arm64: dts: imx8qm-ss-dma: Pass lpuart dma-names")
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM

[ Upstream commit c63749a7ddc59ac6ec0b05abfa0a21af9f2c1d38 ]

Add missing 'clocks' property to LAN8740Ai PHY node, to allow the PHY driver
to manage LAN8740Ai CLKIN reference clock supply. This fixes sporadic link
bouncing caused by interruptions on the PHY reference clock, by letting the
PHY driver manage the reference clock and assure there are no interruptions.

This follows the matching PHY driver recommendation described in commit
bedd8d78aba3 ("net: phy: smsc: LAN8710/20: add phy refclk in support")

Fixes: 8d6712695bc8 ("arm64: dts: imx8mp: Add support for DH electronics i.MX8M Plus DHCOM and PDK2")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Tested-by: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: freescale: tx8p-ml81: fix eqos nvmem-cells

[ Upstream commit cdf4e631eec5ddd49bb625df9fb144d6ecdd6f15 ]

On this SoM eqos is the primary ethernet interface, Ka-Ro fuses the
address for it in eth_mac1, eth_mac2 seems to be left unfused. In their
downstream u-boot they fetch it from eth_mac1 [1][2], by setting alias
of eqos to ethernet0, the driver then fetches the mac address based on
the alias number.

Set eqos to read from eth_mac1 instead of eth_mac2. Also set fec to
point at eth_mac2 as it may be fused later even though it is disabled
by default.

With this changed barebox is now capable of loading the correct address.

Link: https://github.com/karo-electronics/karo-tx-uboot/blob/380543278410bbf04264d80a3bfbe340b8e62439/drivers/net/dwc_eth_qos.c#L1167
Link: https://github.com/karo-electronics/karo-tx-uboot/blob/380543278410bbf04264d80a3bfbe340b8e62439/arch/arm/dts/imx8mp-karo.dtsi#L12
Fixes: bac63d7c5f46 ("arm64: dts: freescale: add Ka-Ro Electronics tx8p-ml81 COM")
Signed-off-by: Maud Spierings <maudspierings@gocontroll.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: freescale: moduline-display: fix compatible

[ Upstream commit 056c68875122dd342782e5956ed145fe9e059614 ]

The compatibles should include the SoM compatible, this board is based
on the Ka-Ro TX8P-ML81 SoM, so add it to allow using shared code in the
bootloader which uses upstream Linux devicetrees as a base.

Also add the hardware revision to the board compatible to handle
revision specific quirks in the bootloader/userspace.

This is a breaking change, but it is early enough that it can be
corrected without causing any issues.

Fixes: 03f07be54cdc ("arm64: dts: freescale: Add the GOcontroll Moduline Display baseboard")
Signed-off-by: Maud Spierings <maudspierings@gocontroll.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: dts: imx6q-ba16: fix RTC interrupt level

[ Upstream commit e6a4eedd49ce27c16a80506c66a04707e0ee0116 ]

RTC interrupt level should be set to "LOW". This was revealed by the
introduction of commit:

f181987ef477 ("rtc: m41t80: use IRQ flags obtained from fwnode")

which changed the way IRQ type is obtained.

Fixes: 56c27310c1b4 ("ARM: dts: imx: Add Advantech BA-16 Qseven module")
Signed-off-by: Ian Ray <ian.ray@gehealthcare.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: add off-on-delay-us for usdhc2 regulator

[ Upstream commit ca643894a37a25713029b36cfe7d1bae515cac08 ]

For SD card, according to the spec requirement, for sd card power reset
operation, it need sd card supply voltage to be lower than 0.5v and keep
over 1ms, otherwise, next time power back the sd card supply voltage to
3.3v, sd card can't support SD3.0 mode again.

To match such requirement on imx8qm-mek board, add 4.8ms delay between
sd power off and power on.

Fixes: 307fd14d4b14 ("arm64: dts: imx: add imx8qm mek support")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: imx8qm-mek: correct the light sensor interrupt type to low level

[ Upstream commit e0d8678c2f09dca22e6197321f223fa9a0ca2839 ]

light sensor isl29023 share the interrupt with lsm303arg, but these
two devices use different interrupt type. According to the datasheet
of these two devides, both support low level trigger type, so correct
the interrupt type here to avoid the following error log:

irq: type mismatch, failed to map hwirq-11 for gpio@5d0c0000!

Fixes: 9918092cbb0e ("arm64: dts: imx8qm-mek: add i2c0 and children devices")
Fixes: 1d8a9f043a77 ("arm64: dts: imx8: use defines for interrupts")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

crypto: qat - fix duplicate restarting msg during AER error

[ Upstream commit 961ac9d97be72267255f1ed841aabf6694b17454 ]

The restarting message from PF to VF is sent twice during AER error
handling: once from adf_error_detected() and again from
adf_disable_sriov().
This causes userspace subservices to shutdown unexpectedly when they
receive a duplicate restarting message after already being restarted.

Avoid calling adf_pf2vf_notify_restarting() and
adf_pf2vf_wait_for_restarting_complete() from adf_error_detected() so
that the restarting msg is sent only once from PF to VF.

Fixes: 9567d3dc760931 ("crypto: qat - improve aer error reset handling")
Signed-off-by: Harshita Bhilwaria <harshita.bhilwaria@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Ravikumar PM <ravikumar.pm@intel.com>
Reviewed-by: Srikanth Thokala <srikanth.thokala@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>

pinctrl: mediatek: mt8189: restore previous register base name array order

[ Upstream commit fa917d3d570279dc3d699cbd947d0da0fde2e402 ]

In mt8189-pinctrl driver, a previous commit changed the register base
name array (mt8189_pinctrl_register_base_names) entry name and order to
align it with the same name and order as the "mediatek,mt8189-pinctrl"
devicetree bindings. The new order (by ascending register address) now
causes an issue with MT8189 pinctrl configuration.

MT8189 SoC has multiple base addresses for the pin configuration
registers. Several constant data structures, declaring each pin
configuration, are using PIN_FIELD_BASE() macro which i_base parameter
indicates for a given pin the lookup index in the base register address
array of the driver internal data for the configuration register
read/write accesses. But in practice, this parameter is given a
hardcoded numerical value that corresponds to the expected base
register entry index in mt8189_pinctrl_register_base_names array.
Since this array reordering, the i_base index matching is no more
correct.

So, in order to avoid modifying over a thousand of PIN_FIELD_BASE()
calls, restore previous mt8189_pinctrl_register_base_names entry order.

Fixes: 518919276c41 ("pinctrl: mediatek: mt8189: align register base names to dt-bindings ones")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

netfs: Fix early read unlock of page with EOF in middle

[ Upstream commit 570ad253a3455a520f03c2136af8714bc780186d ]

The read result collection for buffered reads seems to run ahead of the
completion of subrequests under some circumstances, as can be seen in the
following log snippet:

    9p_client_res: client 18446612686390831168 response P9_TREAD tag  0 err 0
    ...
    netfs_sreq: R=00001b55[1] DOWN TERM  f=192 s=0 5fb2/5fb2 s=5 e=0
    ...
    netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
    netfs_folio: i=157f3 ix=00004-00004 read-done
    netfs_folio: i=157f3 ix=00004-00004 read-unlock
    netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
    netfs_folio: i=157f3 ix=00005-00005 read-done
    netfs_folio: i=157f3 ix=00005-00005 read-unlock
    ...
    netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
    netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
    netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
    netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
    ...
    netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
    netfs_sreq: R=00001b55[2] ZERO TERM  f=102 s=5fb2 4e/4e s=5 e=0

The 'cto=5fb2' indicates the collected file pos we've collected results to
so far - but we still have 0x4e more bytes to go - so we shouldn't have
collected folio ix=00005 yet.  The 'ZERO' subreq that clears the tail
happens after we unlock the folio, allowing the application to see the
uncleared tail through mmap.

The problem is that netfs_read_unlock_folios() will unlock a folio in which
the amount of read results collected hits EOF position - but the ZERO
subreq lies beyond that and so happens after.

Fix this by changing the end check to always be the end of the folio and
never the end of the file.

In the future, I should look at clearing to the end of the folio here rather
than adding a ZERO subreq to do this.  On the other hand, the ZERO subreq can
run in parallel with an async READ subreq.  Further, the ZERO subreq may still
be necessary to, say, handle extents in a ceph file that don't have any
backing store and are thus implicitly all zeros.

This can be reproduced by creating a file, the size of which doesn't align
to a page boundary, e.g. 24998 (0x5fb2) bytes and then doing something
like:

    xfs_io -c "mmap -r 0 0x6000" -c "madvise -d 0 0x6000" \
           -c "mread -v 0 0x6000" /xfstest.test/x

The last 0x4e bytes should all be 00, but if the tail hasn't been cleared
yet, you may see rubbish there.  This can be reproduced with kafs by
modifying the kernel to disable the call to netfs_read_subreq_progress()
and to stop afs_issue_read() from doing the async call for NETFS_READAHEAD.
Reproduction can be made easier by inserting an mdelay(100) in
netfs_issue_read() for the ZERO-subreq case.

AFS and CIFS are normally unlikely to show this as they dispatch READ ops
asynchronously, which allows the ZERO-subreq to finish first.  9P's READ op is
completely synchronous, so the ZERO-subreq will always happen after.  It isn't
seen all the time, though, because the collection may be done in a worker
thread.

Reported-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Link: https://lore.kernel.org/r/8622834.T7Z3S40VBb@weasel/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/938162.1766233900@warthog.procyon.org.uk
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Acked-by: Dominique Martinet <asmadeus@codewreck.org>
Suggested-by: Dominique Martinet <asmadeus@codewreck.org>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

HID: Intel-thc-hid: Intel-thc: Fix wrong register reading

[ Upstream commit f39006965dd37e7be823dba6ca484adccc7a4dff ]

Correct the read register for the setting of max input size and
interrupt delay.

Fixes: 22da60f0304b ("HID: Intel-thc-hid: Intel-thc: Introduce interrupt delay control")
Fixes: 45e92a093099 ("HID: Intel-thc-hid: Intel-thc: Introduce max input size control")
Signed-off-by: Even Xu <even.xu@intel.com>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

HID: Intel-thc-hid: Intel-thc: fix dma_unmap_sg() nents value

[ Upstream commit 0e13150c1a13a3a3d6184c24bfd080d5999945d1 ]

The `dma_unmap_sg()` functions should be called with the same nents as the
`dma_map_sg()`, not the value the map function returned.

Save the number of entries in struct thc_dma_configuration.

Fixes: a688404b2e20 ("HID: intel-thc-hid: intel-thc: Add THC DMA interfaces")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Even Xu <even.xu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

gpio: it87: balance superio enter/exit calls in error path

[ Upstream commit a05543d6b05ba998fdbb4b383319ae5121bb7407 ]

We always call superio_enter() in it87_gpio_direction_out() but only
call superio_exit() if the call to it87_gpio_set() succeeds. Move the
label to balance the calls in error path as well.

Fixes: ef877a159072 ("gpio: it87: use new line value setter callbacks")
Reported-by: Daniel Gibson <daniel@gibson.sh>
Closes: https://lore.kernel.org/all/bd0a00e3-9b8c-43e8-8772-e67b91f4c71e@gibson.sh/
Link: https://lore.kernel.org/r/20251210055026.23146-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

gpu: nova-core: select RUST_FW_LOADER_ABSTRACTIONS

[ Upstream commit 3d3352e73a55a4ccf110f8b3419bbe2fbfd8a030 ]

RUST_FW_LOADER_ABSTRACTIONS was depended on by NOVA_CORE, but NOVA_CORE
is selected by DRM_NOVA. This creates a situation where, if DRM_NOVA is
selected, NOVA_CORE gets enabled but not RUST_FW_LOADER_ABSTRACTIONS,
which results in a build error.

Since the firmware loader is an implementation detail of the driver, it
should be enabled along with it, so change the "depends on" to a
"select".

Fixes: 54e6baf123fd ("gpu: nova-core: add initial driver stub")
Closes: https://lore.kernel.org/oe-kbuild-all/202512061721.rxKGnt5q-lkp@intel.com/
Tested-by: Alyssa Ross <hi@alyssa.is>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20251106-b4-select-rust-fw-v3-2-771172257755@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: ti: k3-am62-lp-sk-nand: Rename pinctrls to fix schema warnings

[ Upstream commit cf5e8adebe77917a4cc95e43e461cdbd857591ce ]

Rename pinctrl nodes to comply with naming conventions required by
pinctrl-single schema.

Fixes: e569152274fec ("arm64: dts: ti: am62-lp-sk: Add overlay for NAND expansion card")
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Link: https://patch.msgid.link/20251127122733.2523367-3-w.egorov@phytec.de
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: ti: k3-am642-phyboard-electra-x27-gpio1-spi1-uart3: Fix schema warnings

[ Upstream commit d876bb9353d87dee0ae620300106e8def189c785 ]

Rename pinctrl nodes to comply with naming conventions required by
pinctrl-single schema. Also, replace invalid integer assignment in
SPI node with a boolean to align with omap-spi schema.

Fixes: 638ab30ce4c6 ("arm64: dts: ti: am64-phyboard-electra: Add DT overlay for X27 connector")
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Link: https://patch.msgid.link/20251127122733.2523367-2-w.egorov@phytec.de
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: ti: k3-am642-phyboard-electra-peb-c-010: Fix icssg-prueth schema warning

[ Upstream commit 05bbe52d0be5637dcd3c880348e3688f7ec64eb7 ]

Reduce length of dma-names and dmas properties for icssg1-ethernet
node to comply with ti,icssg-prueth schema constraints. The previous
entries exceeded the allowed count and triggered dtschema warnings
during validation.

Fixes: e53fbf955ea7 ("arm64: dts: ti: k3-am642-phyboard-electra: Add PEB-C-010 Overlay")
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Link: https://patch.msgid.link/20251127122733.2523367-1-w.egorov@phytec.de
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

of: unittest: Fix memory leak in unittest_data_add()

[ Upstream commit 235a1eb8d2dcc49a6cf0a5ee1aa85544a5d0054b ]

In unittest_data_add(), if of_resolve_phandles() fails, the allocated
unittest_data is not freed, leading to a memory leak.

Fix this by using scope-based cleanup helper __free(kfree) for automatic
resource cleanup. This ensures unittest_data is automatically freed when
it goes out of scope in error paths.

For the success path, use retain_and_null_ptr() to transfer ownership
of the memory to the device tree and prevent double freeing.

Fixes: 2eb46da2a760 ("of/selftest: Use the resolver to fixup phandles")
Suggested-by: Rob Herring <robh@kernel.org>
Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn>
Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Link: https://patch.msgid.link/20251231114915.234638-1-zilin@seu.edu.cn
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: fix use-after-free warning in btrfs_get_or_create_delayed_node()

[ Upstream commit 83f59076a1ae6f5c6845d6f7ed3a1a373d883684 ]

Previously, btrfs_get_or_create_delayed_node() set the delayed_node's
refcount before acquiring the root->delayed_nodes lock.
Commit e8513c012de7 ("btrfs: implement ref_tracker for delayed_nodes")
moved refcount_set inside the critical section, which means there is
no longer a memory barrier between setting the refcount and setting
btrfs_inode->delayed_node.

Without that barrier, the stores to node->refs and
btrfs_inode->delayed_node may become visible out of order. Another
thread can then read btrfs_inode->delayed_node and attempt to
increment a refcount that hasn't been set yet, leading to a
refcounting bug and a use-after-free warning.

The fix is to move refcount_set back to where it was to take
advantage of the implicit memory barrier provided by lock
acquisition.

Because the allocations now happen outside of the lock's critical
section, they can use GFP_NOFS instead of GFP_ATOMIC.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202511262228.6dda231e-lkp@intel.com
Fixes: e8513c012de7 ("btrfs: implement ref_tracker for delayed_nodes")
Tested-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: fix NULL dereference on root when tracing inode eviction

[ Upstream commit f157dd661339fc6f5f2b574fe2429c43bd309534 ]

When evicting an inode the first thing we do is to setup tracing for it,
which implies fetching the root's id. But in btrfs_evict_inode() the
root might be NULL, as implied in the next check that we do in
btrfs_evict_inode().

Hence, we either should set the ->root_objectid to 0 in case the root is
NULL, or we move tracing setup after checking that the root is not
NULL. Setting the rootid to 0 at least gives us the possibility to trace
this call even in the case when the root is NULL, so that's the solution
taken here.

Fixes: 1abe9b8a138c ("Btrfs: add initial tracepoint support for btrfs")
Reported-by: syzbot+d991fea1b4b23b1f6bf8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d991fea1b4b23b1f6bf8
Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: qgroup: update all parent qgroups when doing quick inherit

[ Upstream commit 68d4b3fa18d72b7f649e83012e7e08f1881f6b75 ]

[BUG]
There is a bug that if a subvolume has multi-level parent qgroups, and
is able to do a quick inherit, only the direct parent qgroup got
updated:

  mkfs.btrfs  -f -O quota $dev
  mount $dev $mnt
  btrfs subv create $mnt/subv1
  btrfs qgroup create 1/100 $mnt
  btrfs qgroup create 2/100 $mnt
  btrfs qgroup assign 1/100 2/100 $mnt
  btrfs qgroup assign 0/256 1/100 $mnt
  btrfs qgroup show -p --sync $mnt

  Qgroupid    Referenced    Exclusive Parent     Path
  --------    ----------    --------- ------     ----
  0/5           16.00KiB     16.00KiB -          <toplevel>
  0/256         16.00KiB     16.00KiB 1/100      subv1
  1/100         16.00KiB     16.00KiB 2/100      2/100<1 member qgroup>
  2/100         16.00KiB     16.00KiB -          <0 member qgroups>

  btrfs subv snap -i 1/100 $mnt/subv1 $mnt/snap1
  btrfs qgroup show -p --sync $mnt

  Qgroupid    Referenced    Exclusive Parent     Path
  --------    ----------    --------- ------     ----
  0/5           16.00KiB     16.00KiB -          <toplevel>
  0/256         16.00KiB     16.00KiB 1/100      subv1
  0/257         16.00KiB     16.00KiB 1/100      snap1
  1/100         32.00KiB     32.00KiB 2/100      2/100<1 member qgroup>
  2/100         16.00KiB     16.00KiB -          <0 member qgroups>
  # Note that 2/100 is not updated, and qgroup numbers are inconsistent

  umount $mnt

[CAUSE]
If the snapshot source subvolume belongs to a parent qgroup, and the new
snapshot target is also added to the new same parent qgroup, we allow a
quick update without marking qgroup inconsistent.

But that quick update only update the parent qgroup, without checking if
there is any more parent qgroups.

[FIX]
Iterate through all parent qgroups during the quick inherit.

Reported-by: Boris Burkov <boris@bur.io>
Fixes: b20fe56cd285 ("btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: fix qgroup_snapshot_quick_inherit() squota bug

[ Upstream commit 7ee19a59a75e3d5b9ec00499b86af8e2a46fbe86 ]

qgroup_snapshot_quick_inherit() detects conditions where the snapshot
destination would land in the same parent qgroup as the snapshot source
subvolume. In this case we can avoid costly qgroup calculations and just
add the nodesize of the new snapshot to the parent.

However, in the case of squotas this is actually a double count, and
also an undercount for deeper qgroup nestings.

The following annotated script shows the issue:

  btrfs quota enable --simple "$mnt"

  # Create 2-level qgroup hierarchy
  btrfs qgroup create 2/100 "$mnt"  # Q2 (level 2)
  btrfs qgroup create 1/100 "$mnt"  # Q1 (level 1)
  btrfs qgroup assign 1/100 2/100 "$mnt"

  # Create base subvolume
  btrfs subvolume create "$mnt/base" >/dev/null
  base_id=$(btrfs subvolume show "$mnt/base" | grep 'Subvolume ID:' | awk '{print $3}')

  # Create intermediate snapshot and add to Q1
  btrfs subvolume snapshot "$mnt/base" "$mnt/intermediate" >/dev/null
  inter_id=$(btrfs subvolume show "$mnt/intermediate" | grep 'Subvolume ID:' | awk '{print $3}')
  btrfs qgroup assign "0/$inter_id" 1/100 "$mnt"

  # Create working snapshot with --inherit (auto-adds to Q1)
  # src=intermediate (in only Q1)
  # dst=snap (inheriting only into Q1)
  # This double counts the 16k nodesize of the snapshot in Q1, and
  # undercounts it in Q2.
  btrfs subvolume snapshot -i 1/100 "$mnt/intermediate" "$mnt/snap" >/dev/null
  snap_id=$(btrfs subvolume show "$mnt/snap" | grep 'Subvolume ID:' | awk '{print $3}')

  # Fully complete snapshot creation
  sync

  # Delete working snapshot
  # Q1 and Q2 will lose the full snap usage
  btrfs subvolume delete "$mnt/snap" >/dev/null

  # Delete intermediate and remove from Q1
  # Q1 and Q2 will lose the full intermediate usage
  btrfs qgroup remove "0/$inter_id" 1/100 "$mnt"
  btrfs subvolume delete "$mnt/intermediate" >/dev/null

  # Q1 should be at 0, but still has 16k. Q2 is "correct" at 0 (for now...)

  # Trigger cleaner, wait for deletions
  mount -o remount,sync=1 "$mnt"
  btrfs subvolume sync "$mnt" "$snap_id"
  btrfs subvolume sync "$mnt" "$inter_id"

  # Remove Q1 from Q2
  # Frees 16k more from Q2, underflowing it to 16EiB
  btrfs qgroup remove 1/100 2/100 "$mnt"

  # And show the bad state:
  btrfs qgroup show -pc "$mnt"

        Qgroupid    Referenced    Exclusive Parent   Child   Path
        --------    ----------    --------- ------   -----   ----
        0/5           16.00KiB     16.00KiB -        -       <toplevel>
        0/256         16.00KiB     16.00KiB -        -       base
        1/100         16.00KiB     16.00KiB -        -       <0 member qgroups>
        2/100         16.00EiB     16.00EiB -        -       <0 member qgroups>

Fix this by simply not doing this quick inheritance with squotas.

I suspect that it is also wrong in normal qgroups to not recurse up the
qgroup tree in the quick inherit case, though other consistency checks
will likely fix it anyway.

Fixes: b20fe56cd285 ("btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"

[ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ]

This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d.

When probing the exp-attached sata device, libsas/libata will issue a
hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a
broadcast event will be received after the disk probe fails, and this
commit causes the probe will be re-executed on the disk, and a faulty
disk may get into an indefinite loop of probe.

Therefore, revert this commit, although it can fix some temporary issues
with disk probe failure.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ufs: core: Fix EH failure after W-LUN resume error

[ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ]

When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
including the scsi_target, may be runtime suspended. Subsequently, the
error handler in ufshcd_recover_pm_error() fails to set the W-LUN device
back to active because the parent target is not active.  This results in
the following errors:

  google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
  ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
  ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
  ...
  ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active

Address this by:

1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
    attempting to set the W-LUN to active within
    ufshcd_recover_pm_error().

2. Explicitly checking for power.runtime_error on the HBA and W-LUN
    devices before calling pm_runtime_set_active() to clear the error
    state.

3. Adding pm_runtime_get_sync(hba->dev) in
    ufshcd_err_handling_prepare() to ensure the HBA itself is active
    during error recovery, even if a child device resume failed.

These changes ensure the device power states are managed correctly
during error recovery.

Signed-off-by: Brian Kao <powenkao@google.com>
Tested-by: Brian Kao <powenkao@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset

[ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ]

A dynamic remove/add storage adapter test hits EEH on PowerPC:

  EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160
  EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650
  EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr]
  EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr]
  EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr]
  EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440
  EEH: [c00000000017e558] process_one_work+0x298/0x590
  EEH: [c00000000017eef8] worker_thread+0xa8/0x620
  EEH: [c00000000018be34] kthread+0x124/0x130
  EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64

A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by
irqbalance daemon. If we disable irqbalance daemon, we won't see the
issue.

With debug enabled in ipr driver:

  [   44.103071] ipr: Entering __ipr_remove
  [   44.103083] ipr: Entering ipr_initiate_ioa_bringdown
  [   44.103091] ipr: Entering ipr_reset_shutdown_ioa
  [   44.103099] ipr: Leaving ipr_reset_shutdown_ioa
  [   44.103105] ipr: Leaving ipr_initiate_ioa_bringdown
  [   44.149918] ipr: Entering ipr_reset_ucode_download
  [   44.149935] ipr: Entering ipr_reset_alert
  [   44.150032] ipr: Entering ipr_reset_start_timer
  [   44.150038] ipr: Leaving ipr_reset_alert
  [   44.244343] scsi 1:2:3:0: alua: Detached
  [   44.254300] ipr: Entering ipr_reset_start_bist
  [   44.254320] ipr: Entering ipr_reset_start_timer
  [   44.254325] ipr: Leaving ipr_reset_start_bist
  [   44.364329] scsi 1:2:4:0: alua: Detached
  [   45.134341] scsi 1:2:5:0: alua: Detached
  [   45.860949] ipr: Entering ipr_reset_shutdown_ioa
  [   45.860962] ipr: Leaving ipr_reset_shutdown_ioa
  [   45.860966] ipr: Entering ipr_reset_alert
  [   45.861028] ipr: Entering ipr_reset_start_timer
  [   45.861035] ipr: Leaving ipr_reset_alert
  [   45.964302] ipr: Entering ipr_reset_start_bist
  [   45.964309] ipr: Entering ipr_reset_start_timer
  [   45.964313] ipr: Leaving ipr_reset_start_bist
  [   46.264301] ipr: Entering ipr_reset_bist_done
  [   46.264309] ipr: Leaving ipr_reset_bist_done

During adapter reset, ipr device driver blocks config space access but
can't block MMIO access for MSI-X entries.  There is very small window:
irqbalance daemon kicks in during adapter reset before ipr driver calls
pci_restore_state(pdev) to restore MSI-X table.

irqbalance daemon reads back all 0 for that MSI-X vector in
__pci_read_msi_msg().

irqbalance daemon:

  msi_domain_set_affinity()
  ->irq_chip_set_affinity_patent()
  ->xive_irq_set_affinity()
  ->irq_chip_compose_msi_msg()
    ->pseries_msi_compose_msg()
    ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state
  ->irq_chip_write_msi_msg()
    -> pci_write_msg_msi(): write 0 to the msix vector entry

When ipr driver calls pci_restore_state(pdev) in
ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared
by irqbalance daemon in pci_write_msg_msix().

  pci_restore_state()
  ->__pci_restore_msix_state()

Below is the MSI-X table for ipr adapter after irqbalance daemon kicked
in during adapter reset:

  Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0
  Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0
  Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0
  Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0
  Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0
  Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0
  Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0
  Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0
  Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0
  ---------> Hit EEH since msix vector of index=8 are 0
  Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0
  Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0

  [   46.264312] ipr: Entering ipr_reset_restore_cfg_space
  [   46.267439] ipr: Entering ipr_fail_all_ops
  [   46.267447] ipr: Leaving ipr_fail_all_ops
  [   46.267451] ipr: Leaving ipr_reset_restore_cfg_space
  [   46.267454] ipr: Entering ipr_ioa_bringdown_done
  [   46.267458] ipr: Leaving ipr_ioa_bringdown_done
  [   46.267467] ipr: Entering ipr_worker_thread
  [   46.267470] ipr: Leaving ipr_worker_thread

IRQ balancing is not required during adapter reset.

Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable
it after calling pci_restore_state(). The irqbalance daemon is disabled
for this short period of time (~2s).

Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1

[ Upstream commit 4588e65cfd66fc8bbd9969ea730db39b60a36a30 ]

Avoid scanning SAS/SATA devices in channel 1 when SAS transport is
enabled, as the SAS/SATA devices are exposed through channel 0.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/stable/20251120071955.463475-1-suganath-prabu.subramani%40broadcom.com
Link: https://patch.msgid.link/20251120071955.463475-1-suganath-prabu.subramani@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

smb/client: fix NT_STATUS_NO_DATA_DETECTED value

[ Upstream commit a1237c203f1757480dc2f3b930608ee00072d3cc ]

This was reported by the KUnit tests in the later patches.

See MS-ERREF 2.3.1 STATUS_NO_DATA_DETECTED. Keep it consistent with the
value in the documentation.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value

[ Upstream commit b2b50fca34da5ec231008edba798ddf92986bd7f ]

This was reported by the KUnit tests in the later patches.

See MS-ERREF 2.3.1 STATUS_DEVICE_DOOR_OPEN. Keep it consistent with the
value in the documentation.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value

[ Upstream commit 9f99caa8950a76f560a90074e3a4b93cfa8b3d84 ]

This was reported by the KUnit tests in the later patches.

See MS-ERREF 2.3.1 STATUS_UNABLE_TO_FREE_VM. Keep it consistent with the
value in the documentation.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>