git.ipfire.org Git - thirdparty/linux.git/log

netfilter: bridge: eb_tables: close module init race

sashiko reports for unrelated patch:
Does the core ebtables initialization in ebtables.c suffer from a similar race?
Once nf_register_sockopt() completes, the sockopts are exposed globally.

sockopt has to be registered last, just like in ip/ip6/arptables.

Fixes: 5b53951cfc85 ("netfilter: ebtables: use net_generic infra")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: x_tables: close dangling table module init race

Similar to the previous ebtables patch:
template add exposes the table to userspace, we must do this last to
rnsure the pernet ops are set up (contain the destructors).

Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ebtables: close dangling table module init race

sashiko reported for a related patch:
In modules like iptable_raw.c, [..], if register_pernet_subsys() fails,
the rollback might call kfree(rawtable_ops) before [..]
During this window, could a concurrent userspace process find the globally
visible template, trigger table_init(), [..]

The table init functions must always register the template last.

Otherwise, set/getsockopt can instantiate a table in a namespace
while the required pernet ops (contain the destructor) isn't available.
This change is also required in x_tables, handled in followup change.

Fixes: 87663c39f898 ("netfilter: ebtables: do not hook tables by default")
Reviewed-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ebtables: move to two-stage removal scheme

Like previous patches for x_tables, follow same pattern in ebtables.
We can't reuse xt helpers: ebt_table struct layout is incompatible.

table->ops assignment is now done while still holding the ebt mutex
to make sure we never expose partially-filled table struct.

Fixes: 87663c39f898 ("netfilter: ebtables: do not hook tables by default")
Reviewed-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: x_tables: add and use xtables_unregister_table_exit

Previous change added xtables_unregister_table_pre_exit to detach the
table from the packetpath and to unlink it from the active table list.
In case of rmmod, userspace that is doing set/getsockopt for this table
will not be able to re-instantiate the table:
1. The larval table has been removed already
2. existing instantiated table is no longer on the xt pernet table list.

This adds the second stage helper:

unlink the table from the dying list, free the hook ops (if any) and do
the audit notification. It replaces xt_unregister_table().

Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default")
Reported-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Tristan Madani <tristan@talencesecurity.com>
Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: x_tables: unregister the templates first

When the module is going away we need to zap the template
first. Else there is a small race window where userspace
could instantiate a new table after the pernet exit function
has removed the current table.

Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default")
Reported-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Tristan Madani <tristan@talencesecurity.com>
Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: x_tables: add and use xt_unregister_table_pre_exit

Remove the copypasted variants of _pre_exit and add one single
function in the xtables core. ebtables is not compatible with
x_tables and therefore unchanged.

This is a preparation patch to reduce noise in the followup
bug fixes.

Reviewed-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: x_tables: allocate hook ops while under mutex

arp/ip(6)t_register_table() add the table to the per-netns list via
xt_register_table() before allocating the per-netns hook ops copy
via kmemdup_array().  This leaves a window where the table is
visible in the list with ops=NULL.

If the pernet exit happens runs concurrently the pre_exit callback finds
the table via xt_find_table() and passes the NULL ops pointer to
nf_unregister_net_hooks(), causing a NULL dereference:

  general protection fault in nf_unregister_net_hooks+0xbc/0x150
  RIP: nf_unregister_net_hooks (net/netfilter/core.c:613)
  Call Trace:
    ipt_unregister_table_pre_exit
    iptable_mangle_net_pre_exit
    ops_pre_exit_list
    cleanup_net

Fix by moving the ops allocation into the xtables core so the table is
never in the list without valid ops.  Also ensure the table is no longer
processing packets before its torn down on error unwind.
nf_register_net_hooks might have published at least one hook; call
synchronize_rcu() if there was an error.

audit log register message gets deferred until all operations have
passed, this avoids need to emit another ureg message in case of
error unwinding.

Based on earlier patch by Tristan Madani.

Fixes: f9006acc8dfe5 ("netfilter: arp_tables: pass table pointer via nf_hook_ops")
Fixes: ee177a54413a ("netfilter: ip6_tables: pass table pointer via nf_hook_ops")
Fixes: ae689334225f ("netfilter: ip_tables: pass table pointer via nf_hook_ops")
Link: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: x_tables: allow initial table replace without emitting audit log message

At the moment we emit the audit log a bit too early, which makes it
necessary to also emit an unregister log in case we have to unwind
errors after possible hook register failure.

Followup patch will be slightly simpler if we can delay the
register message until after the hooks have been wired up.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

hwmon: (ads7871) Fix endianness bug in 16-bit register reads

The ads7871_read_reg16() function relies on spi_w8r16() to read the
16-bit sensor output. The ADS7871 device transmits the Least Significant
Byte (LSB) first.

On Little-Endian architectures, spi_w8r16() correctly reconstructs the
16-bit value. However, on Big-Endian architectures, the byte swapping
causes the first received byte (LSB) to be placed in the most significant
byte of the u16, resulting in corrupted voltage readings.

To fix this, cast the integer result of spi_w8r16() to a restricted
__le16 type and convert it to the host CPU's native byte order using
le16_to_cpu(). Negative error codes returned by the SPI core are caught
and returned prior to the conversion to avoid mangling the error status.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260418034601.90226-1-tabreztalks@gmail.com
Fixes: e0c70b8078629 ("hwmon: add TI ads7871 a/d converter driver")
Suggested-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com>
Link: https://lore.kernel.org/r/20260502020844.110038-2-tabreztalks@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

hwmon: (lm75) Fix configuration register writes.

Sensors configurations are defined by set and clear masks. These
do not follow a consistent "clear mask is a superset of set mask"
rule. This relaxed definition breaks lm75_write_config()

static inline int lm75_write_config(struct lm75_data *data, u16 set_mask,
u16 clr_mask)
{
return regmap_update_bits(data->regmap, LM75_REG_CONF,
clr_mask | LM75_SHUTDOWN, set_mask);
}

Basically all bits from set_mask that are not defined in clr_mask are
dropped. Fix that by enhancing the helper to always combine clr_mask
and set_mask into the mask bits of regmap_update_bits().

Fixes: 6da24a25f766 ("hwmon: (lm75) Hide register size differences in regmap access functions")
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://lore.kernel.org/r/20260502173207.3567876-3-markus.stockhausen@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling

The initialization of the AS6200 has two shortcomings

- The device-add-commit states "Conversion mode: continuous" but the
  the lm75_params structure uses set_mask = 0x94c0. This activates
  single shot mode (bit 15). According to the datasheet "The device
  features a single shot measurement mode if the device is in sleep
  mode (SM=1)". This is quite contradictionary.
- It is the only device that activates polarity active-high (bit 10)

All this is paired with a undefined clear mask bug in function
lm75_write_config() that was introduced with a later refactoring
commit.

[as6200] = {
.config_reg_16bits = true,
.set_mask = 0x94C0,
        -> .clr_mask not defined here
.default_resolution = 12,
...
static inline int lm75_write_config(struct lm75_data *data, u16 set_mask,
    u16 clr_mask)
{
return regmap_update_bits(data->regmap, LM75_REG_CONF,
  clr_mask | LM75_SHUTDOWN, set_mask);
}

regmap_update_bits() requires clr_mask to be a superset of set_mask.
So basically all sensors with "wrong" masks like the AS6200 are not
initialized as intended.

Fix that by

- Change the set_mask to 0xc010 to reflect the current active-low
  setup properly and to drive the sensor in continous mode. This
  takes into account that the config register is little endian and
  the first byte sent to the chip is the LSB.
- Adapt the alarm handling so it can report the alarm correctly
  even if it is high active. This is done by comparing config register
  bit 5 and 10 (translated to 2 and 13).

This commit does not introduce any ABI breakage as the mutliple bugs
effectly drive the AS6200 in standard active-low mode.

Fixes: 4b6358e1fe46 ("hwmon: (lm75) Add AMS AS6200 temperature sensor")
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://lore.kernel.org/r/20260502173207.3567876-2-markus.stockhausen@gmx.de
[groeck: Update set_mask for as6200 further: As modeled, the upper bits
contain the conversion rate, so the config register needs to be set to
0xc010 instead of 0x10c0 to reflect 8 samples/s and 4 consecutive faults.
Fix the same problem for TMP112.]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Merge tag 'drm-xe-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

Driver Changes:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status (Gustavo)
- Fix EAGAIN sign in pf_migration_consume (Shuicheng)
- Fix MMIO access using PF view instead of VF view during migration (Shuicheng)
- Exclude indirect ring state page from ADS engine state size (Satya)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/afw5lsrjE4pStEml@gsse-cloud1.jf.intel.com

Merge tag 'drm-rust-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-fixes

DRM Rust fixes for v7.1-rc3

- Fix unsound initialization in drm::Device::new(); if pinned
  initialization of drm::Device::Data fails, make sure
  drm::Device::release() isn't called, so we don't run the data's
  destructor

- Fix missing GEM state cleanup in the init failure case; call
  drm_gem_private_object_fini() if drm_gem_object_init() fails

- Fix wrong ARef import in the DRM shmem GEM helper abstraction

- Replace the nouveau mailing list with the new nova-gpu mailing list
  for both nova-core and nova-drm, and remove unused patchwork entries

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: "Danilo Krummrich" <dakr@kernel.org>
Link: https://patch.msgid.link/DIBZJ40ZC4J3.Y1DLA7JTS2PC@kernel.org

btrfs: fix incorrect i_size after remount caused by KEEP_SIZE prealloc gap

When fallocate() with FALLOC_FL_KEEP_SIZE preallocates an extent past the
current i_size, the file_extent_tree of the inode is updated to cover
that range. However, on the next mount, btrfs_read_locked_inode() only
re-populates file_extent_tree with [0, round_up(i_size, sectorsize)),
losing the marks that belonged to the KEEP_SIZE prealloc extent beyond
i_size.

Later, when a non-KEEP_SIZE fallocate() extends i_size into / past that
old prealloc extent, the reservation loop in btrfs_fallocate() skips
already-prealloc segments and does not call into the path that marks the
file_extent_tree, so a gap remains inside the file_extent_tree across
[old_aligned_i_size, start_of_new_alloc). Then __btrfs_prealloc_file_range()
calls btrfs_inode_safe_disk_i_size_write(), which uses
find_contiguous_extent_bit() starting at offset 0 to derive disk_i_size.
The walk stops at the gap, so disk_i_size ends up smaller than i_size and
gets persisted. After the next mount, the file shows the wrong (smaller)
size.

The following reproducer triggers the problem:

  $ cat test.sh
  MNT=/mnt/sdi
  DEV=/dev/sdi

  mkdir -p $MNT
  mkfs.btrfs -f -O ^no-holes $DEV
  mount $DEV $MNT

  touch $MNT/file1
  # KEEP_SIZE prealloc beyond i_size (i_size stays 0)
  fallocate -n -o 4M -l 4M $MNT/file1
  umount $MNT
  mount $DEV $MNT

  # non-KEEP_SIZE fallocate that overlaps the previous prealloc tail
  # and extends past it
  fallocate -o 7M -l 2M $MNT/file1
  ls -lh $MNT/file1
  umount $MNT
  mount $DEV $MNT
  ls -lh $MNT/file1
  umount $MNT

Running the reproducer gives the following result:

  $ ./test.sh
  (...)
  -rw-rw-r-- 1 root root 9.0M May  4 16:35 /mnt/sdi/file1
  -rw-rw-r-- 1 root root 7.0M May  4 16:35 /mnt/sdi/file1

The size before the second mount is correct (9M), but after the
remount it drops to 7M, i.e. the start of the gap inside file_extent_tree.

Fix this in __btrfs_prealloc_file_range() by marking the entire range
[round_down(old_i_size, sectorsize), round_up(new_i_size, sectorsize))
in file_extent_tree before updating i_size and calling
btrfs_inode_safe_disk_i_size_write(). This ensures the contiguous bit
search starting from 0 is not truncated by a stale gap left behind by a
previous KEEP_SIZE prealloc that was not restored on inode load.

The fix has no effect when the NO_HOLES feature is enabled because
btrfs_inode_safe_disk_i_size_write() and
btrfs_inode_set_file_extent_range()
both take the fast path that directly tracks disk_i_size without
consulting file_extent_tree.

Fixes: 9ddc959e802b ("btrfs: use the file extent tree infrastructure")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
[ Minor updates to the change log ]
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: only release the dirty pages io tree after successful writes

[WARNING]
With extra warning on dirty extent buffers at umount (aka, the next
patch in the series), test case generic/388 can trigger the following
warning about dirty extent buffers at unmount time:

  BTRFS critical (device dm-2 state E): emergency shutdown
  BTRFS error (device dm-2 state E): error while writing out transaction: -30
  BTRFS warning (device dm-2 state E): Skipping commit of aborted transaction.
  BTRFS error (device dm-2 state EA): Transaction 9 aborted (error -30)
  BTRFS: error (device dm-2 state EA) in cleanup_transaction:2068: errno=-30 Readonly filesystem
  BTRFS info (device dm-2 state EA): forced readonly
  BTRFS info (device dm-2 state EA): last unmount of filesystem 4fbf2e15-f941-49a0-bc7c-716315d2777c
  ------------[ cut here ]------------
  WARNING: disk-io.c:3311 at invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs], CPU#8: umount/914368
  CPU: 8 UID: 0 PID: 914368 Comm: umount Tainted: G           OE       7.1.0-rc1-custom+ #372 PREEMPT(full)  2de38db8d1deae71fde295430a0ff3ab98ccf596
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
  RIP: 0010:invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs]
  Call Trace:
   <TASK>
   close_ctree+0x52e/0x574 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
   generic_shutdown_super+0x89/0x1a0
   kill_anon_super+0x16/0x40
   btrfs_kill_super+0x16/0x20 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
   deactivate_locked_super+0x2d/0xb0
   cleanup_mnt+0xdc/0x140
   task_work_run+0x5a/0xa0
   exit_to_user_mode_loop+0x123/0x4b0
   do_syscall_64+0x243/0x7c0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>
  ---[ end trace 0000000000000000 ]---
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30539776 owner 9 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30621696 owner 257 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30638080 owner 258 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30654464 owner 7 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30703616 owner 2 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30720000 owner 10 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30736384 owner 4 gen 9 refs 2 flags 0x7
  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30752768 owner 11 gen 9 refs 2 flags 0x7

I'm using a stripped down version, which seems to trigger the warning
more reliably:

  _fsstress_pid=""
  workload()
  {
   dmesg -C
   mkfs.btrfs -f -K $dev > /dev/null
   echo 1 > /sys/kernel/debug/clear_warn_once
   mount $dev $mnt
   $fsstress -w -n 1024 -p 4 -d $mnt &
   _fsstress_pid=$!
   sleep 0
   $godown $mnt
   pkill --echo -PIPE fsstress > /dev/null
   wait $_fsstress_pid
   unset _fsstress_pid
   umount $mnt

   if dmesg | grep -q "WARNING"; then
   fail
   fi
  }

  for (( i = 0; i < $runtime; i++ )); do
   echo "=== $i/$runtime ==="
   workload
  done

[CAUSE]
Inside btrfs_write_and_wait_transaction(), we first try to write all
dirty ebs, then wait for them to finish.

After that we call btrfs_extent_io_tree_release() to free all
extent states from dirty_pages io tree.

However if we hit an error from btrfs_write_marked_extent(), then we
still call btrfs_extent_io_tree_release() to clear that dirty_pages io
tree, which may contain dirty records that we haven't yet submitted.

Furthermore, the later transaction cleanup path will utilize that
dirty_pages io tree to properly cleanup those dirty ebs, but since it's
already empty, no dirty ebs are properly cleaned up, thus will later
trigger the warnings inside invalidate_btree_folios().

[FIX]
Normally such dirty ebs won't cause problems, as when the iput() is
called on the btree inode, the dirty ebs will be forcibly written back,
and since the fs is already in an error status, such writeback will not
reach disk and finish immediately.

But it's still better to get rid of such dirty ebs, if we ended up with
dirty ebs but the fs is not in an error status, then such writeback at
iput() time will be too late, as all workers are already stopped but
writeback will utilize workers, which will lead to NULL pointer
dereferences.

Instead of unconditionally calling btrfs_extent_io_tree_release(), only
call it if btrfs_write_and_wait_transaction() finished successfully, so
that @dirty_pages extent io tree is kept untouched for transaction
cleanup.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: tracepoints: fix sleep while in atomic context in btrfs_sync_file()

The trace event btrfs_sync_file() is called in an atomic context (all trace
events are) and its call to dput(), which is needed due to the call to
dget_parent(), can sleep, triggering a kernel splat.

This can be reproduced by enabling the trace event and running btrfs/056
from fstests for example. The splat shown in dmesg is the following:

  [53.919] BUG: sleeping function called from invalid context at fs/dcache.c:970
  [53.947] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 32773, name: xfs_io
  [53.988] preempt_count: 2, expected: 0
  [53.967] RCU nest depth: 0, expected: 0
  [53.943] Preemption disabled at:
  [53.944] [<0000000000000000>] 0x0
  [54.078] CPU: 0 UID: 0 PID: 32773 Comm: xfs_io Tainted: G        W           7.1.0-rc1-btrfs-next-232+ #1 PREEMPT(full)
  [54.070] Tainted: [W]=WARN
  [54.071] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
  [54.072] Call Trace:
  [54.074]  <TASK>
  [54.076]  dump_stack_lvl+0x56/0x80
  [54.079]  __might_resched.cold+0xd6/0x10f
  [54.072]  dput.part.0+0x24/0x110
  [54.078]  trace_event_raw_event_btrfs_sync_file+0x75/0x140 [btrfs]
  [54.089]  btrfs_sync_file+0x1ed/0x530 [btrfs]
  [54.087]  ? __handle_mm_fault+0x8ae/0xed0
  [54.089]  btrfs_do_write_iter+0x172/0x210 [btrfs]
  [54.091]  vfs_write+0x21f/0x450
  [54.094]  __x64_sys_pwrite64+0x8d/0xc0
  [54.096]  ? do_user_addr_fault+0x20c/0x670
  [54.099]  do_syscall_64+0x60/0xf20
  [54.092]  ? clear_bhb_loop+0x60/0xb0
  [54.094]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

So stop using dget_parent() and dput() and access the parent dentry
directly as dentry->d_parent. This is also what ext4 is doing in
its equivalent trace event ext4_sync_file_enter().

Fixes: a85b46db143f ("btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file()")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: always pass __GFP_NOWARN from add_ra_bio_pages()

A build workload newly prints order-0 allocation failures on 7.1-rc1:

    sh: page allocation failure: order:0
    mode:0x14084a(__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_IO|__GFP_KSWAPD_RECLAIM|
                  __GFP_COMP|__GFP_HARDWALL)
    CPU: 27 UID: 1000 PID: 855540 Comm: sh Not tainted 7.1.0-rc1-llvm-00058-gdca922e019dd #1 PREEMPTLAZY
    Call Trace:
     <TASK>
     dump_stack_lvl+0x50/0x70
     warn_alloc+0xeb/0x100
     __alloc_pages_slowpath+0x567/0x5a0
     ? filemap_get_entry+0x11a/0x140
     __alloc_frozen_pages_noprof+0x249/0x2d0
     alloc_pages_mpol+0xe4/0x180
     folio_alloc_noprof+0x80/0xa0
     add_ra_bio_pages+0x13c/0x4b0
     btrfs_submit_compressed_read+0x229/0x300
     submit_one_bio+0x9e/0xe0
     btrfs_readahead+0x185/0x1a0
     [...]

    (lldb) source list -a add_ra_bio_pages+0x13c
    .../vmlinux.unstripped add_ra_bio_pages + 316 at .../fs/btrfs/compression.c:454:8
       451
       452                  folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, constraint_gfp),
       453                                              0, NULL);
    -> 454                  if (!folio)
       455                          break;

I can reproduce this consistently by running a memory hog concurrently
with a buffered writer on a machine with a very large amount of swap.

Commit 7ae37b2c94ed ("btrfs: prevent direct reclaim during compressed
readahead") clearly intended to suppress these warnings. But because the
mask set in the address_space with mapping_set_gfp_mask() doesn't include
__GFP_NOWARN, mapping_gfp_constraint() removes it from constraint_gfp
before it is passed to filemap_alloc_folio().

Fix by refactoring the code to add __GFP_NOWARN after the call to
mapping_gfp_constraint().

Fixes: 7ae37b2c94ed ("btrfs: prevent direct reclaim during compressed readahead")
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix check_chunk_block_group_mappings() to iterate all chunk maps

[BUG]
A corrupted image with a chunk present in the chunk tree but whose
corresponding block group item is missing from the extent tree can be
mounted successfully, even though check_chunk_block_group_mappings()
is supposed to catch exactly this corruption at mount time.  Once
mounted, running btrfs balance with a usage filter (-dusage=N or
-dusage=min..max) triggers a null-ptr-deref:

  KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
    RIP: 0010:chunk_usage_filter fs/btrfs/volumes.c:3874 [inline]
    RIP: 0010:should_balance_chunk fs/btrfs/volumes.c:4018 [inline]
    RIP: 0010:__btrfs_balance fs/btrfs/volumes.c:4172 [inline]
    RIP: 0010:btrfs_balance+0x2024/0x42b0 fs/btrfs/volumes.c:4604

[CAUSE]
The crash occurs because __btrfs_balance() iterates the on-disk chunk
tree, finds the orphaned chunk, calls chunk_usage_filter() (or
chunk_usage_range_filter()), which queries the in-memory block group
cache via btrfs_lookup_block_group().  Since no block group was ever
inserted for this chunk, the lookup returns NULL, and the subsequent
dereference of cache->used crashes.

check_chunk_block_group_mappings() uses btrfs_find_chunk_map() to
iterate the in-memory chunk map (fs_info->mapping_tree):

  map = btrfs_find_chunk_map(fs_info, start, 1);

With @start = 0 and @length = 1, btrfs_find_chunk_map() looks for a
chunk map that *contains* the logical address 0. If no chunk contains
logical address 0, btrfs_find_chunk_map(fs_info, 0, 1) returns NULL
immediately and the loop breaks after the very first iteration,
having checked zero chunks. The entire verification function is therefore
a no-op, and the corrupted image passes the mount-time check undetected.

[FIX]
Replace the btrfs_find_chunk_map() based loop with a direct in-order
walk of fs_info->mapping_tree using rb_first_cached() + rb_next().
This guarantees that every chunk map in the tree is visited regardless
of the logical addresses involved.

No lock is taken around the traversal. This function is called during
mount from btrfs_read_block_groups(), which is invoked from open_ctree()
before any background threads (cleaner, transaction kthread, etc.) are
started. There are therefore no concurrent writers that could modify
mapping_tree at this point. An analogous lockless direct traversal of
mapping_tree already exists in fill_dummy_bgs() in the same file.

Since we walk the rb-tree directly via rb_entry() without going through
btrfs_find_chunk_map(), no reference is taken on each map entry, so the
btrfs_free_chunk_map() calls are also removed.

Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Merge tag 'drm-intel-fixes-2026-05-06' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Re-enable ccs modifiers on dg2 (Juha-Pekka Heikkila)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patch.msgid.link/aftSjG1D0-hKISDy@linux

sched_ext: Drop unused scx_find_sub_sched() stub

scx_find_sub_sched()'s only caller, scx_bpf_sub_dispatch(), is gated on
CONFIG_EXT_SUB_SCHED. When CONFIG_EXT_SUB_SCHED=n the caller compiles out
and the stub becomes dead code, tripping -Wunused-function on randconfigs.
Drop the stub.

Fixes: 25037af712eb ("sched_ext: Add rhashtable lookup for sub-schedulers")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/all/202605080556.42PXw8U9-lkp@intel.com/
Signed-off-by: Tejun Heo <tj@kernel.org>

hfs/hfsplus: zero-initialize buffer in hfs_bnode_read

hfs_bnode_read() can return early without writing to the output buffer
when is_bnode_offset_valid() fails or when check_and_correct_requested_
length() corrects the length to zero. Callers such as hfs_bnode_read_
u16() and hfs_bnode_read_u8() pass stack-allocated buffers and use the
result unconditionally, leading to KMSAN uninit-value reports.

Rather than initializing at each individual call site, zero the buffer
at the start of hfs_bnode_read() before any validation checks. This
ensures all callers in both hfs and hfsplus get a deterministic zero
value regardless of which early-return path is taken.

Reported-by: syzbot+217eb327242d08197efb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=217eb327242d08197efb
Tested-by: syzbot+217eb327242d08197efb@syzkaller.appspotmail.com
Fixes: a431930c9bac ("hfs: fix slab-out-of-bounds in hfs_bnode_read()")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20260505111300.3592757-3-tristmd@gmail.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>

hfs/hfsplus: fix u32 overflow in check_and_correct_requested_length

check_and_correct_requested_length() compares (off + len) against
node_size using u32 arithmetic. When the caller passes a large len
value (e.g. from an underflowed subtraction in hfs_brec_remove()),
off + len can wrap past 2^32 and produce a small result, causing the
bounds check to pass when it should fail.

For example, with off=14 and len=0xFFFFFFF2 (underflowed from
data_off - keyoffset - size in hfs_brec_remove), off + len wraps to 6,
which is less than a typical node_size of 512, so the check passes and
the subsequent memmove reads ~4GB past the node buffer.

Fix this by widening the addition to u64 before comparing against
node_size. This prevents the u32 wrap while keeping the logic
straightforward.

Reported-by: syzbot+6df204b70bf3261691c5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6df204b70bf3261691c5
Tested-by: syzbot+6df204b70bf3261691c5@syzkaller.appspotmail.com
Reported-by: syzbot+e76bf3d19b85350571ac@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e76bf3d19b85350571ac
Tested-by: syzbot+e76bf3d19b85350571ac@syzkaller.appspotmail.com
Fixes: a431930c9bac ("hfs: fix slab-out-of-bounds in hfs_bnode_read()")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20260505111300.3592757-2-tristmd@gmail.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>

cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed()

Since prepare_alloc_pages() unconditionally adds __GFP_HARDWALL for the
fast path when cpusets are enabled, the __GFP_HARDWALL check in
cpuset_current_node_allowed() causes the PF_EXITING escape path to be
skipped on the first allocation attempt. This makes it unreachable in
the common case, so dying tasks can get stuck in direct reclaim or even
trigger OOM while trying to exit, despite being allowed to allocate from
any node.

Move the PF_EXITING check before __GFP_HARDWALL so that dying tasks
can allocate memory from any node to exit quickly, even when cpusets
are enabled.

Also update the function comment to reflect the actual behavior of
prepare_alloc_pages() and the corrected check ordering.

Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

umh: replace use of system_unbound_wq with system_dfl_wq

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

rapidio: rio: add WQ_PERCPU to alloc_workqueue users

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

media: ddbridge: add WQ_PERCPU to alloc_workqueue users

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

platform: cznic: turris-omnia-mcu: replace use of system_wq with system_percpu_wq

Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the API.

This patch continues the effort to refactor worqueue APIs, which has begun
with the change introducing new workqueues and a new alloc_workqueue flag:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

Replace system_wq with system_percpu_wq, keeping the same behavior.
The old wq (system_wq) will be kept for a few release cycles.

Cc: Marek Behun <kabel@kernel.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Marek Behún <kabel@kernel.org>

media: synopsys: hdmirx: replace use of system_unbound_wq with system_dfl_wq

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>

virt: acrn: Add WQ_PERCPU to alloc_workqueue users

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Cc: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

accel/amdxdna: Add AIE4 work buffer initialization

NPU firmware requires a host-allocated work buffer for hardware contexts.
Allocate a 4 MB host buffer and attach it to device during device init.

Refactor aie2_alloc_msg_buffer() and aie2_free_msg_buffer() into common
helpers by moving them to aie.c and renaming them to
amdxdna_alloc_msg_buffer() and amdxdna_free_msg_buffer(), allowing both
AIE2 and AIE4 to reuse the implementation.

Signed-off-by: Nishad Saraf <nishads@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/all/20260505160936.3917732-7-lizhi.hou@amd.com/

accel/amdxdna: Add AIE4 metadata query support

Add support for querying device metadata on AIE4 via a mailbox message.
Refactor aie2_get_aie_metadata() into a common helper by moving it to
aie.c and renaming it to amdxdna_get_metadata(), allowing both AIE2
and AIE4 to reuse the implementation.

Co-developed-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260506161642.141257-1-lizhi.hou@amd.com

accel/amdxdna: Add command doorbell and wait support

Expose the command doorbell register to userspace on a per-hardware
context basis, enabling applications to notify the firmware of pending
commands via doorbell writes.

Introduce DRM_IOCTL_AMDXDNA_WAIT_CMD to allow userspace to wait for
completion of individual commands.

Co-developed-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260505160936.3917732-5-lizhi.hou@amd.com

accel/amdxdna: Add AIE4 VF hardware context create and destroy

Implement hardware context creation and destruction for AIE4 VF devices.

Co-developed-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260505160936.3917732-4-lizhi.hou@amd.com

accel/amdxdna: Init AIE4 device partition

Send partition creation command to firmware during VF initialization.

Co-developed-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260505160936.3917732-3-lizhi.hou@amd.com

accel/amdxdna: Add initial support for AIE4 VF

Add basic device initialization support for AIE4 Virtual Functions (PCI
device IDs 0x17F3 and 0x1B0C).

Co-developed-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: Hayden Laccabue <Hayden.Laccabue@amd.com>
Signed-off-by: David Zhang <yidong.zhang@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260505160936.3917732-2-lizhi.hou@amd.com

accel/amdxdna: Fix clflush buffer size

The firmware is told the buffer is req.buf_size bytes. It may read/write
the entire region. If the CPU only flushes a subset, the remaining cache
lines could contain stale data, causing the device to see garbage.

Fixes: 6e87001fe19f ("accel/amdxdna: Adjust size for copy_to_user()")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260507040207.178111-1-lizhi.hou@amd.com

sched_ext: Move scx_error() out of scx_link_sched()'s lock region

scx_link_sched() holds scx_sched_lock. The scx_error() calls inside take the
same lock through scx_claim_exit() and deadlock. Move them out of the guard.

Fixes: 6b4576b09714 ("sched_ext: Reject sub-sched attachment to a disabled parent")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

arm64: dts: qcom: agatti: add higher OPP levels

Add additional OPP entries for the Agatti platform to support higher
operating frequencies as specified in the hardware documentation.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Dikshita Agarwal <dikshita.agarwal@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260507-iris-ar50lt-v1-16-d22cccedc3e2@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

smb: client: validate dacloffset before building DACL pointers

parse_sec_desc(), build_sec_desc(), and the chown path in
id_mode_to_cifs_acl() all add the server-supplied dacloffset to pntsd
before proving a DACL header fits inside the returned security
descriptor.

On 32-bit builds a malicious server can return dacloffset near
U32_MAX, wrap the derived DACL pointer below end_of_acl, and then slip
past the later pointer-based bounds checks. build_sec_desc() and
id_mode_to_cifs_acl() can then dereference DACL fields from the wrapped
pointer in the chmod/chown rewrite paths.

Validate dacloffset numerically before building any DACL pointer and
reuse the same helper at the three DACL entry points.

Fixes: bc3e9dd9d104 ("cifs: Change SIDs in ACEs while transferring file ownership.")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: fix out-of-bounds read in smb2_compound_op()

If a server sends a truncated response but a large OutputBufferLength, and
terminates the EA list early, check_wsl_eas() returns success without
validating that the entire OutputBufferLength fits within iov_len.

Then smb2_compound_op() does:
memcpy(idata->wsl.eas, data[0], size[0]);

Where size[0] is OutputBufferLength. If iov_len is smaller than size[0],
memcpy can read beyond the end of the rsp_iov allocation and leak adjacent
kernel heap memory.

Link: https://lore.kernel.org/linux-cifs/d998240c-aca9-420d-9dbd-f5ba24af19e0@chenxiaosong.com/
Fixes: ea41367b2a60 ("smb: client: introduce SMB2_OP_QUERY_WSL_EA")
Cc: stable@vger.kernel.org
Signed-off-by: Zisen Ye <zisenye@stu.xidian.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: fix out-of-bounds read in symlink_data()

Since smb2_check_message() returns success without length validation for
the symlink error response, in symlink_data() it is possible for
iov->iov_len to be smaller than sizeof(struct smb2_err_rsp). If the buffer
only contains the base SMB2 header (64 bytes), accessing
err->ErrorContextCount (at offset 66) or err->ByteCount later in
symlink_data() will cause an out-of-bounds read.

Link: https://lore.kernel.org/linux-cifs/297d8d9b-adf7-42fd-a1c2-5b1f230032bc@chenxiaosong.com/
Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+")
Cc: Stable@vger.kernel.org
Signed-off-by: Zisen Ye <zisenye@stu.xidian.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: Zero-pad short GSS session keys per MS-SMB2

Per MS-SMB2 section 3.2.5.3, Session.SessionKey is the first 16 bytes
of the GSS cryptographic key, right-padded with zero bytes if the key
is shorter than 16 bytes.

SMB2_auth_kerberos() copies the GSS session key from the cifs.upcall
response using kmemdup(msg->data, msg->sesskey_len, ...) and stores
the GSS-reported length verbatim in ses->auth_key.len. generate_key()
reads SMB2_NTLMV2_SESSKEY_SIZE bytes from this buffer when feeding the
HMAC-SHA256 KDF for signing key derivation. If a GSS mechanism returns
a session key shorter than 16 bytes (e.g. a deprecated single-DES
Kerberos enctype with an 8-byte session key), the KDF call performs an
out-of-bounds slab read and derives keys that do not match the server,
which pads per the spec.

Modern KDCs disable short-key enctypes by default, so this is latent
rather than reachable in production, but it is still a kernel heap
over-read.

Allocate auth_key.response with kzalloc() at a length of
max(msg->sesskey_len, SMB2_NTLMV2_SESSKEY_SIZE), copy the GSS key in,
and rely on kzalloc()'s zero initialization for the spec-mandated
padding. Set ses->auth_key.len to the padded length. Larger GSS keys
(e.g. the 32-byte aes256-cts-hmac-sha1-96 session key) continue to be
stored at their natural length, preserving the FullSessionKey path.

Emit a cifs_dbg(VFS, ...) message when a short key is encountered to
surface deprecated-enctype usage.

NTLMv2 and NTLMSSP code paths produce a 16-byte session key by
construction and are unaffected.

Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: Use FullSessionKey for AES-256 encryption key derivation

When Kerberos authentication is used with AES-256 encryption (AES-256-CCM
or AES-256-GCM), the SMB3 encryption and decryption keys must be derived
using the full session key (Session.FullSessionKey) rather than just the
first 16 bytes (Session.SessionKey).

Per MS-SMB2 section 3.2.5.3.1, when Connection.Dialect is "3.1.1" and
Connection.CipherId is AES-256-CCM or AES-256-GCM, Session.FullSessionKey
must be set to the full cryptographic key from the GSS authentication
context. The encryption and decryption key derivation (SMBC2SCipherKey,
SMBS2CCipherKey) must use this FullSessionKey as the KDF input. The
signing key derivation continues to use Session.SessionKey (first 16
bytes) in all cases.

Previously, generate_key() hardcoded SMB2_NTLMV2_SESSKEY_SIZE (16) as the
HMAC-SHA256 key input length for all derivations. When Kerberos with
AES-256 provides a 32-byte session key, the KDF for encryption/decryption
was using only the first 16 bytes, producing keys that did not match the
server's, causing mount failures with sec=krb5 and require_gcm_256=1.

Add a full_key_size parameter to generate_key() and pass the appropriate
size from generate_smb3signingkey():
- Signing: always SMB2_NTLMV2_SESSKEY_SIZE (16 bytes)
- Encryption/Decryption: ses->auth_key.len when AES-256, otherwise 16

Also fix cifs_dump_full_key() to report the actual session key length for
AES-256 instead of hardcoded CIFS_SESS_KEY_SIZE, so that userspace tools
like Wireshark receive the correct key for decryption.

Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

drm/dp/mst: fix buffer overflows in sideband chunk accumulation

drm_dp_sideband_append_payload() has three related bugs when processing
device-provided sideband reply data:

1. Zero-length curchunk_len underflow: msg_len is a 6-bit field taken
   directly from the DP sideband header. If a device sends msg_len=0,
   curchunk_len is set to zero. The condition (curchunk_idx >= curchunk_len)
   is immediately true, and curchunk_len-1 wraps to 255 (u8 underflow).
   drm_dp_msg_data_crc4() reads 255 bytes from chunk[48], then memcpy()
   writes 255 bytes into msg[], both far out of bounds.

2. chunk[48] overflow: curchunk_len can reach 63 (6-bit field). chunk[] is
   only 48 bytes. Multi-iteration payload assembly appends 16-byte blocks
   until curchunk_idx reaches curchunk_len, writing up to 15 bytes past
   the end of chunk[] into msg[].

3. msg[256] overflow: each chunk contributes (curchunk_len-1) bytes to
   msg[]. No check ensures curlen + (curchunk_len-1) stays within msg[256],
   so the memcpy can spill into adjacent struct fields.

All three are reachable from any DP MST device that can forge sideband
reply messages on a physical connection.

Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)")
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260410041901.2438960-1-ashutoshdesai993@gmail.com

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.1-rc3).

Conflicts:

net/ipv4/igmp.c
  726fa7da2d8c ("ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation")
  c6bebaa744f7 ("ipv4: igmp: annotate data-races in igmp_heard_query()")
https://lore.kernel.org/a7365e4873340f7a5e30411207de3bf9@kernel.org

Adjacent changes:

net/psp/psp_main.c
  30cb24f97d44 ("psp: strip variable-length PSP header in psp_dev_rcv()")
  c2b22277ad89 ("psp: validate IPv4 header fields in psp_dev_rcv()")

net/sched/sch_fq_codel.c
  f83e07b29246 ("net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()")
  3f3aa77ff1c8 ("net/sched: add qstats_cpu_drop_inc() helper")

net/wireless/pmsr.c
  0f3c0a197309 ("wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage")
  410aa47fd9d3 ("wifi: cfg80211: allow suppressing FTM result reporting for PD requests")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drm: Drop HPAGE_PMD_SIZE dependency in dma_iova_try_alloc calls

The phys argument to dma_iova_try_alloc() is used only to compute
the sub-granule offset (phys & (granule - 1)). Since HPAGE_PMD_SIZE
is a power of two larger than any IOMMU granule, this expression
always evaluates to zero.

Replace the ternary expressions with a plain 0, which is what the
API documentation recommends for callers doing PAGE_SIZE-aligned
transfers. This also removes the dependency on HPAGE_PMD_SIZE and
thus on CONFIG_PGTABLE_HAS_HUGE_LEAVES / HAVE_ARCH_TRANSPARENT_HUGEPAGE,
fixing build failures on architectures such as arm32 that lack that
config.

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Assisted-by: GitHub Copilot:claude-sonnet-4.6 # Documentation
Link: https://lore.kernel.org/intel-xe/c36e7dfb-cf62-4d21-a3b1-f54cb43e0832@infradead.org/
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260507091606.1067973-1-francois.dugast@intel.com
Fixes: 2e0cd372b897 ("drm/pagemap: Use dma-map IOVA alloc, link, and sync API for DRM pagemap")
Fixes: 37ad039fb367 ("drm/gpusvm: Use dma-map IOVA alloc, link, and sync API in GPU SVM")
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter, IPsec, Bluetooth and WiFi.

  Current release - fix to a fix:

   - ipmr: add __rcu to netns_ipv4.mrt, make sure we hold the RCU lock
     in all relevant places

  Current release - new code bugs:

   - fixes for the recently added resizable hash tables

   - ipv6: make sure we default IPv6 tunnel drivers to =m now that IPv6
     itself is built in

   - drv: octeontx2-af: fixes for parser/CAM fixes

  Previous releases - regressions:

   - phy: micrel: fix LAN8814 QSGMII soft reset

   - wifi:
       - cw1200: revert "Fix locking in error paths"
       - ath12k: fix crash on WCN7850, due to adding the same queue
         buffer to a list multiple times

  Previous releases - always broken:

   - number of info leak fixes

   - ipv6: implement limits on extension header parsing

   - wifi: number of fixes for missing bound checks in the drivers

   - Bluetooth: fixes for races and locking issues

   - af_unix:
       - fix an issue between garbage collection and PEEK
       - fix yet another issue with OOB data

   - xfrm: esp: avoid in-place decrypt on shared skb frags

   - netfilter: replace skb_try_make_writable() by skb_ensure_writable()

   - openvswitch: vport: fix race between tunnel creation and linking
     leading to invalid memory accesses (type confusion)

   - drv: amd-xgbe: fix PTP addend overflow causing frozen clock

  Misc:

   - sched/isolation: make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
     (for relevant IPVS change)"

* tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (190 commits)
  net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
  net: sparx5: fix wrong chip ids for TSN SKUs
  net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
  tcp: Fix dst leak in tcp_v6_connect().
  ipmr: Call ipmr_fib_lookup() under RCU.
  net: phy: broadcom: Save PHY counters during suspend
  net/smc: fix missing sk_err when TCP handshake fails
  af_unix: Reject SIOCATMARK on non-stream sockets
  veth: fix OOB txq access in veth_poll() with asymmetric queue counts
  eth: fbnic: fix double-free of PCS on phylink creation failure
  net: ethernet: cortina: Drop half-assembled SKB
  selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl
  selftests: mptcp: check output: catch cmd errors
  mptcp: pm: prio: skip closed subflows
  mptcp: pm: ADD_ADDR rtx: return early if no retrans
  mptcp: pm: ADD_ADDR rtx: skip inactive subflows
  mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
  mptcp: pm: ADD_ADDR rtx: free sk if last
  mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
  mptcp: pm: ADD_ADDR rtx: fix potential data-race
  ...

Input: atmel_mxt_ts - check mem_size before calculating config memory size

In mxt_update_cfg(), the driver calculates the memory size needed to store
the configuration as data->mem_size - cfg.start_ofs. If data->mem_size is
less than or equal to cfg.start_ofs, this calculation will underflow or
result in a zero-size buffer, neither of which is valid for a configuration
update.

Add a check to return -EINVAL if data->mem_size is too small. While at it,
change the types of start_ofs and mem_size in struct mxt_cfg to u16 to
match the device address space.

Assisted-by: Gemini:gemini-3.1-pro
Link: https://patch.msgid.link/20260504185448.4055973-2-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: atmel_mxt_ts - fix boundary check in mxt_prepare_cfg_mem

When a configuration file provides an object size that is larger than the
driver's known mxt_obj_size(object), the driver intends to discard the
extra bytes.

The loop iterates using for (i = 0; i < size; i++). Inside the loop, the
condition to skip processing extra bytes is:

if (i > mxt_obj_size(object))
continue;

Since i is a 0-based index, the valid indices for the object are 0 through
mxt_obj_size(object) - 1.

When i == mxt_obj_size(object), the condition evaluates to false, and the
code processes the byte instead of discarding it.

This causes the code to calculate byte_offset = reg + i - cfg->start_ofs
and writes the byte there, overwriting exactly one byte of the adjacent
instance or object.

Update the boundary check to skip extra bytes correctly by using >=.

Fixes: 50a77c658b80 ("Input: atmel_mxt_ts - download device config using firmware loader")
Cc: stable@vger.kernel.org
Assisted-by: Gemini:gemini-3.1-pro
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://patch.msgid.link/20260504185448.4055973-1-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: fm801-gp - simplify initialisation of pci_device_id array

Instead of assigning the pci_device_id members using a list (which is
hard to read as you need to look at the order of the members in that
struct in parallel) use the PCI_VDEVICE() convenience macro to compact
the initialisation while improving readability.

Also drop trailing zeros that the compiler will care about then.

The change doesn't introduce binary changes to the compiled driver,
verified on both ARCH=x86 and ARCH=arm64.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20260507160051.3315630-2-u.kleine-koenig@baylibre.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()

sparx5_port_init() only invokes sparx5_serdes_set() and the associated
shadow-device enable and low-speed device switch for SGMII and QSGMII.
On any port with a high-speed primary device (DEV5G/DEV10G/DEV25G)
configured for 1000BASE-X the serdes is therefore left uninitialized,
the DEV2G5 shadow is never enabled, and the port stays pointed at its
high-speed device rather than the DEV2G5. The PCS1G block looks
healthy in isolation, but no frames reach the link partner.

Add 1000BASE-X to the check so the same three steps run.

Note: the same issue might apply to 2500BASE-X, but that will,
eventually, be addressed in a separate commit.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 946e7fd5053a ("net: sparx5: add port module support")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-4-fb236aa96908@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sparx5: fix wrong chip ids for TSN SKUs

The TSN SKUs in enum spx5_target_chiptype have incorrect IDs:

  SPX5_TARGET_CT_7546TSN    = 0x47546,
  SPX5_TARGET_CT_7549TSN    = 0x47549,
  SPX5_TARGET_CT_7552TSN    = 0x47552,
  SPX5_TARGET_CT_7556TSN    = 0x47556,
  SPX5_TARGET_CT_7558TSN    = 0x47558,

The value read back from the chip is GCB_CHIP_ID_PART_ID, which is a
GENMASK(27, 12) field, i.e. at most 16 bits wide. It can never match
these IDs, so probing a TSN part fails with a "Target not supported"
error.

Fix the enum to use the actual 16-bit part IDs returned by the
hardware: 0x0546, 0x0549, 0x0552, 0x0556 and 0x0558.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 3cfa11bac9bb ("net: sparx5: add the basic sparx5 driver")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-3-fb236aa96908@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Again a collection of small fixes, mostly for device-specific ones.

  The only big LOC is about the removal of pretty old dead code in
  ab8500 codec driver, while the rest all nice small changes.

  Core / API:
   - Fix race in deferred fasync state checks
   - Fix UMP group filtering in sequencer

  ASoC:
   - cs35l56: fixes for driver cleanup and error paths
   - tas2764/2770: workaround for bogus temperature readings
   - wm_adsp: fixes for firmware unit tests
   - amd-yc: more DMI quirks for laptops
   - Minor fixes for fsl_xcvr and spacemit

  HD-Audio:
   - Mute LED and speaker quirks for HP, Lenovo, and Xiaomi laptops

  USB-audio:
   - New device-specific quirks (Motu, JBL, AlphaTheta, Razer)
   - Fix of MIDI2 playback on resume

  Others:
   - Firewire-tascam control event fix
   - Minor cleanups and fixes for sparc/dbri and pcmtest"

* tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
  ASoC: cs35l56: Destroy workqueue in probe error path
  ASoC: cs35l56: Don't use devres to unregister component
  ALSA: sparc/dbri: add missing fallthrough
  ALSA: core: Serialize deferred fasync state checks
  ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx
  ALSA: seq: Fix UMP group 16 filtering
  ASoC: wm_adsp_fw_find_test: Clear searched_fw_files in find-by-index test
  ASoC: wm_adsp_fw_find_test: Redirect wm_adsp_release_firmware_files()
  ASoC: tas2770: Deal with bogus initial temperature value
  ASoC: tas2764: Deal with bogus initial temperature register value
  ALSA: usb-audio: add clock quirk for Motu 1248
  ALSA: usb-audio: midi2: Restart output URBs on resume
  ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx
  ALSA: usb-audio: Add quirk flags for JBL Pebbles
  ALSA: firewire-tascam: Do not drop unread control events
  ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA
  ASoC: fsl_xcvr: Fix event generation for cached controls
  ASoC: sdw_utils: avoid the SDCA companion function not supported failure
  ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
  ASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()
  ...

Merge tag 'platform-drivers-x86-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

- Silence unknown board warning for 8D41 (hp-wmi)

- Fix uninitialized variable in fan RPM handling (lenovo/wmi-other)

- Check min_size also when ACPI does not return an out object (wmi)

* tag 'platform-drivers-x86-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: lenovo: wmi-other: Fix uninitialized variable in lwmi_om_hwmon_write()
  platform/x86: hp-wmi: silence unknown board warning for 8D41
  platform/wmi: Fix unchecked min_size in wmidev_invoke_method()

soc: imx8m: Fix match data lookup for soc device

The i.MX8M soc device is registered via platform_device_register_simple(),
so it is not associated with a Device Tree node and the imx8m_soc_driver
has no of_match_table.

As a result, device_get_match_data() always returns NULL when probing
the soc device.

Retrieve the match data directly from the machine compatible using
of_machine_get_match_data(imx8_soc_match), which provides the correct SoC
data.

Fixes: 2524b293a59e5 ("soc: imx8m: don't access of_root directly")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

Merge tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:

- Fix detach procedure for virtual devices in genpd

- mediatek: Fix use-after-free in scpsys_get_bus_protection_legacy()

* tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: fix use-after-free in scpsys_get_bus_protection_legacy()
pmdomain: core: Fix detach procedure for virtual devices in genpd

net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()

priv->dev was never initialized after devm_kzalloc() allocates the
private data structure. When nvt_set_phy_intf_sel() is later invoked
via the phylink interface_select callback, it calls
nvt_gmac_get_delay(priv->dev, ...) which dereferences the NULL pointer.

Fix this by assigning priv->dev = dev immediately after allocation.

Fixes: 4d7c557f58ef ("net: stmmac: dwmac-nuvoton: Add dwmac glue for Nuvoton MA35 family")
Signed-off-by: Joey Lu <a0987203069@gmail.com>
Link: https://patch.msgid.link/20260506084614.192894-2-a0987203069@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Fix dst leak in tcp_v6_connect().

If a socket is bound to a wildcard address, tcp_v[46]_connect()
updates it with a non-wildcard address based on the route lookup.

After bhash2 was introduced in the cited commit, we must call
inet_bhash2_update_saddr() to update the bhash2 entry as well.

If inet_bhash2_update_saddr() fails, we must release the refcount
for dst by ip_route_connect() or ip6_dst_lookup_flow().

While tcp_v4_connect() calls ip_rt_put() in the error path,
tcp_v6_connect() does not call dst_release().

Let's call dst_release() when inet_bhash2_update_saddr() fails
in tcp_v6_connect().

Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260506070443.1699879-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Call ipmr_fib_lookup() under RCU.

Yi Lai reported RCU splat in reg_vif_xmit() below. [0]

When CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup()
uses rcu_dereference() without explicit rcu_read_lock().

Although rcu_read_lock_bh() is already held by the caller
__dev_queue_xmit(), lockdep requires explicit rcu_read_lock()
for rcu_dereference().

Let's move up rcu_read_lock() in reg_vif_xmit() to
cover ipmr_fib_lookup().

[0]:
WARNING: suspicious RCU usage
7.1.0-rc2-next-20260504-9d0d467c3572 #1 Not tainted
-----------------------------
net/ipv4/ipmr.c:329 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz.2.17/1779:
#0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: rcu_read_lock_bh include/linux/rcupdate.h:891 [inline]
#0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x239/0x4140 net/core/dev.c:4792
#1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
#1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __netif_tx_lock include/linux/netdevice.h:4795 [inline]
#1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __dev_queue_xmit+0x1d5d/0x4140 net/core/dev.c:4865

stack backtrace:
CPU: 1 UID: 0 PID: 1779 Comm: syz.2.17 Not tainted 7.1.0-rc2-next-20260504-9d0d467c3572 #1 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x121/0x150 lib/dump_stack.c:120
dump_stack+0x19/0x20 lib/dump_stack.c:129
lockdep_rcu_suspicious+0x15b/0x1f0 kernel/locking/lockdep.c:6878
ipmr_fib_lookup net/ipv4/ipmr.c:329 [inline]
reg_vif_xmit+0x2ee/0x3c0 net/ipv4/ipmr.c:540
__netdev_start_xmit include/linux/netdevice.h:5382 [inline]
netdev_start_xmit include/linux/netdevice.h:5391 [inline]
xmit_one net/core/dev.c:3889 [inline]
dev_hard_start_xmit+0x170/0x700 net/core/dev.c:3905
__dev_queue_xmit+0x1df1/0x4140 net/core/dev.c:4871
dev_queue_xmit include/linux/netdevice.h:3423 [inline]
packet_xmit+0x252/0x370 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3082 [inline]
packet_sendmsg+0x39ad/0x5650 net/packet/af_packet.c:3114
sock_sendmsg_nosec net/socket.c:797 [inline]
__sock_sendmsg net/socket.c:812 [inline]
____sys_sendmsg+0xa21/0xba0 net/socket.c:2716
___sys_sendmsg+0x121/0x1c0 net/socket.c:2770
__sys_sendmsg+0x177/0x220 net/socket.c:2802
__do_sys_sendmsg net/socket.c:2807 [inline]
__se_sys_sendmsg net/socket.c:2805 [inline]
__x64_sys_sendmsg+0x80/0xc0 net/socket.c:2805
x64_sys_call+0x1d9c/0x21c0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xc1/0x1020 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f37e563ee5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe5caa7fa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000005c5fa0 RCX: 00007f37e563ee5d
RDX: 0000000000000000 RSI: 00002000000012c0 RDI: 0000000000000004
RBP: 00000000005c5fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000005c5fac R15: 00000000005c5fa0
</TASK>

Fixes: b3b6babf4751 ("ipmr: Free mr_table after RCU grace period.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Yi Lai <yi1.lai@intel.com>
Closes: https://lore.kernel.org/netdev/afrY34dLXNUboevf@ly-workstation/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260506065955.1695753-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: broadcom: Save PHY counters during suspend

The PHY counters can be lost if the PHY is reset during suspend. We
need to save the values into the shadow counters or the accounting
will be incorrect over multiple suspend and resume cycles.

Fixes: 820ee17b8d3b ("net: phy: broadcom: Add support code for reading PHY counters")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260505173926.2870069-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/smc: fix missing sk_err when TCP handshake fails

In smc_connect_work(), when the underlying TCP handshake fails, the error
code (rc) must be propagated to sk_err to ensure userspace can correctly
retrieve the error status via SO_ERROR. Currently, the code only handles
a restricted set of error codes (e.g., EPIPE, ECONNREFUSED). If other
errors occurs, such as EHOSTUNREACH, sk_err remains unset (zero).

This affects applications that rely on SO_ERROR to determine connect
outcome. For example, higher versions of Go's netpoller treats
SO_ERROR == 0 combined with a failed getpeername() as a spurious wakeup
and re-enters epoll_wait(). Under ET mode, no further edge will be
generated since the socket is already in a terminal state, causing the
connect to hang indefinitely or until a user-specified timeout, if one
is set.

Fixes: 50717a37db03 ("net/smc: nonblocking connect rework")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20260506014105.27093-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Reject SIOCATMARK on non-stream sockets

SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.

In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.

Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gpio: timberdale: Remove platform data header

With no more users, we can remove timb_gpio.h.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260327-gpio-timberdale-swnode-v3-4-9a1bc1b2b124@oss.qualcomm.com
Signed-off-by: Lee Jones <lee@kernel.org>

gpio: timberdale: Use device properties

The top-level MFD driver now passes the device properties to the GPIO
cell via the software node. Use generic device property accessors and
stop using platform data. We can ignore the "ngpios" property here now
as it will be retrieved internally by GPIO core.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260327-gpio-timberdale-swnode-v3-3-9a1bc1b2b124@oss.qualcomm.com
Signed-off-by: Lee Jones <lee@kernel.org>

mfd: timberdale: Set up a software node for the GPIO cell

Using generic device properties instead of custom platform data
structures is preferred due to the resulting unification of the way
properties are accessed in consumer drivers. There's no DT node for the
GPIO cell in this driver but we can create a software node with device
properties and attach it to all the GPIO cells.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260327-gpio-timberdale-swnode-v3-2-9a1bc1b2b124@oss.qualcomm.com
Signed-off-by: Lee Jones <lee@kernel.org>

mfd: timberdale: Move GPIO_NR_PINS into the driver

This symbol is only used inside the Timberdale MFD driver. Move into
the .c file as there's no need for it to be exposed in a header.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260327-gpio-timberdale-swnode-v3-1-9a1bc1b2b124@oss.qualcomm.com
Signed-off-by: Lee Jones <lee@kernel.org>

ALSA: hda: cs35l41: Put ACPI device on missing physical node

acpi_dev_get_first_match_dev() returns a refcounted ACPI device and
callers must balance it with acpi_dev_put().

cs35l41_hda_read_acpi() stores the returned ACPI device in
cs35l41->dacpi. That reference is normally released by the later
probe cleanup or the remove path, but the NULL-check on
physdev exits before either of those paths can run.

Drop the lookup reference before returning -ENODEV.

Fixes: c34b04cc6178 ("ALSA: hda: cs35l41: Fix NULL pointer dereference in cs35l41_hda_read_acpi()")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Tested-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260428081238.GA1659932@chcpu16

ALSA: hda: cs35l56: Put ACPI device after setting companion

acpi_dev_get_first_match_dev() returns a refcounted ACPI device and
callers are expected to balance it with acpi_dev_put().

When no companion is already attached, cs35l56_hda_read_acpi() looks
up an ACPI device and sets it with ACPI_COMPANION_SET(), but leaves
the lookup reference held.

ACPI_COMPANION_SET() does not take ownership of that reference, so
drop it with acpi_dev_put() after attaching the companion.

Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Tested-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260428080139.GA1649104@chcpu16

fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed'

Commit

3b497c3f4f04 ("fs/resctrl: Introduce the interface to display monitoring modes")

introduced CONFIG_RESCTRL_ASSIGN_FIXED but left adding the Kconfig
entry until it was necessary. The counter assignment mode is fixed in
MPAM, even when there are assignable counters, and so addressing this
is needed to support MPAM.

To avoid the burden of another Kconfig entry, replace
CONFIG_RESCTRL_ASSIGN_FIXED with a new property in 'struct resctrl_mon',
'mbm_cntr_assign_fixed' to be set by the architecture.

Do not request the architecture to change the counter assignment mode if it
does not support doing so. Provide insight to user space about why such a
request fails.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/20260506082855.3694761-1-ben.horgan@arm.com

veth: fix OOB txq access in veth_poll() with asymmetric queue counts

XDP redirect into a veth device (via bpf_redirect()) calls
veth_xdp_xmit(), which enqueues frames into the peer's ptr_ring using
  smp_processor_id() % peer->real_num_rx_queues
as the ring index.  With an asymmetric veth pair where the peer has
fewer TX queues than RX queues, that index can exceed
peer->real_num_tx_queues.

veth_poll() then resolves peer_txq for the ring via:

  peer_txq = peer_dev ? netdev_get_tx_queue(peer_dev, queue_idx) : NULL;

where queue_idx = rq->xdp_rxq.queue_index.  When queue_idx exceeds
peer_dev->real_num_tx_queues this is an out-of-bounds (OOB) access
into the peer's netdev_queue array, triggering DEBUG_NET_WARN_ON_ONCE
in netdev_get_tx_queue().

The normal ndo_start_xmit path is not affected: the stack clamps
skb->queue_mapping via netdev_cap_txqueue() before invoking
ndo_start_xmit, so rxq in veth_xmit() never exceeds real_num_tx_queues.

Fix veth_poll() by clamping: only dereference peer_txq when queue_idx is
within bounds, otherwise set it to NULL.  The out-of-range rings are fed
exclusively via XDP redirect (veth_xdp_xmit), never via ndo_start_xmit
(veth_xmit), so the peer txq was never stopped and there is nothing to
wake; NULL is the correct fallback.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/all/20260502071828.616C3C19425@smtp.kernel.org/
Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20260505132159.241305-2-hawk@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge drm/drm-next into drm-misc-next

Merge drm-next to bring the drm_atomic_state renaming patch.

Signed-off-by: Maxime Ripard <mripard@kernel.org>

platform/x86: hp-wmi: Add support for Victus 16-r0xxx (8BC2)

The HP Victus 16-r0xxx (board ID: 8BC2) has the same WMI as other Victus
S boards, but requires quirks for correctly switching thermal profile.

Add the DMI board name to victus_s_thermal_profile_boards[] table and
map it to omen_v1_thermal_params.

Testing on board 8BC2 confirmed that platform profile is registered
successfully and fan RPMs are readable and controllable.

Signed-off-by: Haichen Feng <2806891994@qq.com>
Link: https://patch.msgid.link/tencent_8E29805D8DC7B6005244C3433C62DD9DF606@qq.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

drm/panfrost: Fix wait_bo ioctl leaking positive return from dma_resv_wait_timeout()

dma_resv_wait_timeout() returns a positive 'remaining jiffies' value
on success, 0 on timeout, and -errno on failure.

panfrost_ioctl_wait_bo() returns this 'long' result from an int-typed
ioctl handler, so positive values reach userspace as bogus errors.
Explicitly set ret to 0 on the success path.

Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Cc: stable@vger.kernel.org
Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://patch.msgid.link/fe33f82fded7be1c18e2e0eb2db451d5a738cf39.1776581974.git.gye976@gmail.com
Signed-off-by: Steven Price <steven.price@arm.com>

accel/rocket: Fix prep_bo ioctl leaking positive return from dma_resv_wait_timeout()

dma_resv_wait_timeout() returns a positive 'remaining jiffies' value
on success, 0 on timeout, and -errno on failure.

rocket_ioctl_prep_bo() returns this 'long' result from an int-typed
ioctl handler, so positive values reach userspace as bogus errors.
Explicitly set ret to 0 on the success path.

Fixes: 525ad89dd904 ("accel/rocket: Add IOCTLs for synchronizing memory accesses")
Cc: stable@vger.kernel.org
Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://patch.msgid.link/c0ebf83b345721701b22d8f5bc41c52c0ecf5e16.1776581974.git.gye976@gmail.com
Signed-off-by: Steven Price <steven.price@arm.com>

MAINTAINERS: Add cgbc backlight driver

Add missing backlight driver in CONGATEC BOARD CONTROLLER entry.

Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
Link: https://patch.msgid.link/20260427-backlight-cgbc-remove-x86-dependency-v2-2-da9f2375a34a@bootlin.com
Signed-off-by: Lee Jones <lee@kernel.org>

backlight: cgbc: Remove redundant X86 dependency

The backlight driver depends on the MFD cgbc-core driver, which already
depends on X86. The explicit X86 dependency for the backlight driver is
redundant and can be safely removed.

Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
Link: https://patch.msgid.link/20260427-backlight-cgbc-remove-x86-dependency-v2-1-da9f2375a34a@bootlin.com
Signed-off-by: Lee Jones <lee@kernel.org>

pinctrl: qcom: nord: remove duplicated pin function

The qdss_cti function is initialized twice in the nord_functions array.
Remove the duplicate entry.

Fixes: c24dd0826f06 ("pinctrl: qcom: add the TLMM driver for the Nord platforms")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605061633.BJLI5voT-lkp@intel.com/
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>

drm/panthor: Avoid potential UAF due to memory reclaim

Recent changes to add shrinker support introduced a use after free
vulnerability.
When a BO is evicted from the shrinker callback, all its CPU and GPU
mappings are invalidated. It can happen that another GPU mapping is
created for the BO after the eviction. Because of the new GPU mapping,
BO will be added back to one of the reclaim list but the state of
corresponding vm_bo will not be changed.
If vm_bo remains in evicted state and shrinker callback is invoked
again then the new GPU mapping won't be invalidated. As a result the
backing pages, which were acquired on the creation of new GPU mapping,
can get reclaimed and reused whilst they are still mapped to the GPU.

To prevent the use after free possibility, this commit removes the
evicted check for vm_bo so that all GPU mappings are checked for
invalidation.

v2:
- Update comment and add a newline in
panthor_vm_evict_bo_mappings_locked().

Fixes: fb42964e2a76 ("drm/panthor: Add a GEM shrinker")
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Akash Goel <akash.goel@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patch.msgid.link/20260413080253.1288157-1-akash.goel@arm.com

KVM: arm64: Pre-check vcpu memcache for host->guest donate

__pkvm_host_donate_guest() flips the host stage-2 PTE for the
donated page to a non-valid annotation via
host_stage2_set_owner_metadata_locked() and then calls
kvm_pgtable_stage2_map() to install the matching guest stage-2
mapping. The map's return value is wrapped in WARN_ON() and
otherwise discarded, asserting that the call cannot fail.

WARN_ON() at nVHE EL2 panics, so this assertion is only correct
if the call genuinely cannot fail. kvm_pgtable_stage2_map() can
fail with -ENOMEM even at PAGE_SIZE granularity: the donate path
verifies PKVM_NOPAGE for the guest IPA before the map, so the
walker must allocate fresh page-table pages from the vcpu
memcache, and the host controls the vcpu memcache via the topup
interface. An under-provisioned donation request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.

Bound the worst-case walker allocation alongside the existing
__host_check_page_state_range() / __guest_check_page_state_range()
pre-checks, using the helper introduced for host->guest share. If
the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(),
return -ENOMEM before any state mutation.

Fixes: 1e579adca177 ("KVM: arm64: Introduce __pkvm_host_donate_guest()")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260501112149.2824881-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Pre-check vcpu memcache for host->guest share

__pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to
install the guest stage-2 mapping, after a forward pass that mutates
the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments
host_share_guest_count) for every page in the range. The map's
return value is wrapped in WARN_ON() and otherwise discarded,
asserting that the call cannot fail.

WARN_ON() at nVHE EL2 panics, so this assertion is only correct if
the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail
with -ENOMEM when the stage-2 walker exhausts the caller's
memcache, and the host controls the vcpu memcache via the topup
interface, so an under-provisioned share request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.

Bound the worst-case walker allocation in the existing pre-check
pass so that kvm_pgtable_stage2_map() cannot fail at the call
site, using kvm_mmu_cache_min_pages() -- the same bound host EL1
uses for its own stage-2 maps. If the vcpu memcache holds fewer
pages, return -ENOMEM before any state mutation.

Fixes: d0bd3e6570ae ("KVM: arm64: Introduce __pkvm_host_share_guest()")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260501112149.2824881-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache

The hypercall handlers call pkvm_refill_memcache() to top up the
hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest().
pkvm_ownership_selftest invokes those functions directly with a
static selftest_vcpu that has an empty memcache.

Seed selftest_vcpu's memcache from the prepopulated selftest
pages, leaving the remainder for selftest_vm.pool. Required by
the memcache-sufficiency pre-check added in the following
patches.

Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260501112149.2824881-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix __deactivate_fgt macro parameter typo

__deactivate_fgt() declares its first parameter as "htcxt" but the body
references "hctxt". The parameter is unused; the macro silently captures
"hctxt" from the enclosing scope. Both existing callers
(__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen
to define a local "struct kvm_cpu_context *hctxt", so the macro works
by coincidence.

A future caller without an "hctxt" local in scope, or naming it
differently, would compile but bind to the wrong context. Align the
parameter name with the sibling __activate_fgt() macro.

The "vcpu" parameter remains unused in the body, kept for API symmetry
with __activate_fgt() (which uses it).

Fixes: f5a5a406b4b8 ("KVM: arm64: Propagate and handle Fine-Grained UNDEF bits")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260501112149.2824881-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Guard against NULL vcpu on VHE hyp panic path

On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu)
on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer
is cleared after every guest exit (and is never set when no guest is
running), so an unexpected EL2 exception landing in _guest_exit_panic,
e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this
function with vcpu == NULL. __deactivate_traps() then dereferences vcpu
via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv()
-> vcpu->arch.features, faulting inside the panic handler and obscuring
the original failure.

The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c)
already guards its vcpu-using cleanup with "if (vcpu)"; mirror that
here. sysreg_restore_host_state_vhe() does not depend on vcpu and
continues to run unconditionally, preserving panic forensics. The
trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's
%p handling.

Fixes: 6a0259ed29bb ("KVM: arm64: Remove hyp_panic arguments")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260501112149.2824881-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Make EL2 exception entry and exit context-synchronization events

SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
0487 M.b D24.2.175 (p. D24-9754):

  - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
    a CSE.
  - FEAT_ExS: the reset value is architecturally UNKNOWN; software
    must set the bit to make the entry/exit a CSE.

INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
architecturally backed unless EOS=1 (and, for entry, EIS=1).

Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
was setting both unconditionally. The conversion made
SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
models unconditionally-RES1 fields and EIS/EOS are RES1 only when
FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
are the only bits whose semantics changed for FEAT_ExS hardware
and where the kernel relies on the value being 1.

Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
unconditionally CSEs regardless of whether FEAT_ExS is implemented.
This matches the pairing in arch/arm64/kvm/config.c which treats EIS
and EOS together as RES1 under !FEAT_ExS.

Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260501112149.2824881-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

platform/x86/intel/tpmi/plr: Prevent fault during unbind

This driver faults when intel vsec driver is unbound from PCI driver
interface. For example:

echo 0000:00:03.1 > /sys/bus/pci/drivers/intel_vsec/unbind

This is caused by accessing plr->dbgfs_dir after vsec_tpmi driver is
removed. Here vsec_tpmi driver is the parent. On unbind, the parent
device remove callback is called first which here will remove debugfs
interface. Hence plr->dbgfs_dir is no longer valid.

Register notifier for TPMI_CORE_EXIT and make this pointer to NULL,
so that debugfs_remove_recursive() is not called with bad plr->dbgfs_dir
pointer.

After notifier is returned the vsec_tpmi driver will call remove debugfs
by calling debugfs_remove_recursive().

Fixes: 811f67c51636 ("platform/x86/intel/tpmi: Add new auxiliary driver for performance limits")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stable@vger.kernel.org
Link: https://patch.msgid.link/20260430151103.1549733-4-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86: intel: Add notifiers support

In some cases a driver using services of vsec_tpmi driver requires some
processing before vsec_tpmi exits. For example a children using debugfs
can't use debugfs as this will be deleted by the vsec_tpmi driver.

This is the case when unbind using PCI driver interface. In this case
the remove callback of vsec_tpmi driver is called first, then remove
callback of its children.

Add support of blocking chain notifiers support. Notify on successful probe
and before clean up in the remove callback.

Fixes: 811f67c51636 ("platform/x86/intel/tpmi: Add new auxiliary driver for performance limits")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stable@vger.kernel.org
Link: https://patch.msgid.link/20260430151103.1549733-3-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86: intel: Move debugfs register before creating devices

It is possible that the driver handling device is enumerated before
registering debugfs. If the driver wants to access debugfs by calling
tpmi_get_debugfs_dir(), this will return error in this case.

Hence register debugfs before creating devices.

Fixes: 811f67c51636 ("platform/x86/intel/tpmi: Add new auxiliary driver for performance limits")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stable@vger.kernel.org
Link: https://patch.msgid.link/20260430151103.1549733-2-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

firmware: psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND

PSCI specification defines the SYSTEM_SUSPEND feature which enables the
firmware to implement the suspend to RAM (S2RAM) functionality by
transitioning the system to a deeper low power state. When the system
enters such state, the power to the peripheral devices might be removed. So
the respective device drivers must prepare for the possible removal of the
power by performing actions such as shutting down or resetting the device
in their system suspend callbacks.

The Linux PM framework allows the platform drivers to convey this info to
device drivers by calling the pm_set_suspend_via_firmware() and
pm_set_resume_via_firmware() APIs.

Hence, if the PSCI firmware supports SYSTEM_SUSPEND feature, call the above
mentioned APIs in the psci_system_suspend_begin() and
psci_system_suspend_enter() callbacks.

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
[mani: reworded the description to be more elaborative]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ARM: multi_v7_defconfig: Correct QCOM_RPMH and QCOM_RPMHPD

QCOM_RPMH and QCOM_RPMHPD can be build only as modules when
QCOM_COMMAND_DB is module itself.

Fixes: 1c25ca9bb5c5 ("ARM: multi_v7_defconfig: enable more Qualcomm drivers")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ARM: multi_v7_defconfig: Cleanup redundant options

Drop all redundant entries:
1. SERIAL_BCM63XX is default=y for ARCH_BCMBCA.
2. USB_GPIO_VBUS is conflicting with enabled USB_CONN_GPIO since
   commit 0ce0f9d0785a ("usb: phy: phy-gpio-vbus-usb: Add device tree
   probing").
3. SND_HDA_CODEC_REALTEK_LIB is selected by other codecs,
   SND_HDA_CODEC_ALC269 by default.
4. SND_PXA_SOC_SSP depends on ARCH_PXA which is only for multi v5.
5. SND_SOC_ROCKCHIP is gone since commit cae3cc435db5 ("ASoC: rockchip:
   Standardize ASoC menu") and can be simply dropped without effect.
6. SND_SOC_TLV320AIC32X4 is selected by SND_SOC_TLV320AIC32X4_I2C/SPI.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ARM: configs: Drop redundant SND_ATMEL_SOC

CONFIG_SND_ATMEL_SOC is gone since commit 4f30f84feb77 ("ASoC: atmel:
Standardize ASoC menu") and can be simply dropped without effect.

No impact on include/generated/autoconf.h.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Acked-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ARM: configs: Drop redundant I2C_DESIGNWARE_PLATFORM

I2C_DESIGNWARE_PLATFORM is default=y via I2C_DESIGNWARE_CORE, which is
enabled. No impact on include/generated/autoconf.h.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ARM: multi_v7_defconfig: Move entries to match savedefconfig

Only re-shuffle entries to match savedefconfig, without removing any
symbols. This helps in reviewing defconfig when comparing with
savedefconfig and looking for obsolete symbols. No impact on
include/generated/autoconf.h.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'v7.2-qcom-pinctrl-defconfigs' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl into soc/defconfig

This is a split-off of a patch that was applied in two parts
in the pinctrl subsystem augmenting Kconfig and defconfigs in
sequence.

This is just the defconfig changes.

This should be pulled into the SoC defconfigs branch for 7.2
to avoid conflicts in linux-next.

* tag 'v7.2-qcom-pinctrl-defconfigs' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: Make important drivers default (2)

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

arm64: defconfig: Switch Ethernet drivers to modules

Development of Linux kernel progressed over last 10 years and it is easy
to generate now initramfs for building own kernel, e.g. with Yocto or
mkosi.  Therefore for a few years of reviews on mailing lists, all new
options enabled in arm64 defconfig were with assumption of having
initramfs which can load bare minimum of modules to mount filesystem
from network or disk.  Basically network driver as built-in is not
anymore essential to boot the system, so switch almost all Ethernet
drivers to modules to save on kernel image size.

Similarly 9P network filesystem for QEMU, especially that testing kernel
unuder QEMU does not have any size or build process constraints and can
use initramfs with -initrd argument.  Note that having network drivers
does not break NFS root, because whatever loading method, e.g. TFTP,
which brought the kernel image can bring also the initramfs with network
adapters (and I have been using such method for years for my Samsung
boards).

Notable exceptions / diff explanations:

1. Mark I2C as built-in, used by CONFIG_IGB as a module.

2. CONFIG_BCM4908_ENET and CONFIG_BCMASP appear in the diff, because
   they were default=y (via ARCH_BCMBCA or ARCH_BCM_IPROC).

3. CONFIG_HNS3_HCLGE and CONFIG_HNS3_ENET are removed, because they are
   default=m.

Moving code to modules has positive impact on kernel image size, thus
boot time of all users not using above drivers and ability to flash
fixed-size boot partitions.

Old Image size: 41.2 MiB (Image.gz: 14.8 MiB)
New Image size: 39.1 MiB (Image.gz: 13.8 MiB)

bloat-o-meter of vmlinux:
add/remove: 3/6972 grow/shrink: 6/45 up/down: 66659/-2333659 (-2267000)

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20260429-defconfig-v2-5-e4ed4186028b@oss.qualcomm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

arm64: defconfig: Drop unused Ethernet vendors

Make going through `make menuconfig` easier by disabling the unused
Ethernet vendor menu options, where no actual driver for given vendor is
selected.

Impact checked with comparing old and new autoconf.h:
diff ... | grep '^-' | grep -v CONFIG_NET_VENDOR

Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20260429-defconfig-v2-4-e4ed4186028b@oss.qualcomm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

arm64: defconfig: Drop default or selected drivers

Drop all options which are either defaults or selected by other drivers:
1. IMX Pinctrl drivers are defaults for this ARCH.
2. RP1 is selected by MISC_RP1.
3. I2C_DESIGNWARE_PLATFORM is default with I2C_DESIGNWARE_CORE.
4. DRM_SAMSUNG_DSIM is selected by DRM_EXYNOS_DSI.
5. SUN6I, SUN8I and Renesas DRM drivers have defaults.
6. SND_SOC_ROCKCHIP does not exist.
7. Several ASoC drivers are selected by other main sound switches.
8. USB_CDNS3_IMX is default with USB_CDNS3.
9. IPQ_APSS_5018 never existed.

No impact on include/generated/autoconf.h.

Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20260429-defconfig-v2-3-e4ed4186028b@oss.qualcomm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

arm64: defconfig: Drop unused legacy netfilter options

Drop all options which are ineffective now, because they depend on
disabled IP_NF_IPTABLES_LEGACY or IP6_NF_IPTABLES_LEGACY. No impact on
include/generated/autoconf.h.

Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20260429-defconfig-v2-2-e4ed4186028b@oss.qualcomm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

arm64: defconfig: Move entries to match savedefconfig

Only re-shuffle entries to match savedefconfig, without removing any
symbols. This helps in reviewing defconfig when comparing with
savedefconfig and looking for obsolete symbols. No impact on
include/generated/autoconf.h.

Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20260429-defconfig-v2-1-e4ed4186028b@oss.qualcomm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>