git.ipfire.org Git - thirdparty/kernel/linux.git/log

btrfs: rename struct btrfs_block_group field commit_used to last_used

Rename the field commit_used in struct btrfs_block_group to last_used,
for clarity and consistency with the similar fields we're about to add.
It's not obvious that commit_flags means "flags as of the last commit"
rather than "flags related to a commit".

Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't add metadata items for the remap tree to the extent tree

There is the following potential problem with the remap tree and delayed refs:

* Remapped extent freed in a delayed ref, which removes an entry from the
remap tree
* Remap tree now small enough to fit in a single leaf
* Corruption as we now have a level-0 block with a level-1 metadata item
in the extent tree

One solution to this would be to rework the remap tree code so that it operates
via delayed refs. But as we're hoping to remove cow-only metadata items in the
future anyway, change things so that the remap tree doesn't have any entries in
the extent tree. This also has the benefit of reducing write amplification.

We also make it so that the clear_cache mount option is a no-op, as with the
extent tree v2, as the free-space tree can no longer be recreated from the
extent tree.

Finally disable relocating the remap tree itself, which is added back in
a later patch. As it is we would get corruption as the traditional
relocation method walks the extent tree, and we're removing its metadata
items.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove remapped block groups from the free-space-tree

No new allocations can be done from block groups that have the REMAPPED
flag set, so there's no value in their having entries in the free-space
tree.

Prevent a search through the free-space tree being scheduled for such a
block group, and prevent any additions to the in-memory free-space tree.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: allow remapped chunks to have zero stripes

When a chunk has been fully remapped, we are going to set its
num_stripes to 0, as it will no longer represent a physical location on
disk.

Change tree-checker to allow for this, and fix read_one_chunk() to avoid
a divide by zero.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: add METADATA_REMAP chunk type

Add a new METADATA_REMAP chunk type, which is a metadata chunk that holds the
remap tree.

This is needed for bootstrapping purposes: the remap tree can't itself
be remapped, and must be relocated the existing way, by COWing every
leaf. The remap tree can't go in the SYSTEM chunk as space there is
limited, because a copy of the chunk item gets placed in the superblock.

The changes in fs/btrfs/volumes.h are because we're adding a new block
group type bit after the profile bits, and so can no longer rely on the
const_ilog2 trick.

The sizing to 32MB per chunk, matching the SYSTEM chunk, is an estimate
here, we can adjust it later if it proves to be too big or too small.
This works out to be ~500,000 remap items, which for a 4KB block size
covers ~2GB of remapped data in the worst case and ~500TB in the best case.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: add definitions and constants for remap-tree

Add an incompat flag for the new remap-tree feature, and the constants
and definitions needed to support it.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: add and use helper to compute the available space for a block group

We have currently three places that compute how much available space a
block group has. Add a helper function for this and use it in those
places.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: tag as unlikely error handling in run_one_delayed_ref()

We don't expect to get errors unless we have a corrupted fs, bad RAM or a
bug. So tag the error handling as unlikely.

This slightly reduces the module's text size on x86_64 using gcc 14.2.0-19
from Debian.

Before this change:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1939458 172512   15592 2127562 2076ca fs/btrfs/btrfs.ko

After this change:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1939398 172512   15592 2127502 20768e fs/btrfs/btrfs.ko

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove unnecessary else branch in run_one_delayed_ref()

There is no need for an else branch to deal with an unexpected delayed ref
type. We can just change the previous branch to deal with this by checking
if the ref type is not BTRFS_EXTENT_OWNER_REF_KEY, since that branch is
useless as it only sets 'ret' to zero when it's already zero. So merge the
two branches.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't BUG() on unexpected delayed ref type in run_one_delayed_ref()

There is no need to BUG(), we can just return an error and log an error
message.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use READA_FORWARD_ALWAYS for device extent verification

btrfs_verify_dev_extents() iterates through the entire device tree
during mount to verify dev extents against chunks. Since this function
scans the whole tree, READA_FORWARD_ALWAYS is more appropriate than
READA_FORWARD.

While the device tree is typically small (a few hundred KB even for
multi-TB filesystems), using the correct readahead mode for full-tree
iteration is more consistent with the intended usage.

Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: jinbaohong <jinbaohong@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: shrink the size of btrfs_device

There are two main causes of holes inside btrfs_device:

- The single bytes member of last_flush_error
  Not only it's a single byte member, but we never really care about the
  exact error number.

- The @devt member
  Which is placed between two u64 members.

Shrink the size of btrfs_device by:

- Use a single bit flag for flush error
  Use BTRFS_DEV_STATE_FLUSH_FAILED so that we no longer need that
  dedicated member.

- Move @devt to the hole after dev_stat_values[]

This reduces the size of btrfs_device from 528 to exact 512 bytes for
x86_64.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: update comment for delalloc flush and oe wait in btrfs_clone_files()

Make the comment more detailed about why we need to flush delalloc and
wait for ordered extent completion before attempting to invalidate the
page cache.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove experimental offload csum mode

The offload csum mode was introduced to allow developers to compare the
performance of generating checksum for data writes at different timings:

- During btrfs_submit_chunk()
  This is the most common one, if any of the following condition is met
  we go this path:

  * The csum is fast
    For now it's CRC32C and xxhash.

  * It's a synchronous write

  * Zoned

- Delay the checksum generation to a workqueue

However since commit dd57c78aec39 ("btrfs: introduce
btrfs_bio::async_csum") we no longer need to bother any of them.

As if it's an experimental build, async checksum generation at the
background will be faster anyway.

And if not an experimental build, we won't even have the offload csum
mode support.

Considering the async csum will be the new default, let's remove the
offload csum mode code.

There will be no impact to end users, and offload csum mode is still
under experimental features.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: split btrfs_fs_closing() and change return type to bool

There are two tests in btrfs_fs_closing() but checking the
BTRFS_FS_CLOSING_DONE bit is done only in one place
load_extent_tree_free(). As this is an inline we can reduce size of the
generated code. The types can be also changed to bool as this becomes a
simple condition.

   text    data     bss     dec     hex filename
1674006  146704   15560 1836270  1c04ee pre/btrfs.ko
1673772  146704   15560 1836036  1c0404 post/btrfs.ko

DELTA: -234

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reject single block sized compression early

Currently for an inode that needs compression, even if there is a delalloc
range that is single fs block sized and can not be inlined, we will
still go through the compression path.

Then inside compress_file_range(), we have one extra check to reject
single block sized range, and fall back to regular uncompressed write.

This rejection is in fact a little too late, we have already allocated
memory to async_chunk, delayed the submission, just to fallback to the
same uncompressed write.

Change the behavior to reject such cases earlier at
inode_need_compress(), so for such single block sized range we won't
even bother trying to go through compress path.

And since the inline small block check is inside inode_need_compress()
and compress_file_range() also calls that function, we no longer need a
dedicate check inside compress_file_range().

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: update outdated comment in __add_block_group_free_space()

The function add_block_group_free_space() was renamed
btrfs_add_block_group_free_space() by commit 6fc5ef782988 ("btrfs:
add btrfs prefix to free space tree exported functions"). Update
the comment accordingly.

Do some reorganization of the next few lines to keep the comment
within 80 characters.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: add mount time auto fix for orphan fst entries

[BUG]
Before btrfs-progs v6.16.1 release, mkfs.btrfs can leave free space tree
entries for deleted chunks:

  # mkfs.btrfs -f -O fst $dev
  # btrfs ins dump-tree -t chunk $dev
  btrfs-progs v6.16
  chunk tree
  leaf 22036480 items 4 free space 15781 generation 8 owner CHUNK_TREE
  leaf 22036480 flags 0x1(WRITTEN) backref revision 1
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 13631488) itemoff 16105 itemsize 80
^^^ The first chunk is at 13631488
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15993 itemsize 112
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15881 itemsize 112

  # btrfs ins dump-tree -t free-space-tree $dev
  btrfs-progs v6.16
  free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
  leaf 30556160 items 13 free space 15918 generation 8 owner FREE_SPACE_TREE
  leaf 30556160 flags 0x1(WRITTEN) backref revision 1
item 0 key (1048576 FREE_SPACE_INFO 4194304) itemoff 16275 itemsize 8
free space info extent count 1 flags 0
item 1 key (1048576 FREE_SPACE_EXTENT 4194304) itemoff 16275 itemsize 0
free space extent
item 2 key (5242880 FREE_SPACE_INFO 8388608) itemoff 16267 itemsize 8
free space info extent count 1 flags 0
item 3 key (5242880 FREE_SPACE_EXTENT 8388608) itemoff 16267 itemsize 0
free space extent
^^^ Above 4 items are all before the first chunk.
item 4 key (13631488 FREE_SPACE_INFO 8388608) itemoff 16259 itemsize 8
free space info extent count 1 flags 0
item 5 key (13631488 FREE_SPACE_EXTENT 8388608) itemoff 16259 itemsize 0
free space extent
...

This can trigger btrfs check errors.

[CAUSE]
It's a bug in free space tree implementation of btrfs-progs, which
doesn't delete involved fst entries for the to-be-deleted chunk/block
group.

[ENHANCEMENT]
The mostly common fix is to clear the space cache and rebuild it, but
that requires a ro->rw remount which may not be possible for rootfs,
and also relies on users to use "clear_cache" mount option manually.

Here introduce a kernel fix for it, which will delete any entries that
is before the first block group automatically at the first RW mount.

For filesystems without such problem, the overhead is just a single tree
search and no modification to the free space tree, thus the overhead
should be minimal.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: simplify check for zoned NODATASUM writes in btrfs_submit_chunk()

This function already dereferences 'inode' multiple times earlier,
making the additional NULL check at line 840 redundant since the
function would have crashed already if inode were NULL.

After commit 81cea6cd7041 ("btrfs: remove btrfs_bio::fs_info by
extracting it from btrfs_bio::inode"), the btrfs_bio::inode field is
mandatory for all btrfs_bio allocations and is guaranteed to be
non-NULL.

Simplify the condition for allocating dummy checksums for zoned
NODATASUM data by removing the unnecessary 'inode &&' check.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: avoid transaction commit on error in insert_balance_item()

There's no point in committing the transaction if we failed to insert the
balance item, since we haven't done anything else after we started/joined
the transaction. Also stop using two variables for tracking the return
value and use only 'ret'.

Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: move unlikely checks around btrfs_is_shutdown() into the helper

Instead of surrounding every caller of btrfs_is_shutdown() with unlikely,
move the unlikely into the helper itself, like we do in other places in
btrfs and is common in the kernel outside btrfs too. Also make the fs_info
argument of btrfs_is_shutdown() const.

On a x86_84 box using gcc 14.2.0-19 from Debian, this resulted in a slight
reduction of the module's text size.

Before:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1939044 172568   15592 2127204 207564 fs/btrfs/btrfs.ko

After:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1938876 172568   15592 2127036 2074bc fs/btrfs/btrfs.ko

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: tag as unlikely error conditions in the transaction commit path

Errors are unexpected during the transaction commit path, and when they
happen we abort the transaction (by calling cleanup_transaction() under
the label 'cleanup_transaction' in btrfs_commit_transaction()). So mark
every error check in the transaction commit path as unlikely, to hint the
compiler so that it can possibly generate better code, and make it clear
for a reader about being unexpected.

On a x86_84 box using gcc 14.2.0-19 from Debian, this resulted in a slight
reduction of the module's text size.

Before:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1939476 172568   15592 2127636 207714 fs/btrfs/btrfs.ko

After:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1939044 172568   15592 2127204 207564 fs/btrfs/btrfs.ko

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove unreachable return after btrfs_backref_panic() in btrfs_backref_finish_upper_links()

The return statement after btrfs_backref_panic() is unreachable since
btrfs_backref_panic() calls BUG() which never returns. Remove the
return to unify it with the other calls to btrfs_backref_panic().

Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: refactor the main loop of cow_file_range()

Currently inside the main loop of cow_file_range(), we do the following
sequence:

- Reserve an extent
- Lock the IO tree range
- Create an IO extent map
- Create an ordered extent

Every step will need extra steps to do cleanup in the following order:

- Drop the newly created extent map
- Unlock extent range and cleanup the involved folios
- Free the reserved extent

However currently the error handling is done inconsistently:

- Extent map drop is handled in a dedicated tag
  Out of the main loop, make it much harder to track.

- The extent unlock and folios cleanup is done separately
  The extent is unlocked through btrfs_unlock_extent(), then
  extent_clear_unlock_delalloc() again in a dedicated tag.
  Meanwhile all other callsites (compression/encoded/nocow) all just
  call extent_clear_unlock_delalloc() to handle unlock and folio clean
  up in one go.

- Reserved extent freeing is handled in a dedicated tag
  Out of the main loop, make it much harder to track.

- Error handling of btrfs_reloc_clone_csums() is relying out-of-loop
  tags
  This is due to the special requirement to finish ordered extents to
  handle the metadata reserved space.

Enhance the error handling and align the behavior by:

- Introduce a dedicated cow_one_range() helper
  Which do the reserve/lock/allocation in the helper.

  And also handle the errors inside the helper.
  No more dedicated tags out of the main loop.

- Use a single extent_clear_unlock_delalloc() to unlock and cleanup
  folios

- Move the btrfs_reloc_clone_csums() error handling into the new helper
  Thankfully it's not that complex compared to other cases.

And since we're here, also reduce the width of the following local
variables to u32:

- cur_alloc_size
- min_alloc_size
  Each allocation won't go beyond 128M, thus u32 is more than enough.

- blocksize
  The maximum is 64K, no need for u64.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: zoned: print block-group type for zoned statistics

When printing the zoned statistics, also include the block-group type in
the block-group listing output.

The updated output looks as follows:

device /dev/vda mounted on /mnt with fstype btrfs
   zoned statistics:
         active block-groups: 9
           reclaimable: 0
           unused: 2
           need reclaim: false
         data relocation block-group: 3221225472
         active zones:
           start: 1073741824, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA)
           start: 1342177280, wp: 0 used: 0, reserved: 0, unusable: 0 (DATA)
           start: 1610612736, wp: 81920 used: 16384, reserved: 16384, unusable: 49152 (SYSTEM)
           start: 1879048192, wp: 2031616 used: 1458176, reserved: 65536, unusable: 507904 (METADATA)
           start: 2147483648, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA)
           start: 2415919104, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA)
           start: 2684354560, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA)
           start: 2952790016, wp: 65536 used: 65536, reserved: 0, unusable: 0 (DATA)
           start: 3221225472, wp: 0 used: 0, reserved: 0, unusable: 0 (DATA)

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: move space_info_flag_to_str() to space-info.h

Move space_info_flag_to_str() to space-info.h and as it now isn't static
to space-info.c any more prefix it with 'btrfs_'.

This way it can be re-used in other places.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: zoned: show statistics about zoned filesystems in mountstats

Add statistics output to /proc/<pid>/mountstats for zoned BTRFS, similar
to the zoned statistics from XFS in mountstats.

The output for /proc/<pid>/mountstats on an example filesystem will be as
follows:

  device /dev/vda mounted on /mnt with fstype btrfs
    zoned statistics:
          active block-groups: 7
            reclaimable: 0
            unused: 5
            need reclaim: false
          data relocation block-group: 1342177280
          active zones:
            start: 1073741824, wp: 268419072 used: 0, reserved: 268419072, unusable: 0
            start: 1342177280, wp: 0 used: 0, reserved: 0, unusable: 0
            start: 1610612736, wp: 49152 used: 16384, reserved: 16384, unusable: 16384
            start: 1879048192, wp: 950272 used: 131072, reserved: 622592, unusable: 196608
            start: 2147483648, wp: 212238336 used: 0, reserved: 212238336, unusable: 0
            start: 2415919104, wp: 0 used: 0, reserved: 0, unusable: 0
            start: 2684354560, wp: 0 used: 0, reserved: 0, unusable: 0

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't call btrfs_handle_fs_error() in btrfs_commit_transaction()

There's no need to call btrfs_handle_fs_error() as we are inside a
transaction and if we get an error we jump to the 'scrub_continue' label
and end up calling cleanup_transaction(), which aborts the transaction.

This is odd given that we have a transaction handle and that in the
transaction commit path any error makes us abort the transaction and
it's the only place that calls btrfs_handle_fs_error().

Remove the btrfs_handle_fs_error() call and replace it with an error
message so that if it happens we know what went wrong during the
transaction commit. Also annotate the condition in the if statement
with 'unlikely' since this is not expected to happen.

We've been wanting to remove btrfs_handle_fs_error(), so this removes
one user that does not even needs it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't call btrfs_handle_fs_error() in qgroup_account_snapshot()

There's no need to call btrfs_handle_fs_error() as we are inside a
transaction and we propagate the error returned from
btrfs_write_and_wait_transaction() to the caller and it ends going up the
call chain to btrfs_commit_transaction() (returned by the call to
create_pending_snapshots()), where we jump to the 'unlock_reloc' label
and end up calling cleanup_transaction(), which aborts the transaction.

This is odd given that we have a transaction handle and that in the
transaction commit path any error makes us abort the transaction and,
besides another place inside btrfs_commit_transaction(), it's the only
place that calls btrfs_handle_fs_error().

Remove the btrfs_handle_fs_error() call and replace it with an error
message so that if it happens we know what went wrong during the
transaction commit. Also annotate the condition in the if statement
with 'unlikely' since this is not expected to happen.

We've been wanting to remove btrfs_handle_fs_error(), so this removes
one user that does not even need it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't call btrfs_handle_fs_error() after failure to delete orphan item

In btrfs_find_orphan_roots() we don't need to call btrfs_handle_fs_error()
if we fail to delete the orphan item for the current root. This is because
we haven't done anything yet regarding the current root and previous
iterations of the loop dealt with other roots, so there's nothing we need
to undo. Instead log an error message and return the error to the caller,
which will result either in a mount failure or remount failure (the only
contexts it's called from).

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: don't call btrfs_handle_fs_error() after failure to join transaction

In btrfs_find_orphan_roots() we don't need to call btrfs_handle_fs_error()
if we fail to join a transaction. This is because we haven't done anything
yet regarding the current root and previous iterations of the loop dealt
with other roots, so there's nothing we need to undo. Instead log an error
message and return the error to the caller, which will result either in
a mount failure or remount failure (the only contexts it's called from).

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove redundant path release in btrfs_find_orphan_roots()

There's no need to release the path in the if branch used when the root
does not exists since we released the path before the call to
btrfs_get_fs_root(). So remove that redundant btrfs_release_path() call.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use single return variable in btrfs_find_orphan_roots()

We use both 'ret' and 'err' which is a pattern that generates confusion
and resulted in subtle bugs in the past. Remove 'err' and use only 'ret'.
Also move simplify the error flow by directly returning from the function
instead of breaking of the loop, since there are no resources to cleanup
after the loop.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: avoid transaction commit on error in del_balance_item()

There's no point in committing the transaction if we failed to delete the
item, since we haven't done anything before. Also stop using two variables
for tracking the return value and use only 'ret'.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: update stale comment in __cow_file_range_inline()

We mention that the reserved data space is page size aligned but that's
not true anymore, as it's sector size aligned instead.
In commit 0bb067ca64e3 ("btrfs: fix the qgroup data free range for inline
data extents") we updated the amount passed to btrfs_qgroup_free_data()
from page size to sector size, but forgot to update the comment.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove duplicated root key setup in btrfs_create_tree()

There's no need for an on stack key to define the root's key as we have
already defined the key in the root itself. So remove the stack variable
and use the key in the root.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: zoned: re-flow prepare_allocation_zoned()

Re-flow prepare allocation zoned to make it a bit more readable by
returning early and removing unnecessary indentations.

This patch does not change any functionality.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: shrink the size of btrfs_bio

This is done by:

- Shrink the size of btrfs_bio::mirror_num
  From 32 bits unsigned int to u16.

  Normally btrfs mirror number is either 0 (all profiles), 1 (all
  profiles), 2 (DUP/RAID1/RAID10/RAID5), 3 (RAID1C3) or 4 (RAID1C4).

  But for RAID6 the mirror number can go as large as the number of
  devices of that chunk.

  Currently the limit for number of devices for a data chunk is
  BTRFS_MAX_DEVS(), which is around 500 for the default 16K nodesize.
  And if going the max 64K nodesize, we can have a little over 2000
  devices for a chunk.

  Although I'd argue it's way overkilled, we don't reject such cases yet
  thus u8 is not going to cut it, and have to use u16 (max out at 64K).

- Use bit fields for boolean members
  Although it's not always safe for racy call sites, those members are
  safe.

  * csum_search_commit_root
  * is_scrub
    Those two are set immediately after bbio allocation and no more
    writes after allocation, thus they are very safe.

  * async_csum
  * can_use_append
    Those two are set for each split range, and after that there is no
    writes into those two members in different threads, thus they are
    also safe.

  And there are spaces for 4 more bits before increasing the size of
  btrfs_bio again, which should be future proof enough.

- Reorder the structure members
  Now we always put the largest member first (after the huge 120 bytes
  union), making it easier to fill any holes.

This reduce the size of btrfs_bio by 8 bytes, from 312 bytes to 304 bytes.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove ASSERT compatibility for gcc < 8.x

The minimum gcc version is 8 since 118c40b7b50340 ("kbuild: require
gcc-8 and binutils-2.30"), the workaround for missing __VA_OPT__ support
is not needed.

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: pass level to _btrfs_printk() to avoid parsing level from string

There's code in _btrfs_printk() to parse the message level from the
input string so we can augment the message with the level description
for better visibility in the logs.

The parsing code has evolved over time, see commits:

- 40f7828b36e3b9 ("btrfs: better handle btrfs_printk() defaults")

- 262c5e86fec7cf ("printk/btrfs: handle more message headers")

- 533574c6bc30cf ("btrfs: use printk_get_level and printk_skip_level, add __printf, fix fallout")

- 4da35113426d16 ("btrfs: add varargs to btrfs_error")

As we are using the specific level helpers everywhere we can simply pass
the message level so we don't have to parse it. The proper printk()
message header is created as KERN_SOH + "level".

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: simplify internal btrfs_printk helpers

The printk() can be compiled out depending on CONFIG_PRINTK, this is
reflected in our helpers. The indirection is provided by btrfs_printk()
used in the ratelimited and RCU wrapper macros.

Drop the btrfs_printk() helper and define the ratelimit and RCU helpers
directly when CONFIG_PRINTK is undefined. This will allow further
changes to the _btrfs_printk() interface (which is internal), any
message in other code should use the level-specific helpers.

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: rename btrfs_create_block_group_cache to btrfs_create_block_group

struct btrfs_block_group used to be called struct
btrfs_block_group_cache but got renamed to btrfs_block_group with
commit 32da5386d9a4 ("btrfs: rename btrfs_block_group_cache").

Rename btrfs_create_block_group_cache() to btrfs_create_block_group() to
reflect that change.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: merge setting ret and return ret

In many places we have pattern:

ret = ...;
return ret;

This can be simplified to a direct return, removing 'ret' if not
otherwise needed. The places in self tests are not converted so we can
add more test cases without changing surrounding code
(extent-map-tests.c:test_case_4()).

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove dead assignment in prepare_one_folio()

In prepare_one_folio(), ret is initialized to 0 at declaration,
and in an error path we assign ret = 0 before jumping to the
again label to retry the operation. However, ret is immediately
overwritten by ret = set_folio_extent_mapped(folio) after the
again label.

Both assignments are never observed by any code path,
therefore they can be safely removed.

Signed-off-by: Massimiliano Pellizzer <mpellizzer.dev@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: replace for_each_set_bit() with for_each_set_bitmap()

Inside extent_io.c, there are several simple call sites doing things
like:

for_each_set_bit(bit, bitmap, bitmap_size) {
/* handle one fs block */
}

The workload includes:

- set_bit()
  Inside extent_writepage_io().

  This can be replaced with a bitmap_set().

- btrfs_folio_set_lock()
- btrfs_mark_ordered_io_finished()
  Inside writepage_delalloc().

  Instead of calling it multiple times, we can pass a range into the
  function with one call.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: concentrate the error handling of submit_one_sector()

Currently submit_one_sector() has only one failure path from
btrfs_get_extent().

However the error handling is split into two parts, one inside
submit_one_sector(), which clears the dirty flag and finishes the
writeback for the fs block.

The other part is to submit any remaining bio inside bio_ctrl and mark
the ordered extent finished for the fs block.

There is no special reason that we must split the error handling, let's
just concentrate all the error handling into submit_one_sector().

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: search for larger extent maps inside btrfs_do_readpage()

[CORNER CASE]
If we have the following file extents layout, btrfs_get_extent() can
return a smaller hole during read, and cause unnecessary extra tree
searches:

item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)

item 7 key (257 EXTENT_DATA 32768) itemoff 15757 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 13635584 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)

In above case, range [0, 4K) and [32K, 36K) are regular extents, and
there is a hole in range [4K, 32K), and the fs has "no-holes" feature,
meaning the hole will not have a file extent item.

[INEFFICIENCY]
Assume the system has 4K page size, and we're doing readahead for range
[4K, 32K), no large folio yet.

btrfs_readahead() for range [4K, 32K)
|- btrfs_do_readpage() for folio 4K
|  |- get_extent_map() for range [4K, 8K)
|     |- btrfs_get_extent() for range [4K, 8K)
|        We hit item 6, then for the next item 7.
|        At this stage we know range [4K, 32K) is a hole.
|        But our search range is only [4K, 8K), not reaching 32K, thus
|        we go into not_found: tag, returning a hole em for [4K, 8K).
|
|- btrfs_do_readpage() for folio 8K
|  |- get_extent_map() for range [8K, 12K)
|     |- btrfs_get_extent() for range [8K, 12K)
|        We hit the same item 6, and then item 7.
|        But still we goto not_found tag, inserting a new hole em,
|        which will be merged with previous one.
|
| [ Repeat the same btrfs_get_extent() calls until the end ]

So we're calling btrfs_get_extent() again and again, just for a
different part of the same hole range [4K, 32K).

[ENHANCEMENT]
Make btrfs_do_readpage() to search for a larger extent map if readahead
is involved.

For btrfs_readahead() we have bio_ctrl::ractl set, and lock extents for
the whole readahead range.

If we find bio_ctrl::ractl is set, we can use that end range as extent
map search end, this allows btrfs_get_extent() to return a much larger
hole, thus reduce the need to call btrfs_get_extent() again and again.

btrfs_readahead() for range [4K, 32K)
|- btrfs_do_readpage() for folio 4K
|  |- get_extent_map() for range [4K, 32K)
|     |- btrfs_get_extent() for range [4K, 32K)
|        We hit item 6, then for the next item 7.
|        At this stage we know range [4K, 32K) is a hole.
|        So the hole em for range [4K, 32K) is returned.
|
|- btrfs_do_readpage() for folio 8K
|  |- get_extent_map() for range [8K, 32K)
|     The cached hole em range [4K, 32K) covers the range,
|     and reuse that em.
|
| [ Repeat the same btrfs_get_extent() calls until the end ]

Now we only call btrfs_get_extent() once for the whole range [4K, 32K),
other than the old 8 times.

Such change will reduce the overhead of reading large holes a little.
For current experimental build (with larger folios) on aarch64, there
will be a tiny but consistent ~1% improvement reading a large hole file:

Reading a 1GiB sparse file (all hole) using xfs_io, with 64K block
size, the result is the time needed to read the whole file, reported
from xfs_io.

32 runs, experimental build (with large folios).

64K page size, 4K fs block size.

- Avg before: 0.20823 s
- Avg after:  0.20635 s
- Diff:   -0.9%

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: introduce BTRFS_PATH_AUTO_RELEASE() helper

There are already several bugs with on-stack btrfs_path involved, even
it is already a little safer than btrfs_path pointers (only leaks the
extent buffers, not the btrfs_path structure itself)

- Patch "btrfs: make sure extent and csum paths are always released in
  scrub_raid56_parity_stripe()"

- Patch "btrfs: fix a potential path leak in print_data_reloc_error()"

Thus there is a real need to apply auto release for those on-stack paths.

Introduces a new macro, BTRFS_PATH_AUTO_RELEASE() which defines one
on-stack btrfs_path structure, initialize it all to 0, then call
btrfs_release_path() on it when exiting the scope.

This applies to current 3 on-stack path usages:

- defrag_get_extent() in defrag.c

- print_data_reloc_error() in inode.c
  There is a special case where we want to release the path early before
  the time consuming iterate_extent_inodes() call, thus that manual
  early release is kept as is, with an extra comment added.

- scrub_radi56_parity_stripe() in scrub.c

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: enable direct IO for bs > ps cases

Previously direct IO was disabled if the fs block size was larger than
the page size, the reasons are:

- Iomap direct IO can split the range ignoring the fs block alignment
  Which could trigger the bio size check from btrfs_submit_bio().

- The buffer is only ensured to be contiguous in user space memory
  The underlying physical memory is not ensured to be contiguous, and
  that can cause problems for the checksum generation/verification and
  RAID56 handling.

However the above problems are solved by the following upstream commits:

- 001397f5ef49 ("iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag")
  Which added an extra flag that can be utilized by the fs to ensure
  the bio submitted by iomap is always aligned to fs block size.

- ec20799064c8 ("btrfs: enable encoded read/write/send for bs > ps cases")
- 8870dbeedcf9 ("btrfs: raid56: enable bs > ps support")
  Which makes btrfs to handle bios that are not backed by large folios
  but still are aligned to fs block size.

As the commits have been merged we can enable direct IO support for
bs > ps cases.

Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: switch to library APIs for checksums

Make btrfs use the library APIs instead of crypto_shash, for all
checksum computations.  This has many benefits:

- Allows future checksum types, e.g. XXH3 or CRC64, to be more easily
  supported.  Only a library API will be needed, not crypto_shash too.

- Eliminates the overhead of the generic crypto layer, including an
  indirect call for every function call and other API overhead.  A
  microbenchmark of btrfs_check_read_bio() with crc32c checksums shows a
  speedup from 658 cycles to 608 cycles per 4096-byte block.

- Decreases the stack usage of btrfs by reducing the size of checksum
  contexts from 384 bytes to 240 bytes, and by eliminating the need for
  some functions to declare a checksum context at all.

- Increases reliability.  The library functions always succeed and
  return void.  In contrast, crypto_shash can fail and return errors.
  Also, the library functions are guaranteed to be available when btrfs
  is loaded; there's no longer any need to use module softdeps to try to
  work around the crypto modules sometimes not being loaded.

- Fixes a bug where blake2b checksums didn't work on kernels booted with
  fips=1.  Since btrfs checksums are for integrity only, it's fine for
  them to use non-FIPS-approved algorithms.

Note that with having to handle 4 algorithms instead of just 1-2, this
commit does result in a slightly positive diffstat.  That being said,
this wouldn't have been the case if btrfs had actually checked for
errors from crypto_shash, which technically it should have been doing.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: zoned: don't zone append to conventional zone

In case of a zoned RAID, it can happen that a data write is targeting a
sequential write required zone and a conventional zone. In this case the
bio will be marked as REQ_OP_ZONE_APPEND but for the conventional zone,
this needs to be REQ_OP_WRITE.

The setting of REQ_OP_ZONE_APPEND is deferred to the last possible time in
btrfs_submit_dev_bio(), but the decision if we can use zone append is
cached in btrfs_bio.

CC: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: e9b9b911e03c ("btrfs: add raid stripe tree to features enabled with debug config")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: relax squota parent qgroup deletion rule

Currently, with squotas, we do not allow removing a parent qgroup with
no members if it still has usage accounted to it. This makes it really
difficult to recover from accounting bugs, as we have no good way of
getting back to 0 usage.

Instead, allow deletion (it's safe at 0 members..) while still warning
about the inconsistency by adding a squota parent check.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: check squota parent usage on membership change

We could have detected the quick inherit bug more directly if we had
an extra warning about squota hierarchy consistency while modifying the
hierarchy. In squotas, the parent usage always simply adds up to the sum of
its children, so we can just check for that when changing membership and
detect more accounting bugs.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: simplify boolean argument for btrfs_inc_ref()/btrfs_dec_ref()

Replace open-coded if/else blocks with the boolean directly and introduce
local const bool variables, making the code shorter and easier to read.

Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use true/false for boolean parameters in btrfs_inc_ref()/btrfs_dec_ref()

Replace integer literals 0/1 with true/false when calling
btrfs_inc_ref() and btrfs_dec_ref() to make the code self-documenting
and avoid mixing bool/integer types.

Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: update comment for visit_node_for_delete()

Drop the obsolete @refs parameter from the comment so the argument list
matches the current function signature after commit f8c4d59de23c9
("btrfs: drop unused parameter refs from visit_node_for_delete()").

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Linux 6.19-rc8

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Small changes in drivers only, no core changes.

  The firewire one fixes a user controlled overflow (but I still can't
  see how it could be exploited)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: amd-versal2: Fix PHY initialization in HCE enable notify
  scsi: firewire: sbp-target: Fix overflow in sbp_make_tpg()
  scsi: be2iscsi: Fix a memory leak in beiscsi_boot_get_sinfo()
  scsi: qla2xxx: edif: Fix dma_free_coherent() size

Merge tag 'perf-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events fix from Ingo Molnar:
"Fix a race in the user-callchains code"

* tag 'perf-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: sched: Fix perf crash with new is_user_task() helper

Merge tag 'sched-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
"Fix a regression in the deferrable dl_server code that can cause the
dl_server to be stuck"

* tag 'sched-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix 'stuck' dl_server

Merge tag 'objtool-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar:

- Fix a build error on ia32-x86_64 cross builds

- Replace locally open coded ALIGN_UP(), ALIGN_UP_POW2()
   and MAX(), which, beyond being duplicates, the
   ALIGN_UP_POW2() is also buggy

- Fix objtool klp-diff regression caused by a recent
   change to the bug table format

- Fix klp-build vs CONFIG_MODULE_SRCVERSION_ALL build
   failure

* tag 'objtool-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  livepatch/klp-build: Fix klp-build vs CONFIG_MODULE_SRCVERSION_ALL
  objtool/klp: Fix bug table handling for __WARN_printf()
  objtool: Replace custom macros in elf.c with shared ones
  objtool: Print bfd_vma as unsigned long long on ia32-x86_64 cross build

Merge tag 'irq-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:
"Misc irqchip fixes:

   - Fix a regression in the ls-extirq irqchip driver

   - Fix an irqchip platform enumeration regression
     in the simple-pm-bus driver"

* tag 'irq-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  bus: simple-pm-bus: Probe the Layerscape SCFG node
  irqchip/ls-extirq: Convert to a platform driver to make it work again

Merge tag 'iommu-fixes-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

- Fix a performance regression cause by the new Generic IO-Page-Table
   code detected in Intel VT-d driver

- Command queue flushing fix for NVidia version of the ARM-SMMU-v3

* tag 'iommu-fixes-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user()
  iommupt: Only cache flush memory changed by unmap

Merge tag 'efi-fixes-for-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:

- Fix regression in efivarfs error propagation

* tag 'efi-fixes-for-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivarfs: fix error propagation in efivar_entry_get()

Merge tag 'sound-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Just a few device-specific fixes; all small and mostly trivial, should
  be pretty safe to take at the late stage"

* tag 'sound-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: sof_sdw: Add a quirk for Lenovo laptop using sidecar amps with cs42l43
  ASoC: Intel: sof_es8336: fix headphone GPIO logic inversion
  ASoC: amd: yc: Add DMI quirk for Acer TravelMate P216-41-TCO
  ASoC: soc-acpi-intel-ptl-match: fix name_prefix of rt1320-2
  ALSA: hda/realtek: Add quirk for Inspur S14-G1
  ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machine
  ALSA: hda/realtek: Really fix headset mic for TongFang X6AR55xU.
  ALSA: hda/realtek - fixed speaker no sound
  ASoC: amd: yc: Add ASUS ExpertBook PM1503CDA to quirks list
  ASoC: fsl: imx-card: Do not force slot width to sample width
  ASoC: dt-bindings: fsl,sai: Add support for i.MX952 platform
  ASoC: cs35l45: Corrects ASP_TX5 DAPM widget channel

Merge tag 'kbuild-fixes-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nicolas Schier:

- Generate rpm-pkg debuginfo package manually, allowing signed kernel
   modules in rpm package, again

- Fix permissions of modules.builtin.modinfo

- Do not run kernel-doc when building external modules

* tag 'kbuild-fixes-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: Do not run kernel-doc when building external modules
  kbuild: Fix permissions of modules.builtin.modinfo
  kbuild: rpm-pkg: Generate debuginfo package manually

kbuild: Do not run kernel-doc when building external modules

After commit 778b8ebe5192 ("docs: Move the python libraries to
tools/lib/python"), building an external module with any value of W=
against the output of install-extmod-build fails with:

  $ make -C /usr/lib/modules/6.19.0-rc7-00108-g4d310797262f/build M=$PWD W=1
  make: Entering directory '/usr/lib/modules/6.19.0-rc7-00108-g4d310797262f/build'
  make[1]: Entering directory '...'
    CC [M] ...
  Traceback (most recent call last):
    File "/usr/lib/modules/6.19.0-rc7-00108-g4d310797262f/build/scripts/kernel-doc.py", line 339, in <module>
      main()
      ~~~~^^
    File "/usr/lib/modules/6.19.0-rc7-00108-g4d310797262f/build/scripts/kernel-doc.py", line 295, in main
      from kdoc.kdoc_files import KernelFiles             # pylint: disable=C0415
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ModuleNotFoundError: No module named 'kdoc'

scripts/lib was included in the build directory from find_in_scripts but
after the move to tools/lib/python, it is no longer included, breaking
kernel-doc.py.

Commit eba6ffd126cd ("docs: kdoc: move kernel-doc to tools/docs") breaks
this even further by moving kernel-doc outside of scripts as well, so it
cannot be found when called by cmd_checkdoc.

  $ make -C /usr/lib/modules/6.19.0-rc7-next-20260130/build M=$PWD W=1
  make: Entering directory '/usr/lib/modules/6.19.0-rc7-next-20260130/build'
  make[1]: Entering directory '...'
    CC [M]  ...
  python3: can't open file '/usr/lib/modules/6.19.0-rc7-next-20260130/build/tools/docs/kernel-doc': [Errno 2] No such file or directory

While kernel-doc could be useful for external modules, it is more useful
for in-tree documentation that will be build and included in htmldocs.
Rather than including it in install-extmod-build, just skip running
kernel-doc for the external module build.

Cc: stable@vger.kernel.org
Fixes: 778b8ebe5192 ("docs: Move the python libraries to tools/lib/python")
Reported-by: Rong Zhang <i@rong.moe>
Closes: https://lore.kernel.org/20260129175321.415295-1-i@rong.moe/
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260130-kbuild-skip-kernel-doc-extmod-v1-1-58443d60131a@kernel.org
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Signed-off-by: Nicolas Schier <nsc@kernel.org>

iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user()

The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset
the HW registers. So, the driver explicitly clears all the registers when a
VINTF or VCMDQ is being initialized calling its hw_deinit() function.

However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ
getting reset in tegra241_vcmdq_hw_init().

Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will
not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF
at that stage.

Then, this may result in dirty VCMDQ registers, which can fail the VM.

Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to
fix this bug. This is required by a host kernel.

Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support")
Cc: stable@vger.kernel.org
Reported-by: Bao Nguyen <ncqb@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fix from Jason Gunthorpe:
"One fix for a harmless KMSAN splat"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Initialize batch->kind in batch_clear()

Merge tag 'firewire-fixes-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
"Fix a race condition introduced in v6.18.

  Andreas Persson discovered this issue while working with Focusrite
  Saffire Pro 40 (TCD33070). The fw_card instance maintains a linked
  list of pending transactions, which must be protected against
  concurrent access.

  However, a commit b5725cfa4120 ("firewire: core: use spin lock
  specific to timer for split transaction") unintentionally allowed
  concurrent accesses to this list.

  Fix this by adjusting the relevant critical sections to properly
  serialize access"

* tag 'firewire-fixes-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: fix race condition against transaction list

Merge tag 'riscv-for-linus-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

- Correct the RISC-V compat.h COMPAT_UTS_MACHINE architecture name

- Avoid printing a false warning message on kernels with the SiFive and
   MIPS errata compiled in

- Address a few warnings generated by sparse in the signal handling
   code

- Fix a comment typo

* tag 'riscv-for-linus-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: compat: fix COMPAT_UTS_MACHINE definition
  errata/sifive: remove unreliable warn_miss_errata
  riscv: fix minor typo in syscall.h comment
  riscv: signal: fix some warnings reported by sparse

Merge tag 'rust-fixes-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull Rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:

   - Trigger rebuilds of the newly added 'proc-macro2' crate (and its
     dependencies) when the Rust compiler version changes

   - Fix error in '.rsi' targets (macro expanding single targets) under
     'O=' pointing to an external (not subdir) folder

   - Fix off-by-one line number in 'rustdoc' KUnit tests

   - Add '-fdiagnostics-show-context' to GCC flags skipped by 'bindgen'

   - Clean objtool warning by adding one more 'noreturn' function

   - Clean 'libpin_init_internal.{so,dylib}' in 'mrproper'

  'kernel' crate:

   - Fix build error when using expressions in formatting arguments

   - Mark 'num::Bounded::__new()' as unsafe and clean documentation
     accordingly

   - Always inline functions using 'build_assert' with arguments

   - Fix 'rusttest' build error providing the right 'isize_atomic_repr'
     type for the host

  'macros' crate:

   - Fix 'rusttest' build error by ignoring example

  rust-analyzer:

   - Remove assertion that was not true for distributions like NixOS

   - Add missing dependency edges and fix editions for 'quote' and
     sysroot crates to provide correct IDE support

  DRM Tyr:

   - Fix build error by adding missing dependency on 'CONFIG_COMMON_CLK'

  Plus clean a few typos in docs and comments"

* tag 'rust-fixes-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (28 commits)
  rust: num: bounded: clean __new documentation and comments
  scripts: generate_rust_analyzer: fix resolution of #[pin_data] macros
  drm/tyr: depend on `COMMON_CLK` to fix build error
  rust: sync: atomic: Provide stub for `rusttest` 32-bit hosts
  kbuild: rust: clean libpin_init_internal in mrproper
  rust: proc-macro2: rebuild if the version text changes
  rust: num: bounded: add missing comment for always inlined function
  rust: sync: refcount: always inline functions using build_assert with arguments
  rust: bits: always inline functions using build_assert with arguments
  scripts: generate_rust_analyzer: compile sysroot with correct edition
  scripts: generate_rust_analyzer: compile quote with correct edition
  scripts: generate_rust_analyzer: quote: treat `core` and `std` as dependencies
  scripts: generate_rust_analyzer: syn: treat `std` as a dependency
  scripts: generate_rust_analyzer: remove sysroot assertion
  rust: kbuild: give `--config-path` to `rustfmt` in `.rsi` target
  scripts: generate_rust_analyzer: Add pin_init_internal deps
  scripts: generate_rust_analyzer: Add pin_init -> compiler_builtins dep
  scripts: generate_rust_analyzer: Add compiler_builtins -> core dep
  rust: macros: ignore example with module parameters
  rust: num: bounded: mark __new as unsafe
  ...

perf: sched: Fix perf crash with new is_user_task() helper

In order to do a user space stacktrace the current task needs to be a user
task that has executed in user space. It use to be possible to test if a
task is a user task or not by simply checking the task_struct mm field. If
it was non NULL, it was a user task and if not it was a kernel task.

But things have changed over time, and some kernel tasks now have their
own mm field.

An idea was made to instead test PF_KTHREAD and two functions were used to
wrap this check in case it became more complex to test if a task was a
user task or not[1]. But this was rejected and the C code simply checked
the PF_KTHREAD directly.

It was later found that not all kernel threads set PF_KTHREAD. The io-uring
helpers instead set PF_USER_WORKER and this needed to be added as well.

But checking the flags is still not enough. There's a very small window
when a task exits that it frees its mm field and it is set back to NULL.
If perf were to trigger at this moment, the flags test would say its a
user space task but when perf would read the mm field it would crash with
at NULL pointer dereference.

Now there are flags that can be used to test if a task is exiting, but
they are set in areas that perf may still want to profile the user space
task (to see where it exited). The only real test is to check both the
flags and the mm field.

Instead of making this modification in every location, create a new
is_user_task() helper function that does all the tests needed to know if
it is safe to read the user space memory or not.

[1] https://lore.kernel.org/all/20250425204120.639530125@goodmis.org/

Fixes: 90942f9fac05 ("perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL")
Closes: https://lore.kernel.org/all/0d877e6f-41a7-4724-875d-0b0a27b8a545@roeck-us.net/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260129102821.46484722@gandalf.local.home

sched/deadline: Fix 'stuck' dl_server

Andrea reported the dl_server getting stuck for him. He tracked it
down to a state where dl_server_start() saw dl_defer_running==1, but
the dl_server's job is no longer valid at the time of
dl_server_start().

In the state diagram this corresponds to [4] D->A (or dl_server_stop()
due to no more runnable tasks) followed by [1], which in case of a
lapsed deadline must then be A->B.

Now our A has dl_defer_running==1, while B demands
dl_defer_running==0, therefore it must get cleared when the CBS wakeup
rules demand a replenish.

Fixes: a110a81c52a9 ("sched/deadline: Deferrable dl server")
Reported-by: Andrea Righi arighi@nvidia.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Andrea Righi arighi@nvidia.com
Link: https://lkml.kernel.org/r/20260123161645.2181752-1-arighi@nvidia.com
Link: https://patch.msgid.link/20260130124100.GC1079264@noisy.programming.kicks-ass.net

Merge tag 'block-6.19-20260130' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Fix for an accounting leak in bcache that's been there forever,
   and a related dead code removal

- Revert of a fix for rnbd that went into this series, but depends
   on other changes that are staged for 7.0

- NVMe pull request via Keith:
      - TCP target completion race condition fix (Ming)
      - DMA descriptor cleanup fix (Roger)

* tag 'block-6.19-20260130' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  bcache: fix I/O accounting leak in detached_dev_do_request
  bcache: remove dead code in detached_dev_do_request
  nvme-pci: DMA unmap the correct regions in nvme_free_sgls
  Revert "rnbd-clt: fix refcount underflow in device unmap path"
  nvmet: fix race in nvmet_bio_done() leading to NULL pointer dereference

Merge tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fixes from Marek Szyprowski:

- important fix for ARM 32-bit based systems using cma= kernel
   parameter (Oreoluwa Babatunde)

- a fix for the corner case of the DMA atomic pool based allocations
   (Sai Sree Kartheek Adivi)

* tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma/pool: distinguish between missing and exhausted atomic pools
  of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel param

Merge tag 'gpio-fixes-for-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
"Over the last week I received quite an unexpected (for rc7) number of
  fixes but they are all pretty small and mostly limited to drivers:

   - don't call into pinctrl when setting direction in gpio-rockchip as
     it's not needed and may trigger locking context errors

   - change spinlock to raw_spinlock in gpio-sprd

   - fix a use-after-free bug in gpio-virtuser

   - don't register a driver from another driver's probe() in gpio-omap

   - fix int width problems in GPIO ACPI code

   - fix interrupt-to-pin mapping in gpio-brcmstb

   - mask interrupts in irq shutdown in gpio-pca953x"

* tag 'gpio-fixes-for-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: acpi: Fix potential out-of-boundary left shift
  gpio: brcmstb: correct hwirq to bank map
  gpio: omap: do not register driver in probe()
  gpio: pca953x: mask interrupts in irq shutdown
  gpio: virtuser: fix UAF in configfs release path
  gpiolib: acpi: use BIT_ULL() for u64 mask in address space handler
  gpio: sprd: Change sprd_gpio lock to raw_spin_lock
  gpio: rockchip: Stop calling pinctrl for set_direction

Merge tag 'drm-fixes-2026-01-30' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Seems to be a bit quieter this week, mostly xe and amdgpu, with msm
  and imx fixes and one WARN_ON from user blocked. Nothing of note
  outstanding either.

  uapi:
   - Fix a WARN_ON() when passing an invalid handle to
     drm_gem_change_handle_ioctl()

  msm:
   - GPU:
      - Fix bogus hwcg register update for a690

  xe:
   - Skip address copy for sync-only execs
   - Fix a WA
   - Derive mem_copy cap from graphics version
   - Fix is_bound() pci_dev lifetime
   - xe nvm cleanup fixes

  amdgpu:
   - SMU 13 fixes
   - SMU 14 fixes
   - GPUVM fault filter fix
   - Powergating fix
   - HDMI debounce fix
   - Xclk fix for soc21 APUs
   - Fix COND_EXEC handling for GC 11
   - GC 10-12 KGQ init fixes
   - GC 11-12 KGQ reset fixes

  imx/tve:
   - drop ddc device reference when unloading"

* tag 'drm-fixes-2026-01-30' of https://gitlab.freedesktop.org/drm/kernel: (21 commits)
  drm/xe/nvm: Fix double-free on aux add failure
  drm/xe/nvm: Manage nvm aux cleanup with devres
  drm/amdgpu/gfx12: adjust KGQ reset sequence
  drm/amdgpu/gfx11: adjust KGQ reset sequence
  drm/amdgpu/gfx12: fix wptr reset in KGQ init
  drm/amdgpu/gfx11: fix wptr reset in KGQ init
  drm/amdgpu/gfx10: fix wptr reset in KGQ init
  drm/xe/configfs: Fix is_bound() pci_dev lifetime
  drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule()
  drm/amdgpu/soc21: fix xclk for APUs
  drm/amd/display: Clear HDMI HPD pending work only if it is enabled
  drm/imx/tve: fix probe device leak
  drm/amd/pm: fix race in power state check before mutex lock
  drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove
  drm/amd/pm: fix smu v14 soft clock frequency setting issue
  drm/amd/pm: fix smu v13 soft clock frequency setting issue
  drm/xe: derive mem copy capability from graphics version
  drm/xe/xelp: Fix Wa_18022495364
  drm/xe: Skip address copy for sync-only execs
  drm: Do not allow userspace to trigger kernel warnings in drm_gem_change_handle_ioctl()
  ...

Merge tag 'drm-misc-fixes-2026-01-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.19-rc8:
- Fix a WARN_ON() when passing an invalid handle to
drm_gem_change_handle_ioctl()
- drop ddc device reference when unloading in imx/tve.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/a34d967e-b111-4b29-8c97-af3e77b5d33e@linux.intel.com

Merge tag 'amd-drm-fixes-6.19-2026-01-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.19-2026-01-29:

amdgpu:
- SMU 13 fixes
- SMU 14 fixes
- GPUVM fault filter fix
- Powergating fix
- HDMI debounce fix
- Xclk fix for soc21 APUs
- Fix COND_EXEC handling for GC 11
- GC 10-12 KGQ init fixes
- GC 11-12 KGQ reset fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260129212518.22274-1-alexander.deucher@amd.com

Merge tag 'drm-xe-fixes-2026-01-29' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Skip address copy for sync-only execs (Lin)
- Fix a WA (Tvrtko)
- Derive mem_copy cap from graphics version (Nitin)
- Fix is_bound() pci_dev lifetime (Lin)
- xe nvm cleanup fixes (Lin)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aXu_JzBFb9YVFYW1@fedora

Merge tag 'pm-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"This adds a terminating NULL entry to an of_device_id table in the
qcom-nvmem cpufreq driver to avoid out-of-bounds access (Pei Xiao)"

* tag 'pm-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: qcom-nvmem: add sentinel to qcom_cpufreq_ipq806x_match_list

Merge tag 'mtd/fixes-for-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fix from Miquel Raynal:
"A single late MTD fix, which reverts a fix that turned out to be
  incorrect.

  The observations of the committer was that the number of IDs to be
  used to probe a chip was incorrect. It happened to be a limitation of
  his controller, not a chip issue. Restore the chip description, a
  solution must be found somewhere else"

* tag 'mtd/fixes-for-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  Revert "mtd: spinand: esmt: fix id code for F50D1G41LB"

drm/xe/nvm: Fix double-free on aux add failure

After a successful auxiliary_device_init(), aux_dev->dev.release
(xe_nvm_release_dev()) is responsible for the kfree(nvm). When
there is failure with auxiliary_device_add(), driver will call
auxiliary_device_uninit(), which call put_device(). So that the
.release callback will be triggered to free the memory associated
with the auxiliary_device.

Move the kfree(nvm) into the auxiliary_device_init() failure path
and remove the err goto path to fix below error.

"
[   13.232905] ==================================================================
[   13.232911] BUG: KASAN: double-free in xe_nvm_init+0x751/0xf10 [xe]
[   13.233112] Free of addr ffff888120635000 by task systemd-udevd/273

[   13.233120] CPU: 8 UID: 0 PID: 273 Comm: systemd-udevd Not tainted 6.19.0-rc2-lgci-xe-kernel+ #225 PREEMPT(voluntary)
...
[   13.233125] Call Trace:
[   13.233126]  <TASK>
[   13.233127]  dump_stack_lvl+0x7f/0xc0
[   13.233132]  print_report+0xce/0x610
[   13.233136]  ? kasan_complete_mode_report_info+0x5d/0x1e0
[   13.233139]  ? xe_nvm_init+0x751/0xf10 [xe]
...
"

v2: drop err goto path. (Alexander)

Fixes: 7926ba2143d8 ("drm/xe: defer free of NVM auxiliary container to device release callback")
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Suggested-by: Brian Nguyen <brian3.nguyen@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260120183239.2966782-7-shuicheng.lin@intel.com
(cherry picked from commit a3187c0c2bbd947ffff97f90d077ac88f9c2a215)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/nvm: Manage nvm aux cleanup with devres

Move nvm teardown to a devm-managed action registered from xe_nvm_init().
This ensures the auxiliary NVM device is deleted on probe failure and
device detach without requiring explicit calls from remove paths.

As part of this, drop xe_nvm_fini() from xe_device_remove() and from the
survivability sysfs teardown, and remove the public xe_nvm_fini() API from
the header.

This is to fix below warn message when there is probe failure after
xe_nvm_init(), then xe_device_probe() is called again:
"
[  207.318152] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/xe.nvm.768'
[  207.318157] CPU: 5 UID: 0 PID: 10261 Comm: modprobe Tainted: G    B   W           6.19.0-rc2-lgci-xe-kernel+ #223 PREEMPT(voluntary)
[  207.318160] Tainted: [B]=BAD_PAGE, [W]=WARN
[  207.318161] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
[  207.318163] Call Trace:
[  207.318163]  <TASK>
[  207.318165]  dump_stack_lvl+0xa0/0xc0
[  207.318170]  dump_stack+0x10/0x20
[  207.318171]  sysfs_warn_dup+0xd5/0x110
[  207.318175]  sysfs_create_dir_ns+0x1f6/0x280
[  207.318177]  ? __pfx_sysfs_create_dir_ns+0x10/0x10
[  207.318179]  ? lock_acquire+0x1a4/0x2e0
[  207.318182]  ? __kasan_check_read+0x11/0x20
[  207.318185]  ? do_raw_spin_unlock+0x5c/0x240
[  207.318187]  kobject_add_internal+0x28d/0x8e0
[  207.318189]  kobject_add+0x11f/0x1f0
[  207.318191]  ? __pfx_kobject_add+0x10/0x10
[  207.318193]  ? lockdep_init_map_type+0x4b/0x230
[  207.318195]  ? get_device_parent.isra.0+0x43/0x4c0
[  207.318197]  ? kobject_get+0x55/0xf0
[  207.318199]  device_add+0x2d7/0x1500
[  207.318201]  ? __pfx_device_add+0x10/0x10
[  207.318203]  ? lockdep_init_map_type+0x4b/0x230
[  207.318205]  __auxiliary_device_add+0x99/0x140
[  207.318208]  xe_nvm_init+0x7a2/0xef0 [xe]
[  207.318333]  ? xe_devcoredump_init+0x80/0x110 [xe]
[  207.318452]  ? __devm_add_action+0x82/0xc0
[  207.318454]  ? fs_reclaim_release+0xc0/0x110
[  207.318457]  xe_device_probe+0x17dd/0x2c40 [xe]
[  207.318574]  ? __pfx___drm_dev_dbg+0x10/0x10
[  207.318576]  ? add_dr+0x180/0x220
[  207.318579]  ? __pfx___drmm_mutex_release+0x10/0x10
[  207.318582]  ? __pfx_xe_device_probe+0x10/0x10 [xe]
[  207.318697]  ? xe_pm_init_early+0x33a/0x410 [xe]
[  207.318850]  xe_pci_probe+0x936/0x1250 [xe]
[  207.318999]  ? lock_acquire+0x1a4/0x2e0
[  207.319003]  ? __pfx_xe_pci_probe+0x10/0x10 [xe]
[  207.319151]  local_pci_probe+0xe6/0x1a0
[  207.319154]  pci_device_probe+0x523/0x840
[  207.319157]  ? __pfx_pci_device_probe+0x10/0x10
[  207.319159]  ? sysfs_do_create_link_sd.isra.0+0x8c/0x110
[  207.319162]  ? sysfs_create_link+0x48/0xc0
...
"

Fixes: c28bfb107dac ("drm/xe/nvm: add on-die non-volatile memory device")
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260120183239.2966782-6-shuicheng.lin@intel.com
(cherry picked from commit 11035eab1b7d88daa7904440046e64d3810b1ca1)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"16 hotfixes.  9 are cc:stable, 12 are for MM.

  There's a patch series from Pratyush Yadav which fixes a few things in
  the new-in-6.19 LUO memfd code.

  Plus the usual shower of singletons - please see the changelogs for
  details"

* tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  vmcoreinfo: make hwerr_data visible for debugging
  mm/zone_device: reinitialize large zone device private folios
  mm/mm_init: don't cond_resched() in deferred_init_memmap_chunk() if called from deferred_grow_zone()
  mm/kfence: randomize the freelist on initialization
  kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM
  kho: init alloc tags when restoring pages from reserved memory
  mm: memfd_luo: restore and free memfd_luo_ser on failure
  mm: memfd_luo: use memfd_alloc_file() instead of shmem_file_setup()
  memfd: export alloc_file()
  flex_proportions: make fprop_new_period() hardirq safe
  mailmap: add entry for Viacheslav Bocharov
  mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn
  mm/memory-failure: fix missing ->mf_stats count in hugetlb poison
  mm, swap: restore swap_space attr aviod kernel panic
  mm/kasan: fix KASAN poisoning in vrealloc()
  mm/shmem, swap: fix race of truncate and swap entry split

Merge tag 'net-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, CAN and wireless.

  There are no known regressions currently under investigation.

  Current release - fix to a fix:

    - can: gs_usb_receive_bulk_callback(): fix error message

  Current release - regressions:

    - eth: gve: fix probe failure if clock read fails

  Previous releases - regressions:

    - ipv6: use the right ifindex when replying to icmpv6 from localhost

    - mptcp: fix race in mptcp_pm_nl_flush_addrs_doit()

    - bluetooth: fix null-ptr-deref in hci_uart_write_work

    - eth:
        - sfc: fix deadlock in RSS config read
        - ice: ifix NULL pointer dereference in ice_vsi_set_napi_queues
        - mlx5: fix memory leak in esw_acl_ingress_lgcy_setup()

  Previous releases - always broken:

    - core: fix segmentation of forwarding fraglist GRO

    - wifi: mac80211: correctly decode TTLM with default link map

    - mptcp: avoid dup SUB_CLOSED events after disconnect

    - nfc: fix memleak in nfc_llcp_send_ui_frame().

    - eth:
        - bonding: fix use-after-free due to enslave fail
        - mlx5e:
            - TC, delete flows only for existing peers
            - fix inverted cap check in tx flow table root disconnect"

* tag 'net-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
  net: fix segmentation of forwarding fraglist GRO
  wifi: mac80211: correctly decode TTLM with default link map
  selftests: mptcp: join: fix local endp not being tracked
  selftests: mptcp: check subflow errors in close events
  mptcp: only reset subflow errors when propagated
  selftests: mptcp: check no dup close events after error
  mptcp: avoid dup SUB_CLOSED events after disconnect
  net/mlx5e: Skip ESN replay window setup for IPsec crypto offload
  net/mlx5: Fix vhca_id access call trace use before alloc
  net/mlx5: fs, Fix inverted cap check in tx flow table root disconnect
  net: phy: micrel: fix clk warning when removing the driver
  net/mlx5e: don't assume psp tx skbs are ipv6 csum handling
  net: bridge: fix static key check
  nfc: nci: Fix race between rfkill and nci_unregister_device().
  gve: fix probe failure if clock read fails
  net/mlx5e: Account for netdev stats in ndo_get_stats64
  net/mlx5e: TC, delete flows only for existing peers
  net/mlx5: Fix Unbinding uplink-netdev in switchdev mode
  ice: stop counting UDP csum mismatch as rx_errors
  ice: Fix NULL pointer dereference in ice_vsi_set_napi_queues
  ...

drm/amdgpu/gfx12: adjust KGQ reset sequence

Kernel gfx queues do not need to be reinitialized or
remapped after a reset. Align with gfx11.

v2: preserve init and remap for MMIO case.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a6d6ed694d72b66b0ed7a483d5effa01acd3951)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx11: adjust KGQ reset sequence

Kernel gfx queues do not need to be reinitialized or
remapped after a reset. This fixes queue reset failures
on APUs.

v2: preserve init and remap for MMIO case.

Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b340ff216fdabfe71ba0cdd47e9835a141d08e10)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx12: fix wptr reset in KGQ init

wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a2918f958d3f677ea93c0ac257cb6ba69b7abb7c)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx11: fix wptr reset in KGQ init

wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1f16866bdb1daed7a80ca79ae2837a9832a74fbc)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx10: fix wptr reset in KGQ init

wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e80b1d1aa1073230b6c25a1a72e88f37e425ccda)
Cc: stable@vger.kernel.org

drm/xe/configfs: Fix is_bound() pci_dev lifetime

Move pci_dev_put() after pci_dbg() to avoid using pdev after dropping its
reference.

Fixes: 2674f1ef29f46 ("drm/xe/configfs: Block runtime attribute changes")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260121173750.3090907-2-shuicheng.lin@intel.com
(cherry picked from commit 63b33604365bdca43dee41bab809da2230491036)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge tag 'for-6.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- fix leaked folio refcount on s390x when using hw zlib compression
   acceleration

- remove own threshold from ->writepages() which could collide with
   cgroup limits and lead to a deadlock when metadadata are not written
   because the amount is under the internal limit

* tag 'for-6.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zlib: fix the folio leak on S390 hardware acceleration
  btrfs: do not strictly require dirty metadata threshold for metadata writepages

net: fix segmentation of forwarding fraglist GRO

This patch enhances GSO segment handling by properly checking
the SKB_GSO_DODGY flag for frag_list GSO packets, addressing
low throughput issues observed when a station accesses IPv4
servers via hotspots with an IPv6-only upstream interface.

Specifically, it fixes a bug in GSO segmentation when forwarding
GRO packets containing a frag_list. The function skb_segment_list
cannot correctly process GRO skbs that have been converted by XLAT,
since XLAT only translates the header of the head skb. Consequently,
skbs in the frag_list may remain untranslated, resulting in protocol
inconsistencies and reduced throughput.

To address this, the patch explicitly sets the SKB_GSO_DODGY flag
for GSO packets in XLAT's IPv4/IPv6 protocol translation helpers
(bpf_skb_proto_4_to_6 and bpf_skb_proto_6_to_4). This marks GSO
packets as potentially modified after protocol translation. As a
result, GSO segmentation will avoid using skb_segment_list and
instead falls back to skb_segment for packets with the SKB_GSO_DODGY
flag. This ensures that only safe and fully translated frag_list
packets are processed by skb_segment_list, resolving protocol
inconsistencies and improving throughput when forwarding GRO packets
converted by XLAT.

Signed-off-by: Jibin Zhang <jibin.zhang@mediatek.com>
Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260126152114.1211-1-jibin.zhang@mediatek.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'asoc-fix-v6.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.19

A couple of small fixes and a couple of quirks, nothing major

Merge tag 'wireless-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Just one fix, for a parsing error in mac80211 that might
result in a one byte out-of-bounds read.

* tag 'wireless-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: correctly decode TTLM with default link map
====================

Link: https://patch.msgid.link/20260129110403.178036-3-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ASoC: sof_sdw: Add a quirk for Lenovo laptop using sidecar amps with cs42l43

Add a quirk for a Lenovo laptop (SSID: 0x17aa3821) to allow using sidecar
CS35L57 amps with CS42L43 codec.

Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260128092410.1540583-1-mstrozek@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

wifi: mac80211: correctly decode TTLM with default link map

TID-To-Link Mapping (TTLM) elements do not contain any link mapping
presence indicator if a default mapping is used and parsing needs to be
skipped.

Note that access points should not explicitly report an advertised TTLM
with a default mapping as that is the implied mapping if the element is
not included, this is even the case when switching back to the default
mapping. However, mac80211 would incorrectly parse the frame and would
also read one byte beyond the end of the element.

Reported-by: Ruikai Peng <ruikai@pwno.io>
Closes: https://lore.kernel.org/linux-wireless/CAFD3drMqc9YWvTCSHLyP89AOpBZsHdZ+pak6zVftYoZcUyF7gw@mail.gmail.com
Fixes: 702e80470a33 ("wifi: mac80211: support handling of advertised TID-to-link mapping")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20260129113349.d6b96f12c732.I69212a50f0f70db185edd3abefb6f04d3cb3e5ff@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

dma/pool: distinguish between missing and exhausted atomic pools

Currently, dma_alloc_from_pool() unconditionally warns and dumps a stack
trace when an allocation fails, with the message "Failed to get suitable
pool".

This conflates two distinct failure modes:
1. Configuration error: No atomic pool is available for the requested
   DMA mask (a fundamental system setup issue)
2. Resource Exhaustion: A suitable pool exists but is currently full (a
   recoverable runtime state)

This lack of distinction prevents drivers from using __GFP_NOWARN to
suppress error messages during temporary pressure spikes, such as when
awaiting synchronous reclaim of descriptors.

Refactor the error handling to distinguish these cases:
- If no suitable pool is found, keep the unconditional WARN regarding
  the missing pool.
- If a pool was found but is exhausted, respect __GFP_NOWARN and update
  the warning message to explicitly state "DMA pool exhausted".

Fixes: 9420139f516d ("dma-pool: fix coherent pool allocations for IOMMU mappings")
Signed-off-by: Sai Sree Kartheek Adivi <s-adivi@ti.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260128133554.3056582-1-s-adivi@ti.com