git.ipfire.org Git - thirdparty/kernel/linux.git/log

]> git.ipfire.org Git - thirdparty/kernel/linux.git/log

projects / thirdparty / kernel / linux.git / log

Sun YangKai [Sat, 4 Oct 2025 14:31:09 +0000 (22:31 +0800)]

btrfs: more trivial BTRFS_PATH_AUTO_FREE conversions

Convert more of the trivial pattern for the auto freeing of btrfs_path
with goto -> return conversions where applicable.

Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:27:16 +0000 (18:27 +0100)]

btrfs: remove fs_info argument from btrfs_reserve_metadata_bytes()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:23:27 +0000 (18:23 +0100)]

btrfs: remove fs_info argument from __reserve_bytes()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:19:46 +0000 (18:19 +0100)]

btrfs: fix parameter documentation for btrfs_reserve_data_bytes()

We don't have a fs_info argument anymore since commit 5d39fda880be
("btrfs: pass btrfs_space_info to btrfs_reserve_data_bytes()"), it
was replaced by a space_info argument. So update the documentation.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:14:39 +0000 (18:14 +0100)]

btrfs: remove fs_info argument from maybe_clamp_preempt()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:13:49 +0000 (18:13 +0100)]

btrfs: remove fs_info argument from handle_reserve_ticket()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:10:02 +0000 (18:10 +0100)]

btrfs: remove fs_info argument from steal_from_global_rsv()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:09:16 +0000 (18:09 +0100)]

btrfs: remove fs_info argument from need_preemptive_reclaim()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 17:01:55 +0000 (18:01 +0100)]

btrfs: remove fs_info argument from btrfs_calc_reclaim_metadata_size()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 16:58:07 +0000 (17:58 +0100)]

btrfs: remove fs_info argument from shrink_delalloc() and flush_space()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 16:53:20 +0000 (17:53 +0100)]

btrfs: remove fs_info argument from btrfs_dump_space_info()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 16:44:34 +0000 (17:44 +0100)]

btrfs: remove fs_info argument from btrfs_can_overcommit()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 16:39:21 +0000 (17:39 +0100)]

btrfs: remove fs_info argument from calc_available_free_space()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 13:02:41 +0000 (14:02 +0100)]

btrfs: remove fs_info argument from maybe_fail_all_tickets()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 13:01:18 +0000 (14:01 +0100)]

btrfs: remove fs_info argument from priority_reclaim_metadata_space()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 12:58:55 +0000 (13:58 +0100)]

btrfs: remove fs_info argument from priority_reclaim_data_space()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Mon, 13 Oct 2025 12:57:09 +0000 (13:57 +0100)]

btrfs: remove fs_info argument from btrfs_try_granting_tickets()

We don't need it since we can grab fs_info from the given space_info.
So remove the fs_info argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Sun, 12 Oct 2025 16:48:27 +0000 (17:48 +0100)]

btrfs: avoid repeated computations in btrfs_mark_ordered_io_finished()

We're computing a few values several times:

1) The current ordered extent's end offset inside the while loop, we have
   computed it and stored it in the 'entry_end' variable but then we
   compute it again later as the first argument to the min() macro;

2) The end file offset, open coded 3 times;

3) The current length (stored in variable 'len') computed 2 times, one
   inside an assertion and the other when assigning to the 'len' variable.

So use existing variables and add new ones to prevent repeating these
expressions and reduce the source code.

We were also subtracting one from the result of min() macro call and
then adding 1 back in the next line, making both operations pointless.
So just remove the decrement and increment by 1.

This also reduces very slightly the object code.

Before:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1916576 161679   15592 2093847 1ff317 fs/btrfs/btrfs.ko

After:

  $ size fs/btrfs/btrfs.ko
     text    data     bss     dec     hex filename
  1916556 161679   15592 2093827 1ff303 fs/btrfs/btrfs.ko

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Sun, 12 Oct 2025 09:39:08 +0000 (10:39 +0100)]

btrfs: avoid multiple i_size rounding in btrfs_truncate()

We have the inode locked so no one can concurrently change its i_size and
neither do we change it ourselves, so there's no point in keep rounding
it in the while loop and setting it up in the control structure. That only
causes confusion when reading the code.

So move all the i_size setup and rounding out of the loop and assert the
inode is locked.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Sun, 12 Oct 2025 09:26:40 +0000 (10:26 +0100)]

btrfs: consistently round up or down i_size in btrfs_truncate()

We're using different ways to round down the i_size by sector size, one
with a bitwise and with a negated mask and another with ALIGN_DOWN(), and
using ALIGN() to round up.

Replace these uses with the round_down() and round_up() macros which have
have names that make it clear the direction of the rounding (unlike the
ALIGN() macro) and getting rid of the bitwise and, negated mask and local
variable for the mask.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Sun, 12 Oct 2025 09:43:02 +0000 (10:43 +0100)]

btrfs: add unlikely to unexpected error case in extent_writepages()

We don't expect to hit errors and log the error message, so add the
unlikely annotation to make it clear and to hint the compiler that it may
reorganize code to be more efficient.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Fri, 10 Oct 2025 16:17:10 +0000 (17:17 +0100)]

btrfs: split assertion into two in extent_writepage_io()

If the assertion fails we don't get to know which of the two expressions
failed and neither the values used in each expression.

So split the assertion into two, each for a single expression, so that
if any is triggered we see a line number reported in a stack trace that
points to which expression failed. Also make the assertions use the
verbose mode to print the values involved in the computations.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Fri, 10 Oct 2025 16:04:03 +0000 (17:04 +0100)]

btrfs: use variable for end offset in extent_writepage_io()

Instead of repeating the expression "start + len" multiple times, store it
in a variable and use it where needed.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Fri, 10 Oct 2025 15:50:02 +0000 (16:50 +0100)]

btrfs: truncate ordered extent when skipping writeback past i_size

While running test case btrfs/192 from fstests with support for large
folios (needs CONFIG_BTRFS_EXPERIMENTAL=y) I ended up getting very sporadic
btrfs check failures reporting that csum items were missing. Looking into
the issue it turned out that btrfs check searches for csum items of a file
extent item with a range that spans beyond the i_size of a file and we
don't have any, because the kernel's writeback code skips submitting bios
for ranges beyond eof. It's not expected however to find a file extent item
that crosses the rounded up (by the sector size) i_size value, but there is
a short time window where we can end up with a transaction commit leaving
this small inconsistency between the i_size and the last file extent item.

Example btrfs check output when this happens:

  $ btrfs check /dev/sdc
  Opening filesystem to check...
  Checking filesystem on /dev/sdc
  UUID: 69642c61-5efb-4367-aa31-cdfd4067f713
  [1/8] checking log skipped (none written)
  [2/8] checking root items
  [3/8] checking extents
  [4/8] checking free space tree
  [5/8] checking fs roots
  root 5 inode 332 errors 1000, some csum missing
  ERROR: errors found in fs roots
  (...)

Looking at a tree dump of the fs tree (root 5) for inode 332 we have:

   $ btrfs inspect-internal dump-tree -t 5 /dev/sdc
   (...)
        item 28 key (332 INODE_ITEM 0) itemoff 2006 itemsize 160
                generation 17 transid 19 size 610969 nbytes 86016
                block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
                sequence 11 flags 0x0(none)
                atime 1759851068.391327881 (2025-10-07 16:31:08)
                ctime 1759851068.410098267 (2025-10-07 16:31:08)
                mtime 1759851068.410098267 (2025-10-07 16:31:08)
                otime 1759851068.391327881 (2025-10-07 16:31:08)
        item 29 key (332 INODE_REF 340) itemoff 1993 itemsize 13
                index 2 namelen 3 name: f1f
        item 30 key (332 EXTENT_DATA 589824) itemoff 1940 itemsize 53
                generation 19 type 1 (regular)
                extent data disk byte 21745664 nr 65536
                extent data offset 0 nr 65536 ram 65536
                extent compression 0 (none)
   (...)

We can see that the file extent item for file offset 589824 has a length of
64K and its number of bytes is 64K. Looking at the inode item we see that
its i_size is 610969 bytes which falls within the range of that file extent
item [589824, 655360[.

Looking into the csum tree:

  $ btrfs inspect-internal dump-tree /dev/sdc
  (...)
        item 15 key (EXTENT_CSUM EXTENT_CSUM 21565440) itemoff 991 itemsize 200
                range start 21565440 end 21770240 length 204800
           item 16 key (EXTENT_CSUM EXTENT_CSUM 1104576512) itemoff 983 itemsize 8
                range start 1104576512 end 1104584704 length 8192
  (..)

We see that the csum item number 15 covers the first 24K of the file extent
item - it ends at offset 21770240 and the extent's disk_bytenr is 21745664,
so we have:

   21770240 - 21745664 = 24K

We see that the next csum item (number 16) is completely outside the range,
so the remaining 40K of the extent doesn't have csum items in the tree.

If we round up the i_size to the sector size, we get:

   round_up(610969, 4096) = 614400

If we subtract from that the file offset for the extent item we get:

   614400 - 589824 = 24K

So the missing 40K corresponds to the end of the file extent item's range
minus the rounded up i_size:

   655360 - 614400 = 40K

Normally we don't expect a file extent item to span over the rounded up
i_size of an inode, since when truncating, doing hole punching and other
operations that trim a file extent item, the number of bytes is adjusted.

There is however a short time window where the kernel can end up,
temporarily,persisting an inode with an i_size that falls in the middle of
the last file extent item and the file extent item was not yet trimmed (its
number of bytes reduced so that it doesn't cross i_size rounded up by the
sector size).

The steps (in the kernel) that lead to such scenario are the following:

1) We have inode I as an empty file, no allocated extents, i_size is 0;

2) A buffered write is done for file range [589824, 655360[ (length of
    64K) and the i_size is updated to 655360. Note that we got a single
    large folio for the range (64K);

3) A truncate operation starts that reduces the inode's i_size down to
    610969 bytes. The truncate sets the inode's new i_size at
    btrfs_setsize() by calling truncate_setsize() and before calling
    btrfs_truncate();

4) At btrfs_truncate() we trigger writeback for the range starting at
    610304 (which is the new i_size rounded down to the sector size) and
    ending at (u64)-1;

5) During the writeback, at extent_write_cache_pages(), we get from the
    call to filemap_get_folios_tag(), the 64K folio that starts at file
    offset 589824 since it contains the start offset of the writeback
    range (610304);

6) At writepage_delalloc() we find the whole range of the folio is dirty
    and therefore we run delalloc for that 64K range ([589824, 655360[),
    reserving a 64K extent, creating an ordered extent, etc;

7) At extent_writepage_io() we submit IO only for subrange [589824, 614400[
    because the inode's i_size is 610969 bytes (rounded up by sector size
    is 614400). There, in the while loop we intentionally skip IO beyond
    i_size to avoid any unnecessay work and just call
    btrfs_mark_ordered_io_finished() for the range [614400, 655360[ (which
    has a 40K length);

8) Once the IO finishes we finish the ordered extent by ending up at
    btrfs_finish_one_ordered(), join transaction N, insert a file extent
    item in the inode's subvolume tree for file offset 589824 with a number
    of bytes of 64K, and update the inode's delayed inode item or directly
    the inode item with a call to btrfs_update_inode_fallback(), which
    results in storing the new i_size of 610969 bytes;

9) Transaction N is committed either by the transaction kthread or some
    other task committed it (in response to a sync or fsync for example).

    At this point we have inode I persisted with an i_size of 610969 bytes
    and file extent item that starts at file offset 589824 and has a number
    of bytes of 64K, ending at an offset of 655360 which is beyond the
    i_size rounded up to the sector size (614400).

    --> So after a crash or power failure here, the btrfs check program
        reports that error about missing checksum items for this inode, as
it tries to lookup for checksums covering the whole range of the
extent;

10) Only after transaction N is committed that at btrfs_truncate() the
    call to btrfs_start_transaction() starts a new transaction, N + 1,
    instead of joining transaction N. And it's with transaction N + 1 that
    it calls btrfs_truncate_inode_items() which updates the file extent
    item at file offset 589824 to reduce its number of bytes from 64K down
    to 24K, so that the file extent item's range ends at the i_size
    rounded up to the sector size (614400 bytes).

Fix this by truncating the ordered extent at extent_writepage_io() when we
skip writeback because the current offset in the folio is beyond i_size.
This ensures we don't ever persist a file extent item with a number of
bytes beyond the rounded up (by sector size) value of the i_size.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Qu Wenruo [Sun, 12 Oct 2025 23:52:05 +0000 (10:22 +1030)]

btrfs: implement remove_bdev and shutdown super operation callbacks

For the ->remove_bdev() callback, btrfs will:

- Mark the target device as missing

- Go degraded if the fs can afford it

- Return error other wise
  Thus falls back to the shutdown callback

For the ->shutdown callback, btrfs will:

- Set the SHUTDOWN flag
  Which will reject all new incoming operations, and make all writeback
  to fail.

  The behavior is the same as the NOLOGFLUSH behavior.

To support the lookup from bdev to a btrfs_device,
btrfs_dev_lookup_args is enhanced to have a new @devt member.
If set, we should be able to use that @devt member to uniquely locating a
btrfs device.

I know the shutdown can be a little overkilled, if one has a RAID1
metadata and RAID0 data, in that case one can still read data with 50%
chance to got some good data.

But a filesystem returning -EIO for half of the time is not really
considered usable.
Further it can also be as bad as the only device went missing for a single
device btrfs.

So here we go safe other than sorry when handling missing device.

And the remove_bdev callback will be hidden behind experimental features
for now, the reasons are:

- There are not enough btrfs specific bdev removal test cases
  The existing test cases are all removing the only device, thus only
  exercises the ->shutdown() behavior.

- Not yet determined what's the expected behavior
  Although the current auto-degrade behavior is no worse than the old
  behavior, it may not always be what the end users want.

  Before there is a concrete interface, better hide the new feature
  from end users.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Tested-by: Anand Jain <asj@kernel.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Qu Wenruo [Sun, 12 Oct 2025 23:52:04 +0000 (10:22 +1030)]

btrfs: implement shutdown ioctl

The shutdown ioctl should follow the XFS one, which use magic number 'X',
and ioctl number 125, with a uint32 as flags.

For now btrfs don't distinguish DEFAULT and LOGFLUSH flags (just like
f2fs), both will freeze the fs first (implies committing the current
transaction), setting the SHUTDOWN flag and finally thaw the fs.

For NOLOGFLUSH flag, the freeze/thaw part is skipped thus the current
transaction is aborted.

The new shutdown ioctl is hidden behind experimental features for more
testing.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Tested-by: Anand Jain <asj@kernel.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Qu Wenruo [Sun, 12 Oct 2025 23:52:03 +0000 (10:22 +1030)]

btrfs: introduce a new shutdown state

A new fs state EMERGENCY_SHUTDOWN is introduced, which is btrfs'
equivalent of XFS_IOC_GOINGDOWN or EXT4_IOC_SHUTDOWN, after entering
emergency shutdown state, all operations will return errors (-EIO), and
can not be bring back to normal state until unmouont.

The new state will reject the following file operations:

- read_iter()
- write_iter()
- mmap()
- open()
- remap_file_range()
- uring_cmd()
- splice_read()
This requires a small wrapper to do the extra shutdown check, then call
the regular filemap_splice_read() function

This should reject most of the file operations on a shutdown btrfs.

And for the existing dirty folios, extra shutdown checks are introduced
to the following functions:

- run_delalloc_nocow()
- run_delalloc_compressed()
- cow_file_range()

So that dirty ranges will still be properly cleaned without being
submitted.

Finally the shutdown state will also set the fs error, so that no new
transaction will be committed, protecting the metadata from any possible
further corruption.

And when the fs entered shutdown mode for the first time, a critical
level kernel message will show up to indicate the incident.

That message will be important for end users as rejected delalloc ranges
will output error messages, hopefully that shutdown message and the fact
that all fs operations are returning error will prevent end users from
getting too confused about the delalloc error messages.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <asj@kernel.org>
Tested-by: Anand Jain <asj@kernel.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Tue, 7 Oct 2025 10:14:37 +0000 (11:14 +0100)]

btrfs: use end_pos variable where needed in btrfs_dirty_folio()

We have a couple places doing the computation "pos + write_bytes" when we
already have it in the local variable "end_pos". Change then to use the
variable instead and make source code smaller. Also make the variable
const since it's not supposed to change.

This also has a very slight reduction in the module size.

Before:

   $ size fs/btrfs/btrfs.ko
      text    data     bss     dec     hex filename
   1915990 161647   15592 2093229 1ff0ad fs/btrfs/btrfs.ko

After:

   $ size fs/btrfs/btrfs.ko
      text    data     bss     dec     hex filename
   1915974 161647   15592 2093213 1ff09d fs/btrfs/btrfs.ko

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Boris Burkov [Thu, 2 Oct 2025 00:20:22 +0000 (17:20 -0700)]

btrfs: fix racy bitfield write in btrfs_clear_space_info_full()

From the memory-barriers.txt document regarding memory barrier ordering
guarantees:

(*) These guarantees do not apply to bitfields, because compilers often
     generate code to modify these using non-atomic read-modify-write
     sequences.  Do not attempt to use bitfields to synchronize parallel
     algorithms.

(*) Even in cases where bitfields are protected by locks, all fields
     in a given bitfield must be protected by one lock.  If two fields
     in a given bitfield are protected by different locks, the compiler's
     non-atomic read-modify-write sequences can cause an update to one
     field to corrupt the value of an adjacent field.

btrfs_space_info has a bitfield sharing an underlying word consisting of
the fields full, chunk_alloc, and flush:

struct btrfs_space_info {
        struct btrfs_fs_info *     fs_info;              /*     0     8 */
        struct btrfs_space_info *  parent;               /*     8     8 */
        ...
        int                        clamp;                /*   172     4 */
        unsigned int               full:1;               /*   176: 0  4 */
        unsigned int               chunk_alloc:1;        /*   176: 1  4 */
        unsigned int               flush:1;              /*   176: 2  4 */
        ...

Therefore, to be safe from parallel read-modify-writes losing a write to
one of the bitfield members protected by a lock, all writes to all the
bitfields must use the lock. They almost universally do, except for
btrfs_clear_space_info_full() which iterates over the space_infos and
writes out found->full = 0 without a lock.

Imagine that we have one thread completing a transaction in which we
finished deleting a block_group and are thus calling
btrfs_clear_space_info_full() while simultaneously the data reclaim
ticket infrastructure is running do_async_reclaim_data_space():

          T1                                             T2
btrfs_commit_transaction
  btrfs_clear_space_info_full
  data_sinfo->full = 0
  READ: full:0, chunk_alloc:0, flush:1
                                              do_async_reclaim_data_space(data_sinfo)
                                              spin_lock(&space_info->lock);
                                              if(list_empty(tickets))
                                                space_info->flush = 0;
                                                READ: full: 0, chunk_alloc:0, flush:1
                                                MOD/WRITE: full: 0, chunk_alloc:0, flush:0
                                                spin_unlock(&space_info->lock);
                                                return;
  MOD/WRITE: full:0, chunk_alloc:0, flush:1

and now data_sinfo->flush is 1 but the reclaim worker has exited. This
breaks the invariant that flush is 0 iff there is no work queued or
running. Once this invariant is violated, future allocations that go
into __reserve_bytes() will add tickets to space_info->tickets but will
see space_info->flush is set to 1 and not queue the work. After this,
they will block forever on the resulting ticket, as it is now impossible
to kick the worker again.

I also confirmed by looking at the assembly of the affected kernel that
it is doing RMW operations. For example, to set the flush (3rd) bit to 0,
the assembly is:
  andb    $0xfb,0x60(%rbx)
and similarly for setting the full (1st) bit to 0:
  andb    $0xfe,-0x20(%rax)

So I think this is really a bug on practical systems.  I have observed
a number of systems in this exact state, but am currently unable to
reproduce it.

Rather than leaving this footgun lying around for the future, take
advantage of the fact that there is room in the struct anyway, and that
it is already quite large and simply change the three bitfield members to
bools. This avoids writes to space_info->full having any effect on
writes to space_info->flush, regardless of locking.

Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Rajeev Tapadia [Fri, 3 Oct 2025 13:30:02 +0000 (19:00 +0530)]

btrfs: fix comment in alloc_bitmap() and drop stale TODO

All callers of alloc_bitmap() hold a transaction handle, so GFP_NOFS is
needed to avoid deadlocks on recursion. Update the comment and drop the
stale TODO.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Rajeev Tapadia <rtapadia730@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Miquel Sabaté Solà [Wed, 1 Oct 2025 18:05:03 +0000 (20:05 +0200)]

btrfs: fix double free of qgroup record after failure to add delayed ref head

In the previous code it was possible to incur into a double kfree()
scenario when calling add_delayed_ref_head(). This could happen if the
record was reported to already exist in the
btrfs_qgroup_trace_extent_nolock() call, but then there was an error
later on add_delayed_ref_head(). In this case, since
add_delayed_ref_head() returned an error, the caller went to free the
record. Since add_delayed_ref_head() couldn't set this kfree'd pointer
to NULL, then kfree() would have acted on a non-NULL 'record' object
which was pointing to memory already freed by the callee.

The problem comes from the fact that the responsibility to kfree the
object is on both the caller and the callee at the same time. Hence, the
fix for this is to shift the ownership of the 'qrecord' object out of
the add_delayed_ref_head(). That is, we will never attempt to kfree()
the given object inside of this function, and will expect the caller to
act on the 'qrecord' object on its own. The only exception where the
'qrecord' object cannot be kfree'd is if it was inserted into the
tracing logic, for which we already have the 'qrecord_inserted_ret'
boolean to account for this. Hence, the caller has to kfree the object
only if add_delayed_ref_head() reports not to have inserted it on the
tracing logic.

As a side-effect of the above, we must guarantee that
'qrecord_inserted_ret' is properly initialized at the start of the
function, not at the end, and then set when an actual insert
happens. This way we avoid 'qrecord_inserted_ret' having an invalid
value on an early exit.

The documentation from the add_delayed_ref_head() has also been updated
to reflect on the exact ownership of the 'qrecord' object.

Fixes: 6ef8fbce0104 ("btrfs: fix missing error handling when adding delayed ref with qgroups enabled")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Mon, 29 Sep 2025 12:41:15 +0000 (14:41 +0200)]

btrfs: subpage: rename macro variables to avoid shadowing

When compiling with -Wshadow there are warnings in the subpage helper
macros that are used in functions like btrfs_subpage_dump_bitmap() or
btrfs_subpage_clear_and_test_dirty() that also use 'bfs' (for struct
btrfs_folio_state) or blocks_per_folio.

Add '__' to the macro variables and unify naming in all subpage macros.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Mehdi Ben Hadj Khelifa [Tue, 30 Sep 2025 10:03:44 +0000 (11:03 +0100)]

btrfs: refactor allocation size calculation in alloc_btrfs_io_context()

Use struct_size() to replace the open-coded calculation, remove the
comment as use of the helper is self explanatory.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Fri, 26 Sep 2025 09:47:30 +0000 (11:47 +0200)]

btrfs: fix trivial -Wshadow warnings

When compiling with -Wshadow (also in 'make W=2' build) there are
several reports of shadowed variables that seem to be harmless:

- btrfs_do_encoded_write() - we can reuse 'ordered', there's no previous
     value that would need to be preserved

- scrub_write_endio() - we need a standalone 'i' for bio iteration

- scrub_stripe() - duplicate ret2 for errors that must not overwrite 'ret'

- btrfs_subpage_set_writeback() - 'flags' is used for another irqsave lock
                                  but is not overwritten when reused for xarray
  due to scoping, but for clarity let's rename it

- process_dir_items_leaf() - duplicate 'ret', used only for immediate checks

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Fri, 26 Sep 2025 06:32:56 +0000 (08:32 +0200)]

btrfs: print-tree: use string format for key names

There's a warning when -Wformat=2 is used:

fs/btrfs/print-tree.c: In function ‘key_type_string’:
fs/btrfs/print-tree.c:424:17: warning: format not a string literal and no format arguments [-Wformat-nonliteral]
424 | scnprintf(buf, buf_size, key_to_str[key->type]);

We're printing fixed strings from a table so there's no problem but
let's fix the warning so we could enable the warning in fs/btrfs/.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Qu Wenruo [Thu, 25 Sep 2025 09:53:22 +0000 (19:23 +0930)]

btrfs: remove unnecessary NULL fs_info check from find_lock_delalloc_range()

[STATIC CHECK REPORT]
Smatch is reporting that find_lock_delalloc_range() used to do a null
pointer check before accessing fs_info, but now we're accessing it for
sectorsize unconditionally.

[FALSE ALERT]
This is a false alert, the existing null pointer check is introduced in
commit f7b12a62f008 ("btrfs: replace BTRFS_MAX_EXTENT_SIZE with
fs_info->max_extent_size"), but way before that, commit 7c0260ee098d
("btrfs: tests, require fs_info for root") is already forcing every
btrfs_root to have a correct fs_info pointer.

So there is no way that btrfs_root::fs_info is NULL.

[FIX]
Just remove the unnecessary NULL pointer checker.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f7b12a62f008 ("btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size")
Closes: https://lore.kernel.org/r/202509250925.4L4JQTtn-lkp@intel.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Wed, 24 Sep 2025 16:10:27 +0000 (17:10 +0100)]

btrfs: use single return value variable in btrfs_relocate_block_group()

We are using 'ret' and 'err' variables to track return values and errors,
which is pattern that is error prone and we had quite some bugs due to
this pattern in the past.

Simplify this and use a single variable, named 'ret', to track errors and
the return value.

Also rename the variable 'rw' to 'bg_is_ro' which is more meaningful name,
and change its type from int to bool.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Boris Burkov [Tue, 23 Sep 2025 17:57:02 +0000 (10:57 -0700)]

btrfs: ignore ENOMEM from alloc_bitmap()

btrfs_convert_free_space_to_bitmaps() and
btrfs_convert_free_space_to_extents() both allocate a bitmap struct
with:

        bitmap_size = free_space_bitmap_size(fs_info, block_group->length);
        bitmap = alloc_bitmap(bitmap_size);
        if (!bitmap) {
                ret = -ENOMEM;
                btrfs_abort_transaction(trans);
                return ret;
        }

This conversion is done based on a heuristic and the check triggers each
time we call update_free_space_extent_count() on a block group (each
time we add/remove an extent or modify a bitmap). Furthermore, nothing
relies on maintaining some invariant of bitmap density, it's just an
optimization for space usage. Therefore, it is safe to simply ignore
any memory allocation errors that occur, rather than aborting the
transaction and leaving the fs read only.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Linus Torvalds [Sun, 23 Nov 2025 22:53:16 +0000 (14:53 -0800)]

Linux 6.18-rc7

commit | commitdiff | tree

Linus Torvalds [Sun, 23 Nov 2025 20:03:28 +0000 (12:03 -0800)]

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"Fixes for the Allwinner A523 clk driver:

   - Lower the minimum rate for the A523 audio PLL to support
     frequencies required by audio devices

   - Mark a couple clks critical on A523 so that Linux doesn't turn them
     off when they're used by other code like TF-A"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: sun55i-a523-ccu: Lower audio0 pll minimum rate
  clk: sunxi-ng: sun55i-a523-r-ccu: Mark bus-r-dma as critical
  clk: sunxi-ng: Mark A523 bus-r-cpucfg clock as critical

commit | commitdiff | tree

Linus Torvalds [Sun, 23 Nov 2025 16:23:30 +0000 (08:23 -0800)]

Merge tag 'timers-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:

- Fix a race in timer->function clearing in timer_shutdown_sync()

- Fix a timekeeper sysfs-setup resource leak in error paths

- Fix the NOHZ report_idle_softirq() syslog rate-limiting
   logic to have no side effects on the return value

* tag 'timers-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix NULL function pointer race in timer_shutdown_sync()
  timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths
  tick/sched: Fix bogus condition in report_idle_softirq()

commit | commitdiff | tree

Linus Torvalds [Sun, 23 Nov 2025 16:20:15 +0000 (08:20 -0800)]

Merge tag 'perf-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Fix perf CPU-clock counters, and address a static checker warning"

* tag 'perf-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix 0 count issue of cpu-clock
perf/x86/intel/uncore: Remove superfluous check

commit | commitdiff | tree

Yipeng Zou [Sat, 22 Nov 2025 09:39:42 +0000 (09:39 +0000)]

timers: Fix NULL function pointer race in timer_shutdown_sync()

There is a race condition between timer_shutdown_sync() and timer
expiration that can lead to hitting a WARN_ON in expire_timers().

The issue occurs when timer_shutdown_sync() clears the timer function
to NULL while the timer is still running on another CPU. The race
scenario looks like this:

CPU0 CPU1
<SOFTIRQ>
lock_timer_base()
expire_timers()
base->running_timer = timer;
unlock_timer_base()
[call_timer_fn enter]
mod_timer()
...
timer_shutdown_sync()
lock_timer_base()
// For now, will not detach the timer but only clear its function to NULL
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()
[call_timer_fn exit]
lock_timer_base()
base->running_timer = NULL;
unlock_timer_base()
...
// Now timer is pending while its function set to NULL.
// next timer trigger
<SOFTIRQ>
expire_timers()
WARN_ON_ONCE(!fn) // hit
...
lock_timer_base()
// Now timer will detach
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()

The problem is that timer_shutdown_sync() clears the timer function
regardless of whether the timer is currently running. This can leave a
pending timer with a NULL function pointer, which triggers the
WARN_ON_ONCE(!fn) check in expire_timers().

Fix this by only clearing the timer function when actually detaching the
timer. If the timer is running, leave the function pointer intact, which is
safe because the timer will be properly detached when it finishes running.

Fixes: 0cc04e80458a ("timers: Add shutdown mechanism to the internal functions")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251122093942.301559-1-zouyipeng@huawei.com

commit | commitdiff | tree

Linus Torvalds [Sat, 22 Nov 2025 20:55:18 +0000 (12:55 -0800)]

Merge tag 'mips-fixes_6.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

- Fix CPU type in DT for econet

- Fix for Malta PCI MMIO breakage for SOC-it

- Fix TLB shutdown caused by iniital uniquification

- Fix random seg faults due to missed vdso storage requirement

* tag 'mips-fixes_6.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: kernel: Fix random segmentation faults
  MIPS: mm: Prevent a TLB shutdown on initial uniquification
  mips: dts: econet: fix EN751221 core type
  MIPS: Malta: Fix !EVA SOC-it PCI MMIO

commit | commitdiff | tree

Linus Torvalds [Sat, 22 Nov 2025 19:53:53 +0000 (11:53 -0800)]

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fix from Eric Biggers:
"Fix another KMSAN warning that made it in while KMSAN wasn't working
reliably"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()

commit | commitdiff | tree

Linus Torvalds [Sat, 22 Nov 2025 18:23:34 +0000 (10:23 -0800)]

Merge tag 'xfs-fixes-6.18-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Carlos Maiolino:
"A single out-of-bounds fix, nothing special"

* tag 'xfs-fixes-6.18-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix out of bounds memory read error in symlink repair

commit | commitdiff | tree

Linus Torvalds [Sat, 22 Nov 2025 18:16:21 +0000 (10:16 -0800)]

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"One target driver fix and one scsi-generic one. The latter is 10 lines
  because the problem lock has to be dropped and re-taken around the
  call causing the sleep in atomic"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sg: Do not sleep in atomic context
  scsi: target: tcm_loop: Fix segfault in tcm_loop_tpg_address_show()

commit | commitdiff | tree

Linus Torvalds [Sat, 22 Nov 2025 17:58:41 +0000 (09:58 -0800)]

Merge tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- INPUT_PROP_HAPTIC_TOUCHPAD definition added early in 6.18 cycle has
   been renamed to INPUT_PROP_PRESSUREPAD to better reflect the kind of
   devices it is supposed to be set for

- a new ID for a touchscreen found in Ayaneo Flip DS in Goodix driver

- Goodix driver no longer tries to set reset pin as "input" as it
   causes issues when there is no pull up resistor installed on the
   board

- fixes for cros_ec_keyb, imx_sc_key, and pegasus-notetaker drivers to
   deal with potential out-of-bounds access and memory corruption issues

* tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: rename INPUT_PROP_HAPTIC_TOUCHPAD to INPUT_PROP_PRESSUREPAD
  Input: cros_ec_keyb - fix an invalid memory access
  Input: imx_sc_key - fix memory corruption on unload
  Input: pegasus-notetaker - fix potential out-of-bounds access
  Input: goodix - remove setting of RST pin to input
  Input: goodix - add support for ACPI ID GDIX1003

commit | commitdiff | tree

Linus Torvalds [Sat, 22 Nov 2025 17:44:50 +0000 (09:44 -0800)]

Merge tag 'riscv-for-linus-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

- Correct the MIPS RISC-V/JEDEC vendor ID

- Fix the system shutdown behavior in the legacy case where
   CONFIG_RISCV_SBI_V01 is set, but the firmware implementation
   doesn't support the older v0.1 system shutdown method

- Align some tools/ macro definitions with the corresponding
   kernel headers

* tag 'riscv-for-linus-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  tools: riscv: Fixed misalignment of CSR related definitions
  riscv: sbi: Prefer SRST shutdown over legacy
  riscv: Update MIPS vendor id to 0x127

commit | commitdiff | tree

Linus Torvalds [Sat, 22 Nov 2025 17:24:36 +0000 (09:24 -0800)]

Merge tag 'selinux-pr-20251121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fixes from Paul Moore:
"Three SELinux patches for v6.18 to fix issues around accessing the
  per-task decision cache that we introduced in v6.16 to help reduce
  SELinux overhead on path walks. The problem was that despite the cache
  being located in the SELinux "task_security_struct", the parent struct
  wasn't actually tied to the task, it was tied to a cred.

  Historically SELinux did locate the task_security_struct in the
  task_struct's security blob, but it was later relocated to the cred
  struct when the cred work happened, as it made the most sense at the
  time.

  Unfortunately we never did the task_security_struct to
  cred_security_struct rename work (avoid code churn maybe? who knows)
  because it didn't really matter at the time. However, it suddenly
  became a problem when we added a per-task cache to a per-cred object
  and didn't notice because of the old, no-longer-correct struct naming.

  Thanks to KCSAN for flagging this, as the silly humans running things
  forgot that the task_security_struct was a big lie.

  This contains three patches, only one of which actually fixes the
  problem described above and moves the SELinux decision cache from the
  per-cred struct to a newly (re)created per-task struct.

  The other two patches, which form the bulk of the diffstat, take care
  of the associated renaming tasks so we can hopefully avoid making the
  same stupid mistake in the future.

  For the record, I did contemplate sending just a fix for the cache,
  leaving the renaming patches for the upcoming merge window, but the
  type/variable naming ended up being pretty awful and would have made
  v6.18 an outlier stuck between the "old" names and the "new" names in
  v6.19. The renaming patches are also fairly mechanical/trivial and
  shouldn't pose much risk despite their size.

  TLDR; naming things may be hard, but if you mess it up bad things
  happen"

* tag 'selinux-pr-20251121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: rename the cred_security_struct variables to "crsec"
  selinux: move avdcache to per-task security struct
  selinux: rename task_security_struct to cred_security_struct

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 19:16:14 +0000 (11:16 -0800)]

Merge tag 'loongarch-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Use UAPI types in ptrace UAPI header to fix nolibc ptrace.

  Fix CPU name display, NUMA node parsing, kexec/kdump, PCI init and BPF
  trampoline"

* tag 'loongarch-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: BPF: Disable trampoline for kernel module function trace
  LoongArch: Don't panic if no valid cache info for PCI
  LoongArch: Mask all interrupts during kexec/kdump
  LoongArch: Fix NUMA node parsing with numa_memblks
  LoongArch: Consolidate CPU names in /proc/cpuinfo
  LoongArch: Use UAPI types in ptrace UAPI header

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 19:14:21 +0000 (11:14 -0800)]

Merge tag 'v6.18-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix potential memory leak in mount

- Add some missing read tracepoints

- Fix locking issue with directory leases

* tag 'v6.18-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Add the smb3_read_* tracepoints to SMB1
  cifs: fix memory leak in smb3_fs_context_parse_param error path
  smb: client: introduce close_cached_dir_locked()

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 19:09:57 +0000 (11:09 -0800)]

Merge tag 'io_uring-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fix from Jens Axboe:
"Just a single fix for a mixup of arguments for the skb_queue_splice()
call, in the io_uring timestamp retrieval code"

* tag 'io_uring-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/cmd_net: fix wrong argument types for skb_queue_splice()

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 18:59:35 +0000 (10:59 -0800)]

Merge tag 'block-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:
"NVMe pull request via Keith:

   - Admin queue use-after-free fix (Keith)

   - Target authentication fix (Alistar)

   - Multipath lockdeup fix (Shin'ichiro)

   - FC transport teardown fixes (Ewan)"

* tag 'block-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()
  nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
  nvme-multipath: fix lockdep WARN due to partition scan work
  nvmet-auth: update sc_c in target host hash calculation
  nvme: fix admin request_queue lifetime

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 18:53:23 +0000 (10:53 -0800)]

Merge tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Niklas Cassel:

- Add a missing refcount decrement in ata_scsi_dev_rescan() when
   the device or its queue is not running.

   In the case where the device is running, the recount is already
   decremented properly (Yihang Li)

- Generate the proper sense code for a Security locked device.

   There was a regression caused by a recent change of how sense
   data is generated for commands that did not provide any sense
   data. This broke system suspend for Security locked devices.

   Generate the sense data that the SCSI disk driver expects for a
   Security locked device so that system suspend works again (me)

- Set capacity to zero for a Security locked device.

   All I/O commands will be aborted by a Security locked device.
   Thus, the block layer disk partition scanning will result in
   a bunch of, for the user, confusing I/O errors in dmesg during
   boot.

   Since a Security locked device is unusable anyway, set the capacity
   to zero, to avoid the disk partition scanning during boot. We still
   create the block device in /dev such that the user may unlock the
   device using e.g. hdparm (me)

* tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: Set capacity to zero for a security locked drive
  ata: libata-scsi: Fix system suspend for a security locked drive
  ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 18:47:24 +0000 (10:47 -0800)]

Merge tag 'pinctrl-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

- Fix register naming in the Mediatek mt8189 driver

- Select REGMAP_MMIO for the Realtek RTD driver

- Fix the number of items in groups in the Toshiba Visconti driver

- Fix a memory leak in the Cirrus CS42L43 driver

- Fix a deadlock (!) in Qualcomm pinmux configuration

- Fix use of uninitialized memory and list initialization in the S32CC
   pin controller

* tag 'pinctrl-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  dt-bindings: pinctrl: xlnx,versal-pinctrl: Add missing unevaluatedProperties on '^conf' nodes
  pinctrl: s32cc: initialize gpio_pin_config::list after kmalloc()
  pinctrl: s32cc: fix uninitialized memory in s32_pinctrl_desc
  pinctrl: qcom: msm: Fix deadlock in pinmux configuration
  pinctrl: cirrus: Fix fwnode leak in cs42l43_pin_probe()
  dt-bindings: pinctrl: toshiba,visconti: Fix number of items in groups
  pinctrl: realtek: Select REGMAP_MMIO for RTD driver
  pinctrl: mediatek: mt8189: align register base names to dt-bindings ones
  pinctrl: mediatek: mt8196: align register base names to dt-bindings ones

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 18:43:58 +0000 (10:43 -0800)]

Merge tag 'gpio-fixes-for-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix a use-after-free bug in GPIO character device code

- update MAINTAINERS

* tag 'gpio-fixes-for-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: update my email address
gpio: cdev: make sure the cdev fd is still active before emitting events

commit | commitdiff | tree

Eric Biggers [Fri, 21 Nov 2025 03:34:31 +0000 (19:34 -0800)]

lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()

Fully initialize *ctx, including the buf field which sha256_init()
doesn't initialize, to avoid a KMSAN warning when comparing *ctx to
orig_ctx. This KMSAN warning slipped in while KMSAN was not working
reliably due to a stackdepot bug, which has now been fixed.

Fixes: 6733968be7cb ("lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251121033431.34406-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 17:55:55 +0000 (09:55 -0800)]

Merge tag 'drm-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"A range of small fixes across the board, the i915 display
  disambiguation is probably the biggest otherwise amdgpu and xe as
  usual with tegra, nouveau, radeon and a core atomic fix.

  Looks mostly normal.

  atomic:
   - Return error codes on failed blob creation for planes

  nouveau:
   - Fix memory leak

  tegra:
   - Fix device ref counting
   - Fix pid ref counting
   - Revert booting on Pixel C

  xe:
   - Fix out-of-bounds access with BIT()
   - Fix kunit test checking wrong condition
   - Drop duplicate kconfig select
   - Fix guc2host irq handler with MSI-X

  i915:
   - Wildcat Lake and Panther Lake detangled for display fixes

  amdgpu:
   - DTBCLK gating fix
   - EDID fetching retry improvements
   - HDMI HPD debounce filtering
   - DCN 2.0 cursor fix
   - DP MST PBN fix
   - VPE fix
   - GC 11 fix
   - PRT fix
   - MMIO remap page fix
   - SR-IOV fix

  radeon:
   - Fence deadlock fix"

* tag 'drm-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
  drm/amdgpu: Add sriov vf check for VCN per queue reset support.
  drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags
  drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag
  drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled
  drm/amd: Skip power ungate during suspend for VPE
  drm/plane: Fix create_in_format_blob() return value
  drm/xe/irq: Handle msix vector0 interrupt
  drm/xe: Remove duplicate DRM_EXEC selection from Kconfig
  drm/xe/kunit: Fix forcewake assertion in mocs test
  drm/xe: Prevent BIT() overflow when handling invalid prefetch region
  drm/radeon: delete radeon_fence_process in is_signaled, no deadlock
  drm/amd/display: Fix pbn to kbps Conversion
  drm/amd/display: Clear the CUR_ENABLE register on DCN20 on DPP5
  drm/amd/display: Add an HPD filter for HDMI
  drm/amd/display: Increase DPCD read retries
  drm/amd/display: Move sleep into each retry for retrieve_link_cap()
  drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched
  drm/i915/xe3: Restrict PTL intel_encoder_is_c10phy() to only PHY A
  drm/i915/display: Add definition for wcl as subplatform
  drm/pcids: Split PTL pciids group to make wcl subplatform
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 21 Nov 2025 17:29:02 +0000 (09:29 -0800)]

samples: work around glibc redefining some of our defines wrong

Apparently as of version 2.42, glibc headers define AT_RENAME_NOREPLACE
and some of the other flags for renameat2() and friends in <stdio.h>.

Which would all be fine, except for inexplicable reasons glibc decided
to define them _differently_ from the kernel definitions, which then
makes some of our sample code that includes both kernel headers and user
space headers unhappy, because the compiler will (correctly) complain
about redefining things.

Now, mixing kernel headers and user space headers is always a somewhat
iffy proposition due to namespacing issues, but it's kind of inevitable
in our sample and selftest code.  And this is just glibc being stupid.

Those defines come from the kernel, glibc is exposing the kernel
interfaces, and glibc shouldn't make up some random new expressions for
these values.

It's not like glibc headers changed the actual result values, but they
arbitrarily just decided to use a different expression to describe those
values.  The kernel just does

    #define AT_RENAME_NOREPLACE  0x0001

while glibc does

    # define RENAME_NOREPLACE (1 << 0)
    # define AT_RENAME_NOREPLACE RENAME_NOREPLACE

instead.  Same value in the end, but very different macro definition.

For absolutely no reason.

This has since been fixed in the glibc development tree, so eventually
we'll end up with the canonical expressions and no clashes.  But in the
meantime the broken headers are in the glibc-2.42 release and have made
it out into distributions.

Do a minimal work-around to make the samples build cleanly by just
undefining the affected macros in between the user space header include
and the kernel header includes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Thomas Bogendoerfer [Thu, 20 Nov 2025 12:10:29 +0000 (13:10 +0100)]

MIPS: kernel: Fix random segmentation faults

Commit 69896119dc9d ("MIPS: vdso: Switch to generic storage
implementation") switches to a generic vdso storage, which increases
the number of data pages from 1 to 4. But there is only one page
reserved, which causes segementation faults depending where the VDSO
area is randomized to. To fix this use the same size of reservation
and allocation of the VDSO data pages.

Fixes: 69896119dc9d ("MIPS: vdso: Switch to generic storage implementation")
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

commit | commitdiff | tree

Maciej W. Rozycki [Thu, 13 Nov 2025 05:21:10 +0000 (05:21 +0000)]

MIPS: mm: Prevent a TLB shutdown on initial uniquification

Depending on the particular CPU implementation a TLB shutdown may occur
if multiple matching entries are detected upon the execution of a TLBP
or the TLBWI/TLBWR instructions. Given that we don't know what entries
we have been handed we need to be very careful with the initial TLB
setup and avoid all these instructions.

Therefore read all the TLB entries one by one with the TLBR instruction,
bypassing the content addressing logic, and truncate any large pages in
place so as to avoid a case in the second step where an incoming entry
for a large page at a lower address overlaps with a replacement entry
chosen at another index. Then preinitialize the TLB using addresses
outside our usual unique range and avoiding clashes with any entries
received, before making the usual call to local_flush_tlb_all().

This fixes (at least) R4x00 cores if TLBP hits multiple matching TLB
entries (SGI IP22 PROM for examples sets up all TLBs to the same virtual
address).

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 35ad7e181541 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init")
Cc: stable@vger.kernel.org
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> # Boston I6400, M5150 sim
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

commit | commitdiff | tree

Dave Airlie [Fri, 21 Nov 2025 08:33:01 +0000 (18:33 +1000)]

Merge tag 'drm-xe-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix out-of-bounds access with BIT() (Shuicheng Lin)
- Fix kunit test checking wrong condition (Matt Roper)
- Drop duplicate kconfig select (Shuicheng Lin)
- Fix guc2host irq handler with MSI-X (Venkata Ramana Nayana)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/uadbrmftcud3wg32c6tje7mmfcr7wgmpnkzxwubk6fletahje2@coek2ciunkvz

commit | commitdiff | tree

Dave Airlie [Fri, 21 Nov 2025 08:19:33 +0000 (18:19 +1000)]

Merge tag 'amd-drm-fixes-6.18-2025-11-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.18-2025-11-20:

amdgpu:
- DTBCLK gating fix
- EDID fetching retry improvements
- HDMI HPD debounce filtering
- DCN 2.0 cursor fix
- DP MST PBN fix
- VPE fix
- GC 11 fix
- PRT fix
- MMIO remap page fix
- SR-IOV fix

radeon:
- Fence deadlock fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251120164110.1077973-1-alexander.deucher@amd.com

commit | commitdiff | tree

Dave Airlie [Fri, 21 Nov 2025 07:51:14 +0000 (17:51 +1000)]

Merge tag 'drm-misc-fixes-2025-11-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

atomic:
- Return error codes on failed blob creation for planes

nouveau:
- Fix memory leak

tegra:
- Fix device ref counting
- Fix pid ref counting
- Revert booting on Pixel C

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251120151308.GA589436@linux.fritz.box

commit | commitdiff | tree

Dave Airlie [Fri, 21 Nov 2025 07:10:00 +0000 (17:10 +1000)]

Merge tag 'drm-intel-fixes-2025-11-20' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Wildcat Lake and Panther Lake detangled for display fixes (Dnyaneshwar)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aR8jByCwjIThpnpk@intel.com

commit | commitdiff | tree

Paul Moore [Tue, 18 Nov 2025 22:27:58 +0000 (17:27 -0500)]

selinux: rename the cred_security_struct variables to "crsec"

Along with the renaming from task_security_struct to cred_security_struct,
rename the local variables to "crsec" from "tsec". This both fits with
existing conventions and helps distinguish between task and cred related
variables.

No functional changes.

Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

commit | commitdiff | tree

Stephen Smalley [Thu, 13 Nov 2025 20:23:14 +0000 (15:23 -0500)]

selinux: move avdcache to per-task security struct

The avdcache is meant to be per-task; move it to a new
task_security_struct that is duplicated per-task.

Cc: stable@vger.kernel.org
Fixes: 5d7ddc59b3d89b724a5aa8f30d0db94ff8d2d93f ("selinux: reduce path walk overhead")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line length fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>

commit | commitdiff | tree

Stephen Smalley [Thu, 13 Nov 2025 20:23:13 +0000 (15:23 -0500)]

selinux: rename task_security_struct to cred_security_struct

Before Linux had cred structures, the SELinux task_security_struct was
per-task and although the structure was switched to being per-cred
long ago, the name was never updated. This change renames it to
cred_security_struct to avoid confusion and pave the way for the
introduction of an actual per-task security structure for SELinux. No
functional change.

Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Nov 2025 19:04:37 +0000 (11:04 -0800)]

Merge tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fix from Tejun Heo:
"One low risk and obvious fix: scx_enable() was dereferencing an error
pointer on helper kthread creation failure. Fixed"

* tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_enable() crash on helper kthread creation failure

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Nov 2025 18:49:12 +0000 (10:49 -0800)]

Merge tag 'slab-for-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

- Fix mempool poisoning order>0 pages with CONFIG_HIGHMEM (Vlastimil Babka)

* tag 'slab-for-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/mempool: fix poisoning order>0 pages with HIGHMEM

commit | commitdiff | tree

Saket Kumar Bhaskar [Wed, 19 Nov 2025 10:37:22 +0000 (16:07 +0530)]

sched_ext: Fix scx_enable() crash on helper kthread creation failure

A crash was observed when the sched_ext selftests runner was
terminated with Ctrl+\ while test 15 was running:

NIP [c00000000028fa58] scx_enable.constprop.0+0x358/0x12b0
LR [c00000000028fa2c] scx_enable.constprop.0+0x32c/0x12b0
Call Trace:
scx_enable.constprop.0+0x32c/0x12b0 (unreliable)
bpf_struct_ops_link_create+0x18c/0x22c
__sys_bpf+0x23f8/0x3044
sys_bpf+0x2c/0x6c
system_call_exception+0x124/0x320
system_call_vectored_common+0x15c/0x2ec

kthread_run_worker() returns an ERR_PTR() on failure rather than NULL,
but the current code in scx_alloc_and_add_sched() only checks for a NULL
helper. Incase of failure on SIGQUIT, the error is not handled in
scx_alloc_and_add_sched() and scx_enable() ends up dereferencing an
error pointer.

Error handling is fixed in scx_alloc_and_add_sched() to propagate
PTR_ERR() into ret, so that scx_enable() jumps to the existing error
path, avoiding random dereference on failure.

Fixes: bff3b5aec1b7 ("sched_ext: Move disable machinery into scx_sched")
Cc: stable@vger.kernel.org # v6.16+
Reported-and-tested-by: Samir Mulani <samir@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Jens Axboe [Thu, 20 Nov 2025 18:40:15 +0000 (11:40 -0700)]

io_uring/cmd_net: fix wrong argument types for skb_queue_splice()

If timestamp retriving needs to be retried and the local list of
SKB's already has entries, then it's spliced back into the socket
queue. However, the arguments for the splice helper are transposed,
causing exactly the wrong direction of splicing into the on-stack
list. Fix that up.

Cc: stable@vger.kernel.org
Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-462435176@google.com>
Fixes: 9e4ed359b8ef ("io_uring/netcmd: add tx timestamping cmd support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Nov 2025 17:46:52 +0000 (09:46 -0800)]

Merge tag 'pm-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix a regression introduced during the 6.16 development cycle that may
  cause runtime PM to be enabled by mistake for devices that do not
  support it (which may lead to some serious trouble) if there is a
  system wakeup event during the "late suspend" phase of system suspend"

* tag 'pm-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: core: Fix runtime PM enabling in device_resume_early()

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Nov 2025 17:44:27 +0000 (09:44 -0800)]

Merge tag 'acpi-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"This fixes EINJV2 support introduced during the 6.17 cycle by
  unbreaking the initialization broken by a previous attempted fix,
  adding sanity checks for data coming from the platform firmware, and
  updating the code to handle injecting legacy error types on an EINJV2
  capable systems properly (Tony Luck)"

* tag 'acpi-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: APEI: EINJ: Fix EINJV2 initialization and injection

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Nov 2025 17:39:34 +0000 (09:39 -0800)]

Merge tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
"This one has lots of new HW entries which adds to the size in diffstat
  but the individual changes are simple.

  Fixes

   - acer-wmi: Ignore backlight event

   - alienware-wmi-wmax: Fix quirk match table order & drop redundant
     entries

   - amd/pmc:
      - Add Xbox Ally to spurious 8042 quirk list
      - Quirk list Lenovo Legion Go 2 NVMe resume

   - msi-wmi-platform:
      - Correct GUID to uppercase
      - GUID is uncleverly copy-pasted from an example so add a DMI
        whitelist

   - intel/speed_select_if: PCIBIOS_* return code conversion

   - intel-uncore-freq & ISST: Fix kernel doc warnings

  New HW support

   - alienware-wmi-wmax:
      - Alienware 16 Aurora support
      - Alienware M support
      - Alienware X support
      - Dell G support

   - amd/pmc:
      - ROG Xbox Ally (non-X) support

   - huaway-wmi: HONOR MagicBoox X16/X14 PrintScreen & YOYO keys

   - hp-wmi:
      - Omen 16-wf1xxx fan support
      - Omen MAX 16-ah0xx fan + thermal profile support
      - Victus 16-r0 and 16-s0 fan + thermal profile support

   - intel/hid: Intel Nova Lake support

   - intel-uncore-freq:
      - Intel Panther Lake support
      - Intel Wildcat Lake support
      - Intel Nova Lake support"

* tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (21 commits)
  platform/x86: intel-uncore-freq: fix all header kernel-doc warnings
  platform/x86: acer-wmi: Ignore backlight event
  platform/x86/intel/speed_select_if: Convert PCIBIOS_* return codes to errnos
  platform/x86/intel/hid: Add Nova Lake support
  platform/x86: alienware-wmi-wmax: Add AWCC support to Alienware 16 Aurora
  platform/x86: hp-wmi: Add Omen MAX 16-ah0xx fan support and thermal profile
  platform/x86: msi-wmi-platform: Fix typo in WMI GUID
  platform/x86: msi-wmi-platform: Only load on MSI devices
  platform/x86/amd: pmc: Add Lenovo Legion Go 2 to pmc quirk list
  platform/x86/amd/pmc: Add spurious_8042 to Xbox Ally
  platform/x86/amd/pmc: Add support for Van Gogh SoC
  platform/x86: alienware-wmi-wmax: Add support for the whole "G" family
  platform/x86: alienware-wmi-wmax: Add support for the whole "X" family
  platform/x86: alienware-wmi-wmax: Add support for the whole "M" family
  platform/x86: alienware-wmi-wmax: Drop redundant DMI entries
  platform/x86: alienware-wmi-wmax: Fix "Alienware m16 R1 AMD" quirk order
  platform/x86: ISST: isst_if.h: fix all kernel-doc warnings
  platform/x86: intel-uncore-freq: Add additional client processors
  platform/x86: hp-wmi: Add Omen 16-wf1xxx fan support
  platform/x86: huawei-wmi: add keys for HONOR models
  ...

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Nov 2025 16:52:07 +0000 (08:52 -0800)]

Merge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from IPsec and wireless.

  Previous releases - regressions:

   - prevent NULL deref in generic_hwtstamp_ioctl_lower(),
     newer APIs don't populate all the pointers in the request

   - phylink: add missing supported link modes for the fixed-link

   - mptcp: fix false positive warning in mptcp_pm_nl_rm_addr

  Previous releases - always broken:

   - openvswitch: remove never-working support for setting NSH fields

   - xfrm: number of fixes for error paths of xfrm_state creation/
     modification/deletion

   - xfrm: fixes for offload
      - fix the determination of the protocol of the inner packet
      - don't push locally generated packets directly to L2 tunnel
        mode offloading, they still need processing from the standard
        xfrm path

   - mptcp: fix a couple of corner cases in fallback and fastclose
     handling

   - wifi: rtw89: hw_scan: prevent connections from getting stuck,
     work around apparent bug in FW by tweaking messages we send

   - af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait

   - veth: more robust handing of race to avoid txq getting stuck

   - eth: ps3_gelic_net: handle skb allocation failures"

* tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  vsock: Ignore signal/timeout on connect() if already established
  be2net: pass wrb_params in case of OS2BMC
  l2tp: reset skb control buffer on xmit
  net: dsa: microchip: lan937x: Fix RGMII delay tuning
  selftests: mptcp: add a check for 'add_addr_accepted'
  mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
  selftests: mptcp: join: userspace: longer timeout
  selftests: mptcp: join: endpoints: longer timeout
  selftests: mptcp: join: fastclose: remove flaky marks
  mptcp: fix duplicate reset on fastclose
  mptcp: decouple mptcp fastclose from tcp close
  mptcp: do not fallback when OoO is present
  mptcp: fix premature close in case of fallback
  mptcp: avoid unneeded subflow-level drops
  mptcp: fix ack generation for fallback msk
  wifi: rtw89: hw_scan: Don't let the operating channel be last
  net: phylink: add missing supported link modes for the fixed-link
  selftest: af_unix: Add test for SO_PEEK_OFF.
  af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic().
  net/mlx5: Clean up only new IRQ glue on request_irq() failure
  ...

commit | commitdiff | tree

Malaya Kumar Rout [Thu, 20 Nov 2025 15:02:13 +0000 (20:32 +0530)]

timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths

tk_aux_sysfs_init() returns immediately on error during the auxiliary clock
initialization loop without cleaning up previously allocated kobjects and
sysfs groups.

If kobject_create_and_add() or sysfs_create_group() fails during loop
iteration, the parent kobjects (tko and auxo) and any previously created
child kobjects are leaked.

Fix this by adding proper error handling with goto labels to ensure all
allocated resources are cleaned up on failure. kobject_put() on the
parent kobjects will handle cleanup of their children.

Fixes: 7b95663a3d96 ("timekeeping: Provide interface to control auxiliary clocks")
Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251120150213.246777-1-mrout@redhat.com

commit | commitdiff | tree

Michal Luczaj [Wed, 19 Nov 2025 14:02:59 +0000 (15:02 +0100)]

vsock: Ignore signal/timeout on connect() if already established

During connect(), acting on a signal/timeout by disconnecting an already
established socket leads to several issues:

1. connect() invoking vsock_transport_cancel_pkt() ->
   virtio_transport_purge_skbs() may race with sendmsg() invoking
   virtio_transport_get_credit(). This results in a permanently elevated
   `vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling.

2. connect() resetting a connected socket's state may race with socket
   being placed in a sockmap. A disconnected socket remaining in a sockmap
   breaks sockmap's assumptions. And gives rise to WARNs.

3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a
   transport change/drop after TCP_ESTABLISHED. Which poses a problem for
   any simultaneous sendmsg() or connect() and may result in a
   use-after-free/null-ptr-deref.

Do not disconnect socket on signal/timeout. Keep the logic for unconnected
sockets: they don't linger, can't be placed in a sockmap, are rejected by
sendmsg().

[1]: https://lore.kernel.org/netdev/e07fd95c-9a38-4eea-9638-133e38c2ec9b@rbox.co/
[2]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/
[3]: https://lore.kernel.org/netdev/60f1b7db-3099-4f6a-875e-af9f6ef194f6@rbox.co/

Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251119-vsock-interrupted-connect-v2-1-70734cf1233f@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Andrey Vatoropin [Wed, 19 Nov 2025 10:51:12 +0000 (10:51 +0000)]

be2net: pass wrb_params in case of OS2BMC

be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL
at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL
pointer when processing a workaround for specific packet, as commit
bc0c3405abbb ("be2net: fix a Tx stall bug caused by a specific ipv6
packet") states.

The correct way would be to pass the wrb_params from be_xmit().

Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.")
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jens Axboe [Thu, 20 Nov 2025 15:39:17 +0000 (08:39 -0700)]

Merge tag 'nvme-6.18-2025-11-20' of git://git.infradead.org/nvme into block-6.18

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.18

- Admin queue use-after-free fix (Keith)
- Target authentication fix (Alistar)
- Multipath lockdeup fix (Shin'ichiro)
- FC transport teardown fixes (Ewan)"

* tag 'nvme-6.18-2025-11-20' of git://git.infradead.org/nvme:
  nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()
  nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
  nvme-multipath: fix lockdep WARN due to partition scan work
  nvmet-auth: update sc_c in target host hash calculation
  nvme: fix admin request_queue lifetime

commit | commitdiff | tree

Niklas Cassel [Wed, 19 Nov 2025 14:13:15 +0000 (15:13 +0100)]

ata: libata-core: Set capacity to zero for a security locked drive

For Security locked drives (drives that have Security enabled, and have
not been Security unlocked by boot firmware), the automatic partition
scanning will result in the user being spammed with errors such as:

  ata5.00: failed command: READ DMA
  ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 7 dma 4096 in
           res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
  ata5.00: status: { DRDY ERR }
  ata5.00: error: { ABRT }
  sd 4:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
  sd 4:0:0:0: [sda] tag#7 Sense Key : Aborted Command [current]
  sd 4:0:0:0: [sda] tag#7 Add. Sense: No additional sense information

during boot, because most commands except for IDENTIFY will be aborted by
a Security locked drive.

For a Security locked drive, set capacity to zero, so that no automatic
partition scanning will happen.

If the user later unlocks the drive using e.g. hdparm, the close() by the
user space application should trigger a revalidation of the drive.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

commit | commitdiff | tree

Niklas Cassel [Wed, 19 Nov 2025 14:13:14 +0000 (15:13 +0100)]

ata: libata-scsi: Fix system suspend for a security locked drive

Commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status
handling") fixed ata_to_sense_error() to properly generate sense key
ABORTED COMMAND (without any additional sense code), instead of the
previous bogus sense key ILLEGAL REQUEST with the additional sense code
UNALIGNED WRITE COMMAND, for a failed command.

However, this broke suspend for Security locked drives (drives that have
Security enabled, and have not been Security unlocked by boot firmware).

The reason for this is that the SCSI disk driver, for the Synchronize
Cache command only, treats any sense data with sense key ILLEGAL REQUEST
as a successful command (regardless of ASC / ASCQ).

After commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error()
status handling") the code that treats any sense data with sense key
ILLEGAL REQUEST as a successful command is no longer applicable, so the
command fails, which causes the system suspend to be aborted:

  sd 1:0:0:0: PM: dpm_run_callback(): scsi_bus_suspend returns -5
  sd 1:0:0:0: PM: failed to suspend async: error -5
  PM: Some devices failed to suspend, or early wake event detected

To make suspend work once again, for a Security locked device only,
return sense data LOGICAL UNIT ACCESS NOT AUTHORIZED, the actual sense
data which a real SCSI device would have returned if locked.
The SCSI disk driver treats this sense data as a successful command.

Cc: stable@vger.kernel.org
Reported-by: Ilia Baryshnikov <qwelias@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220704
Fixes: cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

commit | commitdiff | tree

Yihang Li [Thu, 20 Nov 2025 03:50:23 +0000 (11:50 +0800)]

ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()

Call scsi_device_put() in ata_scsi_dev_rescan() if the device or its
queue are not running.

Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
Cc: stable@vger.kernel.org
Signed-off-by: Yihang Li <liyihang9@h-partners.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

commit | commitdiff | tree

Paolo Abeni [Thu, 20 Nov 2025 12:02:00 +0000 (13:02 +0100)]

Merge tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
wireless-2025-11-20

A single fix for scanning on some rtw89 devices.

* tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: rtw89: hw_scan: Don't let the operating channel be last
====================

Link: https://patch.msgid.link/20251120085433.8601-3-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

David Bauer [Tue, 18 Nov 2025 00:16:18 +0000 (01:16 +0100)]

l2tp: reset skb control buffer on xmit

The L2TP stack did not reset the skb control buffer before sending the
encapsulated package.

In a setup with an ath10k radio and batman-adv over an L2TP tunnel
massive fragmentations happen sporadically if the L2TP tunnel is
established over IPv4.

L2TP might reset some of the fields in the IP control buffer, but L2TP
assumes the type of the control buffer to be of an IPv4 packet.

In case the L2TP interface is used as a batadv hardif or the packet is
an IPv6 packet, this assumption breaks.

Clear the entire control buffer to avoid such mishaps altogether.

Fixes: f77ae9390438 ("[PPPOL2TP]: Reset meta-data in xmit function")
Signed-off-by: David Bauer <mail@david-bauer.net>
Link: https://patch.msgid.link/20251118001619.242107-1-mail@david-bauer.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Oleksij Rempel [Fri, 14 Nov 2025 09:09:51 +0000 (10:09 +0100)]

net: dsa: microchip: lan937x: Fix RGMII delay tuning

Correct RGMII delay application logic in lan937x_set_tune_adj().

The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the
new delay value. This caused the new value to be bitwise-OR'd with the
existing PORT_TUNE_ADJ field instead of replacing it.

For example, when setting the RGMII 2 TX delay on port 4, the
intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was
incorrectly OR'd with the default 0x1B (from register value 0xDA3),
leaving the delay at the wrong setting.

This patch adds the missing mask to clear the field, ensuring the
correct delay value is written. Physical measurements on the RGMII TX
lines confirm the fix, showing the delay changing from ~1ns (before
change) to ~2ns.

While testing on i.MX 8MP showed this was within the platform's timing
tolerance, it did not match the intended hardware-characterized value.

Fixes: b19ac41faa3f ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Darrick J. Wong [Wed, 12 Nov 2025 16:35:18 +0000 (08:35 -0800)]

xfs: fix out of bounds memory read error in symlink repair

xfs/286 produced this report on my test fleet:

==================================================================
BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110

Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-#184):
  memcpy_orig+0x54/0x110
  xrep_symlink_salvage_inline+0xb3/0xf0 [xfs]
  xrep_symlink_salvage+0x100/0x110 [xfs]
  xrep_symlink+0x2e/0x80 [xfs]
  xrep_attempt+0x61/0x1f0 [xfs]
  xfs_scrub_metadata+0x34f/0x5c0 [xfs]
  xfs_ioc_scrubv_metadata+0x387/0x560 [xfs]
  xfs_file_ioctl+0xe23/0x10e0 [xfs]
  __x64_sys_ioctl+0x76/0xc0
  do_syscall_64+0x4e/0x1e0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

kfence-#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128

allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago):
  xfs_init_local_fork+0x79/0xe0 [xfs]
  xfs_iformat_local+0xa4/0x170 [xfs]
  xfs_iformat_data_fork+0x148/0x180 [xfs]
  xfs_inode_from_disk+0x2cd/0x480 [xfs]
  xfs_iget+0x450/0xd60 [xfs]
  xfs_bulkstat_one_int+0x6b/0x510 [xfs]
  xfs_bulkstat_iwalk+0x1e/0x30 [xfs]
  xfs_iwalk_ag_recs+0xdf/0x150 [xfs]
  xfs_iwalk_run_callbacks+0xb9/0x190 [xfs]
  xfs_iwalk_ag+0x1dc/0x2f0 [xfs]
  xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs]
  xfs_iwalk+0xa4/0xd0 [xfs]
  xfs_bulkstat+0xfa/0x170 [xfs]
  xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs]
  xfs_file_ioctl+0xbf2/0x10e0 [xfs]
  __x64_sys_ioctl+0x76/0xc0
  do_syscall_64+0x4e/0x1e0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy)  3d744dd94e92690f00a04398d2bd8631dcef1954
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014
==================================================================

On further analysis, I realized that the second parameter to min() is
not correct.  xfs_ifork::if_bytes is the size of the xfs_ifork::if_data
buffer.  if_bytes can be smaller than the data fork size because:

(a) the forkoff code tries to keep the data area as large as possible
(b) for symbolic links, if_bytes is the ondisk file size + 1
(c) forkoff is always a multiple of 8.

Case in point: for a single-byte symlink target, forkoff will be
8 but the buffer will only be 2 bytes long.

In other words, the logic here is wrong and we walk off the end of the
incore buffer.  Fix that.

Cc: stable@vger.kernel.org # v6.10
Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

commit | commitdiff | tree

Dapeng Mi [Wed, 12 Nov 2025 08:05:26 +0000 (16:05 +0800)]

perf: Fix 0 count issue of cpu-clock

Currently cpu-clock event always returns 0 count, e.g.,

perf stat -e cpu-clock -- sleep 1

Performance counter stats for 'sleep 1':
0 cpu-clock # 0.000 CPUs utilized
1.002308394 seconds time elapsed

The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle
error of some clock events")' adds PERF_EF_UPDATE flag check before
calling cpu_clock_event_update() to update the count, however the
PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in
counting mode (pmu->dev() -> cpu_clock_event_del() ->
cpu_clock_event_stop()). This leads to the cpu-clock event count is
never updated.

To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event
just like what task-clock does.

Fixes: bc4394e5e79c ("perf: Fix the throttle error of some clock events")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com

commit | commitdiff | tree

David Howells [Fri, 24 Oct 2025 15:33:43 +0000 (16:33 +0100)]

cifs: Add the smb3_read_* tracepoints to SMB1

Add the smb3_read_* tracepoints to SMB1's cifs_async_readv() and
cifs_readv_callback().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

commit | commitdiff | tree

Shaurya Rane [Tue, 18 Nov 2025 15:02:57 +0000 (20:32 +0530)]

cifs: fix memory leak in smb3_fs_context_parse_param error path

Add proper cleanup of ctx->source and fc->source to the
cifs_parse_mount_err error handler. This ensures that memory allocated
for the source strings is correctly freed on all error paths, matching
the cleanup already performed in the success path by
smb3_cleanup_fs_context_contents().
Pointers are also set to NULL after freeing to prevent potential
double-free issues.

This change fixes a memory leak originally detected by syzbot. The
leak occurred when processing Opt_source mount options if an error
happened after ctx->source and fc->source were successfully
allocated but before the function completed.

The specific leak sequence was:
1. ctx->source = smb3_fs_context_fullpath(ctx, '/') allocates memory
2. fc->source = kstrdup(ctx->source, GFP_KERNEL) allocates more memory
3. A subsequent error jumps to cifs_parse_mount_err
4. The old error handler freed passwords but not the source strings,
causing the memory to leak.

This issue was not addressed by commit e8c73eb7db0a ("cifs: client:
fix memory leak in smb3_fs_context_parse_param"), which only fixed
leaks from repeated fsconfig() calls but not this error path.

Patch updated with minor change suggested by kernel test robot

Reported-by: syzbot+87be6809ed9bf6d718e3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=87be6809ed9bf6d718e3
Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Signed-off-by: Steve French <stfrench@microsoft.com>

commit | commitdiff | tree

Henrique Carvalho [Thu, 13 Nov 2025 18:09:13 +0000 (15:09 -0300)]

smb: client: introduce close_cached_dir_locked()

Replace close_cached_dir() calls under cfid_list_lock with a new
close_cached_dir_locked() variant that uses kref_put() instead of
kref_put_lock() to avoid recursive locking when dropping references.

While the existing code works if the refcount >= 2 invariant holds,
this area has proven error-prone. Make deadlocks impossible and WARN
on invariant violations.

Cc: stable@vger.kernel.org
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

commit | commitdiff | tree

Johannes Berg [Thu, 20 Nov 2025 08:43:24 +0000 (09:43 +0100)]

Merge tag 'rtw-2025-11-20' of https://github.com/pkshih/rtw

Ping-Ke Shih says:
==================
rtw patches for v6.18-rc7

Fix firmware goes wrong and causes device unusable after scanning. This
issue presents under certain regulatory domain reported from end users.
==================

Link: https://patch.msgid.link/8217bee0-96c4-44c1-9593-2e9ca12eccc5@RTKEXHMBS03.realtek.com.tw
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

commit | commitdiff | tree

Vincent Li [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]

LoongArch: BPF: Disable trampoline for kernel module function trace

The current LoongArch BPF trampoline implementation is incompatible
with tracing functions in kernel modules. This causes several severe
and user-visible problems:

* The `bpf_selftests/module_attach` test fails consistently.
* Kernel lockup when a BPF program is attached to a module function [1].
* Critical kernel modules like WireGuard experience traffic disruption
when their functions are traced with fentry [2].

Given the severity and the potential for other unknown side-effects, it
is safest to disable the feature entirely for now. This patch prevents
the BPF subsystem from allowing trampoline attachments to kernel module
functions on LoongArch.

This is a temporary mitigation until the core issues in the trampoline
code for kernel module handling can be identified and fixed.

[root@fedora bpf]# ./test_progs -a module_attach -v
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -524
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.

[1]: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C13FMT58oqQ@mail.gmail.com/
[2]: https://lore.kernel.org/loongarch/CAK3+h2wYcpc+OwdLDUBvg2rF9rvvyc5amfHT-KcFaK93uoELPg@mail.gmail.com/

Cc: stable@vger.kernel.org
Fixes: f9b6b41f0cf3 ("LoongArch: BPF: Add basic bpf trampoline support")
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

commit | commitdiff | tree

Huacai Chen [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]

LoongArch: Don't panic if no valid cache info for PCI

If there is no valid cache info detected (may happen in virtual machine)
for pci_dfl_cache_line_size, kernel shouldn't panic. Because in the PCI
core it will be evaluated to (L1_CACHE_BYTES >> 2).

Cc: <stable@vger.kernel.org>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

commit | commitdiff | tree

Huacai Chen [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]

LoongArch: Mask all interrupts during kexec/kdump

If the default state of the interrupt controllers in the first kernel
don't mask any interrupts, it may cause the second kernel to potentially
receive interrupts (which were previously allocated by the first kernel)
immediately after a CPU becomes online during its boot process. These
interrupts cannot be properly routed, leading to bad IRQ issues.

This patch calls machine_kexec_mask_interrupts() to mask all interrupts
during the kexec/kdump process.

Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

commit | commitdiff | tree

Bibo Mao [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]

LoongArch: Fix NUMA node parsing with numa_memblks

On physical machine, NUMA node id comes from high bit 44:48 of physical
address. However it is not true on virt machine. With general method, it
comes from ACPI SRAT table.

Here the common function numa_memblks_init() is used to parse NUMA node
information with numa_memblks.

Cc: <stable@vger.kernel.org>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

commit | commitdiff | tree

Huacai Chen [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]

LoongArch: Consolidate CPU names in /proc/cpuinfo

Some processors have no IOCSR.VENDOR and IOCSR.CPUNAME, some processors
have these registers but there is no valid information.

Consolidate CPU names in /proc/cpuinfo:
1. Add "PRID" to display the PRID & Core-Name;
2. Let "Model Name" display "Unknown" if no valid name.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

commit | commitdiff | tree

Thomas Weißschuh [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]

LoongArch: Use UAPI types in ptrace UAPI header

The kernel UAPI headers already contain fixed-width integer types, there
is no need to rely on the libc types. There may not be a libc available
or the libc may not provides the <stdint.h>, like for example on nolibc.

This also aligns the header with the rest of the LoongArch UAPI headers.

Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

commit | commitdiff | tree

Jakub Kicinski [Thu, 20 Nov 2025 04:10:53 +0000 (20:10 -0800)]

Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-11-18 (idpf, ice)

This series contains updates to idpf and ice drivers.

Emil adds a check for NULL vport_config during removal to avoid NULL
pointer dereference in idpf.

Grzegorz fixes PTP teardown paths to account for some missed cleanups
for ice driver.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix PTP cleanup on driver removal in error path
idpf: fix possible vport_config NULL pointer deref in remove
====================

Link: https://patch.msgid.link/20251118235207.2165495-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

A mirror of Linus' kernel repository

RSS Atom