Filipe Manana [Mon, 13 Oct 2025 17:19:46 +0000 (18:19 +0100)]
btrfs: fix parameter documentation for btrfs_reserve_data_bytes()
We don't have a fs_info argument anymore since commit 5d39fda880be
("btrfs: pass btrfs_space_info to btrfs_reserve_data_bytes()"), it
was replaced by a space_info argument. So update the documentation.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sun, 12 Oct 2025 16:48:27 +0000 (17:48 +0100)]
btrfs: avoid repeated computations in btrfs_mark_ordered_io_finished()
We're computing a few values several times:
1) The current ordered extent's end offset inside the while loop, we have
computed it and stored it in the 'entry_end' variable but then we
compute it again later as the first argument to the min() macro;
2) The end file offset, open coded 3 times;
3) The current length (stored in variable 'len') computed 2 times, one
inside an assertion and the other when assigning to the 'len' variable.
So use existing variables and add new ones to prevent repeating these
expressions and reduce the source code.
We were also subtracting one from the result of min() macro call and
then adding 1 back in the next line, making both operations pointless.
So just remove the decrement and increment by 1.
This also reduces very slightly the object code.
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 1916576 161679 15592 2093847 1ff317 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 1916556 161679 15592 2093827 1ff303 fs/btrfs/btrfs.ko
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sun, 12 Oct 2025 09:39:08 +0000 (10:39 +0100)]
btrfs: avoid multiple i_size rounding in btrfs_truncate()
We have the inode locked so no one can concurrently change its i_size and
neither do we change it ourselves, so there's no point in keep rounding
it in the while loop and setting it up in the control structure. That only
causes confusion when reading the code.
So move all the i_size setup and rounding out of the loop and assert the
inode is locked.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sun, 12 Oct 2025 09:26:40 +0000 (10:26 +0100)]
btrfs: consistently round up or down i_size in btrfs_truncate()
We're using different ways to round down the i_size by sector size, one
with a bitwise and with a negated mask and another with ALIGN_DOWN(), and
using ALIGN() to round up.
Replace these uses with the round_down() and round_up() macros which have
have names that make it clear the direction of the rounding (unlike the
ALIGN() macro) and getting rid of the bitwise and, negated mask and local
variable for the mask.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Sun, 12 Oct 2025 09:43:02 +0000 (10:43 +0100)]
btrfs: add unlikely to unexpected error case in extent_writepages()
We don't expect to hit errors and log the error message, so add the
unlikely annotation to make it clear and to hint the compiler that it may
reorganize code to be more efficient.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 10 Oct 2025 16:17:10 +0000 (17:17 +0100)]
btrfs: split assertion into two in extent_writepage_io()
If the assertion fails we don't get to know which of the two expressions
failed and neither the values used in each expression.
So split the assertion into two, each for a single expression, so that
if any is triggered we see a line number reported in a stack trace that
points to which expression failed. Also make the assertions use the
verbose mode to print the values involved in the computations.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 10 Oct 2025 15:50:02 +0000 (16:50 +0100)]
btrfs: truncate ordered extent when skipping writeback past i_size
While running test case btrfs/192 from fstests with support for large
folios (needs CONFIG_BTRFS_EXPERIMENTAL=y) I ended up getting very sporadic
btrfs check failures reporting that csum items were missing. Looking into
the issue it turned out that btrfs check searches for csum items of a file
extent item with a range that spans beyond the i_size of a file and we
don't have any, because the kernel's writeback code skips submitting bios
for ranges beyond eof. It's not expected however to find a file extent item
that crosses the rounded up (by the sector size) i_size value, but there is
a short time window where we can end up with a transaction commit leaving
this small inconsistency between the i_size and the last file extent item.
Example btrfs check output when this happens:
$ btrfs check /dev/sdc
Opening filesystem to check...
Checking filesystem on /dev/sdc
UUID: 69642c61-5efb-4367-aa31-cdfd4067f713
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
[4/8] checking free space tree
[5/8] checking fs roots
root 5 inode 332 errors 1000, some csum missing
ERROR: errors found in fs roots
(...)
Looking at a tree dump of the fs tree (root 5) for inode 332 we have:
We can see that the file extent item for file offset 589824 has a length of
64K and its number of bytes is 64K. Looking at the inode item we see that
its i_size is 610969 bytes which falls within the range of that file extent
item [589824, 655360[.
We see that the csum item number 15 covers the first 24K of the file extent
item - it ends at offset 21770240 and the extent's disk_bytenr is 21745664,
so we have:
We see that the next csum item (number 16) is completely outside the range,
so the remaining 40K of the extent doesn't have csum items in the tree.
If we round up the i_size to the sector size, we get:
round_up(610969, 4096) = 614400
If we subtract from that the file offset for the extent item we get:
614400 - 589824 = 24K
So the missing 40K corresponds to the end of the file extent item's range
minus the rounded up i_size:
655360 - 614400 = 40K
Normally we don't expect a file extent item to span over the rounded up
i_size of an inode, since when truncating, doing hole punching and other
operations that trim a file extent item, the number of bytes is adjusted.
There is however a short time window where the kernel can end up,
temporarily,persisting an inode with an i_size that falls in the middle of
the last file extent item and the file extent item was not yet trimmed (its
number of bytes reduced so that it doesn't cross i_size rounded up by the
sector size).
The steps (in the kernel) that lead to such scenario are the following:
1) We have inode I as an empty file, no allocated extents, i_size is 0;
2) A buffered write is done for file range [589824, 655360[ (length of
64K) and the i_size is updated to 655360. Note that we got a single
large folio for the range (64K);
3) A truncate operation starts that reduces the inode's i_size down to
610969 bytes. The truncate sets the inode's new i_size at
btrfs_setsize() by calling truncate_setsize() and before calling
btrfs_truncate();
4) At btrfs_truncate() we trigger writeback for the range starting at
610304 (which is the new i_size rounded down to the sector size) and
ending at (u64)-1;
5) During the writeback, at extent_write_cache_pages(), we get from the
call to filemap_get_folios_tag(), the 64K folio that starts at file
offset 589824 since it contains the start offset of the writeback
range (610304);
6) At writepage_delalloc() we find the whole range of the folio is dirty
and therefore we run delalloc for that 64K range ([589824, 655360[),
reserving a 64K extent, creating an ordered extent, etc;
7) At extent_writepage_io() we submit IO only for subrange [589824, 614400[
because the inode's i_size is 610969 bytes (rounded up by sector size
is 614400). There, in the while loop we intentionally skip IO beyond
i_size to avoid any unnecessay work and just call
btrfs_mark_ordered_io_finished() for the range [614400, 655360[ (which
has a 40K length);
8) Once the IO finishes we finish the ordered extent by ending up at
btrfs_finish_one_ordered(), join transaction N, insert a file extent
item in the inode's subvolume tree for file offset 589824 with a number
of bytes of 64K, and update the inode's delayed inode item or directly
the inode item with a call to btrfs_update_inode_fallback(), which
results in storing the new i_size of 610969 bytes;
9) Transaction N is committed either by the transaction kthread or some
other task committed it (in response to a sync or fsync for example).
At this point we have inode I persisted with an i_size of 610969 bytes
and file extent item that starts at file offset 589824 and has a number
of bytes of 64K, ending at an offset of 655360 which is beyond the
i_size rounded up to the sector size (614400).
--> So after a crash or power failure here, the btrfs check program
reports that error about missing checksum items for this inode, as
it tries to lookup for checksums covering the whole range of the
extent;
10) Only after transaction N is committed that at btrfs_truncate() the
call to btrfs_start_transaction() starts a new transaction, N + 1,
instead of joining transaction N. And it's with transaction N + 1 that
it calls btrfs_truncate_inode_items() which updates the file extent
item at file offset 589824 to reduce its number of bytes from 64K down
to 24K, so that the file extent item's range ends at the i_size
rounded up to the sector size (614400 bytes).
Fix this by truncating the ordered extent at extent_writepage_io() when we
skip writeback because the current offset in the folio is beyond i_size.
This ensures we don't ever persist a file extent item with a number of
bytes beyond the rounded up (by sector size) value of the i_size.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Sun, 12 Oct 2025 23:52:05 +0000 (10:22 +1030)]
btrfs: implement remove_bdev and shutdown super operation callbacks
For the ->remove_bdev() callback, btrfs will:
- Mark the target device as missing
- Go degraded if the fs can afford it
- Return error other wise
Thus falls back to the shutdown callback
For the ->shutdown callback, btrfs will:
- Set the SHUTDOWN flag
Which will reject all new incoming operations, and make all writeback
to fail.
The behavior is the same as the NOLOGFLUSH behavior.
To support the lookup from bdev to a btrfs_device,
btrfs_dev_lookup_args is enhanced to have a new @devt member.
If set, we should be able to use that @devt member to uniquely locating a
btrfs device.
I know the shutdown can be a little overkilled, if one has a RAID1
metadata and RAID0 data, in that case one can still read data with 50%
chance to got some good data.
But a filesystem returning -EIO for half of the time is not really
considered usable.
Further it can also be as bad as the only device went missing for a single
device btrfs.
So here we go safe other than sorry when handling missing device.
And the remove_bdev callback will be hidden behind experimental features
for now, the reasons are:
- There are not enough btrfs specific bdev removal test cases
The existing test cases are all removing the only device, thus only
exercises the ->shutdown() behavior.
- Not yet determined what's the expected behavior
Although the current auto-degrade behavior is no worse than the old
behavior, it may not always be what the end users want.
Before there is a concrete interface, better hide the new feature
from end users.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <asj@kernel.org> Tested-by: Anand Jain <asj@kernel.org> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Sun, 12 Oct 2025 23:52:04 +0000 (10:22 +1030)]
btrfs: implement shutdown ioctl
The shutdown ioctl should follow the XFS one, which use magic number 'X',
and ioctl number 125, with a uint32 as flags.
For now btrfs don't distinguish DEFAULT and LOGFLUSH flags (just like
f2fs), both will freeze the fs first (implies committing the current
transaction), setting the SHUTDOWN flag and finally thaw the fs.
For NOLOGFLUSH flag, the freeze/thaw part is skipped thus the current
transaction is aborted.
The new shutdown ioctl is hidden behind experimental features for more
testing.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <asj@kernel.org> Tested-by: Anand Jain <asj@kernel.org> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Sun, 12 Oct 2025 23:52:03 +0000 (10:22 +1030)]
btrfs: introduce a new shutdown state
A new fs state EMERGENCY_SHUTDOWN is introduced, which is btrfs'
equivalent of XFS_IOC_GOINGDOWN or EXT4_IOC_SHUTDOWN, after entering
emergency shutdown state, all operations will return errors (-EIO), and
can not be bring back to normal state until unmouont.
The new state will reject the following file operations:
- read_iter()
- write_iter()
- mmap()
- open()
- remap_file_range()
- uring_cmd()
- splice_read()
This requires a small wrapper to do the extra shutdown check, then call
the regular filemap_splice_read() function
This should reject most of the file operations on a shutdown btrfs.
And for the existing dirty folios, extra shutdown checks are introduced
to the following functions:
So that dirty ranges will still be properly cleaned without being
submitted.
Finally the shutdown state will also set the fs error, so that no new
transaction will be committed, protecting the metadata from any possible
further corruption.
And when the fs entered shutdown mode for the first time, a critical
level kernel message will show up to indicate the incident.
That message will be important for end users as rejected delalloc ranges
will output error messages, hopefully that shutdown message and the fact
that all fs operations are returning error will prevent end users from
getting too confused about the delalloc error messages.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <asj@kernel.org> Tested-by: Anand Jain <asj@kernel.org> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 7 Oct 2025 10:14:37 +0000 (11:14 +0100)]
btrfs: use end_pos variable where needed in btrfs_dirty_folio()
We have a couple places doing the computation "pos + write_bytes" when we
already have it in the local variable "end_pos". Change then to use the
variable instead and make source code smaller. Also make the variable
const since it's not supposed to change.
This also has a very slight reduction in the module size.
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 1915990 161647 15592 2093229 1ff0ad fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename 1915974 161647 15592 2093213 1ff09d fs/btrfs/btrfs.ko
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Boris Burkov [Thu, 2 Oct 2025 00:20:22 +0000 (17:20 -0700)]
btrfs: fix racy bitfield write in btrfs_clear_space_info_full()
From the memory-barriers.txt document regarding memory barrier ordering
guarantees:
(*) These guarantees do not apply to bitfields, because compilers often
generate code to modify these using non-atomic read-modify-write
sequences. Do not attempt to use bitfields to synchronize parallel
algorithms.
(*) Even in cases where bitfields are protected by locks, all fields
in a given bitfield must be protected by one lock. If two fields
in a given bitfield are protected by different locks, the compiler's
non-atomic read-modify-write sequences can cause an update to one
field to corrupt the value of an adjacent field.
btrfs_space_info has a bitfield sharing an underlying word consisting of
the fields full, chunk_alloc, and flush:
Therefore, to be safe from parallel read-modify-writes losing a write to
one of the bitfield members protected by a lock, all writes to all the
bitfields must use the lock. They almost universally do, except for
btrfs_clear_space_info_full() which iterates over the space_infos and
writes out found->full = 0 without a lock.
Imagine that we have one thread completing a transaction in which we
finished deleting a block_group and are thus calling
btrfs_clear_space_info_full() while simultaneously the data reclaim
ticket infrastructure is running do_async_reclaim_data_space():
and now data_sinfo->flush is 1 but the reclaim worker has exited. This
breaks the invariant that flush is 0 iff there is no work queued or
running. Once this invariant is violated, future allocations that go
into __reserve_bytes() will add tickets to space_info->tickets but will
see space_info->flush is set to 1 and not queue the work. After this,
they will block forever on the resulting ticket, as it is now impossible
to kick the worker again.
I also confirmed by looking at the assembly of the affected kernel that
it is doing RMW operations. For example, to set the flush (3rd) bit to 0,
the assembly is:
andb $0xfb,0x60(%rbx)
and similarly for setting the full (1st) bit to 0:
andb $0xfe,-0x20(%rax)
So I think this is really a bug on practical systems. I have observed
a number of systems in this exact state, but am currently unable to
reproduce it.
Rather than leaving this footgun lying around for the future, take
advantage of the fact that there is room in the struct anyway, and that
it is already quite large and simply change the three bitfield members to
bools. This avoids writes to space_info->full having any effect on
writes to space_info->flush, regardless of locking.
Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Rajeev Tapadia [Fri, 3 Oct 2025 13:30:02 +0000 (19:00 +0530)]
btrfs: fix comment in alloc_bitmap() and drop stale TODO
All callers of alloc_bitmap() hold a transaction handle, so GFP_NOFS is
needed to avoid deadlocks on recursion. Update the comment and drop the
stale TODO.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Rajeev Tapadia <rtapadia730@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: fix double free of qgroup record after failure to add delayed ref head
In the previous code it was possible to incur into a double kfree()
scenario when calling add_delayed_ref_head(). This could happen if the
record was reported to already exist in the
btrfs_qgroup_trace_extent_nolock() call, but then there was an error
later on add_delayed_ref_head(). In this case, since
add_delayed_ref_head() returned an error, the caller went to free the
record. Since add_delayed_ref_head() couldn't set this kfree'd pointer
to NULL, then kfree() would have acted on a non-NULL 'record' object
which was pointing to memory already freed by the callee.
The problem comes from the fact that the responsibility to kfree the
object is on both the caller and the callee at the same time. Hence, the
fix for this is to shift the ownership of the 'qrecord' object out of
the add_delayed_ref_head(). That is, we will never attempt to kfree()
the given object inside of this function, and will expect the caller to
act on the 'qrecord' object on its own. The only exception where the
'qrecord' object cannot be kfree'd is if it was inserted into the
tracing logic, for which we already have the 'qrecord_inserted_ret'
boolean to account for this. Hence, the caller has to kfree the object
only if add_delayed_ref_head() reports not to have inserted it on the
tracing logic.
As a side-effect of the above, we must guarantee that
'qrecord_inserted_ret' is properly initialized at the start of the
function, not at the end, and then set when an actual insert
happens. This way we avoid 'qrecord_inserted_ret' having an invalid
value on an early exit.
The documentation from the add_delayed_ref_head() has also been updated
to reflect on the exact ownership of the 'qrecord' object.
Fixes: 6ef8fbce0104 ("btrfs: fix missing error handling when adding delayed ref with qgroups enabled") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 29 Sep 2025 12:41:15 +0000 (14:41 +0200)]
btrfs: subpage: rename macro variables to avoid shadowing
When compiling with -Wshadow there are warnings in the subpage helper
macros that are used in functions like btrfs_subpage_dump_bitmap() or
btrfs_subpage_clear_and_test_dirty() that also use 'bfs' (for struct
btrfs_folio_state) or blocks_per_folio.
Add '__' to the macro variables and unify naming in all subpage macros.
btrfs: refactor allocation size calculation in alloc_btrfs_io_context()
Use struct_size() to replace the open-coded calculation, remove the
comment as use of the helper is self explanatory.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 26 Sep 2025 09:47:30 +0000 (11:47 +0200)]
btrfs: fix trivial -Wshadow warnings
When compiling with -Wshadow (also in 'make W=2' build) there are
several reports of shadowed variables that seem to be harmless:
- btrfs_do_encoded_write() - we can reuse 'ordered', there's no previous
value that would need to be preserved
- scrub_write_endio() - we need a standalone 'i' for bio iteration
- scrub_stripe() - duplicate ret2 for errors that must not overwrite 'ret'
- btrfs_subpage_set_writeback() - 'flags' is used for another irqsave lock
but is not overwritten when reused for xarray
due to scoping, but for clarity let's rename it
- process_dir_items_leaf() - duplicate 'ret', used only for immediate checks
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 26 Sep 2025 06:32:56 +0000 (08:32 +0200)]
btrfs: print-tree: use string format for key names
There's a warning when -Wformat=2 is used:
fs/btrfs/print-tree.c: In function ‘key_type_string’:
fs/btrfs/print-tree.c:424:17: warning: format not a string literal and no format arguments [-Wformat-nonliteral]
424 | scnprintf(buf, buf_size, key_to_str[key->type]);
We're printing fixed strings from a table so there's no problem but
let's fix the warning so we could enable the warning in fs/btrfs/.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: remove unnecessary NULL fs_info check from find_lock_delalloc_range()
[STATIC CHECK REPORT]
Smatch is reporting that find_lock_delalloc_range() used to do a null
pointer check before accessing fs_info, but now we're accessing it for
sectorsize unconditionally.
[FALSE ALERT]
This is a false alert, the existing null pointer check is introduced in
commit f7b12a62f008 ("btrfs: replace BTRFS_MAX_EXTENT_SIZE with
fs_info->max_extent_size"), but way before that, commit 7c0260ee098d
("btrfs: tests, require fs_info for root") is already forcing every
btrfs_root to have a correct fs_info pointer.
So there is no way that btrfs_root::fs_info is NULL.
[FIX]
Just remove the unnecessary NULL pointer checker.
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: f7b12a62f008 ("btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size") Closes: https://lore.kernel.org/r/202509250925.4L4JQTtn-lkp@intel.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 24 Sep 2025 16:10:27 +0000 (17:10 +0100)]
btrfs: use single return value variable in btrfs_relocate_block_group()
We are using 'ret' and 'err' variables to track return values and errors,
which is pattern that is error prone and we had quite some bugs due to
this pattern in the past.
Simplify this and use a single variable, named 'ret', to track errors and
the return value.
Also rename the variable 'rw' to 'bg_is_ro' which is more meaningful name,
and change its type from int to bool.
Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Boris Burkov [Tue, 23 Sep 2025 17:57:02 +0000 (10:57 -0700)]
btrfs: ignore ENOMEM from alloc_bitmap()
btrfs_convert_free_space_to_bitmaps() and
btrfs_convert_free_space_to_extents() both allocate a bitmap struct
with:
bitmap_size = free_space_bitmap_size(fs_info, block_group->length);
bitmap = alloc_bitmap(bitmap_size);
if (!bitmap) {
ret = -ENOMEM;
btrfs_abort_transaction(trans);
return ret;
}
This conversion is done based on a heuristic and the check triggers each
time we call update_free_space_extent_count() on a block group (each
time we add/remove an extent or modify a bitmap). Furthermore, nothing
relies on maintaining some invariant of bitmap density, it's just an
optimization for space usage. Therefore, it is safe to simply ignore
any memory allocation errors that occur, rather than aborting the
transaction and leaving the fs read only.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 23 Nov 2025 20:03:28 +0000 (12:03 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Fixes for the Allwinner A523 clk driver:
- Lower the minimum rate for the A523 audio PLL to support
frequencies required by audio devices
- Mark a couple clks critical on A523 so that Linux doesn't turn them
off when they're used by other code like TF-A"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: sun55i-a523-ccu: Lower audio0 pll minimum rate
clk: sunxi-ng: sun55i-a523-r-ccu: Mark bus-r-dma as critical
clk: sunxi-ng: Mark A523 bus-r-cpucfg clock as critical
Linus Torvalds [Sun, 23 Nov 2025 16:23:30 +0000 (08:23 -0800)]
Merge tag 'timers-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
- Fix a race in timer->function clearing in timer_shutdown_sync()
- Fix a timekeeper sysfs-setup resource leak in error paths
- Fix the NOHZ report_idle_softirq() syslog rate-limiting
logic to have no side effects on the return value
* tag 'timers-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Fix NULL function pointer race in timer_shutdown_sync()
timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths
tick/sched: Fix bogus condition in report_idle_softirq()
Yipeng Zou [Sat, 22 Nov 2025 09:39:42 +0000 (09:39 +0000)]
timers: Fix NULL function pointer race in timer_shutdown_sync()
There is a race condition between timer_shutdown_sync() and timer
expiration that can lead to hitting a WARN_ON in expire_timers().
The issue occurs when timer_shutdown_sync() clears the timer function
to NULL while the timer is still running on another CPU. The race
scenario looks like this:
CPU0 CPU1
<SOFTIRQ>
lock_timer_base()
expire_timers()
base->running_timer = timer;
unlock_timer_base()
[call_timer_fn enter]
mod_timer()
...
timer_shutdown_sync()
lock_timer_base()
// For now, will not detach the timer but only clear its function to NULL
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()
[call_timer_fn exit]
lock_timer_base()
base->running_timer = NULL;
unlock_timer_base()
...
// Now timer is pending while its function set to NULL.
// next timer trigger
<SOFTIRQ>
expire_timers()
WARN_ON_ONCE(!fn) // hit
...
lock_timer_base()
// Now timer will detach
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()
The problem is that timer_shutdown_sync() clears the timer function
regardless of whether the timer is currently running. This can leave a
pending timer with a NULL function pointer, which triggers the
WARN_ON_ONCE(!fn) check in expire_timers().
Fix this by only clearing the timer function when actually detaching the
timer. If the timer is running, leave the function pointer intact, which is
safe because the timer will be properly detached when it finishes running.
Fixes: 0cc04e80458a ("timers: Add shutdown mechanism to the internal functions") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251122093942.301559-1-zouyipeng@huawei.com
Linus Torvalds [Sat, 22 Nov 2025 19:53:53 +0000 (11:53 -0800)]
Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fix from Eric Biggers:
"Fix another KMSAN warning that made it in while KMSAN wasn't working
reliably"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()
Linus Torvalds [Sat, 22 Nov 2025 18:16:21 +0000 (10:16 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"One target driver fix and one scsi-generic one. The latter is 10 lines
because the problem lock has to be dropped and re-taken around the
call causing the sleep in atomic"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sg: Do not sleep in atomic context
scsi: target: tcm_loop: Fix segfault in tcm_loop_tpg_address_show()
Linus Torvalds [Sat, 22 Nov 2025 17:58:41 +0000 (09:58 -0800)]
Merge tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- INPUT_PROP_HAPTIC_TOUCHPAD definition added early in 6.18 cycle has
been renamed to INPUT_PROP_PRESSUREPAD to better reflect the kind of
devices it is supposed to be set for
- a new ID for a touchscreen found in Ayaneo Flip DS in Goodix driver
- Goodix driver no longer tries to set reset pin as "input" as it
causes issues when there is no pull up resistor installed on the
board
- fixes for cros_ec_keyb, imx_sc_key, and pegasus-notetaker drivers to
deal with potential out-of-bounds access and memory corruption issues
* tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: rename INPUT_PROP_HAPTIC_TOUCHPAD to INPUT_PROP_PRESSUREPAD
Input: cros_ec_keyb - fix an invalid memory access
Input: imx_sc_key - fix memory corruption on unload
Input: pegasus-notetaker - fix potential out-of-bounds access
Input: goodix - remove setting of RST pin to input
Input: goodix - add support for ACPI ID GDIX1003
Linus Torvalds [Sat, 22 Nov 2025 17:44:50 +0000 (09:44 -0800)]
Merge tag 'riscv-for-linus-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
- Correct the MIPS RISC-V/JEDEC vendor ID
- Fix the system shutdown behavior in the legacy case where
CONFIG_RISCV_SBI_V01 is set, but the firmware implementation
doesn't support the older v0.1 system shutdown method
- Align some tools/ macro definitions with the corresponding
kernel headers
* tag 'riscv-for-linus-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
tools: riscv: Fixed misalignment of CSR related definitions
riscv: sbi: Prefer SRST shutdown over legacy
riscv: Update MIPS vendor id to 0x127
Linus Torvalds [Sat, 22 Nov 2025 17:24:36 +0000 (09:24 -0800)]
Merge tag 'selinux-pr-20251121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
"Three SELinux patches for v6.18 to fix issues around accessing the
per-task decision cache that we introduced in v6.16 to help reduce
SELinux overhead on path walks. The problem was that despite the cache
being located in the SELinux "task_security_struct", the parent struct
wasn't actually tied to the task, it was tied to a cred.
Historically SELinux did locate the task_security_struct in the
task_struct's security blob, but it was later relocated to the cred
struct when the cred work happened, as it made the most sense at the
time.
Unfortunately we never did the task_security_struct to
cred_security_struct rename work (avoid code churn maybe? who knows)
because it didn't really matter at the time. However, it suddenly
became a problem when we added a per-task cache to a per-cred object
and didn't notice because of the old, no-longer-correct struct naming.
Thanks to KCSAN for flagging this, as the silly humans running things
forgot that the task_security_struct was a big lie.
This contains three patches, only one of which actually fixes the
problem described above and moves the SELinux decision cache from the
per-cred struct to a newly (re)created per-task struct.
The other two patches, which form the bulk of the diffstat, take care
of the associated renaming tasks so we can hopefully avoid making the
same stupid mistake in the future.
For the record, I did contemplate sending just a fix for the cache,
leaving the renaming patches for the upcoming merge window, but the
type/variable naming ended up being pretty awful and would have made
v6.18 an outlier stuck between the "old" names and the "new" names in
v6.19. The renaming patches are also fairly mechanical/trivial and
shouldn't pose much risk despite their size.
TLDR; naming things may be hard, but if you mess it up bad things
happen"
* tag 'selinux-pr-20251121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: rename the cred_security_struct variables to "crsec"
selinux: move avdcache to per-task security struct
selinux: rename task_security_struct to cred_security_struct
Linus Torvalds [Fri, 21 Nov 2025 19:16:14 +0000 (11:16 -0800)]
Merge tag 'loongarch-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Use UAPI types in ptrace UAPI header to fix nolibc ptrace.
Fix CPU name display, NUMA node parsing, kexec/kdump, PCI init and BPF
trampoline"
* tag 'loongarch-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: BPF: Disable trampoline for kernel module function trace
LoongArch: Don't panic if no valid cache info for PCI
LoongArch: Mask all interrupts during kexec/kdump
LoongArch: Fix NUMA node parsing with numa_memblks
LoongArch: Consolidate CPU names in /proc/cpuinfo
LoongArch: Use UAPI types in ptrace UAPI header
Linus Torvalds [Fri, 21 Nov 2025 19:14:21 +0000 (11:14 -0800)]
Merge tag 'v6.18-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix potential memory leak in mount
- Add some missing read tracepoints
- Fix locking issue with directory leases
* tag 'v6.18-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add the smb3_read_* tracepoints to SMB1
cifs: fix memory leak in smb3_fs_context_parse_param error path
smb: client: introduce close_cached_dir_locked()
Linus Torvalds [Fri, 21 Nov 2025 19:09:57 +0000 (11:09 -0800)]
Merge tag 'io_uring-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix for a mixup of arguments for the skb_queue_splice()
call, in the io_uring timestamp retrieval code"
* tag 'io_uring-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/cmd_net: fix wrong argument types for skb_queue_splice()
Linus Torvalds [Fri, 21 Nov 2025 18:59:35 +0000 (10:59 -0800)]
Merge tag 'block-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
"NVMe pull request via Keith:
- Admin queue use-after-free fix (Keith)
- Target authentication fix (Alistar)
- Multipath lockdeup fix (Shin'ichiro)
- FC transport teardown fixes (Ewan)"
* tag 'block-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()
nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
nvme-multipath: fix lockdep WARN due to partition scan work
nvmet-auth: update sc_c in target host hash calculation
nvme: fix admin request_queue lifetime
Linus Torvalds [Fri, 21 Nov 2025 18:53:23 +0000 (10:53 -0800)]
Merge tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Add a missing refcount decrement in ata_scsi_dev_rescan() when
the device or its queue is not running.
In the case where the device is running, the recount is already
decremented properly (Yihang Li)
- Generate the proper sense code for a Security locked device.
There was a regression caused by a recent change of how sense
data is generated for commands that did not provide any sense
data. This broke system suspend for Security locked devices.
Generate the sense data that the SCSI disk driver expects for a
Security locked device so that system suspend works again (me)
- Set capacity to zero for a Security locked device.
All I/O commands will be aborted by a Security locked device.
Thus, the block layer disk partition scanning will result in
a bunch of, for the user, confusing I/O errors in dmesg during
boot.
Since a Security locked device is unusable anyway, set the capacity
to zero, to avoid the disk partition scanning during boot. We still
create the block device in /dev such that the user may unlock the
device using e.g. hdparm (me)
* tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Set capacity to zero for a security locked drive
ata: libata-scsi: Fix system suspend for a security locked drive
ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()
Linus Torvalds [Fri, 21 Nov 2025 18:47:24 +0000 (10:47 -0800)]
Merge tag 'pinctrl-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Fix register naming in the Mediatek mt8189 driver
- Select REGMAP_MMIO for the Realtek RTD driver
- Fix the number of items in groups in the Toshiba Visconti driver
- Fix a memory leak in the Cirrus CS42L43 driver
- Fix a deadlock (!) in Qualcomm pinmux configuration
- Fix use of uninitialized memory and list initialization in the S32CC
pin controller
* tag 'pinctrl-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
dt-bindings: pinctrl: xlnx,versal-pinctrl: Add missing unevaluatedProperties on '^conf' nodes
pinctrl: s32cc: initialize gpio_pin_config::list after kmalloc()
pinctrl: s32cc: fix uninitialized memory in s32_pinctrl_desc
pinctrl: qcom: msm: Fix deadlock in pinmux configuration
pinctrl: cirrus: Fix fwnode leak in cs42l43_pin_probe()
dt-bindings: pinctrl: toshiba,visconti: Fix number of items in groups
pinctrl: realtek: Select REGMAP_MMIO for RTD driver
pinctrl: mediatek: mt8189: align register base names to dt-bindings ones
pinctrl: mediatek: mt8196: align register base names to dt-bindings ones
Linus Torvalds [Fri, 21 Nov 2025 18:43:58 +0000 (10:43 -0800)]
Merge tag 'gpio-fixes-for-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a use-after-free bug in GPIO character device code
- update MAINTAINERS
* tag 'gpio-fixes-for-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: update my email address
gpio: cdev: make sure the cdev fd is still active before emitting events
Eric Biggers [Fri, 21 Nov 2025 03:34:31 +0000 (19:34 -0800)]
lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()
Fully initialize *ctx, including the buf field which sha256_init()
doesn't initialize, to avoid a KMSAN warning when comparing *ctx to
orig_ctx. This KMSAN warning slipped in while KMSAN was not working
reliably due to a stackdepot bug, which has now been fixed.
Linus Torvalds [Fri, 21 Nov 2025 17:55:55 +0000 (09:55 -0800)]
Merge tag 'drm-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"A range of small fixes across the board, the i915 display
disambiguation is probably the biggest otherwise amdgpu and xe as
usual with tegra, nouveau, radeon and a core atomic fix.
Looks mostly normal.
atomic:
- Return error codes on failed blob creation for planes
nouveau:
- Fix memory leak
tegra:
- Fix device ref counting
- Fix pid ref counting
- Revert booting on Pixel C
xe:
- Fix out-of-bounds access with BIT()
- Fix kunit test checking wrong condition
- Drop duplicate kconfig select
- Fix guc2host irq handler with MSI-X
i915:
- Wildcat Lake and Panther Lake detangled for display fixes
* tag 'drm-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
drm/amdgpu: Add sriov vf check for VCN per queue reset support.
drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags
drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag
drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled
drm/amd: Skip power ungate during suspend for VPE
drm/plane: Fix create_in_format_blob() return value
drm/xe/irq: Handle msix vector0 interrupt
drm/xe: Remove duplicate DRM_EXEC selection from Kconfig
drm/xe/kunit: Fix forcewake assertion in mocs test
drm/xe: Prevent BIT() overflow when handling invalid prefetch region
drm/radeon: delete radeon_fence_process in is_signaled, no deadlock
drm/amd/display: Fix pbn to kbps Conversion
drm/amd/display: Clear the CUR_ENABLE register on DCN20 on DPP5
drm/amd/display: Add an HPD filter for HDMI
drm/amd/display: Increase DPCD read retries
drm/amd/display: Move sleep into each retry for retrieve_link_cap()
drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched
drm/i915/xe3: Restrict PTL intel_encoder_is_c10phy() to only PHY A
drm/i915/display: Add definition for wcl as subplatform
drm/pcids: Split PTL pciids group to make wcl subplatform
...
Linus Torvalds [Fri, 21 Nov 2025 17:29:02 +0000 (09:29 -0800)]
samples: work around glibc redefining some of our defines wrong
Apparently as of version 2.42, glibc headers define AT_RENAME_NOREPLACE
and some of the other flags for renameat2() and friends in <stdio.h>.
Which would all be fine, except for inexplicable reasons glibc decided
to define them _differently_ from the kernel definitions, which then
makes some of our sample code that includes both kernel headers and user
space headers unhappy, because the compiler will (correctly) complain
about redefining things.
Now, mixing kernel headers and user space headers is always a somewhat
iffy proposition due to namespacing issues, but it's kind of inevitable
in our sample and selftest code. And this is just glibc being stupid.
Those defines come from the kernel, glibc is exposing the kernel
interfaces, and glibc shouldn't make up some random new expressions for
these values.
It's not like glibc headers changed the actual result values, but they
arbitrarily just decided to use a different expression to describe those
values. The kernel just does
instead. Same value in the end, but very different macro definition.
For absolutely no reason.
This has since been fixed in the glibc development tree, so eventually
we'll end up with the canonical expressions and no clashes. But in the
meantime the broken headers are in the glibc-2.42 release and have made
it out into distributions.
Do a minimal work-around to make the samples build cleanly by just
undefining the affected macros in between the user space header include
and the kernel header includes.
Commit 69896119dc9d ("MIPS: vdso: Switch to generic storage
implementation") switches to a generic vdso storage, which increases
the number of data pages from 1 to 4. But there is only one page
reserved, which causes segementation faults depending where the VDSO
area is randomized to. To fix this use the same size of reservation
and allocation of the VDSO data pages.
Fixes: 69896119dc9d ("MIPS: vdso: Switch to generic storage implementation") Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS: mm: Prevent a TLB shutdown on initial uniquification
Depending on the particular CPU implementation a TLB shutdown may occur
if multiple matching entries are detected upon the execution of a TLBP
or the TLBWI/TLBWR instructions. Given that we don't know what entries
we have been handed we need to be very careful with the initial TLB
setup and avoid all these instructions.
Therefore read all the TLB entries one by one with the TLBR instruction,
bypassing the content addressing logic, and truncate any large pages in
place so as to avoid a case in the second step where an incoming entry
for a large page at a lower address overlaps with a replacement entry
chosen at another index. Then preinitialize the TLB using addresses
outside our usual unique range and avoiding clashes with any entries
received, before making the usual call to local_flush_tlb_all().
This fixes (at least) R4x00 cores if TLBP hits multiple matching TLB
entries (SGI IP22 PROM for examples sets up all TLBs to the same virtual
address).
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 35ad7e181541 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init") Cc: stable@vger.kernel.org Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> # Boston I6400, M5150 sim Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Paul Moore [Tue, 18 Nov 2025 22:27:58 +0000 (17:27 -0500)]
selinux: rename the cred_security_struct variables to "crsec"
Along with the renaming from task_security_struct to cred_security_struct,
rename the local variables to "crsec" from "tsec". This both fits with
existing conventions and helps distinguish between task and cred related
variables.
No functional changes.
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Stephen Smalley [Thu, 13 Nov 2025 20:23:14 +0000 (15:23 -0500)]
selinux: move avdcache to per-task security struct
The avdcache is meant to be per-task; move it to a new
task_security_struct that is duplicated per-task.
Cc: stable@vger.kernel.org Fixes: 5d7ddc59b3d89b724a5aa8f30d0db94ff8d2d93f ("selinux: reduce path walk overhead") Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line length fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
Stephen Smalley [Thu, 13 Nov 2025 20:23:13 +0000 (15:23 -0500)]
selinux: rename task_security_struct to cred_security_struct
Before Linux had cred structures, the SELinux task_security_struct was
per-task and although the structure was switched to being per-cred
long ago, the name was never updated. This change renames it to
cred_security_struct to avoid confusion and pave the way for the
introduction of an actual per-task security structure for SELinux. No
functional change.
Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus Torvalds [Thu, 20 Nov 2025 19:04:37 +0000 (11:04 -0800)]
Merge tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fix from Tejun Heo:
"One low risk and obvious fix: scx_enable() was dereferencing an error
pointer on helper kthread creation failure. Fixed"
* tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_enable() crash on helper kthread creation failure
kthread_run_worker() returns an ERR_PTR() on failure rather than NULL,
but the current code in scx_alloc_and_add_sched() only checks for a NULL
helper. Incase of failure on SIGQUIT, the error is not handled in
scx_alloc_and_add_sched() and scx_enable() ends up dereferencing an
error pointer.
Error handling is fixed in scx_alloc_and_add_sched() to propagate
PTR_ERR() into ret, so that scx_enable() jumps to the existing error
path, avoiding random dereference on failure.
Jens Axboe [Thu, 20 Nov 2025 18:40:15 +0000 (11:40 -0700)]
io_uring/cmd_net: fix wrong argument types for skb_queue_splice()
If timestamp retriving needs to be retried and the local list of
SKB's already has entries, then it's spliced back into the socket
queue. However, the arguments for the splice helper are transposed,
causing exactly the wrong direction of splicing into the on-stack
list. Fix that up.
Cc: stable@vger.kernel.org Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-462435176@google.com> Fixes: 9e4ed359b8ef ("io_uring/netcmd: add tx timestamping cmd support") Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 20 Nov 2025 17:46:52 +0000 (09:46 -0800)]
Merge tag 'pm-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a regression introduced during the 6.16 development cycle that may
cause runtime PM to be enabled by mistake for devices that do not
support it (which may lead to some serious trouble) if there is a
system wakeup event during the "late suspend" phase of system suspend"
* tag 'pm-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: core: Fix runtime PM enabling in device_resume_early()
Linus Torvalds [Thu, 20 Nov 2025 17:44:27 +0000 (09:44 -0800)]
Merge tag 'acpi-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"This fixes EINJV2 support introduced during the 6.17 cycle by
unbreaking the initialization broken by a previous attempted fix,
adding sanity checks for data coming from the platform firmware, and
updating the code to handle injecting legacy error types on an EINJV2
capable systems properly (Tony Luck)"
* tag 'acpi-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: APEI: EINJ: Fix EINJV2 initialization and injection
Linus Torvalds [Thu, 20 Nov 2025 17:39:34 +0000 (09:39 -0800)]
Merge tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"This one has lots of new HW entries which adds to the size in diffstat
but the individual changes are simple.
Fixes
- acer-wmi: Ignore backlight event
- alienware-wmi-wmax: Fix quirk match table order & drop redundant
entries
- amd/pmc:
- Add Xbox Ally to spurious 8042 quirk list
- Quirk list Lenovo Legion Go 2 NVMe resume
- msi-wmi-platform:
- Correct GUID to uppercase
- GUID is uncleverly copy-pasted from an example so add a DMI
whitelist
- hp-wmi:
- Omen 16-wf1xxx fan support
- Omen MAX 16-ah0xx fan + thermal profile support
- Victus 16-r0 and 16-s0 fan + thermal profile support
- intel/hid: Intel Nova Lake support
- intel-uncore-freq:
- Intel Panther Lake support
- Intel Wildcat Lake support
- Intel Nova Lake support"
* tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (21 commits)
platform/x86: intel-uncore-freq: fix all header kernel-doc warnings
platform/x86: acer-wmi: Ignore backlight event
platform/x86/intel/speed_select_if: Convert PCIBIOS_* return codes to errnos
platform/x86/intel/hid: Add Nova Lake support
platform/x86: alienware-wmi-wmax: Add AWCC support to Alienware 16 Aurora
platform/x86: hp-wmi: Add Omen MAX 16-ah0xx fan support and thermal profile
platform/x86: msi-wmi-platform: Fix typo in WMI GUID
platform/x86: msi-wmi-platform: Only load on MSI devices
platform/x86/amd: pmc: Add Lenovo Legion Go 2 to pmc quirk list
platform/x86/amd/pmc: Add spurious_8042 to Xbox Ally
platform/x86/amd/pmc: Add support for Van Gogh SoC
platform/x86: alienware-wmi-wmax: Add support for the whole "G" family
platform/x86: alienware-wmi-wmax: Add support for the whole "X" family
platform/x86: alienware-wmi-wmax: Add support for the whole "M" family
platform/x86: alienware-wmi-wmax: Drop redundant DMI entries
platform/x86: alienware-wmi-wmax: Fix "Alienware m16 R1 AMD" quirk order
platform/x86: ISST: isst_if.h: fix all kernel-doc warnings
platform/x86: intel-uncore-freq: Add additional client processors
platform/x86: hp-wmi: Add Omen 16-wf1xxx fan support
platform/x86: huawei-wmi: add keys for HONOR models
...
Linus Torvalds [Thu, 20 Nov 2025 16:52:07 +0000 (08:52 -0800)]
Merge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from IPsec and wireless.
Previous releases - regressions:
- prevent NULL deref in generic_hwtstamp_ioctl_lower(),
newer APIs don't populate all the pointers in the request
- phylink: add missing supported link modes for the fixed-link
- mptcp: fix false positive warning in mptcp_pm_nl_rm_addr
Previous releases - always broken:
- openvswitch: remove never-working support for setting NSH fields
- xfrm: number of fixes for error paths of xfrm_state creation/
modification/deletion
- xfrm: fixes for offload
- fix the determination of the protocol of the inner packet
- don't push locally generated packets directly to L2 tunnel
mode offloading, they still need processing from the standard
xfrm path
- mptcp: fix a couple of corner cases in fallback and fastclose
handling
- wifi: rtw89: hw_scan: prevent connections from getting stuck,
work around apparent bug in FW by tweaking messages we send
- af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait
- veth: more robust handing of race to avoid txq getting stuck
* tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
vsock: Ignore signal/timeout on connect() if already established
be2net: pass wrb_params in case of OS2BMC
l2tp: reset skb control buffer on xmit
net: dsa: microchip: lan937x: Fix RGMII delay tuning
selftests: mptcp: add a check for 'add_addr_accepted'
mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
selftests: mptcp: join: userspace: longer timeout
selftests: mptcp: join: endpoints: longer timeout
selftests: mptcp: join: fastclose: remove flaky marks
mptcp: fix duplicate reset on fastclose
mptcp: decouple mptcp fastclose from tcp close
mptcp: do not fallback when OoO is present
mptcp: fix premature close in case of fallback
mptcp: avoid unneeded subflow-level drops
mptcp: fix ack generation for fallback msk
wifi: rtw89: hw_scan: Don't let the operating channel be last
net: phylink: add missing supported link modes for the fixed-link
selftest: af_unix: Add test for SO_PEEK_OFF.
af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic().
net/mlx5: Clean up only new IRQ glue on request_irq() failure
...
timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths
tk_aux_sysfs_init() returns immediately on error during the auxiliary clock
initialization loop without cleaning up previously allocated kobjects and
sysfs groups.
If kobject_create_and_add() or sysfs_create_group() fails during loop
iteration, the parent kobjects (tko and auxo) and any previously created
child kobjects are leaked.
Fix this by adding proper error handling with goto labels to ensure all
allocated resources are cleaned up on failure. kobject_put() on the
parent kobjects will handle cleanup of their children.
Michal Luczaj [Wed, 19 Nov 2025 14:02:59 +0000 (15:02 +0100)]
vsock: Ignore signal/timeout on connect() if already established
During connect(), acting on a signal/timeout by disconnecting an already
established socket leads to several issues:
1. connect() invoking vsock_transport_cancel_pkt() ->
virtio_transport_purge_skbs() may race with sendmsg() invoking
virtio_transport_get_credit(). This results in a permanently elevated
`vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling.
2. connect() resetting a connected socket's state may race with socket
being placed in a sockmap. A disconnected socket remaining in a sockmap
breaks sockmap's assumptions. And gives rise to WARNs.
3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a
transport change/drop after TCP_ESTABLISHED. Which poses a problem for
any simultaneous sendmsg() or connect() and may result in a
use-after-free/null-ptr-deref.
Do not disconnect socket on signal/timeout. Keep the logic for unconnected
sockets: they don't linger, can't be placed in a sockmap, are rejected by
sendmsg().
Andrey Vatoropin [Wed, 19 Nov 2025 10:51:12 +0000 (10:51 +0000)]
be2net: pass wrb_params in case of OS2BMC
be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL
at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL
pointer when processing a workaround for specific packet, as commit bc0c3405abbb ("be2net: fix a Tx stall bug caused by a specific ipv6
packet") states.
The correct way would be to pass the wrb_params from be_xmit().
* tag 'nvme-6.18-2025-11-20' of git://git.infradead.org/nvme:
nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()
nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
nvme-multipath: fix lockdep WARN due to partition scan work
nvmet-auth: update sc_c in target host hash calculation
nvme: fix admin request_queue lifetime
Niklas Cassel [Wed, 19 Nov 2025 14:13:15 +0000 (15:13 +0100)]
ata: libata-core: Set capacity to zero for a security locked drive
For Security locked drives (drives that have Security enabled, and have
not been Security unlocked by boot firmware), the automatic partition
scanning will result in the user being spammed with errors such as:
ata5.00: failed command: READ DMA
ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 7 dma 4096 in
res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
ata5.00: status: { DRDY ERR }
ata5.00: error: { ABRT }
sd 4:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
sd 4:0:0:0: [sda] tag#7 Sense Key : Aborted Command [current]
sd 4:0:0:0: [sda] tag#7 Add. Sense: No additional sense information
during boot, because most commands except for IDENTIFY will be aborted by
a Security locked drive.
For a Security locked drive, set capacity to zero, so that no automatic
partition scanning will happen.
If the user later unlocks the drive using e.g. hdparm, the close() by the
user space application should trigger a revalidation of the drive.
Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Niklas Cassel [Wed, 19 Nov 2025 14:13:14 +0000 (15:13 +0100)]
ata: libata-scsi: Fix system suspend for a security locked drive
Commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status
handling") fixed ata_to_sense_error() to properly generate sense key
ABORTED COMMAND (without any additional sense code), instead of the
previous bogus sense key ILLEGAL REQUEST with the additional sense code
UNALIGNED WRITE COMMAND, for a failed command.
However, this broke suspend for Security locked drives (drives that have
Security enabled, and have not been Security unlocked by boot firmware).
The reason for this is that the SCSI disk driver, for the Synchronize
Cache command only, treats any sense data with sense key ILLEGAL REQUEST
as a successful command (regardless of ASC / ASCQ).
After commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error()
status handling") the code that treats any sense data with sense key
ILLEGAL REQUEST as a successful command is no longer applicable, so the
command fails, which causes the system suspend to be aborted:
sd 1:0:0:0: PM: dpm_run_callback(): scsi_bus_suspend returns -5
sd 1:0:0:0: PM: failed to suspend async: error -5
PM: Some devices failed to suspend, or early wake event detected
To make suspend work once again, for a Security locked device only,
return sense data LOGICAL UNIT ACCESS NOT AUTHORIZED, the actual sense
data which a real SCSI device would have returned if locked.
The SCSI disk driver treats this sense data as a successful command.
Cc: stable@vger.kernel.org Reported-by: Ilia Baryshnikov <qwelias@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220704 Fixes: cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Paolo Abeni [Thu, 20 Nov 2025 12:02:00 +0000 (13:02 +0100)]
Merge tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
wireless-2025-11-20
A single fix for scanning on some rtw89 devices.
* tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: rtw89: hw_scan: Don't let the operating channel be last
====================
David Bauer [Tue, 18 Nov 2025 00:16:18 +0000 (01:16 +0100)]
l2tp: reset skb control buffer on xmit
The L2TP stack did not reset the skb control buffer before sending the
encapsulated package.
In a setup with an ath10k radio and batman-adv over an L2TP tunnel
massive fragmentations happen sporadically if the L2TP tunnel is
established over IPv4.
L2TP might reset some of the fields in the IP control buffer, but L2TP
assumes the type of the control buffer to be of an IPv4 packet.
In case the L2TP interface is used as a batadv hardif or the packet is
an IPv6 packet, this assumption breaks.
Clear the entire control buffer to avoid such mishaps altogether.
Correct RGMII delay application logic in lan937x_set_tune_adj().
The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the
new delay value. This caused the new value to be bitwise-OR'd with the
existing PORT_TUNE_ADJ field instead of replacing it.
For example, when setting the RGMII 2 TX delay on port 4, the
intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was
incorrectly OR'd with the default 0x1B (from register value 0xDA3),
leaving the delay at the wrong setting.
This patch adds the missing mask to clear the field, ensuring the
correct delay value is written. Physical measurements on the RGMII TX
lines confirm the fix, showing the delay changing from ~1ns (before
change) to ~2ns.
While testing on i.MX 8MP showed this was within the platform's timing
tolerance, it did not match the intended hardware-characterized value.
Fixes: b19ac41faa3f ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014
==================================================================
On further analysis, I realized that the second parameter to min() is
not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data
buffer. if_bytes can be smaller than the data fork size because:
(a) the forkoff code tries to keep the data area as large as possible
(b) for symbolic links, if_bytes is the ondisk file size + 1
(c) forkoff is always a multiple of 8.
Case in point: for a single-byte symlink target, forkoff will be
8 but the buffer will only be 2 bytes long.
In other words, the logic here is wrong and we walk off the end of the
incore buffer. Fix that.
Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Dapeng Mi [Wed, 12 Nov 2025 08:05:26 +0000 (16:05 +0800)]
perf: Fix 0 count issue of cpu-clock
Currently cpu-clock event always returns 0 count, e.g.,
perf stat -e cpu-clock -- sleep 1
Performance counter stats for 'sleep 1':
0 cpu-clock # 0.000 CPUs utilized
1.002308394 seconds time elapsed
The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle
error of some clock events")' adds PERF_EF_UPDATE flag check before
calling cpu_clock_event_update() to update the count, however the
PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in
counting mode (pmu->dev() -> cpu_clock_event_del() ->
cpu_clock_event_stop()). This leads to the cpu-clock event count is
never updated.
To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event
just like what task-clock does.
Fixes: bc4394e5e79c ("perf: Fix the throttle error of some clock events") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com
David Howells [Fri, 24 Oct 2025 15:33:43 +0000 (16:33 +0100)]
cifs: Add the smb3_read_* tracepoints to SMB1
Add the smb3_read_* tracepoints to SMB1's cifs_async_readv() and
cifs_readv_callback().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
Shaurya Rane [Tue, 18 Nov 2025 15:02:57 +0000 (20:32 +0530)]
cifs: fix memory leak in smb3_fs_context_parse_param error path
Add proper cleanup of ctx->source and fc->source to the
cifs_parse_mount_err error handler. This ensures that memory allocated
for the source strings is correctly freed on all error paths, matching
the cleanup already performed in the success path by
smb3_cleanup_fs_context_contents().
Pointers are also set to NULL after freeing to prevent potential
double-free issues.
This change fixes a memory leak originally detected by syzbot. The
leak occurred when processing Opt_source mount options if an error
happened after ctx->source and fc->source were successfully
allocated but before the function completed.
The specific leak sequence was:
1. ctx->source = smb3_fs_context_fullpath(ctx, '/') allocates memory
2. fc->source = kstrdup(ctx->source, GFP_KERNEL) allocates more memory
3. A subsequent error jumps to cifs_parse_mount_err
4. The old error handler freed passwords but not the source strings,
causing the memory to leak.
This issue was not addressed by commit e8c73eb7db0a ("cifs: client:
fix memory leak in smb3_fs_context_parse_param"), which only fixed
leaks from repeated fsconfig() calls but not this error path.
Patch updated with minor change suggested by kernel test robot
Reported-by: syzbot+87be6809ed9bf6d718e3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=87be6809ed9bf6d718e3 Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Signed-off-by: Steve French <stfrench@microsoft.com>
Replace close_cached_dir() calls under cfid_list_lock with a new
close_cached_dir_locked() variant that uses kref_put() instead of
kref_put_lock() to avoid recursive locking when dropping references.
While the existing code works if the refcount >= 2 invariant holds,
this area has proven error-prone. Make deadlocks impossible and WARN
on invariant violations.
Cc: stable@vger.kernel.org Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Johannes Berg [Thu, 20 Nov 2025 08:43:24 +0000 (09:43 +0100)]
Merge tag 'rtw-2025-11-20' of https://github.com/pkshih/rtw
Ping-Ke Shih says:
==================
rtw patches for v6.18-rc7
Fix firmware goes wrong and causes device unusable after scanning. This
issue presents under certain regulatory domain reported from end users.
==================
Vincent Li [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]
LoongArch: BPF: Disable trampoline for kernel module function trace
The current LoongArch BPF trampoline implementation is incompatible
with tracing functions in kernel modules. This causes several severe
and user-visible problems:
* The `bpf_selftests/module_attach` test fails consistently.
* Kernel lockup when a BPF program is attached to a module function [1].
* Critical kernel modules like WireGuard experience traffic disruption
when their functions are traced with fentry [2].
Given the severity and the potential for other unknown side-effects, it
is safest to disable the feature entirely for now. This patch prevents
the BPF subsystem from allowing trampoline attachments to kernel module
functions on LoongArch.
This is a temporary mitigation until the core issues in the trampoline
code for kernel module handling can be identified and fixed.
Huacai Chen [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]
LoongArch: Don't panic if no valid cache info for PCI
If there is no valid cache info detected (may happen in virtual machine)
for pci_dfl_cache_line_size, kernel shouldn't panic. Because in the PCI
core it will be evaluated to (L1_CACHE_BYTES >> 2).
Cc: <stable@vger.kernel.org> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]
LoongArch: Mask all interrupts during kexec/kdump
If the default state of the interrupt controllers in the first kernel
don't mask any interrupts, it may cause the second kernel to potentially
receive interrupts (which were previously allocated by the first kernel)
immediately after a CPU becomes online during its boot process. These
interrupts cannot be properly routed, leading to bad IRQ issues.
This patch calls machine_kexec_mask_interrupts() to mask all interrupts
during the kexec/kdump process.
Bibo Mao [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]
LoongArch: Fix NUMA node parsing with numa_memblks
On physical machine, NUMA node id comes from high bit 44:48 of physical
address. However it is not true on virt machine. With general method, it
comes from ACPI SRAT table.
Here the common function numa_memblks_init() is used to parse NUMA node
information with numa_memblks.
Cc: <stable@vger.kernel.org> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Thomas Weißschuh [Thu, 20 Nov 2025 06:42:05 +0000 (14:42 +0800)]
LoongArch: Use UAPI types in ptrace UAPI header
The kernel UAPI headers already contain fixed-width integer types, there
is no need to rely on the libc types. There may not be a libc available
or the libc may not provides the <stdint.h>, like for example on nolibc.
This also aligns the header with the rest of the LoongArch UAPI headers.
Fixes: 803b0fc5c3f2 ("LoongArch: Add process management") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Jakub Kicinski [Thu, 20 Nov 2025 04:10:53 +0000 (20:10 -0800)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-11-18 (idpf, ice)
This series contains updates to idpf and ice drivers.
Emil adds a check for NULL vport_config during removal to avoid NULL
pointer dereference in idpf.
Grzegorz fixes PTP teardown paths to account for some missed cleanups
for ice driver.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix PTP cleanup on driver removal in error path
idpf: fix possible vport_config NULL pointer deref in remove
====================