]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
5 weeks agoxenbus: add xenbus_device parameter to xenbus_read_driver_state()
Juergen Gross [Wed, 18 Feb 2026 09:52:04 +0000 (10:52 +0100)] 
xenbus: add xenbus_device parameter to xenbus_read_driver_state()

In order to prepare checking the xenbus device status in
xenbus_read_driver_state(), add the pointer to struct xenbus_device
as a parameter.

Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com> # SCSI
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/xen-pcifront.c
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20260218095205.453657-2-jgross@suse.com>

5 weeks agoarm64: Silence sparse warnings caused by the type casting in (cmp)xchg
Catalin Marinas [Fri, 27 Feb 2026 15:51:42 +0000 (15:51 +0000)] 
arm64: Silence sparse warnings caused by the type casting in (cmp)xchg

The arm64 xchg/cmpxchg() wrappers cast the arguments to (unsigned long)
prior to invoking the static inline functions implementing the
operation. Some restrictive type annotations (e.g. __bitwise) lead to
sparse warnings like below:

sparse warnings: (new ones prefixed by >>)
   fs/crypto/bio.c:67:17: sparse: sparse: cast from restricted blk_status_t
>> fs/crypto/bio.c:67:17: sparse: sparse: cast to restricted blk_status_t

Force the casting in the arm64 xchg/cmpxchg() wrappers to silence
sparse.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602230947.uNRsPyBn-lkp@intel.com/
Link: https://lore.kernel.org/r/202602230947.uNRsPyBn-lkp@intel.com/
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Will Deacon <will@kernel.org>
5 weeks agoperf build: Prevent "argument list too long" error
Markus Mayer [Tue, 3 Mar 2026 21:15:01 +0000 (13:15 -0800)] 
perf build: Prevent "argument list too long" error

Due to a recent change, building perf may result in a build error when
it is trying to "prune orphans".

The file list passed to "rm" may exceed what the shell can handle.

The build will then abort with an error like this:

    TEST    [...]/arm64/build/linux-custom/tools/perf/pmu-events/metric_test.log
  make[5]: /bin/sh: Argument list too long
  make[5]: *** [pmu-events/Build:217: prune_orphans] Error 127
  make[5]: *** Waiting for unfinished jobs....
  make[4]: *** [Makefile.perf:773: [...]/tools/perf/pmu-events/pmu-events-in.o] Error 2
  make[4]: *** Waiting for unfinished jobs....
  make[3]: *** [Makefile.perf:289: sub-make] Error 2

Processing the arguments via "xargs", instead of passing the list of
files directly to "rm" via the shell, prevents this issue.

Fixes: 36a1b0061a584430 ("perf build: Reduce pmu-events related copying and mkdirs")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agotools build: Make in-target rule robust against too long argument error
Changqing Li [Mon, 28 Jul 2025 09:31:53 +0000 (17:31 +0800)] 
tools build: Make in-target rule robust against too long argument error

The command length of in-target scales with the depth of the directory
times the number of objects in the Makefile.

When there are many objects, and O=[absolute_path] is set, and the
absolute_path is relatively long.

It is possible that this line "$(call if_changed,$(host)ld_multi)" will
report error: "make[4]: /bin/sh: Argument list too long"

For example, build perf tools with O=/long/output/path

Like built-in.a and *.mod rules in scripts/Makefile.build, add
$(objpredix)/ by the shell command instead of by Make's builtin
function.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Changqing Li <changqing.li@windriver.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agotools headers: Sync uapi/linux/prctl.h with the kernel source
Arnaldo Carvalho de Melo [Fri, 27 Feb 2026 12:08:14 +0000 (09:08 -0300)] 
tools headers: Sync uapi/linux/prctl.h with the kernel source

To pick up the changes in these csets:

  5ca243f6e3c30b97 ("prctl: add arch-agnostic prctl()s for indirect branch tracking")
  28621ec2d46c6adf ("rseq: Add prctl() to enable time slice extensions")

That don't introduced these new prctls:

  $ tools/perf/trace/beauty/prctl_option.sh > before.txt
  $ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after.txt
  $ diff -u before.txt after.txt
  --- before.txt 2026-02-27 09:07:16.435611457 -0300
  +++ after.txt 2026-02-27 09:07:28.189816531 -0300
  @@ -73,6 +73,10 @@
    [76] = "LOCK_SHADOW_STACK_STATUS",
    [77] = "TIMER_CREATE_RESTORE_IDS",
    [78] = "FUTEX_HASH",
  + [79] = "RSEQ_SLICE_EXTENSION",
  + [80] = "GET_INDIR_BR_LP_STATUS",
  + [81] = "SET_INDIR_BR_LP_STATUS",
  + [82] = "LOCK_INDIR_BR_LP_STATUS",
   };
   static const char *prctl_set_mm_options[] = {
    [1] = "START_CODE",
  $

That now will be used to decode the syscall option and also to compose
filters, for instance:

  [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME
       0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee)
       0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670)
       7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10)
       7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970)
       8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10)
       8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970)
  ^C[root@five ~]#

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Please see tools/include/uapi/README for further details.

Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agoblock: use __bio_add_page in bio_copy_kern
Yang Xiuwei [Wed, 4 Mar 2026 04:51:19 +0000 (12:51 +0800)] 
block: use __bio_add_page in bio_copy_kern

Since the bio is allocated with the exact number of pages needed via
blk_rq_map_bio_alloc(), and the loop iterates exactly that many times,
bio_add_page() cannot fail due to insufficient space.  Switch to
__bio_add_page() and remove the dead error handling code.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 weeks agodrm/xe: Fix memory leak in xe_vm_madvise_ioctl
Varun Gupta [Mon, 23 Feb 2026 17:51:45 +0000 (23:21 +0530)] 
drm/xe: Fix memory leak in xe_vm_madvise_ioctl

When check_bo_args_are_sane() validation fails, jump to the new
free_vmas cleanup label to properly free the allocated resources.
This ensures proper cleanup in this error path.

Fixes: 293032eec4ba ("drm/xe/bo: Update atomic_access attribute on madvise")
Cc: stable@vger.kernel.org # v6.18+
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260223175145.1532801-1-varun.gupta@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit 29bd06faf727a4b76663e4be0f7d770e2d2a7965)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 weeks agodrm/xe/reg_sr: Fix leak on xa_store failure
Shuicheng Lin [Wed, 4 Feb 2026 17:28:11 +0000 (17:28 +0000)] 
drm/xe/reg_sr: Fix leak on xa_store failure

Free the newly allocated entry when xa_store() fails to avoid a memory
leak on the error path.

v2: use goto fail_free. (Bala)

Fixes: e5283bd4dfec ("drm/xe/reg_sr: Remove register pool")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260204172810.1486719-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 6bc6fec71ac45f52db609af4e62bdb96b9f5fadb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 weeks agodrm/xe/xe2_hpg: Correct implementation of Wa_16025250150
Matt Roper [Fri, 27 Feb 2026 16:43:41 +0000 (08:43 -0800)] 
drm/xe/xe2_hpg: Correct implementation of Wa_16025250150

Wa_16025250150 asks us to set five register fields of the register to
0x1 each.  However we were just OR'ing this into the existing register
value (which has a default of 0x4 for each nibble-sized field) resulting
in final field values of 0x5 instead of the desired 0x1.  Correct the
RTP programming (use FIELD_SET instead of SET) to ensure each field is
assigned to exactly the value we want.

Cc: Aradhya Bhatia <aradhya.bhatia@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: stable@vger.kernel.org # v6.16+
Fixes: 7654d51f1fd8 ("drm/xe/xe2hpg: Add Wa_16025250150")
Reviewed-by: Ngai-Mint Kwan <ngai-mint.kwan@linux.intel.com>
Link: https://patch.msgid.link/20260227164341.3600098-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit d139209ef88e48af1f6731cd45440421c757b6b5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 weeks agodrm/xe/gsc: Fix GSC proxy cleanup on early initialization failure
Zhanjun Dong [Fri, 20 Feb 2026 22:53:08 +0000 (17:53 -0500)] 
drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure

xe_gsc_proxy_remove undoes what is done in both xe_gsc_proxy_init and
xe_gsc_proxy_start; however, if we fail between those 2 calls, it is
possible that the HW forcewake access hasn't been initialized yet and so
we hit errors when the cleanup code tries to write GSC register. To
avoid that, split the cleanup in 2 functions so that the HW cleanup is
only called if the HW setup was completed successfully.

Since the HW cleanup (interrupt disabling) is now removed from
xe_gsc_proxy_remove, the cleanup on error paths in xe_gsc_proxy_start
must be updated to disable interrupts before returning.

Fixes: ff6cd29b690b ("drm/xe: Cleanup unwind of gt initialization")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260220225308.101469-1-zhanjun.dong@intel.com
(cherry picked from commit 2b37c401b265c07b46408b5cb36a4b757c9b5060)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 weeks agoRevert "drm/pagemap: Disable device-to-device migration"
Thomas Hellström [Wed, 11 Feb 2026 10:41:59 +0000 (11:41 +0100)] 
Revert "drm/pagemap: Disable device-to-device migration"

With commit
a69d1ab971a6 ("mm: Fix a hmm_range_fault() livelock / starvation problem")
device-to-device migration is not functional again and the
disabling can be reverted.

Add the above commit as a Fixes: tag in order for the revert to not
take place unless that commit is present.

This reverts commit 10dd1eaa80a56d3cf6d7c36b5269c8fed617f001.

Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: b570f37a2ce4 ("mm: Fix a hmm_range_fault() livelock / starvation problem")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260211104159.114947-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 1a3c0049b3f56278c9caf2784c53f6ab435fd12c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo updated Fixes tag]

5 weeks agoiomap: reject delalloc mappings during writeback
Darrick J. Wong [Mon, 2 Mar 2026 17:30:02 +0000 (09:30 -0800)] 
iomap: reject delalloc mappings during writeback

Filesystems should never provide a delayed allocation mapping to
writeback; they're supposed to allocate the space before replying.
This can lead to weird IO errors and crashes in the block layer if the
filesystem is being malicious, or if it hadn't set iomap->dev because
it's a delalloc mapping.

Fix this by failing writeback on delalloc mappings.  Currently no
filesystems actually misbehave in this manner, but we ought to be
stricter about things like that.

Cc: stable@vger.kernel.org # v5.5
Fixes: 598ecfbaa742ac ("iomap: lift the xfs writeback code to iomap")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/20260302173002.GL13829@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agoio_uring/zcrx: use READ_ONCE with user shared RQEs
Pavel Begunkov [Wed, 4 Mar 2026 12:37:43 +0000 (12:37 +0000)] 
io_uring/zcrx: use READ_ONCE with user shared RQEs

Refill queue entries are shared with the user space, use READ_ONCE when
reading them.

Fixes: 34a3e60821ab9 ("io_uring/zcrx: implement zerocopy receive pp memory provider");
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 weeks agodrm/i915/psr: Fix for Panel Replay X granularity DPCD register handling
Jouni Högander [Wed, 25 Feb 2026 07:42:21 +0000 (09:42 +0200)] 
drm/i915/psr: Fix for Panel Replay X granularity DPCD register handling

DP specification is saying value 0xff 0xff in PANEL REPLAY SELECTIVE UPDATE
X GRANULARITY CAPABILITY registers (0xb2 and 0xb3) means full-line
granularity. Take this into account when handling Panel Replay X
granularity informed by the panel.

Fixes: 1cc854647450 ("drm/i915/psr: Use SU granularity information available in intel_connector")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7284
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patch.msgid.link/20260225074221.1744330-2-jouni.hogander@intel.com
(cherry picked from commit f5c8f824a495e849492f09a43bd965a8f4d86cb2)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
5 weeks agodrm/dp: Add definition for Panel Replay full-line granularity
Jouni Högander [Wed, 25 Feb 2026 07:42:20 +0000 (09:42 +0200)] 
drm/dp: Add definition for Panel Replay full-line granularity

DP specification is saying value 0xff 0xff in PANEL REPLAY SELECTIVE UPDATE
X GRANULARITY CAPABILITY registers (0xb2 and 0xb3) means full-line
granularity. Add definition for this.

Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/20260225074221.1744330-1-jouni.hogander@intel.com
(cherry picked from commit b93311673263bb98a200ab1cb6304f969bdada5c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
5 weeks agoMerge patch "iomap: don't mark folio uptodate if read IO has bytes pending"
Christian Brauner [Wed, 4 Mar 2026 13:19:05 +0000 (14:19 +0100)] 
Merge patch "iomap: don't mark folio uptodate if read IO has bytes pending"

Joanne Koong <joannelkoong@gmail.com> says:

This is a fix for this scenario:

->read_folio() gets called on a folio size that is 16k while the file is 4k:
  a) ifs->read_bytes_pending gets initialized to 16k
  b) ->read_folio_range() is called for the 4k read
  c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
0 to 4k range is marked uptodate
  d) the post-eof blocks are zeroed and marked uptodate in the call to
iomap_set_range_uptodate()
  e) iomap_set_range_uptodate() sees all the ranges are marked
uptodate and it marks the folio uptodate
  f) iomap_read_end() gets called to subtract the 12k from
ifs->read_bytes_pending. it too sees all the ranges are marked
uptodate and marks the folio uptodate using XOR
  g) the XOR call clears the uptodate flag on the folio

The same situation can occur if the last range read for the folio is done as
an inline read and all the previous ranges have already completed by the time
the inline read completes.

For more context, the full discussion can be found in [1]. There was a
discussion about alternative approaches in that thread, but they had more
complications.

There is another discussion in v1 [2] about consolidating the read paths.
Until that is resolved, this patch fixes the issue.

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1Z9za5w4FoJqTGx50zR2haHHaoot1KJViQyEHJQq4=34w@mail.gmail.com/#t
[2] https://lore.kernel.org/linux-fsdevel/20260219003911.344478-1-joannelkoong@gmail.com/T/#u

* patches from https://patch.msgid.link/20260303233420.874231-1-joannelkoong@gmail.com:
  iomap: don't mark folio uptodate if read IO has bytes pending

Link: https://patch.msgid.link/20260303233420.874231-1-joannelkoong@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agoiomap: don't mark folio uptodate if read IO has bytes pending
Joanne Koong [Tue, 3 Mar 2026 23:34:20 +0000 (15:34 -0800)] 
iomap: don't mark folio uptodate if read IO has bytes pending

If a folio has ifs metadata attached to it and the folio is partially
read in through an async IO helper with the rest of it then being read
in through post-EOF zeroing or as inline data, and the helper
successfully finishes the read first, then post-EOF zeroing / reading
inline will mark the folio as uptodate in iomap_set_range_uptodate().

This is a problem because when the read completion path later calls
iomap_read_end(), it will call folio_end_read(), which sets the uptodate
bit using XOR semantics. Calling folio_end_read() on a folio that was
already marked uptodate clears the uptodate bit.

Fix this by not marking the folio as uptodate if the read IO has bytes
pending. The folio uptodate state will be set in the read completion
path through iomap_end_read() -> folio_end_read().

Reported-by: Wei Gao <wegao@suse.com>
Suggested-by: Sasha Levin <sashal@kernel.org>
Tested-by: Wei Gao <wegao@suse.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: stable@vger.kernel.org # v6.19
Link: https://lore.kernel.org/linux-fsdevel/aYbmy8JdgXwsGaPP@autotest-wegao.qe.prg2.suse.org/
Fixes: b2f35ac4146d ("iomap: add caller-provided callbacks for read and readahead")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260303233420.874231-2-joannelkoong@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agotime/jiffies: Fix sysctl file error on configurations where USER_HZ < HZ
Gerd Rausch [Wed, 25 Feb 2026 23:37:49 +0000 (15:37 -0800)] 
time/jiffies: Fix sysctl file error on configurations where USER_HZ < HZ

Commit 2dc164a48e6fd ("sysctl: Create converter functions with two new
macros") incorrectly returns error to user space when jiffies sysctl
converter is used. The old overflow check got replaced with an
unconditional one:
     +    if (USER_HZ < HZ)
     +        return -EINVAL;
which will always be true on configurations with "USER_HZ < HZ".

Remove the check; it is no longer needed as clock_t_to_jiffies() returns
ULONG_MAX for the overflow case and proc_int_u2k_conv_uop() checks for
"> INT_MAX" after conversion

Fixes: 2dc164a48e6fd ("sysctl: Create converter functions with two new macros")
Reported-by: Colm Harrington <colm.harrington@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
5 weeks agoi2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"
Charles Haithcock [Sat, 28 Feb 2026 01:41:15 +0000 (18:41 -0700)] 
i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"

This reverts commit f707d6b9e7c18f669adfdb443906d46cfbaaa0c1.

Under rare circumstances, multiple udev threads can collect i801 device
info on boot and walk i801_acpi_io_handler somewhat concurrently. The
first will note the area is reserved by acpi to prevent further touches.
This ultimately causes the area to be deregistered. The second will
enter i801_acpi_io_handler after the area is unregistered but before a
check can be made that the area is unregistered. i2c_lock_bus relies on
the now unregistered area containing lock_ops to lock the bus. The end
result is a kernel panic on boot with the following backtrace;

[   14.971872] ioatdma 0000:09:00.2: enabling device (0100 -> 0102)
[   14.971873] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   14.971880] #PF: supervisor read access in kernel mode
[   14.971884] #PF: error_code(0x0000) - not-present page
[   14.971887] PGD 0 P4D 0
[   14.971894] Oops: 0000 [#1] PREEMPT SMP PTI
[   14.971900] CPU: 5 PID: 956 Comm: systemd-udevd Not tainted 5.14.0-611.5.1.el9_7.x86_64 #1
[   14.971905] Hardware name: XXXXXXXXXXXXXXXXXXXXXXX BIOS 1.20.10.SV91 01/30/2023
[   14.971908] RIP: 0010:i801_acpi_io_handler+0x2d/0xb0 [i2c_i801]
[   14.971929] Code: 00 00 49 8b 40 20 41 57 41 56 4d 8b b8 30 04 00 00 49 89 ce 41 55 41 89 d5 41 54 49 89 f4 be 02 00 00 00 55 4c 89 c5 53 89 fb <48> 8b 00 4c 89 c7 e8 18 61 54 e9 80 bd 80 04 00 00 00 75 09 4c 3b
[   14.971933] RSP: 0018:ffffbaa841483838 EFLAGS: 00010282
[   14.971938] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9685e01ba568
[   14.971941] RDX: 0000000000000008 RSI: 0000000000000002 RDI: 0000000000000000
[   14.971944] RBP: ffff9685ca22f028 R08: ffff9685ca22f028 R09: ffff9685ca22f028
[   14.971948] R10: 000000000000000b R11: 0000000000000580 R12: 0000000000000580
[   14.971951] R13: 0000000000000008 R14: ffff9685e01ba568 R15: ffff9685c222f000
[   14.971954] FS:  00007f8287c0ab40(0000) GS:ffff96a47f940000(0000) knlGS:0000000000000000
[   14.971959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.971963] CR2: 0000000000000000 CR3: 0000000168090001 CR4: 00000000003706f0
[   14.971966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   14.971968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   14.971972] Call Trace:
[   14.971977]  <TASK>
[   14.971981]  ? show_trace_log_lvl+0x1c4/0x2df
[   14.971994]  ? show_trace_log_lvl+0x1c4/0x2df
[   14.972003]  ? acpi_ev_address_space_dispatch+0x16e/0x3c0
[   14.972014]  ? __die_body.cold+0x8/0xd
[   14.972021]  ? page_fault_oops+0x132/0x170
[   14.972028]  ? exc_page_fault+0x61/0x150
[   14.972036]  ? asm_exc_page_fault+0x22/0x30
[   14.972045]  ? i801_acpi_io_handler+0x2d/0xb0 [i2c_i801]
[   14.972061]  acpi_ev_address_space_dispatch+0x16e/0x3c0
[   14.972069]  ? __pfx_i801_acpi_io_handler+0x10/0x10 [i2c_i801]
[   14.972085]  acpi_ex_access_region+0x5b/0xd0
[   14.972093]  acpi_ex_field_datum_io+0x73/0x2e0
[   14.972100]  acpi_ex_read_data_from_field+0x8e/0x230
[   14.972106]  acpi_ex_resolve_node_to_value+0x23d/0x310
[   14.972114]  acpi_ds_evaluate_name_path+0xad/0x110
[   14.972121]  acpi_ds_exec_end_op+0x321/0x510
[   14.972127]  acpi_ps_parse_loop+0xf7/0x680
[   14.972136]  acpi_ps_parse_aml+0x17a/0x3d0
[   14.972143]  acpi_ps_execute_method+0x137/0x270
[   14.972150]  acpi_ns_evaluate+0x1f4/0x2e0
[   14.972158]  acpi_evaluate_object+0x134/0x2f0
[   14.972164]  acpi_evaluate_integer+0x50/0xe0
[   14.972173]  ? vsnprintf+0x24b/0x570
[   14.972181]  acpi_ac_get_state.part.0+0x23/0x70
[   14.972189]  get_ac_property+0x4e/0x60
[   14.972195]  power_supply_show_property+0x90/0x1f0
[   14.972205]  add_prop_uevent+0x29/0x90
[   14.972213]  power_supply_uevent+0x109/0x1d0
[   14.972222]  dev_uevent+0x10e/0x2f0
[   14.972228]  uevent_show+0x8e/0x100
[   14.972236]  dev_attr_show+0x19/0x40
[   14.972246]  sysfs_kf_seq_show+0x9b/0x100
[   14.972253]  seq_read_iter+0x120/0x4b0
[   14.972262]  ? selinux_file_permission+0x106/0x150
[   14.972273]  vfs_read+0x24f/0x3a0
[   14.972284]  ksys_read+0x5f/0xe0
[   14.972291]  do_syscall_64+0x5f/0xe0
...

The kernel panic is mitigated by setting limiting the count of udev
children to 1. Revert to using the acpi_lock to continue protecting
marking the area as owned by firmware without relying on a lock in
a potentially unmapped region of memory.

Fixes: f707d6b9e7c1 ("i2c: i801: replace acpi_lock with I2C bus lock")
Signed-off-by: Charles Haithcock <chaithco@redhat.com>
[wsa: added Fixes-tag and updated comment stating the importance of the lock]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
5 weeks agoASoC: amd: yc: Add DMI quirk for ASUS EXPERTBOOK PM1503CDA
Zhang Heng [Wed, 4 Mar 2026 06:32:55 +0000 (14:32 +0800)] 
ASoC: amd: yc: Add DMI quirk for ASUS EXPERTBOOK PM1503CDA

Add a DMI quirk for the ASUS EXPERTBOOK PM1503CDA fixing the
issue where the internal microphone was not detected.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=221070
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Heng <zhangheng@kylinos.cn>
Link: https://patch.msgid.link/20260304063255.139331-1-zhangheng@kylinos.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoASoC: dt-bindings: renesas,rz-ssi: Document RZ/G3L SoC
Biju Das [Wed, 4 Mar 2026 07:19:55 +0000 (07:19 +0000)] 
ASoC: dt-bindings: renesas,rz-ssi: Document RZ/G3L SoC

Document RZ/G3L SSIF-2 bindings. The RZ/G3L SSIF-2 IP is identical to one
found on the RZ/G2L SoC.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20260304072000.6787-1-biju.das.jz@bp.renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agox86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
Jan Stancek [Wed, 25 Feb 2026 19:30:23 +0000 (20:30 +0100)] 
x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths

CONFIG_EFI_SBAT_FILE can be a relative path. When compiling using a different
output directory (O=) the build currently fails because it can't find the
filename set in CONFIG_EFI_SBAT_FILE:

  arch/x86/boot/compressed/sbat.S: Assembler messages:
  arch/x86/boot/compressed/sbat.S:6: Error: file not found: kernel.sbat

Add $(srctree) as include dir for sbat.o.

  [ bp: Massage commit message. ]

Fixes: 61b57d35396a ("x86/efi: Implement support for embedding SBAT data for x86")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/f4eda155b0cef91d4d316b4e92f5771cb0aa7187.1772047658.git.jstancek@redhat.com
5 weeks agodrm/ttm/tests: Fix build failure on PREEMPT_RT
Maarten Lankhorst [Wed, 4 Mar 2026 08:56:16 +0000 (09:56 +0100)] 
drm/ttm/tests: Fix build failure on PREEMPT_RT

Fix a compile error in the kunit tests when CONFIG_PREEMPT_RT is
enabled, and the normal mutex is converted into a rtmutex.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602261547.3bM6yVAS-lkp@intel.com/
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patch.msgid.link/20260304085616.1216961-1-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
5 weeks agopmdomain: rockchip: Fix PD_VCODEC for RK3588
Shawn Lin [Wed, 25 Feb 2026 02:55:01 +0000 (10:55 +0800)] 
pmdomain: rockchip: Fix PD_VCODEC for RK3588

>From the RK3588 TRM Table 7-1 RK3588 Voltage Domain and Power Domain Summary,
PD_RKVDEC0/1 and PD_VENC0/1 rely on VD_VCODEC which require extra voltages to
be applied, otherwise it breaks RK3588-evb1-v10 board after vdec support landed[1].
The panic looks like below:

  rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'rkvdec0' on, val=0
  rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'rkvdec1' on, val=0
  ...
  Hardware name: Rockchip RK3588S EVB1 V10 Board (DT)
  Workqueue: pm genpd_power_off_work_fn
  Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x40/0x84
  dump_stack+0x18/0x24
  vpanic+0x1ec/0x4fc
  vpanic+0x0/0x4fc
  check_panic_on_warn+0x0/0x94
  arm64_serror_panic+0x6c/0x78
  do_serror+0xc4/0xcc
  el1h_64_error_handler+0x3c/0x5c
  el1h_64_error+0x6c/0x70
  regmap_mmio_read32le+0x18/0x24 (P)
  regmap_bus_reg_read+0xfc/0x130
  regmap_read+0x188/0x1ac
  regmap_read+0x54/0x78
  rockchip_pd_power+0xcc/0x5f0
  rockchip_pd_power_off+0x1c/0x4c
  genpd_power_off+0x84/0x120
  genpd_power_off+0x1b4/0x260
  genpd_power_off_work_fn+0x38/0x58
  process_scheduled_works+0x194/0x2c4
  worker_thread+0x2ac/0x3d8
  kthread+0x104/0x124
  ret_from_fork+0x10/0x20
  SMP: stopping secondary CPUs
  Kernel Offset: disabled
  CPU features: 0x3000000,000e0005,40230521,0400720b
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

Chaoyi pointed out the PD_VCODEC is the parent of PD_RKVDEC0/1 and PD_VENC0/1, so checking
the PD_VCODEC is enough.

[1] https://lore.kernel.org/linux-rockchip/20251020212009.8852-2-detlev.casanova@collabora.com/

Fixes: db6df2e3fc16 ("pmdomain: rockchip: add regulator support")
Cc: stable@vger.kernel.org
Suggested-by: Chaoyi Chen <chaoyi.chen@rock-chips.com>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Chaoyi Chen <chaoyi.chen@rock-chips.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
5 weeks agomm/slab: change stride type from unsigned short to unsigned int
Harry Yoo [Tue, 3 Mar 2026 13:57:22 +0000 (22:57 +0900)] 
mm/slab: change stride type from unsigned short to unsigned int

Commit 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
defined the type of slab->stride as unsigned short, because the author
initially planned to store stride within the lower 16 bits of the
page_type field, but later stored it in unused bits in the counters
field instead.

However, the idea of having only 2-byte stride turned out to be a
serious mistake. On systems with 64k pages, order-1 pages are 128k,
which is larger than USHRT_MAX. It triggers a debug warning because
s->size is 128k while stride, truncated to 2 bytes, becomes zero:

  ------------[ cut here ]------------
  Warning! stride (0) != s->size (131072)
  WARNING: mm/slub.c:2231 at alloc_slab_obj_exts_early.constprop.0+0x524/0x534, CPU#6: systemd-sysctl/307
  Modules linked in:
  CPU: 6 UID: 0 PID: 307 Comm: systemd-sysctl Not tainted 7.0.0-rc1+ #6 PREEMPTLAZY
  Hardware name: IBM,9009-22A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW950.E0 (VL950_179) hv:phyp pSeries
  NIP:  c0000000008a9ac0 LR: c0000000008a9abc CTR: 0000000000000000
  REGS: c0000000141f7390 TRAP: 0700   Not tainted  (7.0.0-rc1+)
  MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28004400  XER: 00000005
  CFAR: c000000000279318 IRQMASK: 0
  GPR00: c0000000008a9abc c0000000141f7630 c00000000252a300 c00000001427b200
  GPR04: 0000000000000004 0000000000000000 c000000000278fd0 0000000000000000
  GPR08: fffffffffffe0000 0000000000000000 0000000000000000 0000000022004400
  GPR12: c000000000f644b0 c000000017ff8f00 0000000000000000 0000000000000000
  GPR16: 0000000000000000 c0000000141f7aa0 0000000000000000 c0000000141f7a88
  GPR20: 0000000000000000 0000000000400cc0 ffffffffffffffff c00000001427b180
  GPR24: 0000000000000004 00000000000c0cc0 c000000004e89a20 c00000005de90011
  GPR28: 0000000000010010 c00000005df00000 c000000006017f80 c00c000000177a00
  NIP [c0000000008a9ac0] alloc_slab_obj_exts_early.constprop.0+0x524/0x534
  LR [c0000000008a9abc] alloc_slab_obj_exts_early.constprop.0+0x520/0x534
  Call Trace:
  [c0000000141f7630] [c0000000008a9abc] alloc_slab_obj_exts_early.constprop.0+0x520/0x534 (unreliable)
  [c0000000141f76c0] [c0000000008aafbc] allocate_slab+0x154/0x94c
  [c0000000141f7760] [c0000000008b41c0] refill_objects+0x124/0x16c
  [c0000000141f77c0] [c0000000008b4be0] __pcs_replace_empty_main+0x2b0/0x444
  [c0000000141f7810] [c0000000008b9600] __kvmalloc_node_noprof+0x840/0x914
  [c0000000141f7900] [c000000000a3dd40] seq_read_iter+0x60c/0xb00
  [c0000000141f7a10] [c000000000b36b24] proc_reg_read_iter+0x154/0x1fc
  [c0000000141f7a50] [c0000000009cee7c] vfs_read+0x39c/0x4e4
  [c0000000141f7b30] [c0000000009d0214] ksys_read+0x9c/0x180
  [c0000000141f7b90] [c00000000003a8d0] system_call_exception+0x1e0/0x4b0
  [c0000000141f7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec

This leads to slab_obj_ext() returning the first slabobj_ext or all
objects and confuses the reference counting of object cgroups [1] and
memory (un)charging for memory cgroups [2].

Fortunately, the counters field has 32 unused bits instead of 16
on 64-bit CPUs, which is wide enough to hold any value of s->size.
Change the type to unsigned int.

Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
Closes: https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@linux.ibm.com [2]
Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260303135722.2680521-1-harry.yoo@oracle.com
Reviewed-by: Hao Li <hao.li@linux.dev>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
5 weeks agomm/slab: allow sheaf refill if blocking is not allowed
Vlastimil Babka (SUSE) [Mon, 2 Mar 2026 09:55:37 +0000 (10:55 +0100)] 
mm/slab: allow sheaf refill if blocking is not allowed

Ming Lei reported [1] a regression in the ublk null target benchmark due
to sheaves. The profile shows that the alloc_from_pcs() fastpath fails
and allocations fall back to ___slab_alloc(). It also shows the
allocations happen through mempool_alloc().

The strategy of mempool_alloc() is to call the underlying allocator
(here slab) without __GFP_DIRECT_RECLAIM first. This does not play well
with __pcs_replace_empty_main() checking for gfpflags_allow_blocking()
to decide if it should refill an empty sheaf or fallback to the
slowpath, so we end up falling back.

We could change the mempool strategy but there might be other paths
doing the same ting. So instead allow sheaf refill when blocking is not
allowed, changing the condition to gfpflags_allow_spinning(). The
original condition was unnecessarily restrictive.

Note this doesn't fully resolve the regression [1] as another component
of that are memoryless nodes, which is to be addressed separately.

Reported-by: Ming Lei <ming.lei@redhat.com>
Fixes: e47c897a2949 ("slab: add sheaves to most caches")
Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
Link: https://patch.msgid.link/20260302095536.34062-2-vbabka@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
5 weeks agoata: libata: cancel pending work after clearing deferred_qc
Niklas Cassel [Tue, 3 Mar 2026 10:03:42 +0000 (11:03 +0100)] 
ata: libata: cancel pending work after clearing deferred_qc

Syzbot reported a WARN_ON() in ata_scsi_deferred_qc_work(), caused by
ap->ops->qc_defer() returning non-zero before issuing the deferred qc.

ata_scsi_schedule_deferred_qc() is called during each command completion.
This function will check if there is a deferred QC, and if
ap->ops->qc_defer() returns zero, meaning that it is possible to queue the
deferred qc at this time (without being deferred), then it will queue the
work which will issue the deferred qc.

Once the work get to run, which can potentially be a very long time after
the work was scheduled, there is a WARN_ON() if ap->ops->qc_defer() returns
non-zero.

While we hold the ap->lock both when assigning and clearing deferred_qc,
and the work itself holds the ap->lock, the code currently does not cancel
the work after clearing the deferred qc.

This means that the following scenario can happen:
1) One or several NCQ commands are queued.
2) A non-NCQ command is queued, gets stored in ap->deferred_qc.
3) Last NCQ command gets completed, work is queued to issue the deferred
   qc.
4) Timeout or error happens, ap->deferred_qc is cleared. The queued work is
   currently NOT canceled.
5) Port is reset.
6) One or several NCQ commands are queued.
7) A non-NCQ command is queued, gets stored in ap->deferred_qc.
8) Work is finally run. Yet at this time, there is still NCQ commands in
   flight.

The work in 8) really belongs to the non-NCQ command in 2), not to the
non-NCQ command in 7). The reason why the work is executed when it is not
supposed to, is because it was never canceled when ap->deferred_qc was
cleared in 4). Thus, ensure that we always cancel the work after clearing
ap->deferred_qc.

Another potential fix would have been to let ata_scsi_deferred_qc_work() do
nothing if ap->ops->qc_defer() returns non-zero. However, canceling the
work when clearing ap->deferred_qc seems slightly more logical, as we hold
the ap->lock when clearing ap->deferred_qc, so we know that the work cannot
be holding the lock. (The function could be waiting for the lock, but that
is okay since it will do nothing if ap->deferred_qc is not set.)

Reported-by: syzbot+bcaf842a1e8ead8dfb89@syzkaller.appspotmail.com
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts")
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
5 weeks agodrm/sched: Fix kernel-doc warning for drm_sched_job_done()
Yujie Liu [Fri, 27 Feb 2026 08:24:52 +0000 (16:24 +0800)] 
drm/sched: Fix kernel-doc warning for drm_sched_job_done()

There is a kernel-doc warning for the scheduler:

Warning: drivers/gpu/drm/scheduler/sched_main.c:367 function parameter 'result' not described in 'drm_sched_job_done'

Fix the warning by describing the undocumented error code.

Fixes: 539f9ee4b52a ("drm/scheduler: properly forward fence errors")
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
[phasta: Flesh out commit message]
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20260227082452.1802922-1-yujie.liu@intel.com
5 weeks agoxfs: fix race between healthmon unmount and read_iter
Darrick J. Wong [Mon, 2 Mar 2026 17:31:58 +0000 (09:31 -0800)] 
xfs: fix race between healthmon unmount and read_iter

xfs/1879 on one of my test VMs got stuck due to the xfs_io healthmon
subcommand sleeping in wait_event_interruptible at:

 xfs_healthmon_read_iter+0x558/0x5f8 [xfs]
 vfs_read+0x248/0x320
 ksys_read+0x78/0x120

Looking at xfs_healthmon_read_iter, in !O_NONBLOCK mode it will sleep
until the mount cookie == DETACHED_MOUNT_COOKIE, there are events
waiting to be formatted, or there are formatted events in the read
buffer that could be copied to userspace.

Poking into the running kernel, I see that there are zero events in the
list, the read buffer is empty, and the mount cookie is indeed in
DETACHED state.  IOWs, xfs_healthmon_has_eventdata should have returned
true, but instead we're asleep waiting for a wakeup.

I think what happened here is that xfs_healthmon_read_iter and
xfs_healthmon_unmount were racing with each other, and _read_iter lost
the race.  _unmount queued an unmount event, which woke up _read_iter.
It found, formatted, and copied the event out to userspace.  That
cleared out the pending event list and emptied the read buffer.  xfs_io
then called read() again, so _has_eventdata decided that we should sleep
on the empty event queue.

Next, _unmount called xfs_healthmon_detach, which set the mount cookie
to DETACHED.  Unfortunately, it didn't call wake_up_all on the hm, so
the wait_event_interruptible in the _read_iter thread remains asleep.
That's why the test stalled.

Fix this by moving the wake_up_all call to xfs_healthmon_detach.

Fixes: b3a289a2a9397b ("xfs: create event queuing, formatting, and discovery infrastructure")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
5 weeks agoxfs: remove scratch field from struct xfs_gc_bio
Damien Le Moal [Wed, 25 Feb 2026 22:46:46 +0000 (07:46 +0900)] 
xfs: remove scratch field from struct xfs_gc_bio

The scratch field in struct xfs_gc_bio is unused. Remove it.

Fixes: 102f444b57b3 ("xfs: rework zone GC buffer management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
5 weeks agoata: libata-core: Disable LPM on ST1000DM010-2EP102
Maximilian Pezzullo [Wed, 4 Mar 2026 07:22:59 +0000 (08:22 +0100)] 
ata: libata-core: Disable LPM on ST1000DM010-2EP102

According to a user report, the ST1000DM010-2EP102 has problems with LPM,
causing random system freezes. The drive belongs to the same BarraCuda
family as the ST2000DM008-2FR102 which has the same issue.

Cc: stable@vger.kernel.org
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type")
Reported-by: Filippo Baiamonte <filippo.ba03@bugzilla.kernel.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221163
Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
5 weeks agopower: sequencing: pcie-m2: Fix device node reference leak in probe
Felix Gu [Mon, 2 Mar 2026 14:31:44 +0000 (22:31 +0800)] 
power: sequencing: pcie-m2: Fix device node reference leak in probe

In pwrseq_pcie_m2_probe(), ctx->of_node acquires an explicit reference
to the device node using of_node_get(), but there is no corresponding
of_node_put() in the driver's error handling paths or removal.

Since the ctx is tied to the lifecycle of the platform device, there
is no need to hold an additional reference to the device's own of_node.

Fixes: 52e7b5bd62ba ("power: sequencing: Add the Power Sequencing driver for the PCIe M.2 connectors")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Link: https://patch.msgid.link/20260302-m2-v1-1-a6533e18aa69@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
5 weeks agoRDMA/ionic: Preserve and set Ethernet source MAC after ib_ud_header_init()
Abhijit Gangurde [Fri, 27 Feb 2026 06:18:09 +0000 (11:48 +0530)] 
RDMA/ionic: Preserve and set Ethernet source MAC after ib_ud_header_init()

ionic_build_hdr() populated the Ethernet source MAC (hdr->eth.smac_h) by
passing the header’s storage directly to rdma_read_gid_l2_fields().
However, ib_ud_header_init() is called after that and re-initializes the
UD header, which wipes the previously written smac_h. As a result, packets
are emitted with an zero source MAC address on the wire.

Correct the source MAC by reading the GID-derived smac into a temporary
buffer and copy it after ib_ud_header_init() completes.

Fixes: e8521822c733 ("RDMA/ionic: Register device ops for control path")
Cc: stable@vger.kernel.org # 6.18
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20260227061809.2979990-1-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/irdma: Fix double free related to rereg_user_mr
Jacob Moroni [Fri, 27 Feb 2026 15:27:43 +0000 (15:27 +0000)] 
RDMA/irdma: Fix double free related to rereg_user_mr

If IB_MR_REREG_TRANS is set during rereg_user_mr, the
umem will be released and a new one will be allocated
in irdma_rereg_mr_trans. If any step of irdma_rereg_mr_trans
fails after the new umem is allocated, it releases the umem,
but does not set iwmr->region to NULL. The problem is that
this failure is propagated to the user, who will then call
ibv_dereg_mr (as they should). Then, the dereg_mr path will
see a non-NULL umem and attempt to call ib_umem_release again.

Fix this by setting iwmr->region to NULL after ib_umem_release.

Fixed: 5ac388db27c4 ("RDMA/irdma: Add support to re-register a memory region")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260227152743.1183388-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoselftests: tc-testing: fix list_categories() crash on list type
Naveen Anandhan [Sat, 28 Feb 2026 07:47:35 +0000 (13:17 +0530)] 
selftests: tc-testing: fix list_categories() crash on list type

list_categories() builds a set directly from the 'category'
field of each test case. Since 'category' is a list,
set(map(...)) attempts to insert lists into a set, which
raises:

  TypeError: unhashable type: 'list'

Flatten category lists and collect unique category names
using set.update() instead.

Signed-off-by: Naveen Anandhan <mr.navi8680@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agopowerpc/crash: adjust the elfcorehdr size
Sourabh Jain [Fri, 27 Feb 2026 17:18:01 +0000 (22:48 +0530)] 
powerpc/crash: adjust the elfcorehdr size

With crash hotplug support enabled, additional memory is allocated to
the elfcorehdr kexec segment to accommodate resources added during
memory hotplug events. However, the kdump FDT is not updated with the
same size, which can result in elfcorehdr corruption in the kdump
kernel.

Update elf_headers_sz (the kimage member representing the size of the
elfcorehdr kexec segment) to reflect the total memory allocated for the
elfcorehdr segment instead of the elfcorehdr buffer size at the time of
kdump load. This allows of_kexec_alloc_and_setup_fdt() to reserve the
full elfcorehdr memory in the kdump FDT and prevents elfcorehdr
corruption.

Fixes: 849599b702ef8 ("powerpc/crash: add crash memory hotplug support")
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260227171801.2238847-1-sourabhjain@linux.ibm.com
5 weeks agopowerpc/kexec/core: use big-endian types for crash variables
Sourabh Jain [Wed, 24 Dec 2025 15:12:57 +0000 (20:42 +0530)] 
powerpc/kexec/core: use big-endian types for crash variables

Use explicit word-sized big-endian types for kexec and crash related
variables. This makes the endianness unambiguous and avoids type
mismatches that trigger sparse warnings.

The change addresses sparse warnings like below (seen on both 32-bit
and 64-bit builds):

CHECK   ../arch/powerpc/kexec/core.c
sparse:    expected unsigned int static [addressable] [toplevel] [usertype] crashk_base
sparse:    got restricted __be32 [usertype]
sparse: warning: incorrect type in assignment (different base types)
sparse:    expected unsigned int static [addressable] [toplevel] [usertype] crashk_size
sparse:    got restricted __be32 [usertype]
sparse: warning: incorrect type in assignment (different base types)
sparse:    expected unsigned long long static [addressable] [toplevel] mem_limit
sparse:    got restricted __be32 [usertype]
sparse: warning: incorrect type in assignment (different base types)
sparse:    expected unsigned int static [addressable] [toplevel] [usertype] kernel_end
sparse:    got restricted __be32 [usertype]

No functional change intended.

Fixes: ea961a828fe7 ("powerpc: Fix endian issues in kexec and crash dump code")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512221405.VHPKPjnp-lkp@intel.com/
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251224151257.28672-1-sourabhjain@linux.ibm.com
5 weeks agopowerpc/prom_init: Fixup missing #size-cells on PowerMac media-bay nodes
Rob Herring (Arm) [Wed, 29 Oct 2025 17:40:46 +0000 (12:40 -0500)] 
powerpc/prom_init: Fixup missing #size-cells on PowerMac media-bay nodes

Similar to other PowerMac mac-io devices, the media-bay node is missing the
"#size-cells" property.

Depends-on: commit 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells handling")
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251029174047.1620073-1-robh@kernel.org
5 weeks agopowerpc: dts: fsl: Drop unused .dtsi files
Rob Herring (Arm) [Wed, 28 Jan 2026 14:02:20 +0000 (08:02 -0600)] 
powerpc: dts: fsl: Drop unused .dtsi files

These files are not included by anything and therefore don't get built or
tested.

There's also no upstream driver for the interlaken-lac stuff.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260128140222.1627203-1-robh@kernel.org
5 weeks agopowerpc/uaccess: Fix inline assembly for clang build on PPC32
Christophe Leroy (CS GROUP) [Tue, 3 Feb 2026 07:30:41 +0000 (08:30 +0100)] 
powerpc/uaccess: Fix inline assembly for clang build on PPC32

Test robot reports the following error with clang-16.0.6:

   In file included from kernel/rseq.c:75:
   include/linux/rseq_entry.h:141:3: error: invalid operand for instruction
                   unsafe_get_user(offset, &ucs->post_commit_offset, efault);
                   ^
   include/linux/uaccess.h:608:2: note: expanded from macro 'unsafe_get_user'
           arch_unsafe_get_user(x, ptr, local_label);      \
           ^
   arch/powerpc/include/asm/uaccess.h:518:2: note: expanded from macro 'arch_unsafe_get_user'
           __get_user_size_goto(__gu_val, __gu_addr, sizeof(*(p)), e); \
           ^
   arch/powerpc/include/asm/uaccess.h:284:2: note: expanded from macro '__get_user_size_goto'
           __get_user_size_allowed(x, ptr, size, __gus_retval);    \
           ^
   arch/powerpc/include/asm/uaccess.h:275:10: note: expanded from macro '__get_user_size_allowed'
           case 8: __get_user_asm2(x, (u64 __user *)ptr, retval);  break;  \
                   ^
   arch/powerpc/include/asm/uaccess.h:258:4: note: expanded from macro '__get_user_asm2'
                   "       li %1+1,0\n"                    \
                    ^
   <inline asm>:7:5: note: instantiated into assembly here
           li 31+1,0
              ^
   1 error generated.

On PPC32, for 64 bits vars a pair of registers is used. Usually the
lower register in the pair is the high part and the higher register is
the low part. GCC uses r3/r4 ... r11/r12 ... r14/r15 ... r30/r31

In older kernel code inline assembly was using %1 and %1+1 to represent
64 bits values. However here it looks like clang uses r31 as high part,
allthough r32 doesn't exist hence the error.

Allthoug %1+1 should work, most places now use %L1 instead of %1+1, so
let's do the same here.

With that change, the build doesn't fail anymore and a disassembly shows
clang uses r17/r18 and r31/r14 pair when GCC would have used r16/r17 and
r30/r31:

Disassembly of section .fixup:

00000000 <.fixup>:
   0: 38 a0 ff f2  li      r5,-14
   4: 3a 20 00 00  li      r17,0
   8: 3a 40 00 00  li      r18,0
   c: 48 00 00 00  b       c <.fixup+0xc>
c: R_PPC_REL24 .text+0xbc
  10: 38 a0 ff f2  li      r5,-14
  14: 3b e0 00 00  li      r31,0
  18: 39 c0 00 00  li      r14,0
  1c: 48 00 00 00  b       1c <.fixup+0x1c>
1c: R_PPC_REL24 .text+0x144

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602021825.otcItxGi-lkp@intel.com/
Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()")
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8ca3a657a650e497a96bfe7acde2f637dadab344.1770103646.git.chleroy@kernel.org
5 weeks agopowerpc/e500: Always use 64 bits PTE
Christophe Leroy [Tue, 6 Jan 2026 17:40:19 +0000 (18:40 +0100)] 
powerpc/e500: Always use 64 bits PTE

Today there are two PTE formats for e500:
- The 64 bits format, used
 - On 64 bits kernel
 - On 32 bits kernel with 64 bits physical addresses
 - On 32 bits kernel with support of huge pages
- The 32 bits format, used in other cases

Maintaining two PTE formats means unnecessary maintenance burden
because every change needs to be implemented and tested for both
formats.

Remove the 32 bits PTE format. The memory usage increase due to
larger PTEs is minimal (approx. 0,1% of memory).

This also means that from now on huge pages are supported also
with 32 bits physical addresses.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/04a658209ea78dcc0f3dbde6b2c29cf1939adfe9.1767721208.git.chleroy@kernel.org
5 weeks agotracing: Fix WARN_ON in tracing_buffers_mmap_close
Qing Wang [Fri, 27 Feb 2026 02:58:42 +0000 (10:58 +0800)] 
tracing: Fix WARN_ON in tracing_buffers_mmap_close

When a process forks, the child process copies the parent's VMAs but the
user_mapped reference count is not incremented. As a result, when both the
parent and child processes exit, tracing_buffers_mmap_close() is called
twice. On the second call, user_mapped is already 0, causing the function to
return -ENODEV and triggering a WARN_ON.

Normally, this isn't an issue as the memory is mapped with VM_DONTCOPY set.
But this is only a hint, and the application can call
madvise(MADVISE_DOFORK) which resets the VM_DONTCOPY flag. When the
application does that, it can trigger this issue on fork.

Fix it by incrementing the user_mapped reference count without re-mapping
the pages in the VMA's open callback.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://patch.msgid.link/20260227025842.1085206-1-wangqing7171@gmail.com
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Reported-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3b5dd2030fe08afdf65d
Tested-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agotracing: Disable preemption in the tracepoint callbacks handling filtered pids
Masami Hiramatsu (Google) [Wed, 4 Mar 2026 02:57:38 +0000 (21:57 -0500)] 
tracing: Disable preemption in the tracepoint callbacks handling filtered pids

Filtering PIDs for events triggered the following during selftests:

[37] event tracing - restricts events based on pid notrace filtering
[  155.874095]
[  155.874869] =============================
[  155.876037] WARNING: suspicious RCU usage
[  155.877287] 7.0.0-rc1-00004-g8cd473a19bc7 #7 Not tainted
[  155.879263] -----------------------------
[  155.882839] kernel/trace/trace_events.c:1057 suspicious rcu_dereference_check() usage!
[  155.889281]
[  155.889281] other info that might help us debug this:
[  155.889281]
[  155.894519]
[  155.894519] rcu_scheduler_active = 2, debug_locks = 1
[  155.898068] no locks held by ftracetest/4364.
[  155.900524]
[  155.900524] stack backtrace:
[  155.902645] CPU: 1 UID: 0 PID: 4364 Comm: ftracetest Not tainted 7.0.0-rc1-00004-g8cd473a19bc7 #7 PREEMPT(lazy)
[  155.902648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[  155.902651] Call Trace:
[  155.902655]  <TASK>
[  155.902659]  dump_stack_lvl+0x67/0x90
[  155.902665]  lockdep_rcu_suspicious+0x154/0x1a0
[  155.902672]  event_filter_pid_sched_process_fork+0x9a/0xd0
[  155.902678]  kernel_clone+0x367/0x3a0
[  155.902689]  __x64_sys_clone+0x116/0x140
[  155.902696]  do_syscall_64+0x158/0x460
[  155.902700]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  155.902702]  ? trace_irq_disable+0x1d/0xc0
[  155.902709]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  155.902711] RIP: 0033:0x4697c3
[  155.902716] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 89 c2 85 c0 75 2c 64 48 8b 04 25 10 00 00
[  155.902718] RSP: 002b:00007ffc41150428 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  155.902721] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004697c3
[  155.902722] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  155.902724] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000003fccf990
[  155.902725] R10: 000000003fccd690 R11: 0000000000000246 R12: 0000000000000001
[  155.902726] R13: 000000003fce8103 R14: 0000000000000001 R15: 0000000000000000
[  155.902733]  </TASK>
[  155.902747]

The tracepoint callbacks recently were changed to allow preemption. The
event PID filtering callbacks that were attached to the fork and exit
tracepoints expected preemption disabled in order to access the RCU
protected PID lists.

Add a guard(preempt)() to protect the references to the PID list.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260303215738.6ab275af@fedora
Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast")
Link: https://patch.msgid.link/20260303131706.96057f61a48a34c43ce1e396@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agoftrace: Disable preemption in the tracepoint callbacks handling filtered pids
Steven Rostedt [Tue, 3 Mar 2026 02:35:46 +0000 (21:35 -0500)] 
ftrace: Disable preemption in the tracepoint callbacks handling filtered pids

When function trace PID filtering is enabled, the function tracer will
attach a callback to the fork tracepoint as well as the exit tracepoint
that will add the forked child PID to the PID filtering list as well as
remove the PID that is exiting.

Commit a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of
__DO_TRACE_CALL() with SRCU-fast") removed the disabling of preemption
when calling tracepoint callbacks.

The callbacks used for the PID filtering accounting depended on preemption
being disabled, and now the trigger a "suspicious RCU usage" warning message.

Make them explicitly disable preemption.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260302213546.156e3e4f@gandalf.local.home
Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
5 weeks agotracing: Fix syscall events activation by ensuring refcount hits zero
Huiwen He [Tue, 24 Feb 2026 02:35:44 +0000 (10:35 +0800)] 
tracing: Fix syscall events activation by ensuring refcount hits zero

When multiple syscall events are specified in the kernel command line
(e.g., trace_event=syscalls:sys_enter_openat,syscalls:sys_enter_close),
they are often not captured after boot, even though they appear enabled
in the tracing/set_event file.

The issue stems from how syscall events are initialized. Syscall
tracepoints require the global reference count (sys_tracepoint_refcount)
to transition from 0 to 1 to trigger the registration of the syscall
work (TIF_SYSCALL_TRACEPOINT) for tasks, including the init process (pid 1).

The current implementation of early_enable_events() with disable_first=true
used an interleaved sequence of "Disable A -> Enable A -> Disable B -> Enable B".
If multiple syscalls are enabled, the refcount never drops to zero,
preventing the 0->1 transition that triggers actual registration.

Fix this by splitting early_enable_events() into two distinct phases:
1. Disable all events specified in the buffer.
2. Enable all events specified in the buffer.

This ensures the refcount hits zero before re-enabling, allowing syscall
events to be properly activated during early boot.

The code is also refactored to use a helper function to avoid logic
duplication between the disable and enable phases.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260224023544.1250787-1-hehuiwen@kylinos.cn
Fixes: ce1039bd3a89 ("tracing: Fix enabling of syscall events on the command line")
Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agofgraph: Fix thresh_return nosleeptime double-adjust
Shengming Hu [Sat, 21 Feb 2026 03:33:14 +0000 (11:33 +0800)] 
fgraph: Fix thresh_return nosleeptime double-adjust

trace_graph_thresh_return() called handle_nosleeptime() and then delegated
to trace_graph_return(), which calls handle_nosleeptime() again. When
sleep-time accounting is disabled this double-adjusts calltime and can
produce bogus durations (including underflow).

Fix this by computing rettime once, applying handle_nosleeptime() only
once, using the adjusted calltime for threshold comparison, and writing
the return event directly via __trace_graph_return() when the threshold is
met.

Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260221113314048jE4VRwIyZEALiYByGK0My@zte.com.cn
Fixes: 3c9880f3ab52b ("ftrace: Use a running sleeptime instead of saving on shadow stack")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agofgraph: Fix thresh_return clear per-task notrace
Shengming Hu [Sat, 21 Feb 2026 03:30:07 +0000 (11:30 +0800)] 
fgraph: Fix thresh_return clear per-task notrace

When tracing_thresh is enabled, function graph tracing uses
trace_graph_thresh_return() as the return handler. Unlike
trace_graph_return(), it did not clear the per-task TRACE_GRAPH_NOTRACE
flag set by the entry handler for set_graph_notrace addresses. This could
leave the task permanently in "notrace" state and effectively disable
function graph tracing for that task.

Mirror trace_graph_return()'s per-task notrace handling by clearing
TRACE_GRAPH_NOTRACE and returning early when set.

Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260221113007819YgrZsMGABff4Rc-O_fZxL@zte.com.cn
Fixes: b84214890a9bc ("function_graph: Move graph notrace bit to shadow stack global var")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agosmb: client: Compare MACs in constant time
Eric Biggers [Wed, 18 Feb 2026 04:27:02 +0000 (20:27 -0800)] 
smb: client: Compare MACs in constant time

To prevent timing attacks, MAC comparisons need to be constant-time.
Replace the memcmp() with the correct function, crypto_memneq().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agoio_uring/mock: Fix typo in help text
J. Neuschäfer [Wed, 4 Mar 2026 00:42:57 +0000 (01:42 +0100)] 
io_uring/mock: Fix typo in help text

Fix the spelling of "subsystem".

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 weeks agonet/tcp-md5: Fix MAC comparison to be constant-time
Eric Biggers [Mon, 2 Mar 2026 20:34:09 +0000 (12:34 -0800)] 
net/tcp-md5: Fix MAC comparison to be constant-time

To prevent timing attacks, MACs need to be compared in constant
time.  Use the appropriate helper function for this.

Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Fixes: 658ddaaf6694 ("tcp: md5: RST: getting md5 key from listener")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260302203409.13388-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMAINTAINERS: update the skge/sky2 maintainers
Stephen Hemminger [Mon, 2 Mar 2026 19:50:20 +0000 (11:50 -0800)] 
MAINTAINERS: update the skge/sky2 maintainers

Mark the skge and sky2 drivers as orphan.
I no longer have any Marvell/SysKonnect boards to test with and
mail to Mirko Lindner bounced because Marvell sold off that divsion.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://patch.msgid.link/20260302195120.187183-1-stephen@networkplumber.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoamd-xgbe: fix sleep while atomic on suspend/resume
Raju Rangoju [Mon, 2 Mar 2026 04:21:24 +0000 (09:51 +0530)] 
amd-xgbe: fix sleep while atomic on suspend/resume

The xgbe_powerdown() and xgbe_powerup() functions use spinlocks
(spin_lock_irqsave) while calling functions that may sleep:
- napi_disable() can sleep waiting for NAPI polling to complete
- flush_workqueue() can sleep waiting for pending work items

This causes a "BUG: scheduling while atomic" error during suspend/resume
cycles on systems using the AMD XGBE Ethernet controller.

The spinlock protection in these functions is unnecessary as these
functions are called from suspend/resume paths which are already serialized
by the PM core

Fix this by removing the spinlock. Since only code that takes this lock
is xgbe_powerdown() and xgbe_powerup(), remove it completely.

Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260302042124.1386445-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetconsole: fix sysdata_release_enabled_show checking wrong flag
Breno Leitao [Mon, 2 Mar 2026 11:40:46 +0000 (03:40 -0800)] 
netconsole: fix sysdata_release_enabled_show checking wrong flag

sysdata_release_enabled_show() checks SYSDATA_TASKNAME instead of
SYSDATA_RELEASE, causing the configfs release_enabled attribute to
reflect the taskname feature state rather than the release feature
state. This is a copy-paste error from the adjacent
sysdata_taskname_enabled_show() function.

The corresponding _store function already uses the correct
SYSDATA_RELEASE flag.

Fixes: 343f90227070 ("netconsole: implement configfs for release_enabled")
Signed-off-by: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260302-sysdata_release_fix-v1-1-e5090f677c7c@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ipv4: fix ARM64 alignment fault in multipath hash seed
Yung Chih Su [Mon, 2 Mar 2026 06:02:47 +0000 (14:02 +0800)] 
net: ipv4: fix ARM64 alignment fault in multipath hash seed

`struct sysctl_fib_multipath_hash_seed` contains two u32 fields
(user_seed and mp_seed), making it an 8-byte structure with a 4-byte
alignment requirement.

In `fib_multipath_hash_from_keys()`, the code evaluates the entire
struct atomically via `READ_ONCE()`:

    mp_seed = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed).mp_seed;

While this silently works on GCC by falling back to unaligned regular
loads which the ARM64 kernel tolerates, it causes a fatal kernel panic
when compiled with Clang and LTO enabled.

Commit e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire
when CONFIG_LTO=y") strengthens `READ_ONCE()` to use Load-Acquire
instructions (`ldar` / `ldapr`) to prevent compiler reordering bugs
under Clang LTO. Since the macro evaluates the full 8-byte struct,
Clang emits a 64-bit `ldar` instruction. ARM64 architecture strictly
requires `ldar` to be naturally aligned, thus executing it on a 4-byte
aligned address triggers a strict Alignment Fault (FSC = 0x21).

Fix the read side by moving the `READ_ONCE()` directly to the `u32`
member, which emits a safe 32-bit `ldar Wn`.

Furthermore, Eric Dumazet pointed out that `WRITE_ONCE()` on the entire
struct in `proc_fib_multipath_hash_set_seed()` is also flawed. Analysis
shows that Clang splits this 8-byte write into two separate 32-bit
`str` instructions. While this avoids an alignment fault, it destroys
atomicity and exposes a tear-write vulnerability. Fix this by
explicitly splitting the write into two 32-bit `WRITE_ONCE()`
operations.

Finally, add the missing `READ_ONCE()` when reading `user_seed` in
`proc_fib_multipath_hash_seed()` to ensure proper pairing and
concurrency safety.

Fixes: 4ee2a8cace3f ("net: ipv4: Add a sysctl to set multipath hash seed")
Signed-off-by: Yung Chih Su <yuuchihsu@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260302060247.7066-1-yuuchihsu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/tcp-ao: Fix MAC comparison to be constant-time
Eric Biggers [Mon, 2 Mar 2026 20:36:00 +0000 (12:36 -0800)] 
net/tcp-ao: Fix MAC comparison to be constant-time

To prevent timing attacks, MACs need to be compared in constant
time.  Use the appropriate helper function for this.

Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20260302203600.13561-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoipv6: fix NULL pointer deref in ip6_rt_get_dev_rcu()
Jakub Kicinski [Sun, 1 Mar 2026 19:45:48 +0000 (11:45 -0800)] 
ipv6: fix NULL pointer deref in ip6_rt_get_dev_rcu()

l3mdev_master_dev_rcu() can return NULL when the slave device is being
un-slaved from a VRF. All other callers deal with this, but we lost
the fallback to loopback in ip6_rt_pcpu_alloc() -> ip6_rt_get_dev_rcu()
with commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
device with address").

  KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
  RIP: 0010:ip6_rt_pcpu_alloc (net/ipv6/route.c:1418)
  Call Trace:
   ip6_pol_route (net/ipv6/route.c:2318)
   fib6_rule_lookup (net/ipv6/fib6_rules.c:115)
   ip6_route_output_flags (net/ipv6/route.c:2607)
   vrf_process_v6_outbound (drivers/net/vrf.c:437)

I was tempted to rework the un-slaving code to clear the flag first
and insert synchronize_rcu() before we remove the upper. But looks like
the explicit fallback to loopback_dev is an established pattern.
And I guess avoiding the synchronize_rcu() is nice, too.

Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260301194548.927324-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agorust: str: make NullTerminatedFormatter public
Alexandre Courbot [Tue, 24 Feb 2026 02:25:34 +0000 (11:25 +0900)] 
rust: str: make NullTerminatedFormatter public

If `CONFIG_BLOCK` is disabled, the following warnings are displayed
during build:

  warning: struct `NullTerminatedFormatter` is never constructed
    --> ../rust/kernel/str.rs:667:19
      |
  667 | pub(crate) struct NullTerminatedFormatter<'a> {
      |                   ^^^^^^^^^^^^^^^^^^^^^^^
      |
      = note: `#[warn(dead_code)]` (part of `#[warn(unused)]`) on by default

  warning: associated function `new` is never used
    --> ../rust/kernel/str.rs:673:19
      |
  671 | impl<'a> NullTerminatedFormatter<'a> {
      | ------------------------------------ associated function in this implementation
  672 |     /// Create a new [`Self`] instance.
  673 |     pub(crate) fn new(buffer: &'a mut [u8]) -> Option<NullTerminatedFormatter<'a>> {

Fix them by making `NullTerminatedFormatter` public, as it could be
useful for drivers anyway.

Fixes: cdde7a1951ff ("rust: str: introduce `NullTerminatedFormatter`")
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260224-nullterminatedformatter-v1-1-5bef7b9b3d4c@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
5 weeks agosmb/client: remove unused SMB311_posix_query_info()
ZhangGuoDong [Tue, 3 Mar 2026 15:13:13 +0000 (15:13 +0000)] 
smb/client: remove unused SMB311_posix_query_info()

It is currently unused, as now we are doing compounding instead
(see smb2_query_path_info()).

Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agosmb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info()
ZhangGuoDong [Tue, 3 Mar 2026 15:13:12 +0000 (15:13 +0000)] 
smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info()

SMB311_posix_query_info() is currently unused, but it may still be used in
some stable versions, so these changes are submitted as a separate patch.

Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer,
so the allocated buffer matches the actual struct size.

Fixes: b1bc1874b885 ("smb311: Add support for SMB311 query info (non-compounded)")
Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agosmb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op()
ZhangGuoDong [Tue, 3 Mar 2026 15:13:11 +0000 (15:13 +0000)] 
smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op()

Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer,
so the allocated buffer matches the actual struct size.

Fixes: 6a5f6592a0b6 ("SMB311: Add support for query info using posix extensions (level 100)")
Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agobpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim
Lang Xu [Tue, 3 Mar 2026 09:52:17 +0000 (17:52 +0800)] 
bpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim

The root cause of this bug is that when 'bpf_link_put' reduces the
refcount of 'shim_link->link.link' to zero, the resource is considered
released but may still be referenced via 'tr->progs_hlist' in
'cgroup_shim_find'. The actual cleanup of 'tr->progs_hlist' in
'bpf_shim_tramp_link_release' is deferred. During this window, another
process can cause a use-after-free via 'bpf_trampoline_link_cgroup_shim'.

Based on Martin KaFai Lau's suggestions, I have created a simple patch.

To fix this:
   Add an atomic non-zero check in 'bpf_trampoline_link_cgroup_shim'.
   Only increment the refcount if it is not already zero.

Testing:
   I verified the fix by adding a delay in
   'bpf_shim_tramp_link_release' to make the bug easier to trigger:

static void bpf_shim_tramp_link_release(struct bpf_link *link)
{
/* ... */
if (!shim_link->trampoline)
return;

+ msleep(100);
WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link,
shim_link->trampoline, NULL));
bpf_trampoline_put(shim_link->trampoline);
}

Before the patch, running a PoC easily reproduced the crash(almost 100%)
with a call trace similar to KaiyanM's report.
After the patch, the bug no longer occurs even after millions of
iterations.

Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/3c4ebb0b.46ff8.19abab8abe2.Coremail.kaiyanm@hust.edu.cn/
Signed-off-by: Lang Xu <xulang@uniontech.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/279EEE1BA1DDB49D+20260303095217.34436-1-xulang@uniontech.com
5 weeks agoMerge tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 3 Mar 2026 22:25:18 +0000 (14:25 -0800)] 
Merge tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - Fix circular locking dependency in cpuset partition code by
   deferring housekeeping_update() calls to a workqueue instead
   of calling them directly under cpus_read_lock

 - Fix null-ptr-deref in rebuild_sched_domains_cpuslocked() when
   generate_sched_domains() returns NULL due to kmalloc failure

 - Fix incorrect cpuset behavior for effective_xcpus in
   partition_xcpus_del() and cpuset_update_tasks_cpumask()
   in update_cpumasks_hier()

 - Fix race between task migration and cgroup iteration

* tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: fix null-ptr-deref in rebuild_sched_domains_cpuslocked
  cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
  cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue
  cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
  kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
  cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
  cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
  cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
  cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del()
  cgroup: fix race between task migration and iteration

5 weeks agoMerge tag 'sched_ext-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 3 Mar 2026 22:14:20 +0000 (14:14 -0800)] 
Merge tag 'sched_ext-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Fix starvation of scx_enable() under fair-class saturation by
   offloading the enable path to an RT kthread

 - Fix out-of-bounds access in idle mask initialization on systems with
   non-contiguous NUMA node IDs

 - Fix a preemption window during scheduler exit and a refcount
   underflow in cgroup init error path

 - Fix SCX_EFLAG_INITIALIZED being a no-op flag

 - Add READ_ONCE() annotations for KCSAN-clean lockless accesses and
   replace naked scx_root dereferences with container_of() in kobject
   callbacks

 - Tooling and selftest fixes: compilation issues with clang 17,
   strtoul() misuse, unused options cleanup, and Kconfig sync

* tag 'sched_ext-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix starvation of scx_enable() under fair-class saturation
  sched_ext: Remove redundant css_put() in scx_cgroup_init()
  selftests/sched_ext: Fix peek_dsq.bpf.c compile error for clang 17
  selftests/sched_ext: Add -fms-extensions to bpf build flags
  tools/sched_ext: Add -fms-extensions to bpf build flags
  sched_ext: Use READ_ONCE() for plain reads of scx_watchdog_timeout
  sched_ext: Replace naked scx_root dereferences in kobject callbacks
  sched_ext: Use READ_ONCE() for the read side of dsq->nr update
  tools/sched_ext: fix strtoul() misuse in scx_hotplug_seq()
  sched_ext: Fix SCX_EFLAG_INITIALIZED being a no-op flag
  sched_ext: Fix out-of-bounds access in scx_idle_init_masks()
  sched_ext: Disable preemption between scx_claim_exit() and kicking helper work
  tools/sched_ext: Add Kconfig to sync with upstream
  tools/sched_ext: Sync README.md Kconfig with upstream scx
  selftests/sched_ext: Remove duplicated unistd.h include in rt_stall.c
  tools/sched_ext: scx_sdt: Remove unused '-f' option
  tools/sched_ext: scx_central: Remove unused '-p' option
  selftests/sched_ext: Fix unused-result warning for read()
  selftests/sched_ext: Abort test loop on signal

5 weeks agosched_ext: Fix starvation of scx_enable() under fair-class saturation
Tejun Heo [Tue, 3 Mar 2026 11:01:15 +0000 (01:01 -1000)] 
sched_ext: Fix starvation of scx_enable() under fair-class saturation

During scx_enable(), the READY -> ENABLED task switching loop changes the
calling thread's sched_class from fair to ext. Since fair has higher
priority than ext, saturating fair-class workloads can indefinitely starve
the enable thread, hanging the system. This was introduced when the enable
path switched from preempt_disable() to scx_bypass() which doesn't protect
against fair-class starvation. Note that the original preempt_disable()
protection wasn't complete either - in partial switch modes, the calling
thread could still be starved after preempt_enable() as it may have been
switched to ext class.

Fix it by offloading the enable body to a dedicated system-wide RT
(SCHED_FIFO) kthread which cannot be starved by either fair or ext class
tasks. scx_enable() lazily creates the kthread on first use and passes the
ops pointer through a struct scx_enable_cmd containing the kthread_work,
then synchronously waits for completion.

The workfn runs on a different kthread from sch->helper (which runs
disable_work), so it can safely flush disable_work on the error path
without deadlock.

Fixes: 8c2090c504e9 ("sched_ext: Initialize in bypass mode")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoigc: Fix trigger of incorrect irq in igc_xsk_wakeup function
Vivek Behera [Tue, 20 Jan 2026 07:52:16 +0000 (08:52 +0100)] 
igc: Fix trigger of incorrect irq in igc_xsk_wakeup function

This patch addresses the issue where the igc_xsk_wakeup function
was triggering an incorrect IRQ for tx-0 when the i226 is configured
with only 2 combined queues or in an environment with 2 active CPU cores.
This prevented XDP Zero-copy send functionality in such split IRQ
configurations.

The fix implements the correct logic for extracting q_vectors saved
during rx and tx ring allocation and utilizes flags provided by the
ndo_xsk_wakeup API to trigger the appropriate IRQ.

Fixes: fc9df2a0b520 ("igc: Enable RX via AF_XDP zero-copy")
Fixes: 15fd021bc427 ("igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet")
Signed-off-by: Vivek Behera <vivek.behera@siemens.com>
Reviewed-by: Jacob Keller <jacob.keller@intel.com>
Reviewed-by: Aleksandr loktinov <aleksandr.loktionov@intel.com>
Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoigb: Fix trigger of incorrect irq in igb_xsk_wakeup
Vivek Behera [Thu, 22 Jan 2026 14:16:52 +0000 (15:16 +0100)] 
igb: Fix trigger of incorrect irq in igb_xsk_wakeup

The current implementation in the igb_xsk_wakeup expects
the Rx and Tx queues to share the same irq. This would lead
to triggering of incorrect irq in split irq configuration.
This patch addresses this issue which could impact environments
with 2 active cpu cores
or when the number of queues is reduced to 2 or less

cat /proc/interrupts | grep eno2
 167:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
 0-edge      eno2
 168:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
 1-edge      eno2-rx-0
 169:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
 2-edge      eno2-rx-1
 170:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
 3-edge      eno2-tx-0
 171:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
 4-edge      eno2-tx-1

Furthermore it uses the flags input argument to trigger either rx, tx or
both rx and tx irqs as specified in the ndo_xsk_wakeup api documentation

Fixes: 80f6ccf9f116 ("igb: Introduce XSK data structures and helpers")
Signed-off-by: Vivek Behera <vivek.behera@siemens.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Saritha Sanigani <sarithax.sanigani@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoiavf: fix netdev->max_mtu to respect actual hardware limit
Kohei Enju [Tue, 10 Feb 2026 15:57:14 +0000 (15:57 +0000)] 
iavf: fix netdev->max_mtu to respect actual hardware limit

iavf sets LIBIE_MAX_MTU as netdev->max_mtu, ignoring vf_res->max_mtu
from PF [1]. This allows setting an MTU beyond the actual hardware
limit, causing TX queue timeouts [2].

Set correct netdev->max_mtu using vf_res->max_mtu from the PF.

Note that currently PF drivers such as ice/i40e set the frame size in
vf_res->max_mtu, not MTU. Convert vf_res->max_mtu to MTU before setting
netdev->max_mtu.

[1]
 # ip -j -d link show $DEV | jq '.[0].max_mtu'
 16356

[2]
 iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5692 ms
 iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex
 iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 6: transmit queue 3 timed out 5312 ms
 iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex
 ...

Fixes: 5fa4caff59f2 ("iavf: switch to Page Pool")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agolibie: don't unroll if fwlog isn't supported
Michal Swiatkowski [Wed, 11 Feb 2026 09:11:40 +0000 (10:11 +0100)] 
libie: don't unroll if fwlog isn't supported

The libie_fwlog_deinit() function can be called during driver unload
even when firmware logging was never properly initialized. This led to call
trace:

[  148.576156] Oops: Oops: 0000 [#1] SMP NOPTI
[  148.576167] CPU: 80 UID: 0 PID: 12843 Comm: rmmod Kdump: loaded Not tainted 6.17.0-rc7next-queue-3oct-01915-g06d79d51cf51 #1 PREEMPT(full)
[  148.576177] Hardware name: HPE ProLiant DL385 Gen10 Plus/ProLiant DL385 Gen10 Plus, BIOS A42 07/18/2020
[  148.576182] RIP: 0010:__dev_printk+0x16/0x70
[  148.576196] Code: 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 3c <4c> 8b 6e 50 48 89 f3 4d 85 ed 75 03 4c 8b 2e 48 89 df e8 f3 27 98
[  148.576204] RSP: 0018:ffffd2fd7ea17a48 EFLAGS: 00010202
[  148.576211] RAX: ffffd2fd7ea17aa0 RBX: ffff8eb288ae2000 RCX: 0000000000000000
[  148.576217] RDX: ffffd2fd7ea17a70 RSI: 00000000000000c8 RDI: ffffffffb68d3d88
[  148.576222] RBP: ffffffffb68d3d88 R08: 0000000000000000 R09: 0000000000000000
[  148.576227] R10: 00000000000000c8 R11: ffff8eb2b1a49400 R12: ffffd2fd7ea17a70
[  148.576231] R13: ffff8eb3141fb000 R14: ffffffffc1215b48 R15: ffffffffc1215bd8
[  148.576236] FS:  00007f5666ba6740(0000) GS:ffff8eb2472b9000(0000) knlGS:0000000000000000
[  148.576242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  148.576247] CR2: 0000000000000118 CR3: 000000011ad17000 CR4: 0000000000350ef0
[  148.576252] Call Trace:
[  148.576258]  <TASK>
[  148.576269]  _dev_warn+0x7c/0x96
[  148.576290]  libie_fwlog_deinit+0x112/0x117 [libie_fwlog]
[  148.576303]  ixgbe_remove+0x63/0x290 [ixgbe]
[  148.576342]  pci_device_remove+0x42/0xb0
[  148.576354]  device_release_driver_internal+0x19c/0x200
[  148.576365]  driver_detach+0x48/0x90
[  148.576372]  bus_remove_driver+0x6d/0xf0
[  148.576383]  pci_unregister_driver+0x2e/0xb0
[  148.576393]  ixgbe_exit_module+0x1c/0xd50 [ixgbe]
[  148.576430]  __do_sys_delete_module.isra.0+0x1bc/0x2e0
[  148.576446]  do_syscall_64+0x7f/0x980

It can be reproduced by trying to unload ixgbe driver in recovery mode.

Fix that by checking if fwlog is supported before doing unroll.

Fixes: 641585bc978e ("ixgbe: fwlog support for e610")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: Fix memory leak in ice_set_ringparam()
Zilin Guan [Thu, 22 Jan 2026 03:26:44 +0000 (03:26 +0000)] 
ice: Fix memory leak in ice_set_ringparam()

In ice_set_ringparam, tx_rings and xdp_rings are allocated before
rx_rings. If the allocation of rx_rings fails, the code jumps to
the done label leaking both tx_rings and xdp_rings. Furthermore, if
the setup of an individual Rx ring fails during the loop, the code jumps
to the free_tx label which releases tx_rings but leaks xdp_rings.

Fix this by introducing a free_xdp label and updating the error paths to
ensure both xdp_rings and tx_rings are properly freed if rx_rings
allocation or setup fails.

Compile tested only. Issue found using a prototype static analysis tool
and code review.

Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: fix retry for AQ command 0x06EE
Jakub Staniszewski [Tue, 13 Jan 2026 19:38:17 +0000 (20:38 +0100)] 
ice: fix retry for AQ command 0x06EE

Executing ethtool -m can fail reporting a netlink I/O error while firmware
link management holds the i2c bus used to communicate with the module.

According to Intel(R) Ethernet Controller E810 Datasheet Rev 2.8 [1]
Section 3.3.10.4 Read/Write SFF EEPROM (0x06EE)
request should to be retried upon receiving EBUSY from firmware.

Commit e9c9692c8a81 ("ice: Reimplement module reads used by ethtool")
implemented it only for part of ice_get_module_eeprom(), leaving all other
calls to ice_aq_sff_eeprom() vulnerable to returning early on getting
EBUSY without retrying.

Remove the retry loop from ice_get_module_eeprom() and add Admin Queue
(AQ) command with opcode 0x06EE to the list of commands that should be
retried on receiving EBUSY from firmware.

Cc: stable@vger.kernel.org
Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool")
Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com>
Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://www.intel.com/content/www/us/en/content-details/613875/intel-ethernet-controller-e810-datasheet.html
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: reintroduce retry mechanism for indirect AQ
Jakub Staniszewski [Tue, 13 Jan 2026 19:38:16 +0000 (20:38 +0100)] 
ice: reintroduce retry mechanism for indirect AQ

Add retry mechanism for indirect Admin Queue (AQ) commands. To do so we
need to keep the command buffer.

This technically reverts commit 43a630e37e25
("ice: remove unused buffer copy code in ice_sq_send_cmd_retry()"),
but combines it with a fix in the logic by using a kmemdup() call,
making it more robust and less likely to break in the future due to
programmer error.

Cc: Michal Schmidt <mschmidt@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 3056df93f7a8 ("ice: Re-send some AQ commands, as result of EBUSY AQ error")
Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com>
Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: fix adding AQ LLDP filter for VF
Larysa Zaremba [Wed, 3 Dec 2025 13:29:48 +0000 (14:29 +0100)] 
ice: fix adding AQ LLDP filter for VF

The referenced commit came from a misunderstanding of the FW LLDP filter
AQ (Admin Queue) command due to the error in the internal documentation.
Contrary to the assumptions in the original commit, VFs can be added and
deleted from this filter without any problems. Introduced dev_info message
proved to be useful, so reverting the whole commit does not make sense.

Without this fix, trusted VFs do not receive LLDP traffic, if there is an
AQ LLDP filter on PF. When trusted VF attempts to add an LLDP multicast
MAC address, the following message can be seen in dmesg on host:

ice 0000:33:00.0: Failed to add Rx LLDP rule on VSI 20 error: -95

Revert checking VSI type when adding LLDP filter through AQ.

Fixes: 4d5a1c4e6d49 ("ice: do not add LLDP-specific filter if not necessary")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agocrypto: testmgr - Fix stale references to aes-generic
Eric Biggers [Mon, 2 Mar 2026 23:48:56 +0000 (15:48 -0800)] 
crypto: testmgr - Fix stale references to aes-generic

Due to commit a2484474272e ("crypto: aes - Replace aes-generic with
wrapper around lib"), the "aes-generic" driver name has been replaced
with "aes-lib".  Update a couple testmgr entries that were added
concurrently with this change.

Fixes: a22d48cbe558 ("crypto: testmgr - Add test vectors for authenc(hmac(sha224),cbc(aes))")
Fixes: 030218dedee2 ("crypto: testmgr - Add test vectors for authenc(hmac(sha384),cbc(aes))")
Acked-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://lore.kernel.org/r/20260302234856.30569-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
5 weeks agodrm/msm: Fix dma_free_attrs() buffer size
Thomas Fourier [Thu, 26 Feb 2026 09:57:11 +0000 (10:57 +0100)] 
drm/msm: Fix dma_free_attrs() buffer size

The gpummu->table buffer is alloc'd with size TABLE_SIZE + 32 in
a2xx_gpummu_new() but freed with size TABLE_SIZE in
a2xx_gpummu_destroy().

Change the free size to match the allocation.

Fixes: c2052a4e5c99 ("drm/msm: implement a2xx mmu")
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/707340/
Message-ID: <20260226095714.12126-2-fourier.thomas@gmail.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
5 weeks agodrm/msm/a6xx: Fix the bogus protect error on X2-85
Akhil P Oommen [Wed, 25 Feb 2026 07:41:57 +0000 (13:11 +0530)] 
drm/msm/a6xx: Fix the bogus protect error on X2-85

Update the X2-85 gpu's register protect count configuration with the
correct count_max value to avoid blocking the entire MMIO region from the
UMD.

Protect configurations are a bit complicated on A8xx. There are 2 set of
protect registers with different counts: Global and Pipe-specific. The
last-span-unbound feature is available only on the Pipe-specific protect
registers. Due to this, we cannot use the BUILD_BUG sanity check for A8x
protect configurations, so remove the A840 entry from there.

Fixes: 01ff3bf27215 ("drm/msm/a8xx: Add support for Adreno X2-85 GPU")
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/706944/
Message-ID: <20260225-glymur-protect-fix-v1-1-0deddedf9277@oss.qualcomm.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
5 weeks agocpupower: Add intel_pstate turbo boost support for Intel platforms
Zhang Rui [Mon, 9 Feb 2026 03:24:41 +0000 (11:24 +0800)] 
cpupower: Add intel_pstate turbo boost support for Intel platforms

On modern Intel platforms, the intel_pstate driver is commonly used and
it provides turbo boost control via
/sys/devices/system/cpu/intel_pstate/no_turbo.

However, cpupower doesn't handle this. it
1. shows turbo boost as "active" blindly for Intel platforms
2. controls turbo boost functionality via the generic
   /sys/devices/system/cpu/cpufreq/boost sysfs interface only.

Enhance the cpupower tool to ensure the "--boost" command works
seamlessly on Intel platforms with intel_pstate driver running.

Without this patch,
   $ echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
   1
   $ sudo cpupower frequency-info --boost
   analyzing CPU 21:
     boost state support:
       Supported: yes
       Active: yes
   $ sudo cpupower set --boost 0
   Error setting turbo-boost
   $ sudo cpupower set --boost 1
   Error setting turbo-boost

With this patch,
   $ cat /sys/devices/system/cpu/intel_pstate/no_turbo
   0
   $ sudo cpupower set --boost 0
   $ sudo cpupower frequency-info --boost
   analyzing CPU 21:
     boost state support:
       Supported: yes
       Active: no
   $ cat /sys/devices/system/cpu/intel_pstate/no_turbo
   1
   $ sudo cpupower set --boost 1
   $ sudo cpupower frequency-info --boost
   analyzing CPU 28:
     boost state support:
       Supported: yes
       Active: yes
   $ cat /sys/devices/system/cpu/intel_pstate/no_turbo
   0

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
5 weeks agocpupower: Add support for setting EPP via systemd service
Jan Kiszka [Sat, 21 Feb 2026 06:21:55 +0000 (07:21 +0100)] 
cpupower: Add support for setting EPP via systemd service

Extend the systemd service so that it can be used for tuning the Energy
Performance Preference (EPP) as well. Available options can be read from
/sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_preferences.
The desired one can then be set in cpupower-service.conf.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
5 weeks agocxl/port: Fix use after free of parent_port in cxl_detach_ep()
Alison Schofield [Thu, 26 Feb 2026 18:44:36 +0000 (10:44 -0800)] 
cxl/port: Fix use after free of parent_port in cxl_detach_ep()

cxl_detach_ep() is called during bottom-up removal when all CXL memory
devices beneath a switch port have been removed. For each port in the
hierarchy it locks both the port and its parent, removes the endpoint,
and if the port is now empty, marks it dead and unregisters the port
by calling delete_switch_port(). There are two places during this work
where the parent_port may be used after freeing:

First, a concurrent detach may have already processed a port by the
time a second worker finds it via bus_find_device(). Without pinning
parent_port, it may already be freed when we discover port->dead and
attempt to unlock the parent_port. In a production kernel that's a
silent memory corruption, with lock debug, it looks like this:

[]DEBUG_LOCKS_WARN_ON(__owner_task(owner) != get_current())
[]WARNING: kernel/locking/mutex.c:949 at __mutex_unlock_slowpath+0x1ee/0x310
[]Call Trace:
[]mutex_unlock+0xd/0x20
[]cxl_detach_ep+0x180/0x400 [cxl_core]
[]devm_action_release+0x10/0x20
[]devres_release_all+0xa8/0xe0
[]device_unbind_cleanup+0xd/0xa0
[]really_probe+0x1a6/0x3e0

Second, delete_switch_port() releases three devm actions registered
against parent_port. The last of those is unregister_port() and it
calls device_unregister() on the child port, which can cascade. If
parent_port is now also empty the device core may unregister and free
it too. So by the time delete_switch_port() returns, parent_port may
be free, and the subsequent device_unlock(&parent_port->dev) operates
on freed memory. The kernel log looks same as above, with a different
offset in cxl_detach_ep().

Both of these issues stem from the absence of a lifetime guarantee
between a child port and its parent port.

Establish a lifetime rule for ports: child ports hold a reference to
their parent device until release. Take the reference when the port
is allocated and drop it when released. This ensures the parent is
valid for the full lifetime of the child and eliminates the use after
free window in cxl_detach_ep().

This is easily reproduced with a reload of cxl_acpi in QEMU with CXL
devices present.

Fixes: 2345df54249c ("cxl/memdev: Fix endpoint port removal")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260226184439.1732841-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
5 weeks agoMerge tag 'for-7.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Tue, 3 Mar 2026 17:08:00 +0000 (09:08 -0800)] 
Merge tag 'for-7.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "One-liner or short fixes for minor/moderate problems reported recently:

   - fixes or level adjustments of error messages

   - fix leaked transaction handles after aborted transactions, when
     using the remap tree feature

   - fix a few leaked chunk maps after errors

   - fix leaked page array in io_uring encoded read if an error occurs
     and the 'finished' is not called

   - fix double release of reserved extents when doing a range COW

   - don't commit super block when the filesystem is in shutdown state

   - fix squota accounting condition when checking members vs parent
     usage

   - other error handling fixes"

* tag 'for-7.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: check block group lookup in remove_range_from_remap_tree()
  btrfs: fix transaction handle leaks in btrfs_last_identity_remap_gone()
  btrfs: fix chunk map leak in btrfs_map_block() after btrfs_translate_remap()
  btrfs: fix chunk map leak in btrfs_map_block() after btrfs_chunk_map_num_copies()
  btrfs: fix compat mask in error messages in btrfs_check_features()
  btrfs: print correct subvol num if active swapfile prevents deletion
  btrfs: fix warning in scrub_verify_one_metadata()
  btrfs: fix objectid value in error message in check_extent_data_ref()
  btrfs: fix incorrect key offset in error message in check_dev_extent_item()
  btrfs: fix error message order of parameters in btrfs_delete_delayed_dir_index()
  btrfs: don't commit the super block when unmounting a shutdown filesystem
  btrfs: free pages on error in btrfs_uring_read_extent()
  btrfs: fix referenced/exclusive check in squota_check_parent_usage()
  btrfs: remove pointless WARN_ON() in cache_save_setup()
  btrfs: convert log messages to error level in btrfs_replay_log()
  btrfs: remove btrfs_handle_fs_error() after failure to recover log trees
  btrfs: remove redundant warning message in btrfs_check_uuid_tree()
  btrfs: change warning messages to error level in open_ctree()
  btrfs: fix a double release on reserved extents in cow_one_range()
  btrfs: handle discard errors in in btrfs_finish_extent_commit()

5 weeks agosparc/PCI: Initialize msi_addr_mask for OF-created PCI devices
Nilay Shroff [Fri, 20 Feb 2026 07:02:28 +0000 (12:32 +0530)] 
sparc/PCI: Initialize msi_addr_mask for OF-created PCI devices

Recent changes replaced the use of no_64bit_msi with msi_addr_mask, which
is now expected to be initialized to DMA_BIT_MASK(64) during PCI device
setup. On SPARC systems, this initialization was inadvertently missed for
devices instantiated from device tree nodes, leaving msi_addr_mask unset
for OF-created pci_dev instances. As a result, MSI address validation fails
during probe, causing affected devices to fail initialization.

Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev()
so that MSI address validation succeeds and PCI device probing works as
expected.

Fixes: 386ced19e9a3 ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Han Gao <gaohan@iscas.ac.cn> # SPARC Enterprise T5220
Tested-by: Nathaniel Roach <nroach44@nroach44.id.au> # SPARC T5-2
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20260220070239.1693303-3-nilay@linux.ibm.com
5 weeks agopowerpc/pci: Initialize msi_addr_mask for OF-created PCI devices
Nilay Shroff [Fri, 20 Feb 2026 07:02:27 +0000 (12:32 +0530)] 
powerpc/pci: Initialize msi_addr_mask for OF-created PCI devices

Recent changes replaced the use of no_64bit_msi with msi_addr_mask.  As a
result, msi_addr_mask is now expected to be initialized to DMA_BIT_MASK(64)
when a pci_dev is set up. However, this initialization was missed on
powerpc due to differences in the device initialization path compared to
other (x86) architecture. Due to this, now PCI device probe method fails on
powerpc system.

On powerpc systems, struct pci_dev instances are created from device tree
nodes via of_create_pci_dev(). Because msi_addr_mask was not initialized
there, it remained zero. Later, during MSI setup, msi_verify_entries()
validates the programmed MSI address against pdev->msi_addr_mask. Since the
mask was not set correctly, the validation fails, causing PCI driver probe
failures for devices on powerpc systems.

Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev()
so that MSI address validation succeeds and device probe works as expected.

Fixes: 386ced19e9a3 ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Tested-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260220070239.1693303-2-nilay@linux.ibm.com
5 weeks agosched_ext: Remove redundant css_put() in scx_cgroup_init()
Cheng-Yang Chou [Tue, 3 Mar 2026 14:35:30 +0000 (22:35 +0800)] 
sched_ext: Remove redundant css_put() in scx_cgroup_init()

The iterator css_for_each_descendant_pre() walks the cgroup hierarchy
under cgroup_lock(). It does not increment the reference counts on
yielded css structs.

According to the cgroup documentation, css_put() should only be used
to release a reference obtained via css_get() or css_tryget_online().
Since the iterator does not use either of these to acquire a reference,
calling css_put() in the error path of scx_cgroup_init() causes a
refcount underflow.

Remove the unbalanced css_put() to prevent a potential Use-After-Free
(UAF) vulnerability.

Fixes: 819513666966 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agobtrfs: remove duplicated definition of btrfs_printk_in_rcu()
Filipe Manana [Fri, 27 Feb 2026 12:09:47 +0000 (12:09 +0000)] 
btrfs: remove duplicated definition of btrfs_printk_in_rcu()

It's defined twice in a row for the !CONFIG_PRINTK case, so remove one
of the duplicates.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: remove unnecessary transaction abort in the received subvol ioctl
Filipe Manana [Fri, 27 Feb 2026 00:05:08 +0000 (00:05 +0000)] 
btrfs: remove unnecessary transaction abort in the received subvol ioctl

If we fail to remove an item from the uuid tree, we don't need to abort
the transaction since we have not done any change before. So remove that
transaction abort.

Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: abort transaction on failure to update root in the received subvol ioctl
Filipe Manana [Fri, 27 Feb 2026 00:02:33 +0000 (00:02 +0000)] 
btrfs: abort transaction on failure to update root in the received subvol ioctl

If we failed to update the root we don't abort the transaction, which is
wrong since we already used the transaction to remove an item from the
uuid tree.

Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree")
CC: stable@vger.kernel.org # 3.12+
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: fix transaction abort on set received ioctl due to item overflow
Filipe Manana [Thu, 26 Feb 2026 23:41:07 +0000 (23:41 +0000)] 
btrfs: fix transaction abort on set received ioctl due to item overflow

If the set received ioctl fails due to an item overflow when attempting to
add the BTRFS_UUID_KEY_RECEIVED_SUBVOL we have to abort the transaction
since we did some metadata updates before.

This means that if a user calls this ioctl with the same received UUID
field for a lot of subvolumes, we will hit the overflow, trigger the
transaction abort and turn the filesystem into RO mode. A malicious user
could exploit this, and this ioctl does not even requires that a user
has admin privileges (CAP_SYS_ADMIN), only that he/she owns the subvolume.

Fix this by doing an early check for item overflow before starting a
transaction. This is also race safe because we are holding the subvol_sem
semaphore in exclusive (write) mode.

A test case for fstests will follow soon.

Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree")
CC: stable@vger.kernel.org # 3.12+
Reviewed-by: Anand Jain <asj@kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: fix transaction abort when snapshotting received subvolumes
Filipe Manana [Mon, 23 Feb 2026 16:19:31 +0000 (16:19 +0000)] 
btrfs: fix transaction abort when snapshotting received subvolumes

Currently a user can trigger a transaction abort by snapshotting a
previously received snapshot a bunch of times until we reach a
BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we
can store in a leaf). This is very likely not common in practice, but
if it happens, it turns the filesystem into RO mode. The snapshot, send
and set_received_subvol and subvol_setflags (used by receive) don't
require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user
could use this to turn a filesystem into RO mode and disrupt a system.

Reproducer script:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  # Use smallest node size to make the test faster.
  mkfs.btrfs -f --nodesize 4K $DEV
  mount $DEV $MNT

  # Create a subvolume and set it to RO so that it can be used for send.
  btrfs subvolume create $MNT/sv
  touch $MNT/sv/foo
  btrfs property set $MNT/sv ro true

  # Send and receive the subvolume into snaps/sv.
  mkdir $MNT/snaps
  btrfs send $MNT/sv | btrfs receive $MNT/snaps

  # Now snapshot the received subvolume, which has a received_uuid, a
  # lot of times to trigger the leaf overflow.
  total=500
  for ((i = 1; i <= $total; i++)); do
      echo -ne "\rCreating snapshot $i/$total"
      btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null
  done
  echo

  umount $MNT

When running the test:

  $ ./test.sh
  (...)
  Create subvolume '/mnt/sdi/sv'
  At subvol /mnt/sdi/sv
  At subvol sv
  Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type
  Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system
  Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system
  Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system
  Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system

And in dmesg/syslog:

  $ dmesg
  (...)
  [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252!
  [251067.629212] ------------[ cut here ]------------
  [251067.630033] BTRFS: Transaction aborted (error -75)
  [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235
  [251067.632851] Modules linked in: btrfs dm_zero (...)
  [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G        W           6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
  [251067.646165] Tainted: [W]=WARN
  [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
  [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs]
  [251067.649984] Code: f0 48 0f (...)
  [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292
  [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3
  [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750
  [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820
  [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0
  [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5
  [251067.659019] FS:  00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000
  [251067.660115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0
  [251067.661972] Call Trace:
  [251067.662292]  <TASK>
  [251067.662653]  create_pending_snapshots+0x97/0xc0 [btrfs]
  [251067.663413]  btrfs_commit_transaction+0x26e/0xc00 [btrfs]
  [251067.664257]  ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs]
  [251067.665238]  ? _raw_spin_unlock+0x15/0x30
  [251067.665837]  ? record_root_in_trans+0xa2/0xd0 [btrfs]
  [251067.666531]  btrfs_mksubvol+0x330/0x580 [btrfs]
  [251067.667145]  btrfs_mksnapshot+0x74/0xa0 [btrfs]
  [251067.667827]  __btrfs_ioctl_snap_create+0x194/0x1d0 [btrfs]
  [251067.668595]  btrfs_ioctl_snap_create_v2+0x107/0x130 [btrfs]
  [251067.669479]  btrfs_ioctl+0x1580/0x2690 [btrfs]
  [251067.670093]  ? count_memcg_events+0x6d/0x180
  [251067.670849]  ? handle_mm_fault+0x1a0/0x2a0
  [251067.671652]  __x64_sys_ioctl+0x92/0xe0
  [251067.672406]  do_syscall_64+0x50/0xf20
  [251067.673129]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [251067.674096] RIP: 0033:0x7f2a495648db
  [251067.674812] Code: 00 48 89 (...)
  [251067.678227] RSP: 002b:00007ffc5aa57840 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [251067.679691] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2a495648db
  [251067.681145] RDX: 00007ffc5aa588b0 RSI: 0000000050009417 RDI: 0000000000000004
  [251067.682511] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
  [251067.683842] R10: 000000000000000a R11: 0000000000000246 R12: 00007ffc5aa59910
  [251067.685176] R13: 00007ffc5aa588b0 R14: 0000000000000004 R15: 0000000000000006
  [251067.686524]  </TASK>
  [251067.686972] ---[ end trace 0000000000000000 ]---
  [251067.687890] BTRFS: error (device sdi state A) in create_pending_snapshot:1907: errno=-75 unknown
  [251067.689049] BTRFS info (device sdi state EA): forced readonly
  [251067.689054] BTRFS warning (device sdi state EA): Skipping commit of aborted transaction.
  [251067.690119] BTRFS: error (device sdi state EA) in cleanup_transaction:2043: errno=-75 unknown
  [251067.702028] BTRFS info (device sdi state EA): last unmount of filesystem 46dc3975-30a2-4a69-a18f-418b859cccda

Fix this by ignoring -EOVERFLOW errors from btrfs_uuid_tree_add() in the
snapshot creation code when attempting to add the
BTRFS_UUID_KEY_RECEIVED_SUBVOL item. This is OK because it's not critical
and we are still able to delete the snapshot, as snapshot/subvolume
deletion ignores if a BTRFS_UUID_KEY_RECEIVED_SUBVOL is missing (see
inode.c:btrfs_delete_subvolume()). As for send/receive, we can still do
send/receive operations since it always peeks the first root ID in the
existing BTRFS_UUID_KEY_RECEIVED_SUBVOL (it could peek any since all
snapshots have the same content), and even if the key is missing, it
falls back to searching by BTRFS_UUID_KEY_SUBVOL key.

A test case for fstests will be sent soon.

Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree")
CC: stable@vger.kernel.org # 3.12+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: fix transaction abort on file creation due to name hash collision
Filipe Manana [Thu, 26 Feb 2026 11:05:43 +0000 (11:05 +0000)] 
btrfs: fix transaction abort on file creation due to name hash collision

If we attempt to create several files with names that result in the same
hash, we have to pack them in same dir item and that has a limit inherent
to the leaf size. However if we reach that limit, we trigger a transaction
abort and turns the filesystem into RO mode. This allows for a malicious
user to disrupt a system, without the need to have administration
privileges/capabilities.

Reproducer:

  $ cat exploit-hash-collisions.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  # Use smallest node size to make the test faster and require fewer file
  # names that result in hash collision.
  mkfs.btrfs -f --nodesize 4K $DEV
  mount $DEV $MNT

  # List of names that result in the same crc32c hash for btrfs.
  declare -a names=(
   'foobar'
   '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC'
   'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z'
   'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4'
   'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:'
   'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO'
   'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us'
   'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY'
   'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO'
   'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU'
   'Ono7avN5GjC:_6dBJ_'
   'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am'
   'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k'
   'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2'
   'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd'
   'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm'
   'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ'
   'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky'
   'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS'
   'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz'
   'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu'
   'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN'
   'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4='
   'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn'
   'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C'
   'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW'
   '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc'
   'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mCaaKGxb990jzaagRktDTyp'
   '9hD2ApKa_t_7x-a@GCG28kY:7$M@5udI1myQ$x5udtggvagmCQcq9QXWRC5hoB0o-_zHQUqZI5rMcz_kbMgvN5jr63LeYA4Cj-c6F5Ugmx6DgVf@2Jqm%MafecpgooqreJ53P-QTS'
  )

  # Now create files with all those names in the same parent directory.
  # It should not fail since a 4K leaf has enough space for them.
  for name in "${names[@]}"; do
       touch $MNT/$name
  done

  # Now add one more file name that causes a crc32c hash collision.
  # This should fail, but it should not turn the filesystem into RO mode
  # (which could be exploited by malicious users) due to a transaction
  # abort.
  touch $MNT/'W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt'

  # Check that we are able to create another file, with a name that does not cause
  # a crc32c hash collision.
  echo -n "hello world" > $MNT/baz

  # Unmount and mount again, verify file baz exists and with the right content.
  umount $MNT
  mount $DEV $MNT
  echo "File baz content: $(cat $MNT/baz)"

  umount $MNT

When running the reproducer:

  $ ./exploit-hash-collisions.sh
  (...)
  touch: cannot touch '/mnt/sdi/W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt': Value too large for defined data type
  ./exploit-hash-collisions.sh: line 57: /mnt/sdi/baz: Read-only file system
  cat: /mnt/sdi/baz: No such file or directory
  File baz content:

And the transaction abort stack trace in dmesg/syslog:

  $ dmesg
  (...)
  [758240.509761] ------------[ cut here ]------------
  [758240.510668] BTRFS: Transaction aborted (error -75)
  [758240.511577] WARNING: fs/btrfs/inode.c:6854 at btrfs_create_new_inode+0x805/0xb50 [btrfs], CPU#6: touch/888644
  [758240.513513] Modules linked in: btrfs dm_zero (...)
  [758240.523221] CPU: 6 UID: 0 PID: 888644 Comm: touch Tainted: G        W           6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
  [758240.524621] Tainted: [W]=WARN
  [758240.525037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
  [758240.526331] RIP: 0010:btrfs_create_new_inode+0x80b/0xb50 [btrfs]
  [758240.527093] Code: 0f 82 cf (...)
  [758240.529211] RSP: 0018:ffffce64418fbb48 EFLAGS: 00010292
  [758240.529935] RAX: 00000000ffffffd3 RBX: 0000000000000000 RCX: 00000000ffffffb5
  [758240.531040] RDX: 0000000d04f33e06 RSI: 00000000ffffffb5 RDI: ffffffffc0919dd0
  [758240.531920] RBP: ffffce64418fbc10 R08: 0000000000000000 R09: 00000000ffffffb5
  [758240.532928] R10: 0000000000000000 R11: ffff8e52c0000000 R12: ffff8e53eee7d0f0
  [758240.533818] R13: ffff8e57f70932a0 R14: ffff8e5417629568 R15: 0000000000000000
  [758240.534664] FS:  00007f1959a2a740(0000) GS:ffff8e5b27cae000(0000) knlGS:0000000000000000
  [758240.535821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [758240.536644] CR2: 00007f1959b10ce0 CR3: 000000012a2cc005 CR4: 0000000000370ef0
  [758240.537517] Call Trace:
  [758240.537828]  <TASK>
  [758240.538099]  btrfs_create_common+0xbf/0x140 [btrfs]
  [758240.538760]  path_openat+0x111a/0x15b0
  [758240.539252]  do_filp_open+0xc2/0x170
  [758240.539699]  ? preempt_count_add+0x47/0xa0
  [758240.540200]  ? __virt_addr_valid+0xe4/0x1a0
  [758240.540800]  ? __check_object_size+0x1b3/0x230
  [758240.541661]  ? alloc_fd+0x118/0x180
  [758240.542315]  do_sys_openat2+0x70/0xd0
  [758240.543012]  __x64_sys_openat+0x50/0xa0
  [758240.543723]  do_syscall_64+0x50/0xf20
  [758240.544462]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [758240.545397] RIP: 0033:0x7f1959abc687
  [758240.546019] Code: 48 89 fa (...)
  [758240.548522] RSP: 002b:00007ffe16ff8690 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
  [758240.566278] RAX: ffffffffffffffda RBX: 00007f1959a2a740 RCX: 00007f1959abc687
  [758240.567068] RDX: 0000000000000941 RSI: 00007ffe16ffa333 RDI: ffffffffffffff9c
  [758240.567860] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  [758240.568707] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000561eec7c4b90
  [758240.569712] R13: 0000561eec7c311f R14: 00007ffe16ffa333 R15: 0000000000000000
  [758240.570758]  </TASK>
  [758240.571040] ---[ end trace 0000000000000000 ]---
  [758240.571681] BTRFS: error (device sdi state A) in btrfs_create_new_inode:6854: errno=-75 unknown
  [758240.572899] BTRFS info (device sdi state EA): forced readonly

Fix this by checking for hash collision, and if the adding a new name is
possible, early in btrfs_create_new_inode() before we do any tree updates,
so that we don't need to abort the transaction if we cannot add the new
name due to the leaf size limit.

A test case for fstests will be sent soon.

Fixes: caae78e03234 ("btrfs: move common inode creation code into btrfs_create_new_inode()")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: read key again after incrementing slot in move_existing_remaps()
Mark Harmstone [Wed, 25 Feb 2026 10:36:06 +0000 (10:36 +0000)] 
btrfs: read key again after incrementing slot in move_existing_remaps()

Fix move_existing_remaps() so that if we increment the slot because the
key we encounter isn't a REMAP_BACKREF, we don't reuse the objectid and
offset of the old item.

Link: https://lore.kernel.org/linux-btrfs/20260125123908.2096548-1-clm@meta.com/
Reported-by: Chris Mason <clm@fb.com>
Fixes: bbea42dfb91f ("btrfs: move existing remaps before relocating block group")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: add missing RCU unlock in error path in try_release_subpage_extent_buffer()
Bart Van Assche [Wed, 25 Feb 2026 19:59:58 +0000 (11:59 -0800)] 
btrfs: add missing RCU unlock in error path in try_release_subpage_extent_buffer()

Call rcu_read_lock() before exiting the loop in
try_release_subpage_extent_buffer() because there is a rcu_read_unlock()
call past the loop.

This has been detected by the Clang thread-safety analyzer.

Fixes: ad580dfa388f ("btrfs: fix subpage deadlock in try_release_subpage_extent_buffer()")
CC: stable@vger.kernel.org # 6.18+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agos390/stackleak: Fix __stackleak_poison() inline assembly constraint
Heiko Carstens [Mon, 2 Mar 2026 13:35:00 +0000 (14:35 +0100)] 
s390/stackleak: Fix __stackleak_poison() inline assembly constraint

The __stackleak_poison() inline assembly comes with a "count" operand where
the "d" constraint is used. "count" is used with the exrl instruction and
"d" means that the compiler may allocate any register from 0 to 15.

If the compiler would allocate register 0 then the exrl instruction would
not or the value of "count" into the executed instruction - resulting in a
stackframe which is only partially poisoned.

Use the correct "a" constraint, which excludes register 0 from register
allocation.

Fixes: 2a405f6bb3a5 ("s390/stackleak: provide fast __stackleak_poison() implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20260302133500.1560531-4-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
5 weeks agos390/xor: Improve inline assembly constraints
Heiko Carstens [Mon, 2 Mar 2026 13:34:59 +0000 (14:34 +0100)] 
s390/xor: Improve inline assembly constraints

The inline assembly constraint for the "bytes" operand is "d" for all xor()
inline assemblies. "d" means that any register from 0 to 15 can be used. If
the compiler would use register 0 then the exrl instruction would not or
the value of "bytes" into the executed instruction - resulting in an
incorrect result.

However all the xor() inline assemblies make hard-coded use of register 0,
and it is correctly listed in the clobber list, so that this cannot happen.

Given that this is quite subtle use the better "a" constraint, which
excludes register 0 from register allocation in any case.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20260302133500.1560531-3-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
5 weeks agos390/xor: Fix xor_xc_2() inline assembly constraints
Heiko Carstens [Mon, 2 Mar 2026 13:34:58 +0000 (14:34 +0100)] 
s390/xor: Fix xor_xc_2() inline assembly constraints

The inline assembly constraints for xor_xc_2() are incorrect. "bytes",
"p1", and "p2" are input operands, while all three of them are modified
within the inline assembly. Given that the function consists only of this
inline assembly it seems unlikely that this may cause any problems, however
fix this in any case.

Fixes: 2cfc5f9ce7f5 ("s390/xor: optimized xor routing using the XC instruction")
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20260302133500.1560531-2-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
5 weeks agos390/xor: Fix xor_xc_5() inline assembly
Vasily Gorbik [Mon, 2 Mar 2026 18:03:34 +0000 (19:03 +0100)] 
s390/xor: Fix xor_xc_5() inline assembly

xor_xc_5() contains a larl 1,2f that is not used by the asm and is not
declared as a clobber. This can corrupt a compiler-allocated value in %r1
and lead to miscompilation. Remove the instruction.

Fixes: 745600ed6965 ("s390/lib: Use exrl instead of ex in xor functions")
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
5 weeks agobtrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol create
Boris Burkov [Tue, 24 Feb 2026 22:25:35 +0000 (14:25 -0800)] 
btrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol create

We have recently observed a number of subvolumes with broken dentries.
ls-ing the parent dir looks like:

drwxrwxrwt 1 root root 16 Jan 23 16:49 .
drwxr-xr-x 1 root root 24 Jan 23 16:48 ..
d????????? ? ?    ?     ?            ? broken_subvol

and similarly stat-ing the file fails.

In this state, deleting the subvol fails with ENOENT, but attempting to
create a new file or subvol over it errors out with EEXIST and even
aborts the fs. Which leaves us a bit stuck.

dmesg contains a single notable error message reading:
"could not do orphan cleanup -2"

2 is ENOENT and the error comes from the failure handling path of
btrfs_orphan_cleanup(), with the stack leading back up to
btrfs_lookup().

btrfs_lookup
btrfs_lookup_dentry
btrfs_orphan_cleanup // prints that message and returns -ENOENT

After some detailed inspection of the internal state, it became clear
that:
- there are no orphan items for the subvol
- the subvol is otherwise healthy looking, it is not half-deleted or
  anything, there is no drop progress, etc.
- the subvol was created a while ago and does the meaningful first
  btrfs_orphan_cleanup() call that sets BTRFS_ROOT_ORPHAN_CLEANUP much
  later.
- after btrfs_orphan_cleanup() fails, btrfs_lookup_dentry() returns -ENOENT,
  which results in a negative dentry for the subvolume via
  d_splice_alias(NULL, dentry), leading to the observed behavior. The
  bug can be mitigated by dropping the dentry cache, at which point we
  can successfully delete the subvolume if we want.

i.e.,
btrfs_lookup()
  btrfs_lookup_dentry()
    if (!sb_rdonly(inode->vfs_inode)->vfs_inode)
    btrfs_orphan_cleanup(sub_root)
      test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP)
      btrfs_search_slot() // finds orphan item for inode N
      ...
      prints "could not do orphan cleanup -2"
  if (inode == ERR_PTR(-ENOENT))
    inode = NULL;
  return d_splice_alias(NULL, dentry) // NEGATIVE DENTRY for valid subvolume

btrfs_orphan_cleanup() does test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP)
on the root when it runs, so it cannot run more than once on a given
root, so something else must run concurrently. However, the obvious
routes to deleting an orphan when nlinks goes to 0 should not be able to
run without first doing a lookup into the subvolume, which should run
btrfs_orphan_cleanup() and set the bit.

The final important observation is that create_subvol() calls
d_instantiate_new() but does not set BTRFS_ROOT_ORPHAN_CLEANUP, so if
the dentry cache gets dropped, the next lookup into the subvolume will
make a real call into btrfs_orphan_cleanup() for the first time. This
opens up the possibility of concurrently deleting the inode/orphan items
but most typical evict() paths will be holding a reference on the parent
dentry (child dentry holds parent->d_lockref.count via dget in
d_alloc(), released in __dentry_kill()) and prevent the parent from
being removed from the dentry cache.

The one exception is delayed iputs. Ordered extent creation calls
igrab() on the inode. If the file is unlinked and closed while those
refs are held, iput() in __dentry_kill() decrements i_count but does
not trigger eviction (i_count > 0). The child dentry is freed and the
subvol dentry's d_lockref.count drops to 0, making it evictable while
the inode is still alive.

Since there are two races (the race between writeback and unlink and
the race between lookup and delayed iputs), and there are too many moving
parts, the following three diagrams show the complete picture.
(Only the second and third are races)

Phase 1:
Create Subvol in dentry cache without BTRFS_ROOT_ORPHAN_CLEANUP set

btrfs_mksubvol()
  lookup_one_len()
    __lookup_slow()
      d_alloc_parallel()
        __d_alloc() // d_lockref.count = 1
  create_subvol(dentry)
    // doesn't touch the bit..
    d_instantiate_new(dentry, inode) // dentry in cache with d_lockref.count == 1

Phase 2:
Create a delayed iput for a file in the subvol but leave the subvol in
state where its dentry can be evicted (d_lockref.count == 0)

T1 (task)                    T2 (writeback)                   T3 (OE workqueue)

write() // dirty pages
                              btrfs_writepages()
                                btrfs_run_delalloc_range()
                                  cow_file_range()
                                    btrfs_alloc_ordered_extent()
                                      igrab() // i_count: 1 -> 2
btrfs_unlink_inode()
  btrfs_orphan_add()
close()
  __fput()
    dput()
      finish_dput()
        __dentry_kill()
          dentry_unlink_inode()
            iput() // 2 -> 1
          --parent->d_lockref.count // 1 -> 0; evictable
                                                                finish_ordered_fn()
                                                                  btrfs_finish_ordered_io()
                                                                    btrfs_put_ordered_extent()
                                                                      btrfs_add_delayed_iput()

Phase 3:
Once the delayed iput is pending and the subvol dentry is evictable,
the shrinker can free it, causing the next lookup to go through
btrfs_lookup() and call btrfs_orphan_cleanup() for the first time.
If the cleaner kthread processes the delayed iput concurrently, the
two race:

  T1 (shrinker)              T2 (cleaner kthread)                          T3 (lookup)

  super_cache_scan()
    prune_dcache_sb()
      __dentry_kill()
      // subvol dentry freed
                              btrfs_run_delayed_iputs()
                                iput()  // i_count -> 0
                                  evict()  // sets I_FREEING
                                    btrfs_evict_inode()
                                      // truncation loop
                                                                            btrfs_lookup()
                                                                              btrfs_lookup_dentry()
                                                                                btrfs_orphan_cleanup()
                                                                                  // first call (bit never set)
                                                                                  btrfs_iget()
                                                                                    // blocks on I_FREEING

                                      btrfs_orphan_del()
                                      // inode freed
                                                                                    // returns -ENOENT
                                                                                  btrfs_del_orphan_item()
                                                                                    // -ENOENT
                                                                                // "could not do orphan cleanup -2"
                                                                            d_splice_alias(NULL, dentry)
                                                                            // negative dentry for valid subvol

The most straightforward fix is to ensure the invariant that a dentry
for a subvolume can exist if and only if that subvolume has
BTRFS_ROOT_ORPHAN_CLEANUP set on its root (and is known to have no
orphans or ran btrfs_orphan_cleanup()).

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread start
Johannes Thumshirn [Tue, 24 Feb 2026 12:51:13 +0000 (13:51 +0100)] 
btrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread start

btrfs_zoned_reserve_data_reloc_bg() is called on each mount of a file
system and allocates a new block-group, to assign it to be the dedicated
relocation target, if no pre-existing usable block-group for this task is
found.

If for some reason the transaction is aborted, btrfs_end_transaction()
will wake up the transaction kthread. But the transaction kthread is not
yet initialized at the time btrfs_zoned_reserve_data_reloc_bg() is
called, leading to the following NULL-pointer dereference:

  RSP: 0018:ffffc9000c617c98 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 000000000000073c RCX: 0000000000000002
  RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000001
  RBP: 0000000000000207 R08: ffffffff8223c71d R09: 0000000000000635
  R10: ffff888108588000 R11: 0000000000000003 R12: 0000000000000003
  R13: 000000000000073c R14: 0000000000000000 R15: ffff888114dd6000
  FS:  00007f2993745840(0000) GS:ffff8882b508d000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000000000073c CR3: 0000000121a82006 CR4: 0000000000770eb0
  PKRU: 55555554
  Call Trace:
   <TASK>
   try_to_wake_up (./include/linux/spinlock.h:557 kernel/sched/core.c:4106)
   __btrfs_end_transaction (fs/btrfs/transaction.c:1115 (discriminator 2))
   btrfs_zoned_reserve_data_reloc_bg (fs/btrfs/zoned.c:2840)
   open_ctree (fs/btrfs/disk-io.c:3588)
   btrfs_get_tree.cold (fs/btrfs/super.c:982 fs/btrfs/super.c:1944 fs/btrfs/super.c:2087 fs/btrfs/super.c:2121)
   vfs_get_tree (fs/super.c:1752)
   __do_sys_fsconfig (fs/fsopen.c:231 fs/fsopen.c:295 fs/fsopen.c:473)
   do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
  RIP: 0033:0x7f299392740e

Move the call to btrfs_zoned_reserve_data_reloc_bg() after the
transaction_kthread has been initialized to fix this problem.

Fixes: 694ce5e143d6 ("btrfs: zoned: reserve data_reloc block group on mount")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: hold space_info->lock when clearing periodic reclaim ready
Sun YangKai [Mon, 9 Feb 2026 12:53:39 +0000 (20:53 +0800)] 
btrfs: hold space_info->lock when clearing periodic reclaim ready

btrfs_set_periodic_reclaim_ready() requires space_info->lock to be held,
as enforced by lockdep_assert_held(). However, btrfs_reclaim_sweep() was
calling it after do_reclaim_sweep() returns, at which point
space_info->lock is no longer held.

Fix this by explicitly acquiring space_info->lock before clearing the
periodic reclaim ready flag in btrfs_reclaim_sweep().

Reported-by: Chris Mason <clm@meta.com>
Link: https://lore.kernel.org/linux-btrfs/20260208182556.891815-1-clm@meta.com/
Fixes: 19eff93dc738 ("btrfs: fix periodic reclaim condition")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agobtrfs: print-tree: add remap tree definitions
Mark Harmstone [Mon, 9 Feb 2026 18:00:14 +0000 (18:00 +0000)] 
btrfs: print-tree: add remap tree definitions

Add the definitions for the remap tree to print-tree.c, so that we get
more useful information if a tree is dumped to dmesg.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 weeks agoRevert "ACPI: PM: Let acpi_dev_pm_attach() skip devices without ACPI PM"
Rafael J. Wysocki [Tue, 3 Mar 2026 14:26:31 +0000 (15:26 +0100)] 
Revert "ACPI: PM: Let acpi_dev_pm_attach() skip devices without ACPI PM"

Revert commit 88fad6ce090b ("ACPI: PM: Let acpi_dev_pm_attach() skip
devices without ACPI PM") that introduced a SoundWire suspend regression
[1].

It is actually not true that the commit above doesn't make a functional
difference because acpi_subsys_suspend(), for example, may resume
devices in runtime-suspend which affects the subsequent handling of
those devices during the suspend transition.  For this reason, the
devices that were handled by the ACPI PM domain before that commit may
be handled differently now which may lead to suspend-resume issues.

Fixes: 88fad6ce090b ("ACPI: PM: Let acpi_dev_pm_attach() skip devices without ACPI PM")
Reported-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Closes: https://github.com/thesofproject/linux/pull/5677#issuecomment-3984375077 [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2829615.mvXUDI8C0e@rafael.j.wysocki
5 weeks agoASoC: SDCA: Add allocation failure check for Entity name
Charles Keepax [Tue, 3 Mar 2026 14:17:07 +0000 (14:17 +0000)] 
ASoC: SDCA: Add allocation failure check for Entity name

Currently find_sdca_entity_iot() can allocate a string for the
Entity name but it doesn't check if that allocation succeeded.
Add the missing NULL check after the allocation.

Fixes: 48fa77af2f4a ("ASoC: SDCA: Add terminal type into input/output widget name")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260303141707.3841635-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>