]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
4 weeks agosoc: samsung: exynos-pmu: fix device leak on regmap lookup
Johan Hovold [Fri, 21 Nov 2025 12:18:52 +0000 (13:18 +0100)] 
soc: samsung: exynos-pmu: fix device leak on regmap lookup

commit 990eb9a8eb4540ab90c7b34bb07b87ff13881cad upstream.

Make sure to drop the reference taken when looking up the PMU device and
its regmap.

Note that holding a reference to a device does not prevent its regmap
from going away so there is no point in keeping the reference.

Fixes: 0b7c6075022c ("soc: samsung: exynos-pmu: Add regmap support for SoCs that protect PMU regs")
Cc: stable@vger.kernel.org # 6.9
Cc: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20251121121852.16825-1-johan@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agotracing: Fix fixed array of synthetic event
Steven Rostedt [Thu, 4 Dec 2025 20:19:35 +0000 (15:19 -0500)] 
tracing: Fix fixed array of synthetic event

commit 47ef834209e5981f443240d8a8b45bf680df22aa upstream.

The commit 4d38328eb442d ("tracing: Fix synth event printk format for str
fields") replaced "%.*s" with "%s" but missed removing the number size of
the dynamic and static strings. The commit e1a453a57bc7 ("tracing: Do not
add length to print format in synthetic events") fixed the dynamic part
but did not fix the static part. That is, with the commands:

  # echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
  # echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
  # echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger

That caused the output of:

          <idle>-0       [001] d..5.   193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
    sshd-session-879     [001] d..5.   193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
          <idle>-0       [002] d..5.   193.811198: wake_lat: wakee=(efault)bashdelta=91

The commit e1a453a57bc7 fixed the part where the synthetic event had
"char[] wakee". But if one were to replace that with a static size string:

  # echo 's:wake_lat char[16] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events

Where "wakee" is defined as "char[16]" and not "char[]" making it a static
size, the code triggered the "(efaul)" again.

Remove the added STR_VAR_LEN_MAX size as the string is still going to be
nul terminated.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Link: https://patch.msgid.link/20251204151935.5fa30355@gandalf.local.home
Fixes: e1a453a57bc7 ("tracing: Do not add length to print format in synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agovfio: Fix ksize arg while copying user struct in vfio_df_ioctl_bind_iommufd()
Raghavendra Rao Ananta [Fri, 31 Oct 2025 17:06:02 +0000 (17:06 +0000)] 
vfio: Fix ksize arg while copying user struct in vfio_df_ioctl_bind_iommufd()

commit 2f03f21fe7516902283b135de272d3c7b10672de upstream.

For the cases where user includes a non-zero value in 'token_uuid_ptr'
field of 'struct vfio_device_bind_iommufd', the copy_struct_from_user()
in vfio_df_ioctl_bind_iommufd() fails with -E2BIG. For the 'minsz' passed,
copy_struct_from_user() expects the newly introduced field to be zero-ed,
which would be incorrect in this case.

Fix this by passing the actual size of the kernel struct. If working
with a newer userspace, copy_struct_from_user() would copy the
'token_uuid_ptr' field, and if working with an old userspace, it would
zero out this field, thus still retaining backward compatibility.

Fixes: 86624ba3b522 ("vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD")
Cc: stable@vger.kernel.org
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20251031170603.2260022-2-rananta@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agovirtio: vdpa: Fix reference count leak in octep_sriov_enable()
Miaoqian Lin [Mon, 27 Oct 2025 06:07:35 +0000 (14:07 +0800)] 
virtio: vdpa: Fix reference count leak in octep_sriov_enable()

commit b41ca62c0019de1321d75f2b2f274a28784a41ed upstream.

pci_get_device() will increase the reference count for the returned
pci_dev, and also decrease the reference count for the input parameter
from if it is not NULL.

If we break the loop in  with 'vf_pdev' not NULL. We
need to call pci_dev_put() to decrease the reference count.

Found via static anlaysis and this is similar to commit c508eb042d97
("perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology()")

Fixes: 8b6c724cdab8 ("virtio: vdpa: vDPA driver for Marvell OCTEON DPU devices")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251027060737.33815-1-linmq006@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agozloop: make the write pointer of full zones invalid
Damien Le Moal [Sat, 15 Nov 2025 12:15:51 +0000 (21:15 +0900)] 
zloop: make the write pointer of full zones invalid

commit 866d65745b635927c3d1343ab67e6fd4a99d116d upstream.

The write pointer of zones that are in the full condition is always
invalid. Reflect that fact by setting the write pointer of full zones
to ULLONG_MAX.

Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agozloop: fail zone append operations that are targeting full zones
Damien Le Moal [Sat, 15 Nov 2025 12:15:52 +0000 (21:15 +0900)] 
zloop: fail zone append operations that are targeting full zones

commit cf28f6f923cb1dd2765b5c3d7697bb4dcf2096a0 upstream.

zloop_rw() will fail any regular write operation that targets a full
sequential zone. The check for this is indirect and achieved by checking
the write pointer alignment of the write operation. But this check is
ineffective for zone append operations since these are alwasy
automatically directed at a zone write pointer.

Prevent zone append operations from being executed in a full zone with
an explicit check of the zone condition.

Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoamba: tegra-ahb: Fix device leak on SMMU enable
Johan Hovold [Thu, 25 Sep 2025 15:00:07 +0000 (17:00 +0200)] 
amba: tegra-ahb: Fix device leak on SMMU enable

commit 500e1368e46928f4b2259612dcabb6999afae2a6 upstream.

Make sure to drop the reference taken to the AHB platform device when
looking up its driver data while enabling the SMMU.

Note that holding a reference to a device does not prevent its driver
data from going away.

Fixes: 89c788bab1f0 ("ARM: tegra: Add SMMU enabler in AHB")
Cc: stable@vger.kernel.org # 3.5
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agocrypto: arm64/ghash - Fix incorrect output from ghash-neon
Eric Biggers [Tue, 9 Dec 2025 22:34:17 +0000 (14:34 -0800)] 
crypto: arm64/ghash - Fix incorrect output from ghash-neon

commit f6a458746f905adb7d70e50e8b9383dc9e3fd75f upstream.

Commit 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block
handling") made ghash_finup() pass the wrong buffer to
ghash_do_simd_update().  As a result, ghash-neon now produces incorrect
outputs when the message length isn't divisible by 16 bytes.  Fix this.

(I didn't notice this earlier because this code is reached only on CPUs
that support NEON but not PMULL.  I haven't yet found a way to get
qemu-system-aarch64 to emulate that configuration.)

Fixes: 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block handling")
Cc: stable@vger.kernel.org
Reported-by: Diederik de Haas <diederik@cknow-tech.com>
Closes: https://lore.kernel.org/linux-crypto/DETXT7QI62KE.F3CGH2VWX1SC@cknow-tech.com/
Tested-by: Diederik de Haas <diederik@cknow-tech.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20251209223417.112294-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agocrypto: caam - Add check for kcalloc() in test_len()
Guangshuo Li [Tue, 23 Sep 2025 12:44:18 +0000 (20:44 +0800)] 
crypto: caam - Add check for kcalloc() in test_len()

commit 7cf6e0b69b0d90ab042163e5bbddda0dfcf8b6a7 upstream.

As kcalloc() may fail, check its return value to avoid a NULL pointer
dereference when passing the buffer to rng->read(). On allocation
failure, log the error and return since test_len() returns void.

Fixes: 2be0d806e25e ("crypto: caam - add a test for the RNG")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agocrypto: af_alg - zero initialize memory allocated via sock_kmalloc
Shivani Agarwal [Wed, 24 Sep 2025 06:01:48 +0000 (23:01 -0700)] 
crypto: af_alg - zero initialize memory allocated via sock_kmalloc

commit 6f6e309328d53a10c0fe1f77dec2db73373179b6 upstream.

Several crypto user API contexts and requests allocated with
sock_kmalloc() were left uninitialized, relying on callers to
set fields explicitly. This resulted in the use of uninitialized
data in certain error paths or when new fields are added in the
future.

The ACVP patches also contain two user-space interface files:
algif_kpp.c and algif_akcipher.c. These too rely on proper
initialization of their context structures.

A particular issue has been observed with the newly added
'inflight' variable introduced in af_alg_ctx by commit:

  67b164a871af ("crypto: af_alg - Disallow multiple in-flight AIO requests")

Because the context is not memset to zero after allocation,
the inflight variable has contained garbage values. As a result,
af_alg_alloc_areq() has incorrectly returned -EBUSY randomly when
the garbage value was interpreted as true:

  https://github.com/gregkh/linux/blame/master/crypto/af_alg.c#L1209

The check directly tests ctx->inflight without explicitly
comparing against true/false. Since inflight is only ever set to
true or false later, an uninitialized value has triggered
-EBUSY failures. Zero-initializing memory allocated with
sock_kmalloc() ensures inflight and other fields start in a known
state, removing random issues caused by uninitialized data.

Fixes: fe869cdb89c9 ("crypto: algif_hash - User-space interface for hash operations")
Fixes: 5afdfd22e6ba ("crypto: algif_rng - add random number generator support")
Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
Fixes: 67b164a871af ("crypto: af_alg - Disallow multiple in-flight AIO requests")
Cc: stable@vger.kernel.org
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodt-bindings: PCI: qcom,pcie-sm8550: Add missing required power-domains and resets
Krzysztof Kozlowski [Thu, 30 Oct 2025 08:50:51 +0000 (09:50 +0100)] 
dt-bindings: PCI: qcom,pcie-sm8550: Add missing required power-domains and resets

commit e60c6f34b9f3a83f96006243c0ef96c134520257 upstream.

Commit b8d3404058a6 ("dt-bindings: PCI: qcom,pcie-sm8550: Move SM8550 to
dedicated schema") move the device schema to separate file, but it
missed a "if:not:...then:" clause in the original binding which was
requiring power-domains and resets for this particular chip.

Fixes: b8d3404058a6 ("dt-bindings: PCI: qcom,pcie-sm8550: Move SM8550 to dedicated schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030-dt-bindings-pci-qcom-fixes-power-domains-v2-8-28c1f11599fe@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodt-bindings: PCI: qcom,pcie-sm8450: Add missing required power-domains and resets
Krzysztof Kozlowski [Thu, 30 Oct 2025 08:50:50 +0000 (09:50 +0100)] 
dt-bindings: PCI: qcom,pcie-sm8450: Add missing required power-domains and resets

commit 667facc4000c49a7c280097ef6638f133bcb1e59 upstream.

Commit 88c9b3af4e31 ("dt-bindings: PCI: qcom,pcie-sm8450: Move SM8450 to
dedicated schema") move the device schema to separate file, but it
missed a "if:not:...then:" clause in the original binding which was
requiring power-domains and resets for this particular chip.

Fixes: 88c9b3af4e31 ("dt-bindings: PCI: qcom,pcie-sm8450: Move SM8450 to dedicated schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030-dt-bindings-pci-qcom-fixes-power-domains-v2-7-28c1f11599fe@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodt-bindings: PCI: qcom,pcie-sm8350: Add missing required power-domains and resets
Krzysztof Kozlowski [Thu, 30 Oct 2025 08:50:49 +0000 (09:50 +0100)] 
dt-bindings: PCI: qcom,pcie-sm8350: Add missing required power-domains and resets

commit 012ba0d5f02e1f192eda263b5f9f826e47d607bb upstream.

Commit 2278b8b54773 ("dt-bindings: PCI: qcom,pcie-sm8350: Move SM8350 to
dedicated schema") move the device schema to separate file, but it
missed a "if:not:...then:" clause in the original binding which was
requiring power-domains and resets for this particular chip.

Fixes: 2278b8b54773 ("dt-bindings: PCI: qcom,pcie-sm8350: Move SM8350 to dedicated schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030-dt-bindings-pci-qcom-fixes-power-domains-v2-6-28c1f11599fe@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodt-bindings: PCI: qcom,pcie-sm8250: Add missing required power-domains and resets
Krzysztof Kozlowski [Thu, 30 Oct 2025 08:50:48 +0000 (09:50 +0100)] 
dt-bindings: PCI: qcom,pcie-sm8250: Add missing required power-domains and resets

commit 2620c6bcd8c141b79ff2afe95dc814dfab644f63 upstream.

Commit 4891b66185c1 ("dt-bindings: PCI: qcom,pcie-sm8250: Move SM8250 to
dedicated schema") move the device schema to separate file, but it
missed a "if:not:...then:" clause in the original binding which was
requiring power-domains and resets for this particular chip.

Fixes: 4891b66185c1 ("dt-bindings: PCI: qcom,pcie-sm8250: Move SM8250 to dedicated schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030-dt-bindings-pci-qcom-fixes-power-domains-v2-5-28c1f11599fe@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodt-bindings: PCI: qcom,pcie-sm8150: Add missing required power-domains and resets
Krzysztof Kozlowski [Thu, 30 Oct 2025 08:50:47 +0000 (09:50 +0100)] 
dt-bindings: PCI: qcom,pcie-sm8150: Add missing required power-domains and resets

commit 31cb432b62fb796e0c1084542ba39311d2f716d5 upstream.

Commit 51bc04d5b49d ("dt-bindings: PCI: qcom,pcie-sm8150: Move SM8150 to
dedicated schema") move the device schema to separate file, but it
missed a "if:not:...then:" clause in the original binding which was
requiring power-domains and resets for this particular chip.

Fixes: 51bc04d5b49d ("dt-bindings: PCI: qcom,pcie-sm8150: Move SM8150 to dedicated schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030-dt-bindings-pci-qcom-fixes-power-domains-v2-4-28c1f11599fe@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodt-bindings: PCI: qcom,pcie-sc8280xp: Add missing required power-domains and resets
Krzysztof Kozlowski [Thu, 30 Oct 2025 08:50:46 +0000 (09:50 +0100)] 
dt-bindings: PCI: qcom,pcie-sc8280xp: Add missing required power-domains and resets

commit ea551601404d286813aef6819ddf0bf1d7d69a24 upstream.

Commit c007a5505504 ("dt-bindings: PCI: qcom,pcie-sc8280xp: Move
SC8280XP to dedicated schema") move the device schema to separate file,
but it missed a "if:not:...then:" clause in the original binding which
was requiring power-domains and resets for this particular chip.

Fixes: c007a5505504 ("dt-bindings: PCI: qcom,pcie-sc8280xp: Move SC8280XP to dedicated schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030-dt-bindings-pci-qcom-fixes-power-domains-v2-3-28c1f11599fe@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodt-bindings: PCI: qcom,pcie-sc7280: Add missing required power-domains and resets
Krzysztof Kozlowski [Thu, 30 Oct 2025 08:50:45 +0000 (09:50 +0100)] 
dt-bindings: PCI: qcom,pcie-sc7280: Add missing required power-domains and resets

commit ef99c2efeacac7758cc8c2d00e3200100a4da16c upstream.

Commit 756485bfbb85 ("dt-bindings: PCI: qcom,pcie-sc7280: Move SC7280 to
dedicated schema") move the device schema to separate file, but it
missed a "if:not:...then:" clause in the original binding which was
requiring power-domains and resets for this particular chip.

Fixes: 756485bfbb85 ("dt-bindings: PCI: qcom,pcie-sc7280: Move SC7280 to dedicated schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030-dt-bindings-pci-qcom-fixes-power-domains-v2-2-28c1f11599fe@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodrm/displayid: pass iter to drm_find_displayid_extension()
Jani Nikula [Tue, 28 Oct 2025 20:07:25 +0000 (22:07 +0200)] 
drm/displayid: pass iter to drm_find_displayid_extension()

commit 520f37c30992fd0c212a34fbe99c062b7a3dc52e upstream.

It's more convenient to pass iter than a handful of its members to
drm_find_displayid_extension(), especially as we're about to add another
member.

Rename the function find_next_displayid_extension() while at it, to be
more descriptive.

Cc: Tiago Martins Araújo <tiago.martins.araujo@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Tiago Martins Araújo <tiago.martins.araujo@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/3837ae7f095e77a082ac2422ce2fac96c4f9373d.1761681968.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodrm/amd/display: Fix scratch registers offsets for DCN351
Ray Wu [Fri, 28 Nov 2025 01:14:09 +0000 (09:14 +0800)] 
drm/amd/display: Fix scratch registers offsets for DCN351

commit fd62aa13d3ee0f21c756a40a7c2f900f98992d6a upstream.

[Why]
Different platforms use different NBIO header files,
causing display code to use differnt offset and read
wrong accelerated status.

[How]
- Unified NBIO offset header file across platform.
- Correct scratch registers offsets to proper locations.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 576e032e909c8a6bb3d907b4ef5f6abe0f644199)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodrm/amd/display: Fix scratch registers offsets for DCN35
Ray Wu [Fri, 28 Nov 2025 00:58:13 +0000 (08:58 +0800)] 
drm/amd/display: Fix scratch registers offsets for DCN35

commit 69741d9ccc7222e6b6f138db67b012ecc0d72542 upstream.

[Why]
Different platforms use differnet NBIO header files,
causing display code to use differnt offset and read
wrong accelerated status.

[How]
- Unified NBIO offset header file across platform.
- Correct scratch registers offsets to proper locations.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 49a63bc8eda0304ba307f5ba68305f936174f72d)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodrm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()
Alex Deucher [Tue, 11 Nov 2025 16:17:22 +0000 (11:17 -0500)] 
drm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()

commit 3c41114dcdabb7b25f5bc33273c6db9c7af7f4a7 upstream.

This can get called from an atomic context.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4470
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8acdad9344cc7b4e7bc01f0dfea80093eb3768db)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoRevert "drm/amd/display: Fix pbn to kbps Conversion"
Mario Limonciello [Tue, 9 Dec 2025 17:14:47 +0000 (11:14 -0600)] 
Revert "drm/amd/display: Fix pbn to kbps Conversion"

commit 72e24456a54fe04710d89626cc5a88703e2f6202 upstream.

Deeply daisy chained DP/MST displays are no longer able to light
up. This reverts commit e0dec00f3d05 ("drm/amd/display: Fix pbn
to kbps Conversion")

Cc: Jerry Zuo <jerry.zuo@amd.com>
Reported-by: nat@nullable.se
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4756
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e1c94109c76e8a77a21531bd53f6c63356c81158)
Cc: stable@vger.kernel.org # 6.17+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agogpio: loongson: Switch 2K2000/3000 GPIO to BYTE_CTRL_MODE
Xi Ruoyao [Fri, 28 Nov 2025 07:50:32 +0000 (15:50 +0800)] 
gpio: loongson: Switch 2K2000/3000 GPIO to BYTE_CTRL_MODE

commit dae9750105cf93ac1e156ef91f4beeb53bd64777 upstream.

The manuals of 2K2000 says both BIT_CTRL_MODE and BYTE_CTRL_MODE are
supported but the latter is recommended.  Also on 2K3000, per the ACPI
DSDT the GPIO controller is compatible with 2K2000, but it fails to
operate GPIOs 62 and 63 (and maybe others) using BIT_CTRL_MODE.
Using BYTE_CTRL_MODE also makes those 2K3000 GPIOs work.

Fixes: 3feb70a61740 ("gpio: loongson: add more gpio chip support")
Cc: stable@vger.kernel.org
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/r/20251128075033.255821-1-xry111@xry111.site
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agogpiolib: acpi: Add quirk for Dell Precision 7780
Askar Safin [Sat, 6 Dec 2025 18:04:13 +0000 (18:04 +0000)] 
gpiolib: acpi: Add quirk for Dell Precision 7780

commit 2d967310c49ed93ac11cef408a55ddf15c3dd52e upstream.

Dell Precision 7780 often wakes up on its own from suspend. Sometimes
wake up happens immediately (i. e. within 7 seconds), sometimes it happens
after, say, 30 minutes.

Fixes: 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
Link: https://lore.kernel.org/linux-i2c/197ae95ffd8.dc819e60457077.7692120488609091556@zohomail.com/
Cc: stable@vger.kernel.org
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Askar Safin <safinaskar@gmail.com>
Link: https://lore.kernel.org/r/20251206180414.3183334-2-safinaskar@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agogpio: regmap: Fix memleak in error path in gpio_regmap_register()
Wentao Guan [Thu, 4 Dec 2025 10:13:04 +0000 (18:13 +0800)] 
gpio: regmap: Fix memleak in error path in gpio_regmap_register()

commit 52721cfc78c76b09c66e092b52617006390ae96a upstream.

Call gpiochip_remove() to free the resources allocated by
gpiochip_add_data() in error path.

Fixes: 553b75d4bfe9 ("gpio: regmap: Allow to allocate regmap-irq device")
Fixes: ae495810cffe ("gpio: regmap: add the .fixed_direction_output configuration parameter")
CC: stable@vger.kernel.org
Co-developed-by: WangYuli <wangyl5933@chinaunicom.cn>
Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20251204101303.30353-1-guanwentao@uniontech.com
[Bartosz: reworked the commit message]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agos390/ipl: Clear SBP flag when bootprog is set
Sven Schnelle [Fri, 5 Dec 2025 09:58:57 +0000 (10:58 +0100)] 
s390/ipl: Clear SBP flag when bootprog is set

commit b1aa01d31249bd116b18c7f512d3e46b4b4ad83b upstream.

With z16 a new flag 'search boot program' was introduced for
list-directed IPL (SCSI, NVMe, ECKD DASD). If this flag is set,
e.g. via selecting the "Automatic" value for the "Boot program
selector" control on an HMC load panel, it is copied to the reipl
structure from the initial ipl structure. When a user now sets a
boot prog via sysfs, the flag is not cleared and the bootloader
will again automatically select the boot program, ignoring user
configuration.

To avoid that, clear the SBP flag when a bootprog sysfs file is
written.

Cc: stable@vger.kernel.org
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agocgroup: rstat: use LOCK CMPXCHG in css_rstat_updated
Shakeel Butt [Fri, 5 Dec 2025 20:01:06 +0000 (12:01 -0800)] 
cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated

commit 3309b63a2281efb72df7621d60cc1246b6286ad3 upstream.

On x86-64, this_cpu_cmpxchg() uses CMPXCHG without LOCK prefix which
means it is only safe for the local CPU and not for multiple CPUs.
Recently the commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi
safe") make css_rstat_updated lockless and uses lockless list to allow
reentrancy. Since css_rstat_updated can invoked from process context,
IRQ and NMI, it uses this_cpu_cmpxchg() to select the winner which will
inset the lockless lnode into the global per-cpu lockless list.

However the commit missed one case where lockless node of a cgroup can
be accessed and modified by another CPU doing the flushing. Basically
llist_del_first_init() in css_process_update_tree().

On a cursory look, it can be questioned how css_process_update_tree()
can see a lockless node in global lockless list where the updater is at
this_cpu_cmpxchg() and before llist_add() call in css_rstat_updated().
This can indeed happen in the presence of IRQs/NMI.

Consider this scenario: Updater for cgroup stat C on CPU A in process
context is after llist_on_list() check and before this_cpu_cmpxchg() in
css_rstat_updated() where it get interrupted by IRQ/NMI. In the IRQ/NMI
context, a new updater calls css_rstat_updated() for same cgroup C and
successfully inserts rstatc_pcpu->lnode.

Now concurrently CPU B is running the flusher and it calls
llist_del_first_init() for CPU A and got rstatc_pcpu->lnode of cgroup C
which was added by the IRQ/NMI updater.

Now imagine CPU B calling init_llist_node() on cgroup C's
rstatc_pcpu->lnode of CPU A and on CPU A, the process context updater
calling this_cpu_cmpxchg(rstatc_pcpu->lnode) concurrently.

The CMPXCNG without LOCK on CPU A is not safe and thus we need LOCK
prefix.

In Meta's fleet running the kernel with the commit 36df6e3dbd7e, we are
observing on some machines the memcg stats are getting skewed by more
than the actual memory on the system. On close inspection, we noticed
that lockless node for a workload for specific CPU was in the bad state
and thus all the updates on that CPU for that cgroup was being lost.

To confirm if this skew was indeed due to this CMPXCHG without LOCK in
css_rstat_updated(), we created a repro (using AI) at [1] which shows
that CMPXCHG without LOCK creates almost the same lnode corruption as
seem in Meta's fleet and with LOCK CMPXCHG the issue does not
reproduces.

Link: http://lore.kernel.org/efiagdwmzfwpdzps74fvcwq3n4cs36q33ij7eebcpssactv3zu@se4hqiwxcfxq
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: stable@vger.kernel.org # v6.17+
Fixes: 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe")
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agobtrfs: don't log conflicting inode if it's a dir moved in the current transaction
Filipe Manana [Thu, 27 Nov 2025 16:35:59 +0000 (16:35 +0000)] 
btrfs: don't log conflicting inode if it's a dir moved in the current transaction

commit 266273eaf4d99475f1ae57f687b3e42bc71ec6f0 upstream.

We can't log a conflicting inode if it's a directory and it was moved
from one parent directory to another parent directory in the current
transaction, as this can result an attempt to have a directory with
two hard links during log replay, one for the old parent directory and
another for the new parent directory.

The following scenario triggers that issue:

1) We have directories "dir1" and "dir2" created in a past transaction.
   Directory "dir1" has inode A as its parent directory;

2) We move "dir1" to some other directory;

3) We create a file with the name "dir1" in directory inode A;

4) We fsync the new file. This results in logging the inode of the new file
   and the inode for the directory "dir1" that was previously moved in the
   current transaction. So the log tree has the INODE_REF item for the
   new location of "dir1";

5) We move the new file to some other directory. This results in updating
   the log tree to included the new INODE_REF for the new location of the
   file and removes the INODE_REF for the old location. This happens
   during the rename when we call btrfs_log_new_name();

6) We fsync the file, and that persists the log tree changes done in the
   previous step (btrfs_log_new_name() only updates the log tree in
   memory);

7) We have a power failure;

8) Next time the fs is mounted, log replay happens and when processing
   the inode for directory "dir1" we find a new INODE_REF and add that
   link, but we don't remove the old link of the inode since we have
   not logged the old parent directory of the directory inode "dir1".

As a result after log replay finishes when we trigger writeback of the
subvolume tree's extent buffers, the tree check will detect that we have
a directory a hard link count of 2 and we get a mount failure.
The errors and stack traces reported in dmesg/syslog are like this:

   [ 3845.729764] BTRFS info (device dm-0): start tree-log replay
   [ 3845.730304] page: refcount:3 mapcount:0 mapping:000000005c8a3027 index:0x1d00 pfn:0x11510c
   [ 3845.731236] memcg:ffff9264c02f4e00
   [ 3845.731751] aops:btree_aops [btrfs] ino:1
   [ 3845.732300] flags: 0x17fffc00000400a(uptodate|private|writeback|node=0|zone=2|lastcpupid=0x1ffff)
   [ 3845.733346] raw: 017fffc00000400a 0000000000000000 dead000000000122 ffff9264d978aea8
   [ 3845.734265] raw: 0000000000001d00 ffff92650e6d4738 00000003ffffffff ffff9264c02f4e00
   [ 3845.735305] page dumped because: eb page dump
   [ 3845.735981] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=6 ino=257, invalid nlink: has 2 expect no more than 1 for dir
   [ 3845.737786] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14881 owner 5
   [ 3845.737789] BTRFS info (device dm-0): refs 4 lock_owner 0 current 30701
   [ 3845.737792]  item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
   [ 3845.737794]  inode generation 3 transid 9 size 16 nbytes 16384
   [ 3845.737795]  block group 0 mode 40755 links 1 uid 0 gid 0
   [ 3845.737797]  rdev 0 sequence 2 flags 0x0
   [ 3845.737798]  atime 1764259517.0
   [ 3845.737800]  ctime 1764259517.572889464
   [ 3845.737801]  mtime 1764259517.572889464
   [ 3845.737802]  otime 1764259517.0
   [ 3845.737803]  item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
   [ 3845.737805]  index 0 name_len 2
   [ 3845.737807]  item 2 key (256 DIR_ITEM 2363071922) itemoff 16077 itemsize 34
   [ 3845.737808]  location key (257 1 0) type 2
   [ 3845.737810]  transid 9 data_len 0 name_len 4
   [ 3845.737811]  item 3 key (256 DIR_ITEM 2676584006) itemoff 16043 itemsize 34
   [ 3845.737813]  location key (258 1 0) type 2
   [ 3845.737814]  transid 9 data_len 0 name_len 4
   [ 3845.737815]  item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34
   [ 3845.737816]  location key (257 1 0) type 2
   [ 3845.737818]  transid 9 data_len 0 name_len 4
   [ 3845.737819]  item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34
   [ 3845.737820]  location key (258 1 0) type 2
   [ 3845.737821]  transid 9 data_len 0 name_len 4
   [ 3845.737822]  item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160
   [ 3845.737824]  inode generation 9 transid 10 size 6 nbytes 0
   [ 3845.737825]  block group 0 mode 40755 links 2 uid 0 gid 0
   [ 3845.737826]  rdev 0 sequence 1 flags 0x0
   [ 3845.737827]  atime 1764259517.572889464
   [ 3845.737828]  ctime 1764259517.572889464
   [ 3845.737830]  mtime 1764259517.572889464
   [ 3845.737831]  otime 1764259517.572889464
   [ 3845.737832]  item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14
   [ 3845.737833]  index 2 name_len 4
   [ 3845.737834]  item 8 key (257 INODE_REF 258) itemoff 15787 itemsize 14
   [ 3845.737836]  index 2 name_len 4
   [ 3845.737837]  item 9 key (257 DIR_ITEM 2507850652) itemoff 15754 itemsize 33
   [ 3845.737838]  location key (259 1 0) type 1
   [ 3845.737839]  transid 10 data_len 0 name_len 3
   [ 3845.737840]  item 10 key (257 DIR_INDEX 2) itemoff 15721 itemsize 33
   [ 3845.737842]  location key (259 1 0) type 1
   [ 3845.737843]  transid 10 data_len 0 name_len 3
   [ 3845.737844]  item 11 key (258 INODE_ITEM 0) itemoff 15561 itemsize 160
   [ 3845.737846]  inode generation 9 transid 10 size 8 nbytes 0
   [ 3845.737847]  block group 0 mode 40755 links 1 uid 0 gid 0
   [ 3845.737848]  rdev 0 sequence 1 flags 0x0
   [ 3845.737849]  atime 1764259517.572889464
   [ 3845.737850]  ctime 1764259517.572889464
   [ 3845.737851]  mtime 1764259517.572889464
   [ 3845.737852]  otime 1764259517.572889464
   [ 3845.737853]  item 12 key (258 INODE_REF 256) itemoff 15547 itemsize 14
   [ 3845.737855]  index 3 name_len 4
   [ 3845.737856]  item 13 key (258 DIR_ITEM 1843588421) itemoff 15513 itemsize 34
   [ 3845.737857]  location key (257 1 0) type 2
   [ 3845.737858]  transid 10 data_len 0 name_len 4
   [ 3845.737860]  item 14 key (258 DIR_INDEX 2) itemoff 15479 itemsize 34
   [ 3845.737861]  location key (257 1 0) type 2
   [ 3845.737862]  transid 10 data_len 0 name_len 4
   [ 3845.737863]  item 15 key (259 INODE_ITEM 0) itemoff 15319 itemsize 160
   [ 3845.737865]  inode generation 10 transid 10 size 0 nbytes 0
   [ 3845.737866]  block group 0 mode 100600 links 1 uid 0 gid 0
   [ 3845.737867]  rdev 0 sequence 2 flags 0x0
   [ 3845.737868]  atime 1764259517.580874966
   [ 3845.737869]  ctime 1764259517.586121869
   [ 3845.737870]  mtime 1764259517.580874966
   [ 3845.737872]  otime 1764259517.580874966
   [ 3845.737873]  item 16 key (259 INODE_REF 257) itemoff 15306 itemsize 13
   [ 3845.737874]  index 2 name_len 3
   [ 3845.737875] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected
   [ 3845.739448] ------------[ cut here ]------------
   [ 3845.740092] WARNING: CPU: 5 PID: 30701 at fs/btrfs/disk-io.c:335 btree_csum_one_bio+0x25a/0x270 [btrfs]
   [ 3845.741439] Modules linked in: btrfs dm_flakey crc32c_cryptoapi (...)
   [ 3845.750626] CPU: 5 UID: 0 PID: 30701 Comm: mount Tainted: G        W           6.18.0-rc6-btrfs-next-218+ #1 PREEMPT(full)
   [ 3845.752414] Tainted: [W]=WARN
   [ 3845.752828] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
   [ 3845.754499] RIP: 0010:btree_csum_one_bio+0x25a/0x270 [btrfs]
   [ 3845.755460] Code: 31 f6 48 89 (...)
   [ 3845.758685] RSP: 0018:ffffa8d9c5677678 EFLAGS: 00010246
   [ 3845.759450] RAX: 0000000000000000 RBX: ffff92650e6d4738 RCX: 0000000000000000
   [ 3845.760309] RDX: 0000000000000000 RSI: ffffffff9aab45b9 RDI: ffff9264c4748000
   [ 3845.761239] RBP: ffff9264d4324000 R08: 0000000000000000 R09: ffffa8d9c5677468
   [ 3845.762607] R10: ffff926bdc1fffa8 R11: 0000000000000003 R12: ffffa8d9c5677680
   [ 3845.764099] R13: 0000000000004000 R14: ffff9264dd624000 R15: ffff9264d978aba8
   [ 3845.765094] FS:  00007f751fa5a840(0000) GS:ffff926c42a82000(0000) knlGS:0000000000000000
   [ 3845.766226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [ 3845.766970] CR2: 0000558df1815380 CR3: 000000010ed88003 CR4: 0000000000370ef0
   [ 3845.768009] Call Trace:
   [ 3845.768392]  <TASK>
   [ 3845.768714]  btrfs_submit_bbio+0x6ee/0x7f0 [btrfs]
   [ 3845.769640]  ? write_one_eb+0x28e/0x340 [btrfs]
   [ 3845.770588]  btree_write_cache_pages+0x2f0/0x550 [btrfs]
   [ 3845.771286]  ? alloc_extent_state+0x19/0x100 [btrfs]
   [ 3845.771967]  ? merge_next_state+0x1a/0x90 [btrfs]
   [ 3845.772586]  ? set_extent_bit+0x233/0x8b0 [btrfs]
   [ 3845.773198]  ? xas_load+0x9/0xc0
   [ 3845.773589]  ? xas_find+0x14d/0x1a0
   [ 3845.773969]  do_writepages+0xc6/0x160
   [ 3845.774367]  filemap_fdatawrite_wbc+0x48/0x60
   [ 3845.775003]  __filemap_fdatawrite_range+0x5b/0x80
   [ 3845.775902]  btrfs_write_marked_extents+0x61/0x170 [btrfs]
   [ 3845.776707]  btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs]
   [ 3845.777379]  ? _raw_spin_unlock_irqrestore+0x23/0x40
   [ 3845.777923]  btrfs_commit_transaction+0x5ea/0xd20 [btrfs]
   [ 3845.778551]  ? _raw_spin_unlock+0x15/0x30
   [ 3845.778986]  ? release_extent_buffer+0x34/0x160 [btrfs]
   [ 3845.779659]  btrfs_recover_log_trees+0x7a3/0x7c0 [btrfs]
   [ 3845.780416]  ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
   [ 3845.781499]  open_ctree+0x10bb/0x15f0 [btrfs]
   [ 3845.782194]  btrfs_get_tree.cold+0xb/0x16c [btrfs]
   [ 3845.782764]  ? fscontext_read+0x15c/0x180
   [ 3845.783202]  ? rw_verify_area+0x50/0x180
   [ 3845.783667]  vfs_get_tree+0x25/0xd0
   [ 3845.784047]  vfs_cmd_create+0x59/0xe0
   [ 3845.784458]  __do_sys_fsconfig+0x4f6/0x6b0
   [ 3845.784914]  do_syscall_64+0x50/0x1220
   [ 3845.785340]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [ 3845.785980] RIP: 0033:0x7f751fc7f4aa
   [ 3845.786759] Code: 73 01 c3 48 (...)
   [ 3845.789951] RSP: 002b:00007ffcdba45dc8 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
   [ 3845.791402] RAX: ffffffffffffffda RBX: 000055ccc8291c20 RCX: 00007f751fc7f4aa
   [ 3845.792688] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
   [ 3845.794308] RBP: 000055ccc8292120 R08: 0000000000000000 R09: 0000000000000000
   [ 3845.795829] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
   [ 3845.797183] R13: 00007f751fe11580 R14: 00007f751fe1326c R15: 00007f751fdf8a23
   [ 3845.798633]  </TASK>
   [ 3845.799067] ---[ end trace 0000000000000000 ]---
   [ 3845.800215] BTRFS: error (device dm-0) in btrfs_commit_transaction:2553: errno=-5 IO failure (Error while writing out transaction)
   [ 3845.801860] BTRFS warning (device dm-0 state E): Skipping commit of aborted transaction.
   [ 3845.802815] BTRFS error (device dm-0 state EA): Transaction aborted (error -5)
   [ 3845.803728] BTRFS: error (device dm-0 state EA) in cleanup_transaction:2036: errno=-5 IO failure
   [ 3845.805374] BTRFS: error (device dm-0 state EA) in btrfs_replay_log:2083: errno=-5 IO failure (Failed to recover log tree)
   [ 3845.807919] BTRFS error (device dm-0 state EA): open_ctree failed: -5

Fix this by never logging a conflicting inode that is a directory and was
moved in the current transaction (its last_unlink_trans equals the current
transaction) and instead fallback to a transaction commit.

A test case for fstests will follow soon.

Reported-by: Vyacheslav Kovalevsky <slva.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/7bbc9419-5c56-450a-b5a0-efeae7457113@gmail.com/
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agopowerpc/kexec: Enable SMT before waking offline CPUs
Nysal Jan K.A. [Tue, 28 Oct 2025 10:55:12 +0000 (16:25 +0530)] 
powerpc/kexec: Enable SMT before waking offline CPUs

commit c2296a1e42418556efbeb5636c4fa6aa6106713a upstream.

If SMT is disabled or a partial SMT state is enabled, when a new kernel
image is loaded for kexec, on reboot the following warning is observed:

kexec: Waking offline cpu 228.
WARNING: CPU: 0 PID: 9062 at arch/powerpc/kexec/core_64.c:223 kexec_prepare_cpus+0x1b0/0x1bc
[snip]
 NIP kexec_prepare_cpus+0x1b0/0x1bc
 LR  kexec_prepare_cpus+0x1a0/0x1bc
 Call Trace:
  kexec_prepare_cpus+0x1a0/0x1bc (unreliable)
  default_machine_kexec+0x160/0x19c
  machine_kexec+0x80/0x88
  kernel_kexec+0xd0/0x118
  __do_sys_reboot+0x210/0x2c4
  system_call_exception+0x124/0x320
  system_call_vectored_common+0x15c/0x2ec

This occurs as add_cpu() fails due to cpu_bootable() returning false for
CPUs that fail the cpu_smt_thread_allowed() check or non primary
threads if SMT is disabled.

Fix the issue by enabling SMT and resetting the number of SMT threads to
the number of threads per core, before attempting to wake up all present
CPUs.

Fixes: 38253464bc82 ("cpu/SMT: Create topology_smt_thread_allowed()")
Reported-by: Sachin P Bappalige <sachinpb@linux.ibm.com>
Cc: stable@vger.kernel.org # v6.6+
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com>
Tested-by: Samir M <samir@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251028105516.26258-1-nysal@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoSUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf
Joshua Rogers [Fri, 7 Nov 2025 15:05:33 +0000 (10:05 -0500)] 
SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf

commit d4b69a6186b215d2dc1ebcab965ed88e8d41768d upstream.

A zero length gss_token results in pages == 0 and in_token->pages[0]
is NULL. The code unconditionally evaluates
page_address(in_token->pages[0]) for the initial memcpy, which can
dereference NULL even when the copy length is 0. Guard the first
memcpy so it only runs when length > 0.

Fixes: 5866efa8cbfb ("SUNRPC: Fix svcauth_gss_proxy_init()")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agosvcrdma: use rc_pageoff for memcpy byte offset
Joshua Rogers [Fri, 7 Nov 2025 15:09:47 +0000 (10:09 -0500)] 
svcrdma: use rc_pageoff for memcpy byte offset

commit a8ee9099f30654917aa68f55d707b5627e1dbf77 upstream.

svc_rdma_copy_inline_range added rc_curpage (page index) to the page
base instead of the byte offset rc_pageoff. Use rc_pageoff so copies
land within the current page.

Found by ZeroPath (https://zeropath.com)

Fixes: 8e122582680c ("svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agosvcrdma: return 0 on success from svc_rdma_copy_inline_range
Joshua Rogers [Fri, 7 Nov 2025 15:09:48 +0000 (10:09 -0500)] 
svcrdma: return 0 on success from svc_rdma_copy_inline_range

commit 94972027ab55b200e031059fd6c7a649f8248020 upstream.

The function comment specifies 0 on success and -EINVAL on invalid
parameters. Make the tail return 0 after a successful copy loop.

Fixes: d7cc73972661 ("svcrdma: support multiple Read chunks per RPC")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agonfsd: Mark variable __maybe_unused to avoid W=1 build break
Andy Shevchenko [Thu, 13 Nov 2025 08:31:31 +0000 (09:31 +0100)] 
nfsd: Mark variable __maybe_unused to avoid W=1 build break

commit ebae102897e760e9e6bc625f701dd666b2163bd1 upstream.

Clang is not happy about set but (in some cases) unused variable:

fs/nfsd/export.c:1027:17: error: variable 'inode' set but not used [-Werror,-Wunused-but-set-variable]

since it's used as a parameter to dprintk() which might be configured
a no-op. To avoid uglifying code with the specific ifdeffery just mark
the variable __maybe_unused.

The commit [1], which introduced this behaviour, is quite old and hence
the Fixes tag points to the first of the Git era.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0431923fb7a1
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agosvcrdma: bound check rq_pages index in inline path
Joshua Rogers [Fri, 7 Nov 2025 15:09:49 +0000 (10:09 -0500)] 
svcrdma: bound check rq_pages index in inline path

commit d1bea0ce35b6095544ee82bb54156fc62c067e58 upstream.

svc_rdma_copy_inline_range indexed rqstp->rq_pages[rc_curpage] without
verifying rc_curpage stays within the allocated page array. Add guards
before the first use and after advancing to a new page.

Fixes: d7cc73972661 ("svcrdma: support multiple Read chunks per RPC")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoASoC: SOF: ipc4-topology: Convert FLOAT to S32 during blob selection
Peter Ujfalusi [Mon, 15 Dec 2025 12:06:48 +0000 (14:06 +0200)] 
ASoC: SOF: ipc4-topology: Convert FLOAT to S32 during blob selection

commit 816f291fc23f325d31509d0e97873249ad75ae9a upstream.

SSP/DMIC blobs have no support for FLOAT type, they are using S32 on data
bus.

Convert the format from FLOAT_LE to S32_LE to make sure that the correct
format is used within the path.

FLOAT conversion will be done on the host side (or within the path).

Fixes: f7c41911ad74 ("ASoC: SOF: ipc4-topology: Add support for float sample type")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://patch.msgid.link/20251215120648.4827-3-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoNFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap
Chuck Lever [Mon, 17 Nov 2025 16:00:50 +0000 (11:00 -0500)] 
NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap

commit ad3cbbb0c1892c48919727fcb8dec5965da8bacb upstream.

>From RFC 8881:

5.8.1.14. Attribute 75: suppattr_exclcreat

> The bit vector that would set all REQUIRED and RECOMMENDED
> attributes that are supported by the EXCLUSIVE4_1 method of file
> creation via the OPEN operation. The scope of this attribute
> applies to all objects with a matching fsid.

There's nothing in RFC 8881 that states that suppattr_exclcreat is
or is not allowed to contain bits for attributes that are clear in
the reported supported_attrs bitmask. But it doesn't make sense for
an NFS server to indicate that it /doesn't/ implement an attribute,
but then also indicate that clients /are/ allowed to set that
attribute using OPEN(create) with EXCLUSIVE4_1.

The FATTR4_WORD2_TIME_DELEG attributes are also not to be allowed
for OPEN(create) with EXCLUSIVE4_1. It doesn't make sense to set
a delegated timestamp on a new file.

Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoALSA: hda/realtek: Add Asus quirk for TAS amplifiers
Antheas Kapenekakis [Tue, 16 Dec 2025 21:17:14 +0000 (22:17 +0100)] 
ALSA: hda/realtek: Add Asus quirk for TAS amplifiers

commit f7cede182c963720edd1e5fb50ea4f1c7eafa30e upstream.

By default, these devices use the quirk ALC294_FIXUP_ASUS_SPK. Not
using it causes the headphone jack to stop working. Therefore,
introduce a new quirk ALC287_FIXUP_TXNW2781_I2C_ASUS that binds
to the TAS amplifier while using that quirk.

Cc: stable@kernel.org
Fixes: 18a4895370a7 ("ALSA: hda/realtek: Add match for ASUS Xbox Ally projects")
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://patch.msgid.link/20251216211714.1116898-1-lkml@antheas.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoASoC: SOF: ipc4-topology: Prefer 32-bit DMIC blobs for 8-bit formats as well
Peter Ujfalusi [Mon, 15 Dec 2025 12:06:47 +0000 (14:06 +0200)] 
ASoC: SOF: ipc4-topology: Prefer 32-bit DMIC blobs for 8-bit formats as well

commit 26e455064983e00013c0a63ffe0eed9e9ec2fa89 upstream.

With the introduction of 8-bit formats the DMIC blob lookup also needs to
be modified to prefer the 32-bit blob when 8-bit format is used on FE.

At the same time we also need to make sure that in case 8-bit format is
used, but only 16-bit blob is available for DMIC then we will not try to
look for 8-bit blob (which is invalid) as fallback, but for a 16-bit one.

Fixes: c04c2e829649 ("ASoC: SOF: ipc4-topology: Add support for 8-bit formats")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://patch.msgid.link/20251215120648.4827-2-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoNFSD: NFSv4 file creation neglects setting ACL
Chuck Lever [Wed, 19 Nov 2025 00:51:19 +0000 (19:51 -0500)] 
NFSD: NFSv4 file creation neglects setting ACL

commit 913f7cf77bf14c13cfea70e89bcb6d0b22239562 upstream.

An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".

The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.

Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL.

Reported-by: Aurélien Couderc <aurelien.couderc2002@gmail.com>
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoNFSD: Clear SECLABEL in the suppattr_exclcreat bitmap
Chuck Lever [Mon, 17 Nov 2025 16:00:49 +0000 (11:00 -0500)] 
NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap

commit 27d17641cacfedd816789b75d342430f6b912bd2 upstream.

>From RFC 8881:

5.8.1.14. Attribute 75: suppattr_exclcreat

> The bit vector that would set all REQUIRED and RECOMMENDED
> attributes that are supported by the EXCLUSIVE4_1 method of file
> creation via the OPEN operation. The scope of this attribute
> applies to all objects with a matching fsid.

There's nothing in RFC 8881 that states that suppattr_exclcreat is
or is not allowed to contain bits for attributes that are clear in
the reported supported_attrs bitmask. But it doesn't make sense for
an NFS server to indicate that it /doesn't/ implement an attribute,
but then also indicate that clients /are/ allowed to set that
attribute using OPEN(create) with EXCLUSIVE4_1.

Ensure that the SECURITY_LABEL and ACL bits are not set in the
suppattr_exclcreat bitmask when they are also not set in the
supported_attrs bitmask.

Fixes: 8c18f2052e75 ("nfsd41: SUPPATTR_EXCLCREAT attribute")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agonet/handshake: restore destructor on submit failure
caoping [Thu, 4 Dec 2025 09:10:58 +0000 (01:10 -0800)] 
net/handshake: restore destructor on submit failure

commit 6af2a01d65f89e73c1cbb9267f8880d83a88cee4 upstream.

handshake_req_submit() replaces sk->sk_destruct but never restores it when
submission fails before the request is hashed. handshake_sk_destruct() then
returns early and the original destructor never runs, leaking the socket.
Restore sk_destruct on the error path.

Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: caoping <caoping@cmss.chinamobile.com>
Link: https://patch.msgid.link/20251204091058.1545151-1-caoping@cmss.chinamobile.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agofsnotify: do not generate ACCESS/MODIFY events on child for special files
Amir Goldstein [Sun, 7 Dec 2025 10:44:55 +0000 (11:44 +0100)] 
fsnotify: do not generate ACCESS/MODIFY events on child for special files

commit 635bc4def026a24e071436f4f356ea08c0eed6ff upstream.

inotify/fanotify do not allow users with no read access to a file to
subscribe to events (e.g. IN_ACCESS/IN_MODIFY), but they do allow the
same user to subscribe for watching events on children when the user
has access to the parent directory (e.g. /dev).

Users with no read access to a file but with read access to its parent
directory can still stat the file and see if it was accessed/modified
via atime/mtime change.

The same is not true for special files (e.g. /dev/null). Users will not
generally observe atime/mtime changes when other users read/write to
special files, only when someone sets atime/mtime via utimensat().

Align fsnotify events with this stat behavior and do not generate
ACCESS/MODIFY events to parent watchers on read/write of special files.
The events are still generated to parent watchers on utimensat(). This
closes some side-channels that could be possibly used for information
exfiltration [1].

[1] https://snee.la/pdf/pubs/file-notification-attacks.pdf

Reported-by: Sudheendra Raghav Neela <sneela@tugraz.at>
CC: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agonet: phy: marvell-88q2xxx: Fix clamped value in mv88q2xxx_hwmon_write
Thorsten Blum [Tue, 2 Dec 2025 17:27:44 +0000 (18:27 +0100)] 
net: phy: marvell-88q2xxx: Fix clamped value in mv88q2xxx_hwmon_write

commit c4cdf7376271bce5714c06d79ec67759b18910eb upstream.

The local variable 'val' was never clamped to -75000 or 180000 because
the return value of clamp_val() was not used. Fix this by assigning the
clamped value back to 'val', and use clamp() instead of clamp_val().

Cc: stable@vger.kernel.org
Fixes: a557a92e6881 ("net: phy: marvell-88q2xxx: add support for temperature sensor")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251202172743.453055-3-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agor8169: fix RTL8117 Wake-on-Lan in DASH mode
René Rebe [Tue, 2 Dec 2025 18:41:37 +0000 (19:41 +0100)] 
r8169: fix RTL8117 Wake-on-Lan in DASH mode

commit dd75c723ef566f7f009c047f47e0eee95fe348ab upstream.

Wake-on-Lan does currently not work for r8169 in DASH mode, e.g. the
ASUS Pro WS X570-ACE with RTL8168fp/RTL8117.

Fix by not returning early in rtl_prepare_power_down when dash_enabled.
While this fixes WoL, it still kills the OOB RTL8117 remote management
BMC connection. Fix by not calling rtl8168_driver_stop if WoL is enabled.

Fixes: 065c27c184d6 ("r8169: phy power ops")
Signed-off-by: René Rebe <rene@exactco.de>
Cc: stable@vger.kernel.org
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20251202.194137.1647877804487085954.rene@exactco.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agolib/crypto: riscv: Add poly1305-core.S to .gitignore
Charles Mirabile [Fri, 12 Dec 2025 18:47:17 +0000 (13:47 -0500)] 
lib/crypto: riscv: Add poly1305-core.S to .gitignore

commit 5a0b1882506858b12cc77f0e2439a5f3c5052761 upstream.

poly1305-core.S is an auto-generated file, so it should be ignored.

Fixes: bef9c7559869 ("lib/crypto: riscv/poly1305: Import OpenSSL/CRYPTOGAMS implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Link: https://lore.kernel.org/r/20251212184717.133701-1-cmirabil@redhat.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoarm64/gcs: Flush the GCS locking state on exec
Mark Brown [Sat, 29 Nov 2025 00:48:45 +0000 (00:48 +0000)] 
arm64/gcs: Flush the GCS locking state on exec

commit 98a97bf41528ef738b06eb07ec2b2eb1cfde6ce6 upstream.

When we exec a new task we forget to flush the set of locked GCS mode bits.
Since we do flush the rest of the state this means that if GCS is locked
the new task will be unable to enable GCS, it will be locked as being
disabled. Add the expected flush.

Fixes: fc84bc5378a8 ("arm64/gcs: Context switch GCS state for EL0")
Cc: <stable@vger.kernel.org> # 6.13.x
Reported-by: Yury Khrustalev <Yury.Khrustalev@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Yury Khrustalev <yury.khrustalev@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoPM: runtime: Do not clear needs_force_resume with enabled runtime PM
Rafael J. Wysocki [Mon, 15 Dec 2025 14:21:34 +0000 (15:21 +0100)] 
PM: runtime: Do not clear needs_force_resume with enabled runtime PM

commit 359afc8eb02a518fbdd0cbd462c8c2827c6cbec2 upstream.

Commit 89d9cec3b1e9 ("PM: runtime: Clear power.needs_force_resume in
pm_runtime_reinit()") added provisional clearing of power.needs_force_resume
to pm_runtime_reinit(), but it is done unconditionally which is a
mistake because pm_runtime_reinit() may race with driver probing
and removal [1].

To address this, notice that power.needs_force_resume should never
be set when runtime PM is enabled and so it only needs to be cleared
when runtime PM is disabled, and update pm_runtime_init() to only
clear that flag when runtime PM is disabled.

Fixes: 89d9cec3b1e9 ("PM: runtime: Clear power.needs_force_resume in pm_runtime_reinit()")
Reported-by: Ed Tsai <ed.tsai@mediatek.com>
Closes: https://lore.kernel.org/linux-pm/20251215122154.3180001-1-ed.tsai@mediatek.com/ [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 6.17+ <stable@vger.kernel.org> # 6.17+
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/12807571.O9o76ZdvQC@rafael.j.wysocki
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agotracing: Do not register unsupported perf events
Steven Rostedt [Tue, 16 Dec 2025 23:24:40 +0000 (18:24 -0500)] 
tracing: Do not register unsupported perf events

commit ef7f38df890f5dcd2ae62f8dbde191d72f3bebae upstream.

Synthetic events currently do not have a function to register perf events.
This leads to calling the tracepoint register functions with a NULL
function pointer which triggers:

 ------------[ cut here ]------------
 WARNING: kernel/tracepoint.c:175 at tracepoint_add_func+0x357/0x370, CPU#2: perf/2272
 Modules linked in: kvm_intel kvm irqbypass
 CPU: 2 UID: 0 PID: 2272 Comm: perf Not tainted 6.18.0-ftest-11964-ge022764176fc-dirty #323 PREEMPTLAZY
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:tracepoint_add_func+0x357/0x370
 Code: 28 9c e8 4c 0b f5 ff eb 0f 4c 89 f7 48 c7 c6 80 4d 28 9c e8 ab 89 f4 ff 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc <0f> 0b 49 c7 c6 ea ff ff ff e9 ee fe ff ff 0f 0b e9 f9 fe ff ff 0f
 RSP: 0018:ffffabc0c44d3c40 EFLAGS: 00010246
 RAX: 0000000000000001 RBX: ffff9380aa9e4060 RCX: 0000000000000000
 RDX: 000000000000000a RSI: ffffffff9e1d4a98 RDI: ffff937fcf5fd6c8
 RBP: 0000000000000001 R08: 0000000000000007 R09: ffff937fcf5fc780
 R10: 0000000000000003 R11: ffffffff9c193910 R12: 000000000000000a
 R13: ffffffff9e1e5888 R14: 0000000000000000 R15: ffffabc0c44d3c78
 FS:  00007f6202f5f340(0000) GS:ffff93819f00f000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055d3162281a8 CR3: 0000000106a56003 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  tracepoint_probe_register+0x5d/0x90
  synth_event_reg+0x3c/0x60
  perf_trace_event_init+0x204/0x340
  perf_trace_init+0x85/0xd0
  perf_tp_event_init+0x2e/0x50
  perf_try_init_event+0x6f/0x230
  ? perf_event_alloc+0x4bb/0xdc0
  perf_event_alloc+0x65a/0xdc0
  __se_sys_perf_event_open+0x290/0x9f0
  do_syscall_64+0x93/0x7b0
  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
  ? trace_hardirqs_off+0x53/0xc0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Instead, have the code return -ENODEV, which doesn't warn and has perf
error out with:

 # perf record -e synthetic:futex_wait
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (synthetic:futex_wait).
"dmesg | grep -i perf" may provide additional information.

Ideally perf should support synthetic events, but for now just fix the
warning. The support can come later.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20251216182440.147e4453@gandalf.local.home
Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events")
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoxfs: validate that zoned RT devices are zone aligned
Christoph Hellwig [Tue, 16 Dec 2025 17:30:08 +0000 (18:30 +0100)] 
xfs: validate that zoned RT devices are zone aligned

commit 982d2616a2906113e433fdc0cfcc122f8d1bb60a upstream.

Garbage collection assumes all zones contain the full amount of blocks.
Mkfs already ensures this happens, but make the kernel check it as well
to avoid getting into trouble due to fuzzers or mkfs bugs.

Fixes: 2167eaabe2fa ("xfs: define the zoned on-disk format")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: stable@vger.kernel.org # v6.15
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoxfs: fix a UAF problem in xattr repair
Darrick J. Wong [Thu, 4 Dec 2025 21:43:50 +0000 (13:43 -0800)] 
xfs: fix a UAF problem in xattr repair

commit 5990fd756943836978ad184aac980e2b36ab7e01 upstream.

The xchk_setup_xattr_buf function can allocate a new value buffer, which
means that any reference to ab->value before the call could become a
dangling pointer.  Fix this by moving an assignment to after the buffer
setup.

Cc: stable@vger.kernel.org # v6.10
Fixes: e47dcf113ae348 ("xfs: repair extended attributes")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoxfs: fix the zoned RT growfs check for zone alignment
Christoph Hellwig [Tue, 16 Dec 2025 17:30:09 +0000 (18:30 +0100)] 
xfs: fix the zoned RT growfs check for zone alignment

commit dc68c0f601691010dd5ae53442f8523f41a53131 upstream.

The grofs code for zoned RT subvolums already tries to check for zone
alignment, but gets it wrong by using the old instead of the new mount
structure.

Fixes: 01b71e64bb87 ("xfs: support growfs on zoned file systems")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: stable@vger.kernel.org # v6.15
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoxfs: fix stupid compiler warning
Darrick J. Wong [Thu, 4 Dec 2025 21:44:15 +0000 (13:44 -0800)] 
xfs: fix stupid compiler warning

commit f06725052098d7b1133ac3846d693c383dc427a2 upstream.

gcc 14.2 warns about:

xfs_attr_item.c: In function ‘xfs_attr_recover_work’:
xfs_attr_item.c:785:9: warning: ‘ip’ may be used uninitialized [-Wmaybe-uninitialized]
  785 |         xfs_trans_ijoin(tp, ip, 0);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
xfs_attr_item.c:740:42: note: ‘ip’ was declared here
  740 |         struct xfs_inode                *ip;
      |                                          ^~

I think this is bogus since xfs_attri_recover_work either returns a real
pointer having initialized ip or an ERR_PTR having not touched it, but
the tools are smarter than me so let's just null-init the variable
anyway.

Cc: stable@vger.kernel.org # v6.8
Fixes: e70fb328d52772 ("xfs: recreate work items when recovering intent items")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoxfs: fix a memory leak in xfs_buf_item_init()
Haoxiang Li [Wed, 10 Dec 2025 09:06:01 +0000 (17:06 +0800)] 
xfs: fix a memory leak in xfs_buf_item_init()

commit fc40459de82543b565ebc839dca8f7987f16f62e upstream.

xfs_buf_item_get_format() may allocate memory for bip->bli_formats,
free the memory in the error path.

Fixes: c3d5f0c2fb85 ("xfs: complain if anyone tries to create a too-large buffer log item")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: selftests: Add missing "break" in rseq_test's param parsing
Gavin Shan [Mon, 24 Nov 2025 05:04:27 +0000 (15:04 +1000)] 
KVM: selftests: Add missing "break" in rseq_test's param parsing

commit 1b9439c933b500cb24710bbd81fe56e9b0025b6f upstream.

In commit 0297cdc12a87 ("KVM: selftests: Add option to rseq test to
override /dev/cpu_dma_latency"), a 'break' is missed before the option
'l' in the argument parsing loop, which leads to an unexpected core
dump in atoi_paranoid(). It tries to get the latency from non-existent
argument.

  host$ ./rseq_test -u
  Random seed: 0x6b8b4567
  Segmentation fault (core dumped)

Add a 'break' before the option 'l' in the argument parsing loop to avoid
the unexpected core dump.

Fixes: 0297cdc12a87 ("KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Gavin Shan <gshan@redhat.com>
Link: https://patch.msgid.link/20251124050427.1924591-1-gshan@redhat.com
[sean: describe code change in shortlog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: x86: Apply runtime updates to current CPUID during KVM_SET_CPUID{,2}
Sean Christopherson [Tue, 2 Dec 2025 01:50:48 +0000 (17:50 -0800)] 
KVM: x86: Apply runtime updates to current CPUID during KVM_SET_CPUID{,2}

commit e2b43fb25243d502ad36b07bab9de09f4b76fff9 upstream.

When handling KVM_SET_CPUID{,2}, do runtime CPUID updates on the vCPU's
current CPUID (and caps) prior to swapping in the incoming CPUID state so
that KVM doesn't lose pending updates if the incoming CPUID is rejected,
and to prevent a false failure on the equality check.

Note, runtime updates are unconditionally performed on the incoming/new
CPUID (and associated caps), i.e. clearing the dirty flag won't negatively
affect the new CPUID.

Fixes: 93da6af3ae56 ("KVM: x86: Defer runtime updates of dynamic CPUID bits until CPUID emulation")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Closes: https://lore.kernel.org/all/20251128123202.68424a95@imammedo
Cc: stable@vger.kernel.org
Acked-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Link: https://patch.msgid.link/20251202015049.1167490-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-Exits
Sean Christopherson [Thu, 13 Nov 2025 22:56:13 +0000 (14:56 -0800)] 
KVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-Exits

commit da01f64e7470988f8607776aa7afa924208863fb upstream.

Explicitly clear exit_code_hi in the VMCB when synthesizing "normal"
nested VM-Exits, as the full exit code is a 64-bit value (spoiler alert),
and all exit codes for non-failing VMRUN use only bits 31:0.

Cc: Jim Mattson <jmattson@google.com>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: stable@vger.kernel.org
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251113225621.1688428-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN)
Sean Christopherson [Thu, 13 Nov 2025 22:56:14 +0000 (14:56 -0800)] 
KVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN)

commit f402ecd7a8b6446547076f4bd24bd5d4dcc94481 upstream.

Set exit_code_hi to -1u as a temporary band-aid to fix a long-standing
(effectively since KVM's inception) bug where KVM treats the exit code as
a 32-bit value, when in reality it's a 64-bit value.  Per the APM, offset
0x70 is a single 64-bit value:

  070h 63:0 EXITCODE

And a sane reading of the error values defined in "Table C-1. SVM Intercept
Codes" is that negative values use the full 64 bits:

  –1 VMEXIT_INVALID Invalid guest state in VMCB.
  –2 VMEXIT_BUSYBUSY bit was set in the VMSA
  –3 VMEXIT_IDLE_REQUIREDThe sibling thread is not in an idle state
  -4 VMEXIT_INVALID_PMC Invalid PMC state

And that interpretation is confirmed by testing on Milan and Turin (by
setting bits in CR0[63:32] to generate VMEXIT_INVALID on VMRUN).

Furthermore, Xen has treated exitcode as a 64-bit value since HVM support
was adding in 2006 (see Xen commit d1bd157fbc ("Big merge the HVM
full-virtualisation abstractions.")).

Cc: Jim Mattson <jmattson@google.com>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: stable@vger.kernel.org
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251113225621.1688428-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit
Dongli Zhang [Fri, 5 Dec 2025 23:19:05 +0000 (15:19 -0800)] 
KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit

commit 29763138830916f46daaa50e83e7f4f907a3236b upstream.

If an APICv status updated was pended while L2 was active, immediately
refresh vmcs01's controls instead of pending KVM_REQ_APICV_UPDATE as
kvm_vcpu_update_apicv() only calls into vendor code if a change is
necessary.

E.g. if APICv is inhibited, and then activated while L2 is running:

  kvm_vcpu_update_apicv()
  |
  -> __kvm_vcpu_update_apicv()
     |
     -> apic->apicv_active = true
      |
      -> vmx_refresh_apicv_exec_ctrl()
         |
         -> vmx->nested.update_vmcs01_apicv_status = true
          |
          -> return

Then L2 exits to L1:

  __nested_vmx_vmexit()
  |
  -> kvm_make_request(KVM_REQ_APICV_UPDATE)

  vcpu_enter_guest(): KVM_REQ_APICV_UPDATE
  -> kvm_vcpu_update_apicv()
     |
     -> __kvm_vcpu_update_apicv()
        |
        -> return // because if (apic->apicv_active == activate)

Reported-by: Chao Gao <chao.gao@intel.com>
Closes: https://lore.kernel.org/all/aQ2jmnN8wUYVEawF@intel.com
Fixes: 7c69661e225c ("KVM: nVMX: Defer APICv updates while L2 is active until L1 is active")
Cc: stable@vger.kernel.org
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
[sean: write changelog]
Link: https://patch.msgid.link/20251205231913.441872-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: TDX: Explicitly set user-return MSRs that *may* be clobbered by the TDX-Module
Sean Christopherson [Thu, 30 Oct 2025 19:15:25 +0000 (12:15 -0700)] 
KVM: TDX: Explicitly set user-return MSRs that *may* be clobbered by the TDX-Module

commit c0711f8c610e1634ed54fb04da1e82252730306a upstream.

Set all user-return MSRs to their post-TD-exit value when preparing to run
a TDX vCPU to ensure the value that KVM expects to be loaded after running
the vCPU is indeed the value that's loaded in hardware.  If the TDX-Module
doesn't actually enter the guest, i.e. doesn't do VM-Enter, then it won't
"restore" VMM state, i.e. won't clobber user-return MSRs to their expected
post-run values, in which case simply updating KVM's "cached" value will
effectively corrupt the cache due to hardware still holding the original
value.

In theory, KVM could conditionally update the current user-return value if
and only if tdh_vp_enter() succeeds, but in practice "success" doesn't
guarantee the TDX-Module actually entered the guest, e.g. if the TDX-Module
synthesizes an EPT Violation because it suspects a zero-step attack.

Force-load the expected values instead of trying to decipher whether or
not the TDX-Module restored/clobbered MSRs, as the risk doesn't justify
the benefits.  Effectively avoiding four WRMSRs once per run loop (even if
the vCPU is scheduled out, user-return MSRs only need to be reloaded if
the CPU exits to userspace or runs a non-TDX vCPU) is likely in the noise
when amortized over all entries, given the cost of running a TDX vCPU.
E.g. the cost of the WRMSRs is somewhere between ~300 and ~500 cycles,
whereas the cost of a _single_ roundtrip to/from a TDX guest is thousands
of cycles.

Fixes: e0b4f31a3c65 ("KVM: TDX: restore user ret MSRs")
Cc: stable@vger.kernel.org
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://patch.msgid.link/20251030191528.3380553-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: SVM: Mark VMCB_PERM_MAP as dirty on nested VMRUN
Jim Mattson [Mon, 22 Sep 2025 16:29:22 +0000 (09:29 -0700)] 
KVM: SVM: Mark VMCB_PERM_MAP as dirty on nested VMRUN

commit 93c9e107386dbe1243287a5b14ceca894de372b9 upstream.

Mark the VMCB_PERM_MAP bit as dirty in nested_vmcb02_prepare_control()
on every nested VMRUN.

If L1 changes MSR interception (INTERCEPT_MSR_PROT) between two VMRUN
instructions on the same L1 vCPU, the msrpm_base_pa in the associated
vmcb02 will change, and the VMCB_PERM_MAP clean bit should be cleared.

Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Reported-by: Matteo Rizzo <matteorizzo@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: Fix last_boosted_vcpu index assignment bug
Wanpeng Li [Mon, 10 Nov 2025 03:32:27 +0000 (11:32 +0800)] 
KVM: Fix last_boosted_vcpu index assignment bug

commit 32bd348be3fa07b26c5ea6b818a161c142dcc2f2 upstream.

In kvm_vcpu_on_spin(), the loop counter 'i' is incorrectly written to
last_boosted_vcpu instead of the actual vCPU index 'idx'. This causes
last_boosted_vcpu to store the loop iteration count rather than the
vCPU index, leading to incorrect round-robin behavior in subsequent
directed yield operations.

Fix this by using 'idx' instead of 'i' in the assignment.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20251110033232.12538-7-kernellwp@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: nSVM: Propagate SVM_EXIT_CR0_SEL_WRITE correctly for LMSW emulation
Yosry Ahmed [Fri, 24 Oct 2025 19:29:17 +0000 (19:29 +0000)] 
KVM: nSVM: Propagate SVM_EXIT_CR0_SEL_WRITE correctly for LMSW emulation

commit 5674a76db0213f9db1e4d08e847ff649b46889c0 upstream.

When emulating L2 instructions, svm_check_intercept() checks whether a
write to CR0 should trigger a synthesized #VMEXIT with
SVM_EXIT_CR0_SEL_WRITE. For MOV-to-CR0, SVM_EXIT_CR0_SEL_WRITE is only
triggered if any bit other than CR0.MP and CR0.TS is updated. However,
according to the APM (24593—Rev.  3.42—March 2024, Table 15-7):

  The LMSW instruction treats the selective CR0-write
  intercept as a non-selective intercept (i.e., it intercepts
  regardless of the value being written).

Skip checking the changed bits for x86_intercept_lmsw and always inject
SVM_EXIT_CR0_SEL_WRITE.

Fixes: cfec82cb7d31 ("KVM: SVM: Add intercept check for emulated cr accesses")
Cc: stable@vger.kernel.org
Reported-by: Matteo Rizzo <matteorizzo@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251024192918.3191141-3-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: selftests: Forcefully override ARCH from x86_64 to x86
Sean Christopherson [Tue, 7 Oct 2025 22:30:57 +0000 (15:30 -0700)] 
KVM: selftests: Forcefully override ARCH from x86_64 to x86

commit 17e5a9b77716564540d81f0c1e6082d28cf305c9 upstream.

Forcefully override ARCH from x86_64 to x86 to handle the scenario where
the user specifies ARCH=x86_64 on the command line.

Fixes: 9af04539d474 ("KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR")
Cc: stable@vger.kernel.org
Reported-by: David Matlack <dmatlack@google.com>
Closes: https://lore.kernel.org/all/20250724213130.3374922-1-dmatlack@google.com
Link: https://lore.kernel.org/r/20251007223057.368082-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: SVM: Mark VMCB_NPT as dirty on nested VMRUN
Jim Mattson [Mon, 22 Sep 2025 16:29:23 +0000 (09:29 -0700)] 
KVM: SVM: Mark VMCB_NPT as dirty on nested VMRUN

commit 7c8b465a1c91f674655ea9cec5083744ec5f796a upstream.

Mark the VMCB_NPT bit as dirty in nested_vmcb02_prepare_save()
on every nested VMRUN.

If L1 changes the PAT MSR between two VMRUN instructions on the same
L1 vCPU, the g_pat field in the associated vmcb02 will change, and the
VMCB_NPT clean bit should be cleared.

Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: nSVM: Avoid incorrect injection of SVM_EXIT_CR0_SEL_WRITE
Yosry Ahmed [Fri, 24 Oct 2025 19:29:18 +0000 (19:29 +0000)] 
KVM: nSVM: Avoid incorrect injection of SVM_EXIT_CR0_SEL_WRITE

commit 3d80f4c93d3d26d0f9a0dd2844961a632eeea634 upstream.

When emulating L2 instructions, svm_check_intercept() checks whether a
write to CR0 should trigger a synthesized #VMEXIT with
SVM_EXIT_CR0_SEL_WRITE. However, it does not check whether L1 enabled
the intercept for SVM_EXIT_WRITE_CR0, which has higher priority
according to the APM (24593—Rev.  3.42—March 2024, Table 15-7):

  When both selective and non-selective CR0-write intercepts are active at
  the same time, the non-selective intercept takes priority. With respect
  to exceptions, the priority of this intercept is the same as the generic
  CR0-write intercept.

Make sure L1 does NOT intercept SVM_EXIT_WRITE_CR0 before checking if
SVM_EXIT_CR0_SEL_WRITE needs to be injected.

Opportunistically tweak the "not CR0" logic to explicitly bail early so
that it's more obvious that only CR0 has a selective intercept, and that
modifying icpt_info.exit_code is functionally necessary so that the call
to nested_svm_exit_handled() checks the correct exit code.

Fixes: cfec82cb7d31 ("KVM: SVM: Add intercept check for emulated cr accesses")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251024192918.3191141-4-yosry.ahmed@linux.dev
[sean: isolate non-CR0 write logic, tweak comments accordingly]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: x86: Fix VM hard lockup after prolonged inactivity with periodic HV timer
fuqiang wang [Thu, 13 Nov 2025 20:51:13 +0000 (12:51 -0800)] 
KVM: x86: Fix VM hard lockup after prolonged inactivity with periodic HV timer

commit 18ab3fc8e880791aa9f7c000261320fc812b5465 upstream.

When advancing the target expiration for the guest's APIC timer in periodic
mode, set the expiration to "now" if the target expiration is in the past
(similar to what is done in update_target_expiration()).  Blindly adding
the period to the previous target expiration can result in KVM generating
a practically unbounded number of hrtimer IRQs due to programming an
expired timer over and over.  In extreme scenarios, e.g. if userspace
pauses/suspends a VM for an extended duration, this can even cause hard
lockups in the host.

Currently, the bug only affects Intel CPUs when using the hypervisor timer
(HV timer), a.k.a. the VMX preemption timer.  Unlike the software timer,
a.k.a. hrtimer, which KVM keeps running even on exits to userspace, the
HV timer only runs while the guest is active.  As a result, if the vCPU
does not run for an extended duration, there will be a huge gap between
the target expiration and the current time the vCPU resumes running.
Because the target expiration is incremented by only one period on each
timer expiration, this leads to a series of timer expirations occurring
rapidly after the vCPU/VM resumes.

More critically, when the vCPU first triggers a periodic HV timer
expiration after resuming, advancing the expiration by only one period
will result in a target expiration in the past.  As a result, the delta
may be calculated as a negative value.  When the delta is converted into
an absolute value (tscdeadline is an unsigned u64), the resulting value
can overflow what the HV timer is capable of programming.  I.e. the large
value will exceed the VMX Preemption Timer's maximum bit width of
cpu_preemption_timer_multi + 32, and thus cause KVM to switch from the
HV timer to the software timer (hrtimers).

After switching to the software timer, periodic timer expiration callbacks
may be executed consecutively within a single clock interrupt handler,
because hrtimers honors KVM's request for an expiration in the past and
immediately re-invokes KVM's callback after reprogramming.  And because
the interrupt handler runs with IRQs disabled, restarting KVM's hrtimer
over and over until the target expiration is advanced to "now" can result
in a hard lockup.

E.g. the following hard lockup was triggered in the host when running a
Windows VM (only relevant because it used the APIC timer in periodic mode)
after resuming the VM from a long suspend (in the host).

  NMI watchdog: Watchdog detected hard LOCKUP on cpu 45
  ...
  RIP: 0010:advance_periodic_target_expiration+0x4d/0x80 [kvm]
  ...
  RSP: 0018:ff4f88f5d98d8ef0 EFLAGS: 00000046
  RAX: fff0103f91be678e RBX: fff0103f91be678e RCX: 00843a7d9e127bcc
  RDX: 0000000000000002 RSI: 0052ca4003697505 RDI: ff440d5bfbdbd500
  RBP: ff440d5956f99200 R08: ff2ff2a42deb6a84 R09: 000000000002a6c0
  R10: 0122d794016332b3 R11: 0000000000000000 R12: ff440db1af39cfc0
  R13: ff440db1af39cfc0 R14: ffffffffc0d4a560 R15: ff440db1af39d0f8
  FS:  00007f04a6ffd700(0000) GS:ff440db1af380000(0000) knlGS:000000e38a3b8000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000d5651feff8 CR3: 000000684e038002 CR4: 0000000000773ee0
  PKRU: 55555554
  Call Trace:
   <IRQ>
   apic_timer_fn+0x31/0x50 [kvm]
   __hrtimer_run_queues+0x100/0x280
   hrtimer_interrupt+0x100/0x210
   ? ttwu_do_wakeup+0x19/0x160
   smp_apic_timer_interrupt+0x6a/0x130
   apic_timer_interrupt+0xf/0x20
   </IRQ>

Moreover, if the suspend duration of the virtual machine is not long enough
to trigger a hard lockup in this scenario, since commit 98c25ead5eda
("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86"), KVM
will continue using the software timer until the guest reprograms the APIC
timer in some way.  Since the periodic timer does not require frequent APIC
timer register programming, the guest may continue to use the software
timer in perpetuity.

Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode")
Cc: stable@vger.kernel.org
Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
[sean: massage comments and changelog]
Link: https://patch.msgid.link/20251113205114.1647493-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: x86: Explicitly set new periodic hrtimer expiration in apic_timer_fn()
fuqiang wang [Thu, 13 Nov 2025 20:51:12 +0000 (12:51 -0800)] 
KVM: x86: Explicitly set new periodic hrtimer expiration in apic_timer_fn()

commit 9633f180ce994ab293ce4924a9b7aaf4673aa114 upstream.

When restarting an hrtimer to emulate a the guest's APIC timer in periodic
mode, explicitly set the expiration using the target expiration computed
by advance_periodic_target_expiration() instead of adding the period to
the existing timer.  This will allow making adjustments to the expiration,
e.g. to deal with expirations far in the past, without having to implement
the same logic in both advance_periodic_target_expiration() and
apic_timer_fn().

Cc: stable@vger.kernel.org
Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
[sean: split to separate patch, write changelog]
Link: https://patch.msgid.link/20251113205114.1647493-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: x86: WARN if hrtimer callback for periodic APIC timer fires with period=0
Sean Christopherson [Thu, 13 Nov 2025 20:51:11 +0000 (12:51 -0800)] 
KVM: x86: WARN if hrtimer callback for periodic APIC timer fires with period=0

commit 0ea9494be9c931ddbc084ad5e11fda91b554cf47 upstream.

WARN and don't restart the hrtimer if KVM's callback runs with the guest's
APIC timer in periodic mode but with a period of '0', as not advancing the
hrtimer's deadline would put the CPU into an infinite loop of hrtimer
events.  Observing a period of '0' should be impossible, even when the
hrtimer is running on a different CPU than the vCPU, as KVM is supposed to
cancel the hrtimer before changing (or zeroing) the period, e.g. when
switching from periodic to one-shot.

Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251113205114.1647493-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agopowerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf()
Finn Thain [Sun, 9 Nov 2025 23:30:22 +0000 (10:30 +1100)] 
powerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf()

commit b94b73567561642323617155bf4ee24ef0d258fe upstream.

Since Linux v6.7, booting using BootX on an Old World PowerMac produces
an early crash. Stan Johnson writes, "the symptoms are that the screen
goes blank and the backlight stays on, and the system freezes (Linux
doesn't boot)."

Further testing revealed that the failure can be avoided by disabling
CONFIG_BOOTX_TEXT. Bisection revealed that the regression was caused by
a change to the font bitmap pointer that's used when btext_init() begins
painting characters on the display, early in the boot process.

Christophe Leroy explains, "before kernel text is relocated to its final
location ... data is addressed with an offset which is added to the
Global Offset Table (GOT) entries at the start of bootx_init()
by function reloc_got2(). But the pointers that are located inside a
structure are not referenced in the GOT and are therefore not updated by
reloc_got2(). It is therefore needed to apply the offset manually by using
PTRRELOC() macro."

Cc: stable@vger.kernel.org
Link: https://lists.debian.org/debian-powerpc/2025/10/msg00111.html
Link: https://lore.kernel.org/linuxppc-dev/d81ddca8-c5ee-d583-d579-02b19ed95301@yahoo.com/
Reported-by: Cedar Maxwell <cedarmaxwell@mac.com>
Closes: https://lists.debian.org/debian-powerpc/2025/09/msg00031.html
Bisected-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Stan Johnson <userm57@yahoo.com>
Fixes: 0ebc7feae79a ("powerpc: Use shared font data")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/22b3b247425a052b079ab84da926706b3702c2c7.1762731022.git.fthain@linux-m68k.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agolibceph: make decode_pool() more resilient against corrupted osdmaps
Ilya Dryomov [Tue, 2 Dec 2025 09:32:31 +0000 (10:32 +0100)] 
libceph: make decode_pool() more resilient against corrupted osdmaps

commit 8c738512714e8c0aa18f8a10c072d5b01c83db39 upstream.

If the osdmap is (maliciously) corrupted such that the encoded length
of ceph_pg_pool envelope is less than what is expected for a particular
encoding version, out-of-bounds reads may ensue because the only bounds
check that is there is based on that length value.

This patch adds explicit bounds checks for each field that is decoded
or skipped.

Cc: stable@vger.kernel.org
Reported-by: ziming zhang <ezrakiez@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Tested-by: ziming zhang <ezrakiez@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agolib/crypto: riscv/chacha: Avoid s0/fp register
Vivian Wang [Tue, 2 Dec 2025 05:25:07 +0000 (13:25 +0800)] 
lib/crypto: riscv/chacha: Avoid s0/fp register

commit 43169328c7b4623b54b7713ec68479cebda5465f upstream.

In chacha_zvkb, avoid using the s0 register, which is the frame pointer,
by reallocating KEY0 to t5. This makes stack traces available if e.g. a
crash happens in chacha_zvkb.

No frame pointer maintenance is otherwise required since this is a leaf
function.

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20251202-riscv-chacha_zvkb-fp-v2-1-7bd00098c9dc@iscas.ac.cn
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agodm-pcache: advance slot index before writing slot
Dongsheng Yang [Fri, 5 Dec 2025 05:46:18 +0000 (05:46 +0000)] 
dm-pcache: advance slot index before writing slot

commit ebbb90344a7da2421e4b54668b94e81828b8b308 upstream.

In dm-pcache, in order to ensure crash-consistency, a dual-copy scheme
is used to alternately update metadata, and there is a slot index that
records the current slot. However, in the write path the current
implementation writes directly to the current slot indexed by slot
index, and then advances the slot — which ends up overwriting the
existing slot, violating the crash-consistency guarantee.

This patch fixes that behavior, preventing metadata from being
overwritten incorrectly.

In addition, this patch add a missing pmem_wmb() after memcpy_flushcache().

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Cc: stable@vger.kernel.org # 6.18
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoparisc: Do not reprogram affinitiy on ASP chip
Helge Deller [Tue, 25 Nov 2025 14:23:02 +0000 (15:23 +0100)] 
parisc: Do not reprogram affinitiy on ASP chip

commit dca7da244349eef4d78527cafc0bf80816b261f5 upstream.

The ASP chip is a very old variant of the GSP chip and is used e.g. in
HP 730 workstations. When trying to reprogram the affinity it will crash
with a HPMC as the relevant registers don't seem to be at the usual
location.  Let's avoid the crash by checking the sversion. Also note,
that reprogramming isn't necessary either, as the HP730 is a just a
single-CPU machine.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoscs: fix a wrong parameter in __scs_magic
Zhichi Lin [Sat, 11 Oct 2025 08:22:22 +0000 (16:22 +0800)] 
scs: fix a wrong parameter in __scs_magic

commit 08bd4c46d5e63b78e77f2605283874bbe868ab19 upstream.

__scs_magic() needs a 'void *' variable, but a 'struct task_struct *' is
given.  'task_scs(tsk)' is the starting address of the task's shadow call
stack, and '__scs_magic(task_scs(tsk))' is the end address of the task's
shadow call stack.  Here should be '__scs_magic(task_scs(tsk))'.

The user-visible effect of this bug is that when CONFIG_DEBUG_STACK_USAGE
is enabled, the shadow call stack usage checking function
(scs_check_usage) would scan an incorrect memory range.  This could lead
to:

1. **Inaccurate stack usage reporting**: The function would calculate
   wrong usage statistics for the shadow call stack, potentially showing
   incorrect value in kmsg.

2. **Potential kernel crash**: If the value of __scs_magic(tsk)is
   greater than that of __scs_magic(task_scs(tsk)), the for loop may
   access unmapped memory, potentially causing a kernel panic.  However,
   this scenario is unlikely because task_struct is allocated via the slab
   allocator (which typically returns lower addresses), while the shadow
   call stack returned by task_scs(tsk) is allocated via vmalloc(which
   typically returns higher addresses).

However, since this is purely a debugging feature
(CONFIG_DEBUG_STACK_USAGE), normal production systems should be not
unaffected.  The bug only impacts developers and testers who are actively
debugging stack usage with this configuration enabled.

Link: https://lkml.kernel.org/r/20251011082222.12965-1-zhichi.lin@vivo.com
Fixes: 5bbaf9d1fcb9 ("scs: Add support for stack usage debugging")
Signed-off-by: Jiyuan Xie <xiejiyuan@vivo.com>
Signed-off-by: Zhichi Lin <zhichi.lin@vivo.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yee Lee <yee.lee@mediatek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agomedia: iris: Add sanity check for stop streaming
Wangao Wang [Mon, 27 Oct 2025 09:35:59 +0000 (17:35 +0800)] 
media: iris: Add sanity check for stop streaming

commit ad699fa78b59241c9d71a8cafb51525f3dab04d4 upstream.

Add sanity check in iris_vb2_stop_streaming. If inst->state is
already IRIS_INST_ERROR, we should skip the stream_off operation
because it would still send packets to the firmware.

In iris_kill_session, inst->state is set to IRIS_INST_ERROR and
session_close is executed, which will kfree(inst_hfi_gen2->packet).
If stop_streaming is called afterward, it will cause a crash.

Fixes: 11712ce70f8e5 ("media: iris: implement vb2 streaming ops")
Cc: stable@vger.kernel.org
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Dikshita Agarwal <dikshita.agarwal@oss.qualcomm.com>
Signed-off-by: Wangao Wang <wangao.wang@oss.qualcomm.com>
Reviewed-by: Vikash Garodia <vikash.garodia@oss.qualcomm.com>
[bod: remove qcom from patch title]
Signed-off-by: Bryan O'Donoghue <bod@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoplatform/chrome: cros_ec_ishtp: Fix UAF after unbinding driver
Tzung-Bi Shih [Fri, 31 Oct 2025 03:39:00 +0000 (03:39 +0000)] 
platform/chrome: cros_ec_ishtp: Fix UAF after unbinding driver

commit 944edca81e7aea15f83cf9a13a6ab67f711e8abd upstream.

After unbinding the driver, another kthread `cros_ec_console_log_work`
is still accessing the device, resulting an UAF and crash.

The driver doesn't unregister the EC device in .remove() which should
shutdown sub-devices synchronously.  Fix it.

Fixes: 26a14267aff2 ("platform/chrome: Add ChromeOS EC ISHTP driver")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20251031033900.3577394-1-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: x86: Don't clear async #PF queue when CR0.PG is disabled (e.g. on #SMI)
Maxim Levitsky [Wed, 15 Oct 2025 03:32:58 +0000 (23:32 -0400)] 
KVM: x86: Don't clear async #PF queue when CR0.PG is disabled (e.g. on #SMI)

commit ab4e41eb9fabd4607304fa7cfe8ec9c0bd8e1552 upstream.

Fix an interaction between SMM and PV asynchronous #PFs where an #SMI can
cause KVM to drop an async #PF ready event, and thus result in guest tasks
becoming permanently stuck due to the task that encountered the #PF never
being resumed.  Specifically, don't clear the completion queue when paging
is disabled, and re-check for completed async #PFs if/when paging is
enabled.

Prior to commit 2635b5c4a0e4 ("KVM: x86: interrupt based APF 'page ready'
event delivery"), flushing the APF queue without notifying the guest of
completed APF requests when paging is disabled was "necessary", in that
delivering a #PF to the guest when paging is disabled would likely confuse
and/or crash the guest.  And presumably the original async #PF development
assumed that a guest would only disable paging when there was no intent to
ever re-enable paging.

That assumption fails in several scenarios, most visibly on an emulated
SMI, as entering SMM always disables CR0.PG (i.e. initially runs with
paging disabled).  When the SMM handler eventually executes RSM, the
interrupted paging-enabled is restored, and the async #PF event is lost.

Similarly, invoking firmware, e.g. via EFI runtime calls, might require a
transition through paging modes and thus also disable paging with valid
entries in the competion queue.

To avoid dropping completion events, drop the "clear" entirely, and handle
paging-enable transitions in the same way KVM already handles APIC
enable/disable events: if a vCPU's APIC is disabled, APF completion events
are not kept pending and not injected while APIC is disabled.  Once a
vCPU's APIC is re-enabled, KVM raises KVM_REQ_APF_READY so that the vCPU
recognizes any pending pending #APF ready events.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251015033258.50974-4-mlevitsk@redhat.com
[sean: rework changelog to call out #PF injection, drop "real mode"
       references, expand the code comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoocfs2: fix kernel BUG in ocfs2_find_victim_chain
Prithvi Tambewagh [Mon, 1 Dec 2025 13:07:11 +0000 (18:37 +0530)] 
ocfs2: fix kernel BUG in ocfs2_find_victim_chain

commit 039bef30e320827bac8990c9f29d2a68cd8adb5f upstream.

syzbot reported a kernel BUG in ocfs2_find_victim_chain() because the
`cl_next_free_rec` field of the allocation chain list (next free slot in
the chain list) is 0, triggring the BUG_ON(!cl->cl_next_free_rec)
condition in ocfs2_find_victim_chain() and panicking the kernel.

To fix this, an if condition is introduced in ocfs2_claim_suballoc_bits(),
just before calling ocfs2_find_victim_chain(), the code block in it being
executed when either of the following conditions is true:

1. `cl_next_free_rec` is equal to 0, indicating that there are no free
chains in the allocation chain list
2. `cl_next_free_rec` is greater than `cl_count` (the total number of
chains in the allocation chain list)

Either of them being true is indicative of the fact that there are no
chains left for usage.

This is addressed using ocfs2_error(), which prints
the error log for debugging purposes, rather than panicking the kernel.

Link: https://lkml.kernel.org/r/20251201130711.143900-1-activprithvi@gmail.com
Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com>
Reported-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96d38c6e1655c1420a72
Tested-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agomedia: vidtv: initialize local pointers upon transfer of memory ownership
Jeongjun Park [Fri, 5 Sep 2025 05:18:16 +0000 (14:18 +0900)] 
media: vidtv: initialize local pointers upon transfer of memory ownership

commit 98aabfe2d79f74613abc2b0b1cef08f97eaf5322 upstream.

vidtv_channel_si_init() creates a temporary list (program, service, event)
and ownership of the memory itself is transferred to the PAT/SDT/EIT
tables through vidtv_psi_pat_program_assign(),
vidtv_psi_sdt_service_assign(), vidtv_psi_eit_event_assign().

The problem here is that the local pointer where the memory ownership
transfer was completed is not initialized to NULL. This causes the
vidtv_psi_pmt_create_sec_for_each_pat_entry() function to fail, and
in the flow that jumps to free_eit, the memory that was freed by
vidtv_psi_*_table_destroy() can be accessed again by
vidtv_psi_*_event_destroy() due to the uninitialized local pointer, so it
is freed once again.

Therefore, to prevent use-after-free and double-free vulnerability,
local pointers must be initialized to NULL when transferring memory
ownership.

Cc: <stable@vger.kernel.org>
Reported-by: syzbot+1d9c0edea5907af239e0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1d9c0edea5907af239e0
Fixes: 3be8037960bc ("media: vidtv: add error checks")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agomm/slub: reset KASAN tag in defer_free() before accessing freed memory
Deepanshu Kartikey [Wed, 10 Dec 2025 02:20:24 +0000 (07:50 +0530)] 
mm/slub: reset KASAN tag in defer_free() before accessing freed memory

commit 53ca00a19d345197a37a1bf552e8d1e7b091666c upstream.

When CONFIG_SLUB_TINY is enabled, kfree_nolock() calls kasan_slab_free()
before defer_free(). On ARM64 with MTE (Memory Tagging Extension),
kasan_slab_free() poisons the memory and changes the tag from the
original (e.g., 0xf3) to a poison tag (0xfe).

When defer_free() then tries to write to the freed object to build the
deferred free list via llist_add(), the pointer still has the old tag,
causing a tag mismatch and triggering a KASAN use-after-free report:

  BUG: KASAN: slab-use-after-free in defer_free+0x3c/0xbc mm/slub.c:6537
  Write at addr f3f000000854f020 by task kworker/u8:6/983
  Pointer tag: [f3], memory tag: [fe]

Fix this by calling kasan_reset_tag() before accessing the freed memory.
This is safe because defer_free() is part of the allocator itself and is
expected to manipulate freed memory for bookkeeping purposes.

Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Cc: stable@vger.kernel.org
Reported-by: syzbot+7a25305a76d872abcfa1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7a25305a76d872abcfa1
Tested-by: syzbot+7a25305a76d872abcfa1@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://patch.msgid.link/20251210022024.3255826-1-kartikey406@gmail.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoKVM: Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot
Sean Christopherson [Tue, 2 Dec 2025 02:03:33 +0000 (18:03 -0800)] 
KVM: Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot

commit 9935df5333aa503a18de5071f53762b65c783c4c upstream.

Reject attempts to disable KVM_MEM_GUEST_MEMFD on a memslot that was
initially created with a guest_memfd binding, as KVM doesn't support
toggling KVM_MEM_GUEST_MEMFD on existing memslots.  KVM prevents enabling
KVM_MEM_GUEST_MEMFD, but doesn't prevent clearing the flag.

Failure to reject the new memslot results in a use-after-free due to KVM
not unbinding from the guest_memfd instance.  Unbinding on a FLAGS_ONLY
change is easy enough, and can/will be done as a hardening measure (in
anticipation of KVM supporting dirty logging on guest_memfd at some point),
but fixing the use-after-free would only address the immediate symptom.

  ==================================================================
  BUG: KASAN: slab-use-after-free in kvm_gmem_release+0x362/0x400 [kvm]
  Write of size 8 at addr ffff8881111ae908 by task repro/745

  CPU: 7 UID: 1000 PID: 745 Comm: repro Not tainted 6.18.0-rc6-115d5de2eef3-next-kasan #3 NONE
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   <TASK>
   dump_stack_lvl+0x51/0x60
   print_report+0xcb/0x5c0
   kasan_report+0xb4/0xe0
   kvm_gmem_release+0x362/0x400 [kvm]
   __fput+0x2fa/0x9d0
   task_work_run+0x12c/0x200
   do_exit+0x6ae/0x2100
   do_group_exit+0xa8/0x230
   __x64_sys_exit_group+0x3a/0x50
   x64_sys_call+0x737/0x740
   do_syscall_64+0x5b/0x900
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f581f2eac31
   </TASK>

  Allocated by task 745 on cpu 6 at 9.746971s:
   kasan_save_stack+0x20/0x40
   kasan_save_track+0x13/0x50
   __kasan_kmalloc+0x77/0x90
   kvm_set_memory_region.part.0+0x652/0x1110 [kvm]
   kvm_vm_ioctl+0x14b0/0x3290 [kvm]
   __x64_sys_ioctl+0x129/0x1a0
   do_syscall_64+0x5b/0x900
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

  Freed by task 745 on cpu 6 at 9.747467s:
   kasan_save_stack+0x20/0x40
   kasan_save_track+0x13/0x50
   __kasan_save_free_info+0x37/0x50
   __kasan_slab_free+0x3b/0x60
   kfree+0xf5/0x440
   kvm_set_memslot+0x3c2/0x1160 [kvm]
   kvm_set_memory_region.part.0+0x86a/0x1110 [kvm]
   kvm_vm_ioctl+0x14b0/0x3290 [kvm]
   __x64_sys_ioctl+0x129/0x1a0
   do_syscall_64+0x5b/0x900
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

Reported-by: Alexander Potapenko <glider@google.com>
Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251202020334.1171351-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agopinctrl: renesas: rzg2l: Fix ISEL restore on resume
Claudiu Beznea [Fri, 12 Sep 2025 09:53:08 +0000 (12:53 +0300)] 
pinctrl: renesas: rzg2l: Fix ISEL restore on resume

commit 44bf66122c12ef6d3382a9b84b9be1802e5f0e95 upstream.

Commit 1d2da79708cb ("pinctrl: renesas: rzg2l: Avoid configuring ISEL in
gpio_irq_{en,dis}able*()") dropped the configuration of ISEL from
struct irq_chip::{irq_enable, irq_disable} APIs and moved it to
struct gpio_chip::irq::{child_to_parent_hwirq,
child_irq_domain_ops::free} APIs to fix spurious IRQs.

After commit 1d2da79708cb ("pinctrl: renesas: rzg2l: Avoid configuring ISEL
in gpio_irq_{en,dis}able*()"), ISEL was no longer configured properly on
resume. This is because the pinctrl resume code used
struct irq_chip::irq_enable  (called from rzg2l_gpio_irq_restore()) to
reconfigure the wakeup interrupts. Some drivers (e.g. Ethernet) may also
reconfigure non-wakeup interrupts on resume through their own code,
eventually calling struct irq_chip::irq_enable.

Fix this by adding ISEL configuration back into the
struct irq_chip::irq_enable API and on resume path for wakeup interrupts.

As struct irq_chip::irq_enable needs now to lock to update the ISEL,
convert the struct rzg2l_pinctrl::lock to a raw spinlock and replace the
locking API calls with the raw variants. Otherwise the lockdep reports
invalid wait context when probing the adv7511 module on RZ/G2L:

 [ BUG: Invalid wait context ]
 6.17.0-rc5-next-20250911-00001-gfcfac22533c9 #18 Not tainted
 -----------------------------
 (udev-worker)/165 is trying to lock:
 ffff00000e3664a8 (&pctrl->lock){....}-{3:3}, at: rzg2l_gpio_irq_enable+0x38/0x78
 other info that might help us debug this:
 context-{5:5}
 3 locks held by (udev-worker)/165:
 #0: ffff00000e890108 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x90/0x1ac
 #1: ffff000011c07240 (request_class){+.+.}-{4:4}, at: __setup_irq+0xb4/0x6dc
 #2: ffff000011c070c8 (lock_class){....}-{2:2}, at: __setup_irq+0xdc/0x6dc
 stack backtrace:
 CPU: 1 UID: 0 PID: 165 Comm: (udev-worker) Not tainted 6.17.0-rc5-next-20250911-00001-gfcfac22533c9 #18 PREEMPT
 Hardware name: Renesas SMARC EVK based on r9a07g044l2 (DT)
 Call trace:
 show_stack+0x18/0x24 (C)
 dump_stack_lvl+0x90/0xd0
 dump_stack+0x18/0x24
 __lock_acquire+0xa14/0x20b4
 lock_acquire+0x1c8/0x354
 _raw_spin_lock_irqsave+0x60/0x88
 rzg2l_gpio_irq_enable+0x38/0x78
 irq_enable+0x40/0x8c
 __irq_startup+0x78/0xa4
 irq_startup+0x108/0x16c
 __setup_irq+0x3c0/0x6dc
 request_threaded_irq+0xec/0x1ac
 devm_request_threaded_irq+0x80/0x134
 adv7511_probe+0x928/0x9a4 [adv7511]
 i2c_device_probe+0x22c/0x3dc
 really_probe+0xbc/0x2a0
 __driver_probe_device+0x78/0x12c
 driver_probe_device+0x40/0x164
 __driver_attach+0x9c/0x1ac
 bus_for_each_dev+0x74/0xd0
 driver_attach+0x24/0x30
 bus_add_driver+0xe4/0x208
 driver_register+0x60/0x128
 i2c_register_driver+0x48/0xd0
 adv7511_init+0x5c/0x1000 [adv7511]
 do_one_initcall+0x64/0x30c
 do_init_module+0x58/0x23c
 load_module+0x1bcc/0x1d40
 init_module_from_file+0x88/0xc4
 idempotent_init_module+0x188/0x27c
 __arm64_sys_finit_module+0x68/0xac
 invoke_syscall+0x48/0x110
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x4c/0x160
 el0t_64_sync_handler+0xa0/0xe4
 el0t_64_sync+0x198/0x19c

Having ISEL configuration back into the struct irq_chip::irq_enable API
should be safe with respect to spurious IRQs, as in the probe case IRQs
are enabled anyway in struct gpio_chip::irq::child_to_parent_hwirq. No
spurious IRQs were detected on suspend/resume, boot, ethernet link
insert/remove tests (executed on RZ/G3S). Boot, ethernet link
insert/remove tests were also executed successfully on RZ/G2L.

Fixes: 1d2da79708cb ("pinctrl: renesas: rzg2l: Avoid configuring ISEL in gpio_irq_{en,dis}able*(")
Cc: stable@vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250912095308.3603704-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agotools/testing/nvdimm: Use per-DIMM device handle
Alison Schofield [Fri, 31 Oct 2025 23:42:20 +0000 (16:42 -0700)] 
tools/testing/nvdimm: Use per-DIMM device handle

commit f59b701b4674f7955170b54c4167c5590f4714eb upstream.

KASAN reports a global-out-of-bounds access when running these nfit
tests: clear.sh, pmem-errors.sh, pfn-meta-errors.sh, btt-errors.sh,
daxdev-errors.sh, and inject-error.sh.

[] BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x769f/0x7840 [nfit_test]
[] Read of size 4 at addr ffffffffc03ea01c by task ndctl/1215
[] The buggy address belongs to the variable:
[] handle+0x1c/0x1df4 [nfit_test]

nfit_test_search_spa() uses handle[nvdimm->id] to retrieve a device
handle and triggers a KASAN error when it reads past the end of the
handle array. It should not be indexing the handle array at all.

The correct device handle is stored in per-DIMM test data. Each DIMM
has a struct nfit_mem that embeds a struct acpi_nfit_memdev that
describes the NFIT device handle. Use that device handle here.

Fixes: 10246dc84dfc ("acpi nfit: nfit_test supports translate SPA")
Cc: stable@vger.kernel.org
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Link: https://patch.msgid.link/20251031234227.1303113-1-alison.schofield@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix return value of f2fs_recover_fsync_data()
Chao Yu [Wed, 5 Nov 2025 06:50:22 +0000 (14:50 +0800)] 
f2fs: fix return value of f2fs_recover_fsync_data()

commit 01fba45deaddcce0d0b01c411435d1acf6feab7b upstream.

With below scripts, it will trigger panic in f2fs:

mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync
echo 111 >> /mnt/f2fs/foo
f2fs_io fsync /mnt/f2fs/foo
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
mount -o ro,norecovery /dev/vdd /mnt/f2fs
or
mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs

F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0
F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f
F2FS-fs (vdd): Stopped filesystem due to reason: 0
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1
Filesystem f2fs get_tree() didn't set fc->root, returned 1
------------[ cut here ]------------
kernel BUG at fs/super.c:1761!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:vfs_get_tree.cold+0x18/0x1a
Call Trace:
 <TASK>
 fc_mount+0x13/0xa0
 path_mount+0x34e/0xc50
 __x64_sys_mount+0x121/0x150
 do_syscall_64+0x84/0x800
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa6cc126cfe

The root cause is we missed to handle error number returned from
f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or
ro,disable_roll_forward mount option, result in returning a positive
error number to vfs_get_tree(), fix it.

Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix to not account invalid blocks in get_left_section_blocks()
Chao Yu [Fri, 28 Nov 2025 09:25:07 +0000 (17:25 +0800)] 
f2fs: fix to not account invalid blocks in get_left_section_blocks()

commit 37345eae9deaa2e4f372eeb98f6594cd0ee0916e upstream.

w/ LFS mode, in get_left_section_blocks(), we should not account the
blocks which were used before and now are invalided, otherwise those
blocks will be counted as freed one in has_curseg_enough_space(), result
in missing to trigger GC in time.

Cc: stable@kernel.org
Fixes: 249ad438e1d9 ("f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode.")
Fixes: bf34c93d2645 ("f2fs: check curseg space before foreground GC")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes()
Chao Yu [Wed, 5 Nov 2025 06:50:23 +0000 (14:50 +0800)] 
f2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes()

commit 68d05693f8c031257a0822464366e1c2a239a512 upstream.

mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync # avoid CP_UMOUNT_FLAG in last f2fs_checkpoint.ckpt_flags
touch /mnt/f2fs/bar
f2fs_io fsync /mnt/f2fs/bar
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
blockdev --setro /dev/vdd
mount /dev/vdd /mnt/f2fs
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.

For the case if we create and fsync a new inode before sudden power-cut,
without norecovery or disable_roll_forward mount option, the following
mount will succeed w/o recovering last fsynced inode.

The problem here is that we only check inode_list list after
find_fsync_dnodes() in f2fs_recover_fsync_data() to find out whether
there is recoverable data in the iamge, but there is a missed case, if
last fsynced inode is not existing in last checkpoint, then, we will
fail to get its inode due to nat of inode node is not existing in last
checkpoint, so the inode won't be linked in inode_list.

Let's detect such case in dyrun mode to fix this issue.

After this change, mount will fail as expected below:
mount: /mnt/f2fs: cannot mount /dev/vdd read-only.
       dmesg(1) may have more information after failed mount system call.
demsg:
F2FS-fs (vdd): Need to recover fsync data, but write access unavailable, please try mount w/ disable_roll_forward or norecovery

Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix uninitialized one_time_gc in victim_sel_policy
Xiaole He [Wed, 29 Oct 2025 05:18:07 +0000 (13:18 +0800)] 
f2fs: fix uninitialized one_time_gc in victim_sel_policy

commit 392711ef18bff524a873b9c239a73148c5432262 upstream.

The one_time_gc field in struct victim_sel_policy is conditionally
initialized but unconditionally read, leading to undefined behavior
that triggers UBSAN warnings.

In f2fs_get_victim() at fs/f2fs/gc.c:774, the victim_sel_policy
structure is declared without initialization:

    struct victim_sel_policy p;

The field p.one_time_gc is only assigned when the 'one_time' parameter
is true (line 789):

    if (one_time) {
        p.one_time_gc = one_time;
        ...
    }

However, this field is unconditionally read in subsequent get_gc_cost()
at line 395:

    if (p->one_time_gc && (valid_thresh_ratio < 100) && ...)

When one_time is false, p.one_time_gc contains uninitialized stack
memory. Hence p.one_time_gc is an invalid bool value.

UBSAN detects this invalid bool value:

    UBSAN: invalid-load in fs/f2fs/gc.c:395:7
    load of value 77 is not a valid value for type '_Bool'
    CPU: 3 UID: 0 PID: 1297 Comm: f2fs_gc-252:16 Not tainted 6.18.0-rc3
    #5 PREEMPT(voluntary)
    Hardware name: OpenStack Foundation OpenStack Nova,
    BIOS 1.13.0-1ubuntu1.1 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x70/0x90
     dump_stack+0x14/0x20
     __ubsan_handle_load_invalid_value+0xb3/0xf0
     ? dl_server_update+0x2e/0x40
     ? update_curr+0x147/0x170
     f2fs_get_victim.cold+0x66/0x134 [f2fs]
     ? sched_balance_newidle+0x2ca/0x470
     ? finish_task_switch.isra.0+0x8d/0x2a0
     f2fs_gc+0x2ba/0x8e0 [f2fs]
     ? _raw_spin_unlock_irqrestore+0x12/0x40
     ? __timer_delete_sync+0x80/0xe0
     ? timer_delete_sync+0x14/0x20
     ? schedule_timeout+0x82/0x100
     gc_thread_func+0x38b/0x860 [f2fs]
     ? gc_thread_func+0x38b/0x860 [f2fs]
     ? __pfx_autoremove_wake_function+0x10/0x10
     kthread+0x10b/0x220
     ? __pfx_gc_thread_func+0x10/0x10 [f2fs]
     ? _raw_spin_unlock_irq+0x12/0x40
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x11a/0x160
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>

This issue is reliably reproducible with the following steps on a
100GB SSD /dev/vdb:

    mkfs.f2fs -f /dev/vdb
    mount /dev/vdb /mnt/f2fs_test
    fio --name=gc --directory=/mnt/f2fs_test --rw=randwrite \
        --bs=4k --size=8G --numjobs=12 --fsync=4 --runtime=10 \
        --time_based
    echo 1 > /sys/fs/f2fs/vdb/gc_urgent

The uninitialized value causes incorrect GC victim selection, leading
to unpredictable garbage collection behavior.

Fix by zero-initializing the entire victim_sel_policy structure to
ensure all fields have defined values.

Fixes: e791d00bd06c ("f2fs: add valid block ratio not to do excessive GC for one time GC")
Cc: stable@kernel.org
Signed-off-by: Xiaole He <hexiaole1994@126.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix age extent cache insertion skip on counter overflow
Xiaole He [Mon, 27 Oct 2025 09:23:41 +0000 (17:23 +0800)] 
f2fs: fix age extent cache insertion skip on counter overflow

commit 27bf6a637b7613fc85fa6af468b7d612d78cd5c0 upstream.

The age extent cache uses last_blocks (derived from
allocated_data_blocks) to determine data age. However, there's a
conflict between the deletion
marker (last_blocks=0) and legitimate last_blocks=0 cases when
allocated_data_blocks overflows to 0 after reaching ULLONG_MAX.

In this case, valid extents are incorrectly skipped due to the
"if (!tei->last_blocks)" check in __update_extent_tree_range().

This patch fixes the issue by:
1. Reserving ULLONG_MAX as an invalid/deletion marker
2. Limiting allocated_data_blocks to range [0, ULLONG_MAX-1]
3. Using F2FS_EXTENT_AGE_INVALID for deletion scenarios
4. Adjusting overflow age calculation from ULLONG_MAX to (ULLONG_MAX-1)

Reproducer (using a patched kernel with allocated_data_blocks
initialized to ULLONG_MAX - 3 for quick testing):

Step 1: Mount and check initial state
  # dd if=/dev/zero of=/tmp/test.img bs=1M count=100
  # mkfs.f2fs -f /tmp/test.img
  # mkdir -p /mnt/f2fs_test
  # mount -t f2fs -o loop,age_extent_cache /tmp/test.img /mnt/f2fs_test
  # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
  Allocated Data Blocks: 18446744073709551612 # ULLONG_MAX - 3
  Inner Struct Count: tree: 1(0), node: 0

Step 2: Create files and write data to trigger overflow
  # touch /mnt/f2fs_test/{1,2,3,4}.txt; sync
  # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
  Allocated Data Blocks: 18446744073709551613 # ULLONG_MAX - 2
  Inner Struct Count: tree: 5(0), node: 1

  # dd if=/dev/urandom of=/mnt/f2fs_test/1.txt bs=4K count=1; sync
  # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
  Allocated Data Blocks: 18446744073709551614 # ULLONG_MAX - 1
  Inner Struct Count: tree: 5(0), node: 2

  # dd if=/dev/urandom of=/mnt/f2fs_test/2.txt bs=4K count=1; sync
  # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
  Allocated Data Blocks: 18446744073709551615 # ULLONG_MAX
  Inner Struct Count: tree: 5(0), node: 3

  # dd if=/dev/urandom of=/mnt/f2fs_test/3.txt bs=4K count=1; sync
  # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
  Allocated Data Blocks: 0 # Counter overflowed!
  Inner Struct Count: tree: 5(0), node: 4

Step 3: Trigger the bug - next write should create node but gets skipped
  # dd if=/dev/urandom of=/mnt/f2fs_test/4.txt bs=4K count=1; sync
  # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
  Allocated Data Blocks: 1
  Inner Struct Count: tree: 5(0), node: 4

  Expected: node: 5 (new extent node for 4.txt)
  Actual: node: 4 (extent insertion was incorrectly skipped due to
  last_blocks = allocated_data_blocks = 0 in __get_new_block_age)

After this fix, the extent node is correctly inserted and node count
becomes 5 as expected.

Fixes: 71644dff4811 ("f2fs: add block_age-based extent cache")
Cc: stable@kernel.org
Signed-off-by: Xiaole He <hexiaole1994@126.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: use global inline_xattr_slab instead of per-sb slab cache
Chao Yu [Tue, 21 Oct 2025 03:48:56 +0000 (11:48 +0800)] 
f2fs: use global inline_xattr_slab instead of per-sb slab cache

commit 1f27ef42bb0b7c0740c5616ec577ec188b8a1d05 upstream.

As Hong Yun reported in mailing list:

loop7: detected capacity change from 0 to 131072
------------[ cut here ]------------
kmem_cache of name 'f2fs_xattr_entry-7:7' already exists
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 kmem_cache_sanity_check mm/slab_common.c:109 [inline]
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 __kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
CPU: 0 UID: 0 PID: 24426 Comm: syz.7.1370 Not tainted 6.17.0-rc4 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:kmem_cache_sanity_check mm/slab_common.c:109 [inline]
RIP: 0010:__kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
Call Trace:
 __kmem_cache_create include/linux/slab.h:353 [inline]
 f2fs_kmem_cache_create fs/f2fs/f2fs.h:2943 [inline]
 f2fs_init_xattr_caches+0xa5/0xe0 fs/f2fs/xattr.c:843
 f2fs_fill_super+0x1645/0x2620 fs/f2fs/super.c:4918
 get_tree_bdev_flags+0x1fb/0x260 fs/super.c:1692
 vfs_get_tree+0x43/0x140 fs/super.c:1815
 do_new_mount+0x201/0x550 fs/namespace.c:3808
 do_mount fs/namespace.c:4136 [inline]
 __do_sys_mount fs/namespace.c:4347 [inline]
 __se_sys_mount+0x298/0x2f0 fs/namespace.c:4324
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x8e/0x3a0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The bug can be reproduced w/ below scripts:
- mount /dev/vdb /mnt1
- mount /dev/vdc /mnt2
- umount /mnt1
- mounnt /dev/vdb /mnt1

The reason is if we created two slab caches, named f2fs_xattr_entry-7:3
and f2fs_xattr_entry-7:7, and they have the same slab size. Actually,
slab system will only create one slab cache core structure which has
slab name of "f2fs_xattr_entry-7:3", and two slab caches share the same
structure and cache address.

So, if we destroy f2fs_xattr_entry-7:3 cache w/ cache address, it will
decrease reference count of slab cache, rather than release slab cache
entirely, since there is one more user has referenced the cache.

Then, if we try to create slab cache w/ name "f2fs_xattr_entry-7:3" again,
slab system will find that there is existed cache which has the same name
and trigger the warning.

Let's changes to use global inline_xattr_slab instead of per-sb slab cache
for fixing.

Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups")
Cc: stable@kernel.org
Reported-by: Hong Yun <yhong@link.cuhk.edu.hk>
Tested-by: Hong Yun <yhong@link.cuhk.edu.hk>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: invalidate dentry cache on failed whiteout creation
Deepanshu Kartikey [Mon, 27 Oct 2025 13:06:34 +0000 (18:36 +0530)] 
f2fs: invalidate dentry cache on failed whiteout creation

commit d33f89b34aa313f50f9a512d58dd288999f246b0 upstream.

F2FS can mount filesystems with corrupted directory depth values that
get runtime-clamped to MAX_DIR_HASH_DEPTH. When RENAME_WHITEOUT
operations are performed on such directories, f2fs_rename performs
directory modifications (updating target entry and deleting source
entry) before attempting to add the whiteout entry via f2fs_add_link.

If f2fs_add_link fails due to the corrupted directory structure, the
function returns an error to VFS, but the partial directory
modifications have already been committed to disk. VFS assumes the
entire rename operation failed and does not update the dentry cache,
leaving stale mappings.

In the error path, VFS does not call d_move() to update the dentry
cache. This results in new_dentry still pointing to the old inode
(new_inode) which has already had its i_nlink decremented to zero.
The stale cache causes subsequent operations to incorrectly reference
the freed inode.

This causes subsequent operations to use cached dentry information that
no longer matches the on-disk state. When a second rename targets the
same entry, VFS attempts to decrement i_nlink on the stale inode, which
may already have i_nlink=0, triggering a WARNING in drop_nlink().

Example sequence:
1. First rename (RENAME_WHITEOUT): file2 → file1
   - f2fs updates file1 entry on disk (points to inode 8)
   - f2fs deletes file2 entry on disk
   - f2fs_add_link(whiteout) fails (corrupted directory)
   - Returns error to VFS
   - VFS does not call d_move() due to error
   - VFS cache still has: file1 → inode 7 (stale!)
   - inode 7 has i_nlink=0 (already decremented)

2. Second rename: file3 → file1
   - VFS uses stale cache: file1 → inode 7
   - Tries to drop_nlink on inode 7 (i_nlink already 0)
   - WARNING in drop_nlink()

Fix this by explicitly invalidating old_dentry and new_dentry when
f2fs_add_link fails during whiteout creation. This forces VFS to
refresh from disk on subsequent operations, ensuring cache consistency
even when the rename partially succeeds.

Reproducer:
1. Mount F2FS image with corrupted i_current_depth
2. renameat2(file2, file1, RENAME_WHITEOUT)
3. renameat2(file3, file1, 0)
4. System triggers WARNING in drop_nlink()

Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT")
Reported-by: syzbot+632cf32276a9a564188d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=632cf32276a9a564188d
Suggested-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/all/20251022233349.102728-1-kartikey406@gmail.com/
Cc: stable@vger.kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix to avoid updating zero-sized extent in extent cache
Chao Yu [Mon, 20 Oct 2025 02:42:12 +0000 (10:42 +0800)] 
f2fs: fix to avoid updating zero-sized extent in extent cache

commit 7c37c79510329cd951a4dedf3f7bf7e2b18dccec upstream.

As syzbot reported:

F2FS-fs (loop0): __update_extent_tree_range: extent len is zero, type: 0, extent [0, 0, 0], age [0, 0]
------------[ cut here ]------------
kernel BUG at fs/f2fs/extent_cache.c:678!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__update_extent_tree_range+0x13bc/0x1500 fs/f2fs/extent_cache.c:678
Call Trace:
 <TASK>
 f2fs_update_read_extent_cache_range+0x192/0x3e0 fs/f2fs/extent_cache.c:1085
 f2fs_do_zero_range fs/f2fs/file.c:1657 [inline]
 f2fs_zero_range+0x10c1/0x1580 fs/f2fs/file.c:1737
 f2fs_fallocate+0x583/0x990 fs/f2fs/file.c:2030
 vfs_fallocate+0x669/0x7e0 fs/open.c:342
 ioctl_preallocate fs/ioctl.c:289 [inline]
 file_ioctl+0x611/0x780 fs/ioctl.c:-1
 do_vfs_ioctl+0xb33/0x1430 fs/ioctl.c:576
 __do_sys_ioctl fs/ioctl.c:595 [inline]
 __se_sys_ioctl+0x82/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f07bc58eec9

In error path of f2fs_zero_range(), it may add a zero-sized extent
into extent cache, it should be avoided.

Fixes: 6e9619499f53 ("f2fs: support in batch fzero in dnode page")
Cc: stable@kernel.org
Reported-by: syzbot+24124df3170c3638b35f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68e5d698.050a0220.256323.0032.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix to propagate error from f2fs_enable_checkpoint()
Chao Yu [Mon, 27 Oct 2025 06:35:33 +0000 (14:35 +0800)] 
f2fs: fix to propagate error from f2fs_enable_checkpoint()

commit be112e7449a6e1b54aa9feac618825d154b3a5c7 upstream.

In order to let userspace detect such error rather than suffering
silent failure.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix to avoid potential deadlock
Chao Yu [Tue, 14 Oct 2025 11:47:35 +0000 (19:47 +0800)] 
f2fs: fix to avoid potential deadlock

commit ca8b201f28547e28343a6f00a6e91fa8c09572fe upstream.

As Jiaming Zhang and syzbot reported, there is potential deadlock in
f2fs as below:

Chain exists of:
  &sbi->cp_rwsem --> fs_reclaim --> sb_internal#2

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock(sb_internal#2);
                               lock(fs_reclaim);
                               lock(sb_internal#2);
  rlock(&sbi->cp_rwsem);

 *** DEADLOCK ***

3 locks held by kswapd0/73:
 #0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:7015 [inline]
 #0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x951/0x2800 mm/vmscan.c:7389
 #1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_trylock_shared fs/super.c:562 [inline]
 #1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_cache_scan+0x91/0x4b0 fs/super.c:197
 #2: ffff888011840610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x8d9/0x1b60 fs/f2fs/inode.c:890

stack backtrace:
CPU: 0 UID: 0 PID: 73 Comm: kswapd0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
 check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
 __lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
 lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
 down_read+0x46/0x2e0 kernel/locking/rwsem.c:1537
 f2fs_down_read fs/f2fs/f2fs.h:2278 [inline]
 f2fs_lock_op fs/f2fs/f2fs.h:2357 [inline]
 f2fs_do_truncate_blocks+0x21c/0x10c0 fs/f2fs/file.c:791
 f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:867
 f2fs_truncate+0x489/0x7c0 fs/f2fs/file.c:925
 f2fs_evict_inode+0x9f2/0x1b60 fs/f2fs/inode.c:897
 evict+0x504/0x9c0 fs/inode.c:810
 f2fs_evict_inode+0x1dc/0x1b60 fs/f2fs/inode.c:853
 evict+0x504/0x9c0 fs/inode.c:810
 dispose_list fs/inode.c:852 [inline]
 prune_icache_sb+0x21b/0x2c0 fs/inode.c:1000
 super_cache_scan+0x39b/0x4b0 fs/super.c:224
 do_shrink_slab+0x6ef/0x1110 mm/shrinker.c:437
 shrink_slab_memcg mm/shrinker.c:550 [inline]
 shrink_slab+0x7ef/0x10d0 mm/shrinker.c:628
 shrink_one+0x28a/0x7c0 mm/vmscan.c:4955
 shrink_many mm/vmscan.c:5016 [inline]
 lru_gen_shrink_node mm/vmscan.c:5094 [inline]
 shrink_node+0x315d/0x3780 mm/vmscan.c:6081
 kswapd_shrink_node mm/vmscan.c:6941 [inline]
 balance_pgdat mm/vmscan.c:7124 [inline]
 kswapd+0x147c/0x2800 mm/vmscan.c:7389
 kthread+0x70e/0x8a0 kernel/kthread.c:463
 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

The root cause is deadlock among four locks as below:

kswapd
- fs_reclaim --- Lock A
 - shrink_one
  - evict
   - f2fs_evict_inode
    - sb_start_intwrite --- Lock B

- iput
 - evict
  - f2fs_evict_inode
   - sb_start_intwrite --- Lock B
   - f2fs_truncate
    - f2fs_truncate_blocks
     - f2fs_do_truncate_blocks
      - f2fs_lock_op --- Lock C

ioctl
- f2fs_ioc_commit_atomic_write
 - f2fs_lock_op --- Lock C
  - __f2fs_commit_atomic_write
   - __replace_atomic_write_block
    - f2fs_get_dnode_of_data
     - __get_node_folio
      - f2fs_check_nid_range
       - f2fs_handle_error
        - f2fs_record_errors
         - f2fs_down_write --- Lock D

open
- do_open
 - do_truncate
  - security_inode_need_killpriv
   - f2fs_getxattr
    - lookup_all_xattrs
     - f2fs_handle_error
      - f2fs_record_errors
       - f2fs_down_write --- Lock D
        - f2fs_commit_super
         - read_mapping_folio
          - filemap_alloc_folio_noprof
           - prepare_alloc_pages
            - fs_reclaim_acquire --- Lock A

In order to avoid such deadlock, we need to avoid grabbing sb_lock in
f2fs_handle_error(), so, let's use asynchronous method instead:
- remove f2fs_handle_error() implementation
- rename f2fs_handle_error_async() to f2fs_handle_error()
- spread f2fs_handle_error()

Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Cc: stable@kernel.org
Reported-by: syzbot+14b90e1156b9f6fc1266@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68eae49b.050a0220.ac43.0001.GAE@google.com
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Closes: https://lore.kernel.org/lkml/CANypQFa-Gy9sD-N35o3PC+FystOWkNuN8pv6S75HLT0ga-Tzgw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: fix to avoid updating compression context during writeback
Chao Yu [Wed, 22 Oct 2025 03:06:36 +0000 (11:06 +0800)] 
f2fs: fix to avoid updating compression context during writeback

commit 10b591e7fb7cdc8c1e53e9c000dc0ef7069aaa76 upstream.

Bai, Shuangpeng <sjb7183@psu.edu> reported a bug as below:

Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 11441 Comm: syz.0.46 Not tainted 6.17.0 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:f2fs_all_cluster_page_ready+0x106/0x550 fs/f2fs/compress.c:857
Call Trace:
 <TASK>
 f2fs_write_cache_pages fs/f2fs/data.c:3078 [inline]
 __f2fs_write_data_pages fs/f2fs/data.c:3290 [inline]
 f2fs_write_data_pages+0x1c19/0x3600 fs/f2fs/data.c:3317
 do_writepages+0x38e/0x640 mm/page-writeback.c:2634
 filemap_fdatawrite_wbc mm/filemap.c:386 [inline]
 __filemap_fdatawrite_range mm/filemap.c:419 [inline]
 file_write_and_wait_range+0x2ba/0x3e0 mm/filemap.c:794
 f2fs_do_sync_file+0x6e6/0x1b00 fs/f2fs/file.c:294
 generic_write_sync include/linux/fs.h:3043 [inline]
 f2fs_file_write_iter+0x76e/0x2700 fs/f2fs/file.c:5259
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write+0x7e9/0xe00 fs/read_write.c:686
 ksys_write+0x19d/0x2d0 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xf7/0x470 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The bug was triggered w/ below race condition:

fsync setattr ioctl
- f2fs_do_sync_file
 - file_write_and_wait_range
  - f2fs_write_cache_pages
  : inode is non-compressed
  : cc.cluster_size =
    F2FS_I(inode)->i_cluster_size = 0
   - tag_pages_for_writeback
- f2fs_setattr
 - truncate_setsize
 - f2fs_truncate
- f2fs_fileattr_set
 - f2fs_setflags_common
  - set_compress_context
  : F2FS_I(inode)->i_cluster_size = 4
  : set_inode_flag(inode, FI_COMPRESSED_FILE)
   - f2fs_compressed_file
   : return true
   - f2fs_all_cluster_page_ready
   : "pgidx % cc->cluster_size" trigger dividing 0 issue

Let's change as below to fix this issue:
- introduce a new atomic type variable .writeback in structure f2fs_inode_info
to track the number of threads which calling f2fs_write_cache_pages().
- use .i_sem lock to protect .writeback update.
- check .writeback before update compression context in f2fs_setflags_common()
to avoid race w/ ->writepages.

Fixes: 4c8ff7095bef ("f2fs: support data compression")
Cc: stable@kernel.org
Reported-by: Bai, Shuangpeng <sjb7183@psu.edu>
Tested-by: Bai, Shuangpeng <sjb7183@psu.edu>
Closes: https://lore.kernel.org/lkml/44D8F7B3-68AD-425F-9915-65D27591F93F@psu.edu
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agof2fs: ensure node page reads complete before f2fs_put_super() finishes
Jan Prusakowski [Mon, 6 Oct 2025 08:46:15 +0000 (10:46 +0200)] 
f2fs: ensure node page reads complete before f2fs_put_super() finishes

commit 297baa4aa263ff8f5b3d246ee16a660d76aa82c4 upstream.

Xfstests generic/335, generic/336 sometimes crash with the following message:

F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1
------------[ cut here ]------------
kernel BUG at fs/f2fs/super.c:1939!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G        W           6.17.0-rc5-xfstests-g9dd1835ecda5 #1 PREEMPT(none)
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_put_super+0x3b3/0x3c0
Call Trace:
 <TASK>
 generic_shutdown_super+0x7e/0x190
 kill_block_super+0x1a/0x40
 kill_f2fs_super+0x9d/0x190
 deactivate_locked_super+0x30/0xb0
 cleanup_mnt+0xba/0x150
 task_work_run+0x5c/0xa0
 exit_to_user_mode_loop+0xb7/0xc0
 do_syscall_64+0x1ae/0x1c0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
---[ end trace 0000000000000000 ]---

It appears that sometimes it is possible that f2fs_put_super() is called before
all node page reads are completed.
Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem.

Cc: stable@kernel.org
Fixes: 20872584b8c0b ("f2fs: fix to drop all dirty meta/node pages during umount()")
Signed-off-by: Jan Prusakowski <jprusakowski@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoscsi: ufs: core: Add ufshcd_update_evt_hist() for UFS suspend error
Seunghwan Baek [Wed, 10 Dec 2025 06:38:54 +0000 (15:38 +0900)] 
scsi: ufs: core: Add ufshcd_update_evt_hist() for UFS suspend error

commit c9f36f04a8a2725172cdf2b5e32363e4addcb14c upstream.

If UFS resume fails, the event history is updated in ufshcd_resume(), but
there is no code anywhere to record UFS suspend. Therefore, add code to
record UFS suspend error event history.

Fixes: dd11376b9f1b ("scsi: ufs: Split the drivers/scsi/ufs directory")
Cc: stable@vger.kernel.org
Signed-off-by: Seunghwan Baek <sh8267.baek@samsung.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/20251210063854.1483899-2-sh8267.baek@samsung.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoscsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflow
Chandrakanth Patil [Thu, 11 Dec 2025 00:29:29 +0000 (05:59 +0530)] 
scsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflow

commit d373163194982f43b92c552c138c29d9f0b79553 upstream.

The driver was not reading the MAX_REQ_PER_REPLY_QUEUE_LIMIT IOCFacts
flag, so the reply-queue-full handling was never enabled, even on
firmware that supports it. Reading this flag enables the feature and
prevents reply queue overflow.

Fixes: f08b24d82749 ("scsi: mpi3mr: Avoid reply queue full condition")
Cc: stable@vger.kernel.org
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Link: https://patch.msgid.link/20251211002929.22071-1-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoscsi: target: Reset t_task_cdb pointer in error case
Andrey Vatoropin [Tue, 18 Nov 2025 08:42:31 +0000 (08:42 +0000)] 
scsi: target: Reset t_task_cdb pointer in error case

commit 5053eab38a4c4543522d0c320c639c56a8b59908 upstream.

If allocation of cmd->t_task_cdb fails, it remains NULL but is later
dereferenced in the 'err' path.

In case of error, reset NULL t_task_cdb value to point at the default
fixed-size buffer.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 9e95fb805dc0 ("scsi: target: Fix NULL pointer dereference")
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Link: https://patch.msgid.link/20251118084014.324940-1-a.vatoropin@crpt.ru
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoNFSD: use correct reservation type in nfsd4_scsi_fence_client
Dai Ngo [Wed, 5 Nov 2025 20:45:54 +0000 (12:45 -0800)] 
NFSD: use correct reservation type in nfsd4_scsi_fence_client

commit 6f52063db9aabdaabea929b1e998af98c2e8d917 upstream.

The reservation type argument for the pr_preempt call should match the
one used in nfsd4_block_get_device_info_scsi.

Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Cc: stable@vger.kernel.org
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 weeks agoscsi: aic94xx: fix use-after-free in device removal path
Junrui Luo [Tue, 28 Oct 2025 16:29:04 +0000 (00:29 +0800)] 
scsi: aic94xx: fix use-after-free in device removal path

commit f6ab594672d4cba08540919a4e6be2e202b60007 upstream.

The asd_pci_remove() function fails to synchronize with pending tasklets
before freeing the asd_ha structure, leading to a potential
use-after-free vulnerability.

When a device removal is triggered (via hot-unplug or module unload),
race condition can occur.

The fix adds tasklet_kill() before freeing the asd_ha structure,
ensuring all scheduled tasklets complete before cleanup proceeds.

Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Reported-by: Junrui Luo <moonafterrain@outlook.com>
Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://patch.msgid.link/ME2PR01MB3156AB7DCACA206C845FC7E8AFFDA@ME2PR01MB3156.ausprd01.prod.outlook.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>