git.ipfire.org Git - thirdparty/kernel/linux.git/log

thermal: testing: zone: Flush work items during cleanup

To prevent freed module code from being executed during the thermal
testing module unload, make it add a dedicated workqueue for thermal
testing work items and flush it in thermal_testing_exit().

Fixes: f6a034f2df42 ("thermal: Introduce a debugfs-based testing facility")
Link: https://sashiko.dev/#/patchset/20260605185212.2491144-1-sam.moelius%40trailofbits.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1959388.tdWV9SEqCh@rafael.j.wysocki
[ rjw: Make variable d_command static ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

dt-bindings: thermal: amlogic: Correct 'reg' in the example

The example DTS is tested in a wrapped node with address/size-cells=1,
thus reg had incorrect entry leading to dt_binding_check fails:

thermal/amlogic,thermal.example.dtb: temperature-sensor@20000 (amlogic,t7-thermal): reg: [[0, 131072], [0, 80]] is too long

Fixes: b1c8ccdbd4e9 ("dt-bindings: thermal: amlogic: Add support for T7")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260622100231.438435-4-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

dt-bindings: thermal: amlogic: Fix missing header in the example

Usage of defines from headers requires including relevant header,
otherwise dt_binding_check fails:

  Lexical error: Documentation/devicetree/bindings/thermal/amlogic,thermal.example.dts:59.27-34 Unexpected 'GIC_SPI'
  Lexical error: Documentation/devicetree/bindings/thermal/amlogic,thermal.example.dts:59.38-57 Unexpected 'IRQ_TYPE_LEVEL_HIGH'
  Lexical error: Documentation/devicetree/bindings/thermal/amlogic,thermal.example.dts:60.37-45 Unexpected 'CLKID_TS'

Fixes: b1c8ccdbd4e9 ("dt-bindings: thermal: amlogic: Add support for T7")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260622100231.438435-3-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

of: Fix RST inline emphasis warnings in of_map_id() kernel-doc

The @filter_np parameter descriptions in of_map_id() and of_map_msi_id()
contained the text '*filter_np' in prose. Docutils interprets a leading
'*' as the start of RST emphasis (italic), but finds no closing '*',
triggering:

  Documentation/devicetree/kernel-api:11: ./drivers/of/base.c:2134:
  WARNING: Inline emphasis start-string without end-string. [docutils]

  Documentation/devicetree/kernel-api:11: ./drivers/of/base.c:2260:
  WARNING: Inline emphasis start-string without end-string. [docutils]

Fix by wrapping '*filter_np' in double backticks (*filter_np) to
render it as an RST inline code literal, which is also the correct
kernel-doc convention for pointer expressions.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606130111.ldC96rqf-lkp@intel.com/
Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
Link: https://patch.msgid.link/20260619-iommu_map_kdoc_fix-v1-1-9573e1cf30b3@oss.qualcomm.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

of: property: Fix of_fwnode_get_reference_args() with negative index

fwnode_property_get_reference_args() should return -ENOENT when an out
of bound index is passed. An issue arised with the OF backend because
the OF API use signed indexes while the fwnode API use unsigned ones.
When an index value greater the INT_MAX was passed to the OF backend
it got casted to a negative value and it returned -EINVAL instead of
-ENOENT. This patch add a check to of_fwnode_get_reference_args() to
catch negative index before they are passed to the OF API and return
-ENOENT right away.

This issue appeared when the following pattern was used in the LED
subsystem:

index = fwnode_property_match_string(fwnode, "led-names", name)
led_node = fwnode_find_reference(fwnode, "leds", index);

Unlike the same pattern with the OF API, this pattern implicitly cast
the signed return value of fwnode_property_match_string() to an
unsigned index leading to the above issue with the OF backend. It can
be argued that the return value of fwnode_property_match_string()
should be checked separately, but I think there is value in supporting
such simple and straight to the point patterns.

Link: https://lore.kernel.org/linux-leds/aimVRwJPhlGxsIUj@tom-desktop/T/#mc43cbf7e0599991b56dd0d9680714d28d145fbc8
Cc: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Alban Bedel <alban.bedel@lht.dlh.de>
Link: https://patch.msgid.link/20260618152035.1600436-1-alban.bedel@lht.dlh.de
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

block: handle REQ_OP_ZONE_APPEND in __bio_integrity_action

Otherwise zone append commands will miss their integrity data. While
this works "fine" for auto-PI, it break file system PI and non-PI
metadata.

With this XFS on ZNS namespace with non-PI metadata and 512 byte sectors
with PI work, while PI 4k sector formats with PI work only when Caleb's
"block: fix integrity offset/length conversions" is applied as well.

Note that unlike regular writes, zone append does need remapping as
partitions are not supported on zoned block devices.

Fixes: df3c485e0e60 ("block: switch on bio operation in bio_integrity_prep")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://patch.msgid.link/20260624080014.1998650-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: fix GFP_ flags confusion in bio_integrity_alloc_buf

bio_integrity_alloc_buf usage of GFP_ flags is messed up.  For one it
mixes GFP_NOFS and GFP_NOIO for neighbouring allocations, but it also
makes the allocations fail more often than needed.  That code was copied
from bio_alloc_bioset which needs to do that so that it can punt to the
rescuer workqueue, but none of that is needed for the integrity
allocations that either sits in the file system or at the very bottom
of the I/O stack.  Failing early means we'll do a fully waiting
allocation from the mempool ->alloc callback which is usually much
larger than required.

Fix this by passing a gfp_t so that the file system path can pass
GFP_NOFS and the auto-integrity code can pass GFP_NOIO, and don't
modify the allocation type except for disabling warnings.

Fixes: ec7f31b2a2d3 ("block: make bio auto-integrity deadlock safe")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://patch.msgid.link/20260624080014.1998650-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block, bfq: don't grab queue_lock to initialize bfq

The request_queue is frozen and quiesced while the elevator init_sched()
method runs, so queue_lock is not needed for BFQ cgroup initialization.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/1965073ea20f33114a8d903816b986e483b9bb34.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mm/page_io: don't nest queue_lock under rcu in bio_associate_blkg_from_page()

Take a css reference under RCU, drop RCU, and then associate the bio with
the blkg. This avoids nesting queue_lock under RCU and prepares to protect
blkcg with blkcg_mutex instead of queue_lock.

Use css_tryget() instead of css_tryget_online() so swap writeback for
pages charged to a dying memcg still passes the dying css to
bio_associate_blkg_from_css(). That preserves the existing closest-live
ancestor fallback instead of charging those bios to the root blkg.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/c910d2c39d3ec97f67de68af636a52394342d55f.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: don't nest queue_lock under blkcg->lock in blkcg_destroy_blkgs()

The correct lock order is q->queue_lock before blkcg->lock, and in order
to prevent deadlock from blkcg_destroy_blkgs(), trylock is used for
q->queue_lock while blkcg->lock is already held, this is hacky.

Refactor blkcg_destroy_blkgs() to hold blkcg->lock only long enough to
get the first blkg and then release it. Then take q->queue_lock and
blkcg->lock in the correct order to destroy the blkg. This is a very cold
path, so the extra lock/unlock cycles are acceptable.

Also prepare to convert protecting blkcg with blkcg_mutex instead of
queue_lock.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/00b03cf74a9937cb4d6dd67a189ddc00a3de0451.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: don't nest queue_lock under rcu in bio_associate_blkg()

If a bio is already associated with a blkg, the blkcg is already pinned
until the bio is done, so there is no need for RCU protection. Otherwise,
protect blkcg_css() with RCU independently. Prepare to protect blkcg with
blkcg_mutex instead of queue_lock.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/8496fa234b21d4b31b7f068766906d0bffcac8e6.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: don't nest queue_lock under rcu in blkg_lookup_create()

Change this in two steps:

1) hold rcu lock and do blkg_lookup() from fast path;
2) hold queue_lock directly from slow path, and don't nest it under rcu
lock;

Prepare to convert protecting blkcg with blkcg_mutex instead of
queue_lock.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/93f33cc9e5a39dddb78dcd934d0c1d04b564fb00.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: don't nest queue_lock under rcu in blkcg_print_blkgs()

With previous modification to delay freeing policy data after an RCU grace
period, prfill() can run under RCU instead of taking queue_lock. However,
policy teardown can still clear blkg->pd[plid] after blkcg_print_blkgs()
observes the policy enabled bit.

Load policy data once with READ_ONCE() and skip the blkg if teardown
already cleared it. Do the same in recursive stat walks for descendant
blkgs. Remove the stale BFQ debug queue_lock assertion because
blkcg_print_blkgs() no longer calls prfill() with queue_lock held. This
also lets ioc_qos_prfill() and ioc_cost_model_prfill() use IRQ-safe
ioc->lock locking without re-enabling IRQs while queue_lock is still held.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/db7633d5e263dd1c2bf9b901762545a84b7d714e.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: delay freeing policy data after rcu grace period

Currently blkcg_print_blkgs() must hold RCU to iterate blkgs from a
blkcg, and prfill() must hold queue_lock to prevent policy data from
being freed by policy deactivation. As a consequence, queue_lock has to
be nested under RCU from blkcg_print_blkgs().

Delay freeing policy data until after an RCU grace period so prfill() can
be protected by RCU alone.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/e20e5d984b41a026d61851966bed35eb094c4bff.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: protect iterating blkgs with blkcg->lock in blkcg_print_stat()

blkcg_print_one_stat() will be called for each blkg:
- access blkg->iostat, which is freed from rcu callback
  blkg_free_workfn();
- access policy data from pd_stat_fn(), which is freed from
  pd_free_fn(), while pd_free_fn() can be called by removing blkcg or
  deactivating policy;

Take blkcg->lock while iterating so the blkgs stay online and both
blkg->iostat and policy data for activated policies stay valid.  Use
irq-safe locking because blkcg->lock can be nested under q->queue_lock,
which is used from IRQ completion paths.

Prepare to convert protecting blkgs from request_queue with mutex.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Link: https://patch.msgid.link/05799877e720dcd300e2ddd4625e8e162959d7cc.1780621988.git.yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'md-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-7.2

Pull MD fixes from Yu Kuai:

"Bug Fixes:
- Fix raid1 writes_pending and barrier reference leaks on write
   failures. (Abd-Alrhman Masalkhi)
- Fix raid10 writes_pending leak on write request failures.
   (Abd-Alrhman Masalkhi)
- Fix raid10 writes_pending and barrier reference leaks on discard
   failures. (Abd-Alrhman Masalkhi)
- Fix raid1 REQ_NOWAIT handling while waiting for behind writes.
   (Abd-Alrhman Masalkhi)
- Fix raid1 r1_bio leak when a REQ_NOWAIT retry would block.
   (Abd-Alrhman Masalkhi)
- Fix raid1 read-balance head_position data race. (Chen Cheng)
- Fix raid5 stripe batch bm_seq wraparound comparison. (Chen Cheng)
- Fix raid5 stripe batch state snapshot KCSAN noise. (Chen Cheng)
- Fix raid5 R5_Overlap races while breaking stripe batches.
   (Chen Cheng)

Improvements:
- Add raid5 discard IO accounting. (Yu Kuai)
- Always convert raid5 llbitmap bits for discard. (Yu Kuai)

Cleanups:
- Simplify raid1_write_request() error handling.
   (Abd-Alrhman Masalkhi)"

* 'md-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
  md/raid5: avoid R5_Overlap races while breaking stripe batches
  md/raid5: use stripe state snapshot in break_stripe_batch_list()
  md/raid5: let stripe batch bm_seq comparison wrap-safe
  md/raid1: protect head_position for read balance
  md/raid1: free r1_bio when REQ_NOWAIT is set and read would block on retry
  md/raid1: honor REQ_NOWAIT when waiting for behind writes
  md/raid5: always convert llbitmap bits for discard
  md/raid5: validate discard support at request time
  md/raid5: account discard IO
  md/raid1: simplify raid1_write_request() error handling
  md/raid10: fix writes_pending and barrier reference leaks on discard failures
  md/raid10: fix writes_pending leak on write request failures
  md/raid1: fix writes_pending and barrier reference leaks on write failures

ASoC: soc-core: Don't fail if device_link could not be created

If device_link_add() fails in snd_soc_bind_card() just skip that
driver pair and carry on.

This means that ASoC must now keep track of which components it was
able to device_link to the card->dev. The new card_device_link member
of struct snd_soc_component is non-NULL if a device_link exists.

The intent of the device link is to ensure that the machine driver
system-suspends before the component drivers, to prevent ASoC
suspend attempting to reconfigure a driver that has already suspended.

It isn't possible to create this device link if the machine driver is
a parent of the component driver or already has a device_link in the
opposite direction. In this case skip the link. A warn is placed in
kernel log since this might indicate a genuine design problem with
those two drivers (this can be downgraded to dbg in future when
people are happy that all these special drivers correctly handle their
reversed shutdown order).

Fixes: 0f54ce994b23 ("ASoC: soc-core: Create device_link to ensure correct suspend order")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/linux-sound/61bd38e7-5eb9-4448-a93f-afa2ccbd1c9d@opensource.cirrus.com/T/#m496fe5a11b0a3649afd2e85da5e1cea82bb16d8a
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260623135821.4125543-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: rockchip: rockchip_sai: #include <linux/platform_device.h> explicitly

Currently that header is only included via:

<sound/dmaengine_pcm.h> ->
<sound/soc.h> ->
<linux/platform_device.h>

which doesn't look reliable, still more in the presence of the comment:

/* For the current users of sound/soc.h to avoid build issues */

in <sound/soc.h>.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Link: https://patch.msgid.link/20260624083708.254517-2-u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: da9121: Use subvariant ids in the I2C table

da9121_i2c_probe() stores i2c_get_match_data() in chip->subvariant_id
and da9121_assign_chip_model() switches on DA9121_SUBTYPE_* values. The
OF table provides those subvariant values, but the I2C id table
currently provides DA9121_TYPE_* values.

Make the I2C id table use the same subvariant namespace as the OF table
so non-DT I2C matches feed the expected data type into the model
assignment code.

Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Link: https://patch.msgid.link/20260624060024.61300-1-pengpeng@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

x86/apic: KVM: Use cpu_physical_id() to get APIC ID of running vCPU for AVIC

Use cpu_physical_id() instead of default_cpu_present_to_apicid() when
getting the APIC ID of the pCPU on which a vCPU is running/loaded, as the
kernel has gone way off the rails if a vCPU is loaded on a pCPU that has
been physically removed from the system. Even if the impossible were to
happen, the absolutely worst case scenario is that hardware will ring the
AIVC doorbell on the wrong pCPU, i.e. a severely broken system will
experience mild performance issues.

Kill off KVM's superfluous kvm_cpu_get_apicid() wrapper along with the
for-KVM export of default_cpu_present_to_apicid(), as they existed purely
for the wonky AVIC usage.

Cc: Kai Huang <kai.huang@intel.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Message-ID: <20260612185459.591892-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Expose number of shadow MMU shadow pages as a stat

Turn arch.n_used_mmu_pages into a stat, mmu_shadow_pages, as the number of
live shadow pages is arguably _the_ most critical datapoint when it comes
to analyzing the shadow MMU. Before the TDP MMU came along, i.e. when the
shadow MMU was the only MMU, explicitly tracking the number of shadow pages
wasn't as interesting, because the same information could more or less be
gleaned from the pages_{1g,2m,4k} stats. But with the TDP MMU, where the
shadow MMU is only used for nested TDP, it becomes extremely difficult, if
not impossible, to determine which SPTEs are coming from the TDP MMU, and
which are coming from the shadow MMU.

E.g. when triaging/debugging shadow MMU performance issues due to "too many
shadow pages", being able to observe that 99%+ of all shadow pages are
unsync is critical to being able to deduce that KVM is effectively leaking
shadow pages.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260612133727.411902-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-s390-next-7.2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

* Fix S390_USER_OPEREXEC so it can now be enabled regardless of other
unrelated capabilities

* Fix handling of the _PAGE_UNUSED pte bit that could lead to guest
memory corruption in some scenarios

* A bunch of misc gmap fixes (locking, behaviour under memory pressure)

* Fix CMMA dirty tracking

drm/i915/cdclk: Fix up CDCLK_FREQ_DECIMAL without a full PLL re-enable

The GOP (and even Bspec on some platforms) is a bit inconsistent
on what the CDCLK_FREQ_DECIMAL divider should be. Currently any
mismatch there causes a full CDCLK PLL disable+re-enable, which
we really don't want to do if any displays are currently active.
Let's instead just reprogram CDCLK_FREQ_DECIMAL when that is the
only thing amiss. For any other (more serious) mismatch we still
punt to the full PLL reprogramming.

We also need to tweak the bxt_cdclk_cd2x_pipe() stuff a bit to
consistently select pipe==NONE since we have no idea which pipes
are enabled at this point. Since we're not actually changing the
CDCLK frequency here we don't need to sync the update to any
pipe.

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/16209
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260612173653.7830-2-ville.syrjala@linux.intel.com
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
(cherry picked from commit 3f9de66f8acbf8ff45a91b4920605ed10c6b7c06)
Fixes: ba91b9eecb47 ("drm/i915/cdclk: Decouple cdclk from state->modeset")
Fixes: d66a21947e21 ("drm/i915/bxt: Sanitize CDCLK to fix breakage during S4 resume")
Fixes: c73666f394fc ("drm/i915/skl: If needed sanitize bios programmed cdclk")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

KVM: x86: Unconditionally recompute CR8 intercept on PPR update

The TPR_THRESHOLD field in the VMCS is used by VMX to induce VM exits
when the guest's virtual TPR falls under the specified threshold,
allowing KVM to inject previously masked interrupts.

KVM handles these VM exits in handle_tpr_below_threshold().
Commit eb90f3417a0c ("KVM: vmx: speed up TPR below threshold vmexits")
optimized this function by calling apic_update_ppr() instead of raising
KVM_REQ_EVENT. apic_update_ppr() then raises KVM_REQ_EVENT if there is
a pending, deliverable interrupt.

However, if there are no new interrupts pending, apic_update_ppr() does
not issue the request. Thus, kvm_lapic_update_cr8_intercept() and
vmx_update_cr8_intercept() are not called before VM entry, which results
in a high, stale TPR_THRESHOLD. This is problematic due to the following
sentence in 28.2.1.1 "VM-Execution Control Fields" in the SDM:

  The following check is performed if the “use TPR shadow” VM-execution
  control is 1 and the “virtualize APIC accesses” and “virtual-interrupt
  delivery” VM-execution controls are both 0: the value of bits 3:0 of
  the TPR threshold VM-execution control field should not be greater
  than the value of bits 7:4 of VTPR.

This error condition is typically not observed when KVM runs on a bare
metal system because modern processors support APICv, which enables
virtual-interrupt delivery, and which KVM uses when possible. This
causes the processor to no longer generate TPR-below-threshold exits
and to no longer check TPR_THRESHOLD on entry. However, when running
on older platforms, or under nested virtualization on a hypervisor that
does not support virtual-interrupt delivery and enforces this check
(like Hyper-V) this can cause a VM entry failure with hardware error
0x7, as seen in [1].

Call kvm_lapic_update_cr8_intercept() if apic_update_ppr() does not
find a deliverable interrupt (and thus does not raise KVM_REQ_EVENT).
Remove calls to kvm_lapic_update_cr8_intercept() on paths that end up in
apic_update_ppr(), as they now become redundant. This ensures that any
path that updates the guest's PPR also figures out if KVM needs to wait
for a TPR change (using TPR_THRESHOLD on VMX or CR8 intercepts on SVM).

Link: https://github.com/coconut-svsm/svsm/issues/1081
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Cc: stable@vger.kernel.org
Fixes: eb90f3417a0c ("KVM: vmx: speed up TPR below threshold vmexits")
Signed-off-by: Carlos López <clopez@suse.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260618174347.1981064-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: VMX: Grab vmcs12 on CR8 interception update iff vCPU is in guest mode

When updating CR8 intercepts, get vmcs12 if and only if the vCPU is in
guest mode so that a future change can have update CR8 intercepts during
vCPU creation, without running afoul of get_vmcs12()'s lockdep assertion.

  ------------[ cut here ]------------
  debug_locks && !(lock_is_held(&(&vcpu->mutex)->dep_map) || !refcount_read(&vcpu->kvm->users_count))
  WARNING: arch/x86/kvm/vmx/nested.h:61 at get_vmcs12 arch/x86/kvm/vmx/nested.h:60 [inline], CPU#0: syz.2.19/5879
  WARNING: arch/x86/kvm/vmx/nested.h:61 at vmx_update_cr8_intercept+0x3de/0x4e0 arch/x86/kvm/vmx/vmx.c:6879, CPU#0: syz.2.19/5879
  Modules linked in:
  CPU: 0 UID: 0 PID: 5879 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
  RIP: 0010:get_vmcs12 arch/x86/kvm/vmx/nested.h:60 [inline]
  RIP: 0010:vmx_update_cr8_intercept+0x3de/0x4e0 arch/x86/kvm/vmx/vmx.c:6879
  Call Trace:
   <TASK>
   apic_update_ppr arch/x86/kvm/lapic.c:984 [inline]
   kvm_lapic_reset+0x1c24/0x2980 arch/x86/kvm/lapic.c:3023
   kvm_vcpu_reset+0x44c/0x1bf0 arch/x86/kvm/x86.c:12986
   kvm_arch_vcpu_create+0x746/0x8b0 arch/x86/kvm/x86.c:12847
   kvm_vm_ioctl_create_vcpu+0x428/0x930 virt/kvm/kvm_main.c:4201
   kvm_vm_ioctl+0x893/0xd50 virt/kvm/kvm_main.c:5159
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:597 [inline]
   __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   </TASK>

No functional change intended.

Reported-by: syzbot ci <syzbot+ci493c6d734b63e050@syzkaller.appspotmail.com>
Closes: https://lore.kernel.org/all/6a2adf3b.3b0a2d4e.8c8d1.0012.GAE@google.com
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260618174347.1981064-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: WARN (once) if RTC pending EOI tracking goes off the rails

WARN once if KVM's tracking for pending EOIs for Real-Time Clock IRQs goes
off the rails, as there's no reason to bug the host or risk a DoS due to
spamming dmesg with endless WARNs. Absolute worst case scenario, guest
time will go awry.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618174527.1982333-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: WARN and fail kvm_set_irq() if a PIC or I/O APIC vector is invalid

WARN and return an error up the stack if the PIC or I/O APIC encounters an
invalid vector when injecting an IRQ, as there is no danger to the host and
thus no justification for potentially panicking the kernel. Don't bug the
VM either, as the risk of corrupting the guest is minuscule, and the guest
might even be completely tolerant of a lost interrupt.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185213.2019937-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Bug the VM, not the kernel, if the ISR count {under,over}flows

Bug the VM, not the host kernel, if KVM's ISR count {under,over}flows when
tracking in-flight ISRs. There is zero danger to the host if KVM messes up
its IRQ tracking.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185350.2020845-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Bug the VM, not the host kernel, if KVM write-protects upper SPTEs

Instead of bugging the host kernel, WARN and terminate the VM if KVM
attempts to write-protect at a level that cannot use leaf SPTEs.
There is no reason to bring down the entire host; even termininating
the VM is likely overkill, but in theory a missed write could corrupt
guest memory, so play it safe.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185641.2022368-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Replace BUG_ON() with WARN_ON_ONCE() on "bad" nested GPA translation

If KVM attempts to translate what it thinks is an L2 GPA with a non-nested
MMU, simply WARN and return the GPA, i.e. trust the MMU more than the
caller, as there is zero reason to potentially panic the host kernel just
because KVM misused an API.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185746.2023283-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Replace guest-triggerable BUG_ON() in ioeventfd datamatch with get_unaligned()

Drop a BUG_ON() that has been reachable since it was first added, way back
in 2009, and instead use get_unaligned() to perform potentially-unaligned
accesses.

For a given store, KVM x86's emulator tracks the entire value in the
destination operand, x86_emulate_ctxt.dst.  If the destination is memory,
and the target splits multiple pages and/or is emulated MMIO, then KVM
handles each fragment independently.  E.g. on a page split starting at page
offset 0xffc, KVM writes 4 bytes to the first page, then the remaining
bytes to the second page, using ctxt->dst as the source for both (with
appropriate offsets).

If the destination splits a page *and* hits emulated MMIO on the second
page, then KVM will complete the write to the first page, then emulate the
MMIO access to the second page.  If there is a datamatch-enabled ioeventfd
at offset 0 of the second page, then KVM will process the remainder of the
store as a potential ioeventfd signal.

Putting it all together, if the guest emits a store that splits a page
starting at page offset N, and the second page has a datamatch-enabled
ioeventfd at offset 0, then KVM will check for datamatch using
&dst.valptr[N] as the source.  Due to dst (and thus dst.valptr) being
32-byte aligned, if N is not aligned to @len, the BUG_ON() fires.

E.g. with a 16-byte store at page offset 0xffc, to an ioeventfd of len 8,
all initial checks in ioeventfd_in_range() will succeed, and the BUG_ON()
fires due to @val being 4-byte aligned, but not 8-byte aligned.

  ------------[ cut here ]------------
  kernel BUG at arch/x86/kvm/../../../virt/kvm/eventfd.c:783!
  Oops: invalid opcode: 0000 [#1] SMP
  CPU: 0 UID: 1000 PID: 615 Comm: repro Not tainted 7.1.0-rc2-ff238429d1ea #365 PREEMPT
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:ioeventfd_write+0x6c/0x70 [kvm]
  Call Trace:
   <TASK>
   __kvm_io_bus_write+0x85/0xb0 [kvm]
   kvm_io_bus_write+0x53/0x80 [kvm]
   vcpu_mmio_write+0x66/0xf0 [kvm]
   emulator_read_write_onepage+0x12a/0x540 [kvm]
   emulator_read_write+0x109/0x2b0 [kvm]
   x86_emulate_insn+0x4f8/0xfb0 [kvm]
   x86_emulate_instruction+0x181/0x790 [kvm]
   kvm_mmu_page_fault+0x313/0x630 [kvm]
   vmx_handle_exit+0x18a/0x590 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0xc81/0x1c90 [kvm]
   kvm_vcpu_ioctl+0x2d5/0x970 [kvm]
   __x64_sys_ioctl+0x8a/0xd0
   do_syscall_64+0xb7/0x890
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f19c931a9bf
   </TASK>
  Modules linked in: kvm_intel kvm irqbypass
  ---[ end trace 0000000000000000 ]---

In a perfect world, the fix would be to simply delete the BUG_ON(), as KVM
x86 doesn't perform alignment checks on "normal" memory accesses at CPL0.
Sadly, C99 ruins all the fun; while the x86 architecture plays nice,
dereferencing an unaligned pointer directly is undefined behavior in C,
e.g. triggers splats when running with CONFIG_UBSAN_ALIGNMENT=y.

Fixes: d34e6b175e61 ("KVM: add ioeventfd support")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260612225241.678509-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ALSA: seq: Fix uninitialised heap leak in snd_seq_event_dup()

snd_seq_event_dup() copies an incoming event into a pool cell and, in
the UMP-enabled build, clears the trailing cell->ump.raw.extra word that
the memcpy() did not cover. The guard deciding whether to clear it
compares the copied size against sizeof(cell->event):

memcpy(&cell->ump, event, size);
if (size < sizeof(cell->event))
cell->ump.raw.extra = 0;

For a legacy (non-UMP) event, size == sizeof(struct snd_seq_event) ==
sizeof(cell->event), so the condition is false and the extra word keeps
stale data. The cell pool is allocated with kvmalloc() (not zeroed) and
cells are reused via a free list, so that word holds uninitialised heap
or leftover event data.

When such a cell is delivered to a UMP client (client->midi_version > 0)
that set SNDRV_SEQ_FILTER_NO_CONVERT -- so the legacy event reaches it
unconverted -- snd_seq_read() reads it out as the larger struct
snd_seq_ump_event and copies the stale word to user space, a 4-byte
kernel heap infoleak to an unprivileged /dev/snd/seq client.

Compare against sizeof(cell->ump) instead, so the trailing word is zeroed
for every event shorter than the UMP cell.

Fixes: 46397622a3fa ("ALSA: seq: Add UMP support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: HyeongJun An <sammiee5311@gmail.com>
Link: https://patch.msgid.link/20260623233841.853326-1-sammiee5311@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits()

If the allocation of the bits array failed, kvm_s390_set_cmma_bits()
would return 0 instead of an error code.

Rework the function to use the __free() macros and thus simplify the
code flow; when the above mentioned allocation fails, simply return
-ENOMEM.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-10-imbrenda@linux.ibm.com>

KVM: s390: selftests: Fix cmma selftest

The existing cmma selftest depended on the host allocating page tables
for all present memslots. Since the gmap rewrite, memory that is not
accessed by the guest might not have page tables allocated yet.

This caused the test to fail due to a mismatch in the assertion.

Fix by having the guest access also the second half of the test
memslot, thus guaranteeing that its page tables are present.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-9-imbrenda@linux.ibm.com>

KVM: s390: Fix cmma dirty tracking

It is possible that some guest memory areas have not been touched yet
when starting migration mode, and thus have no ptes allocated. Only
existing and allocated ptes should count toward the total of dirty cmma
entries.

When starting migration mode, enable the migration_mode flag
immediately, so that any subsequent ESSA will trap in the host and
cause cmma_dirty_pages to be increased as needed.
Subsequently, set the cmma_d bit on all existing cmma-clean PGSTEs,
increasing cmma_dirty_pages as needed. Skipping cmma-dirty pages
prevents double counting.

Conversely, when disabling migration mode, set cmma_dirty_pages to 0
and clear the cmma_d bit in all existing PGSTEs.

The invariant is that when migration mode is off, no PGSTE has its
cmma_d bit set, and cmma_dirty_pages is 0. kvm->slots_lock protects
kvm_s390_vm_start_migration() and kvm_s390_vm_stop_migration() from
each other and from kvm_s390_get_cmma_bits().

Also fix dat_get_cmma() to properly wrap around if the first attempt
reached the end of guest memory without finding cmma-dirty pages.

[ imbrenda: Moved kvm_s390_sync_request_broadcast() before gmap_set_cmma_all_dirty() ]

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-8-imbrenda@linux.ibm.com>

KVM: s390: Fix locking in kvm_s390_set_mem_control()

Add the missing locking around dat_reset_cmma().

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-7-imbrenda@linux.ibm.com>

KVM: s390: Fix handle_{sske,pfmf} under memory pressure

Under heavy memory pressure, handle_sske() and handle_pfmf() might
cause an endless loop if the mmu cache runs empty, the atomic
allocations fail, and the top-up function also fails. While quite
unlikely, that scenario is not impossible.

Fix the issue by not ignoring the return value of
kvm_s390_mmu_cache_topup(), and appropriately returning an error code
in case of failure.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-6-imbrenda@linux.ibm.com>

KVM: s390: Fix code typo in gmap_protect_asce_top_level()

The correct length to pass to kvm_s390_get_guest_pages() is asce.tl + 1,
not asce.dt + 1. It was a typo, which, due to fortuitous circumstances,
did not cause bugs. It should nonetheless be fixed.

Fixes: e5f98a6899bd ("KVM: s390: Add some helper functions needed for vSIE")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-5-imbrenda@linux.ibm.com>

KVM: s390: Do not set special large pages dirty

Special pages / folios should not be set dirty. This also applies to
large pages.

Add a missing check in gmap_clear_young_crste() to prevent setting the
large page dirty if it is a special page.

Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-4-imbrenda@linux.ibm.com>

KVM: s390: Fix dat_peek_cmma() overflow

If userspace passes a start address that is out of bounds,
_dat_walk_gfn_range() will fail with -EFAULT, but state.end will not be
touched and will stay 0. This will cause *count to underflow and report
a very high number, and the function will end up erroneously reporting
success.

Fix by only setting *count if the end address is not smaller than the
starting address. This way invalid starting addresses will correctly
return -EFAULT and *count will correctly indicate that no values have
been returned.

Fixes: 7b368470e1a4 ("KVM: s390: KVM page table management functions: CMMA")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-3-imbrenda@linux.ibm.com>

s390/mm: Fix handling of _PAGE_UNUSED pte bit

The _PAGE_UNUSED softbit should not really be lying around. Its sole
purpose is to signal to try_to_unmap_one() and try_to_migrate_one()
that the page can be discarded instead of being moved / swapped.

KVM has no way to know why a page is being unmapped, so it sets the bit
on userspace ptes corresponding to unused guest pages every time they
get unmapped. KVM has no reasonable way to clear the bit once the page
is in use again.

While set_ptes() checks and clears the bit, other paths that set new
ptes did not. This led to used pages being thrown out as if they were
unused, causing guest corruption.

Fix the issue by clearing the _PAGE_UNUSED bit for present ptes in
set_pte(), i.e. whenever a present pte is getting set. The check in
set_ptes() is then redundant and can be removed.

Also fix gmap_helper_try_set_pte_unused() to only set the bit if the
pte is present; the _PAGE_UNUSED bit is only defined for present ptes
and thus should not be set for non-present ptes.

Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()")
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-2-imbrenda@linux.ibm.com>

KVM: s390: Fix typo in UCONTROL documentation

Small typo noticed while writing the USER_OPEREXEC selftest.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260507200836.3500368-4-farman@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: selftests: Extended user_operexec tests

There is a possibility that the user_operexec capability
only works if facility bit 74 is enabled. This is now fixed,
but add a selftest to demonstrate that.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260507200836.3500368-3-farman@linux.ibm.com>

KVM: s390: Fix S390_USER_OPEREXEC enablement without STFLE 74

The KVM_CAP_S390_USER_OPEREXEC capability allows operation exceptions
to be forwarded to userspace. But the actual enablement at the hardware
level occurs in kvm_arch_vcpu_postcreate(), and only if STFLE.74 or
user_instr0 are enabled. The latter is associated with a separate
capability (KVM_CAP_S390_USER_INSTR0), so the only way this happens
for the USER_OPEREXEC capability is if STFLE.74 is enabled. KVM
unconditionally enables this bit in kvm_arch_init_vm(), but the guest
could disable it from the CPU model and thus ignore this capability.

Add USER_OPEREXEC to the check in kvm_arch_vcpu_postcreate(), such that
either capability would enable this type of exception.

Fixes: 8e8678e740ec ("KVM: s390: Add capability that forwards operation exceptions")
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
[Fixed patch title, as recommended by frankja@linux.ibm.com]
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260507200836.3500368-2-farman@linux.ibm.com>

apparmor: advertise the tcp fast open fix is applied

The fix for tcp-fast-open ensures that the connect permission is being
mediated correctly but it didn't add an artifact to the feature set to
advertise the fix is available. Add an artifact so that the test suite
can identify if the fix has not been properly applied or a new
unexpected regression has occurred.

Fixes: 4d587cd8a7215 ("apparmor: mediate the implicit connect of TCP fast open sendmsg")
Signed-off-by: John Johansen <john.johansen@canonical.com>

eth: fbnic: fix ordering of heartbeat vs ownership

When requesting ownership of the NIC (MAC/PHY control), we set up
the heartbeat to look stale:

  /* Initialize heartbeat, set last response to 1 second in the past
   * so that we will trigger a timeout if the firmware doesn't respond
   */
  fbd->last_heartbeat_response = req_time - HZ;
  fbd->last_heartbeat_request = req_time;

The response handler then sets:

  fbd->last_heartbeat_response = jiffies;

for which we wait via:

  fbnic_fw_init_heartbeat() -> fbnic_fw_heartbeat_current()

The scheme is a bit odd, but it should work in principle.

Fix the ordering of operations. We have to set up the stale heartbeat
before we send the message. Otherwise if the response is very fast
we will override it. This triggers on QEMU if we run on the core
that handles the IRQ, and results in ndo_open failing with ETIMEDOUT.

The change in ordering doesn't impact releasing the ownership.
Both ndo_stop and heartbeat check are under rtnl_lock.

Fixes: 20d2e88cc746 ("eth: fbnic: Add initial messaging to notify FW of our presence")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260622154753.827506-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-fix-error-handling-in-disable_ipv6-sysctl'

Fernando Fernandez Mancera says:

====================
ipv6: fix error handling in disable_ipv6 sysctl

While working on a different IPv6 patch series I have spotted multiple
minor bugs around sysctl error handling and notifications. In general,
they are not serious issues.

In addition, there is one more issue in forwarding sysctl as it does not
check for CAP_NET_ADMIN for the namespace. I am keeping that patch out
of this series and I am aiming it at the net-next tree once it re-opens.

During v3, Ido's pointed out that it is unnecessary to reset the
position pointer when the return value is negative as at
new_sync_write() the ppos is only advanced when ret return value is
positive. That means we can get rid of that operation in ipv4/ipv6
sysctls. That is going to be sent to net-next too.
====================

Link: https://patch.msgid.link/20260622130857.5115-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix missing notification for ignore_routes_with_linkdown

When changing the ignore_routes_with_linkdown sysctl for a specific
interface, the RTM_NEWNETCONF netlink notification was not being emitted
to userspace. Fix this by emitting the notification when needed.

In addition, fix bogus return value for successful "all" and specific
interface write operation leading to a wrong reset of the position
pointer.

Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-7-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix state corruption during proxy_ndp sysctl restart

When handling proxy_ndp, if rtnl_net_trylock() fails, the operation is
retried but as the value was already modified by the initial
proc_dointvec() call, the restarted syscall will read the newly modified
value as the 'old' state.

Fix this by taking the RTNL lock before parsing the input value if the
operation is a write.

Fixes: c92d5491a6d9 ("netconf: add support for IPv6 proxy_ndp")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-6-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix error handling in disable_policy sysctl

When writing to the disable_policy sysctl, if proc_dointvec() fails to
parse the input, it returns a negative error code. The current
implementation is resetting the position argument even if an error
occurred during proc_dointvec() and not only during sysctl restart.

Fix this by checking the return value of proc_dointvec() and returning
early on failure.

Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-5-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix error handling in forwarding sysctl

When writing to the forwarding sysctl, if proc_dointvec() fails to parse
the input, it returns a negative error code. The current implementation
is overwriting that error for write operations.

This results in a silent failure, it returns a successful write although
the configuration was not modified at all. When modifying the "all"
variant it can also modify the configuration of existing interfaces to
the wrong value.

Fix this by checking the return value of proc_dointvec() and returning
early on failure. In addition, adjust return code of
addrconf_fixup_forwarding() for successful operation.

Fixes: b325fddb7f86 ("ipv6: Fix sysctl unregistration deadlock")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-4-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix error handling in ignore_routes_with_linkdown sysctl

When writing to the ignore_routes_with_linkdown sysctl, if
proc_dointvec() fails to parse the input, it returns a negative error
code. The current implementation is overwriting that error for write
operations.

This results in a silent failure, it returns a successful write although
the configuration was not modified at all. When modifying the "all"
variant it can also modify the configuration of existing interfaces to
the wrong value.

Fix this by checking the return value of proc_dointvec() and returning
early on failure.

Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-3-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix error handling in disable_ipv6 sysctl

When writing to the disable_ipv6 sysctl, if proc_dointvec() fails to
parse the input, it returns a negative error code. The current
implementation is overwriting that error for write operations.

This results in a silent failure, it returns a successful write although
the configuration was not modified at all. When modifying the "all"
variant it can also modify the configuration of existing interfaces to
the wrong value.

Fix this by checking the return value of proc_dointvec() and returning
early on failure.

Fixes: 56d417b12e57 ("IPv6: Add 'autoconf' and 'disable_ipv6' module parameters")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Orphan SUNPLUS ETHERNET DRIVER

I have left Sunplus and no longer have access to the relevant hardware
to test or maintain this driver. Mark the driver as orphaned.

Signed-off-by: Wells Lu <wellslutw@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260622180721.28334-1-wellslutw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: au1000: move free_irq out of the close-time spinlocked section

au1000_close() calls free_irq() while aup->lock is still held with
spin_lock_irqsave(). free_irq() can sleep because it takes the IRQ
descriptor request mutex, so it does not belong inside the close-time
spinlocked section.

This was found by our static analysis tool and then confirmed by manual
review of the in-tree au1000_close() .ndo_stop path. The reviewed path
keeps aup->lock held across the MAC reset, queue stop and
free_irq(dev->irq, dev).

A directed runtime validation kept that ndo_stop carrier and the same
free_irq(dev->irq, dev) operation under the driver lock. Lockdep reported
"BUG: sleeping function called from invalid context" and "Invalid wait
context" while free_irq() was taking desc->request_mutex, with
au1000_close() and free_irq() on the stack.

Drop aup->lock before freeing the IRQ. The protected close-time work still
stops the device and queue before IRQ teardown, but the sleepable IRQ core
path now runs outside the spinlocked section.

Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260619151816.1144289-1-runyu.xiao@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: fix err_chunk memory leaks in INIT handling

When sctp_verify_init() encounters unrecognized parameters, it allocates an
err_chunk to report them. However, this chunk is leaked in several code
paths:

1. In sctp_sf_do_5_1B_init(), if security_sctp_assoc_request() fails after
   sctp_verify_init() has populated err_chunk, the function returns
   immediately without freeing it.

2. In sctp_sf_do_unexpected_init(), the same leak occurs on the
   security_sctp_assoc_request() failure path.

3. In sctp_sf_do_unexpected_init(), on the success path after copying
   unrecognized parameters to the INIT-ACK, the function returns without
   freeing err_chunk, unlike sctp_sf_do_5_1B_init() which properly frees
   it.

Fix all three leaks by adding sctp_chunk_free(err_chunk) calls before
returning in the error paths and on the success path in
sctp_sf_do_unexpected_init().

Fixes: c081d53f97a1 ("security: pass asoc to sctp_assoc_request and sctp_sk_clone")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0656704f1b0158287c98aec09ba36c83e4a537ab.1781970534.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: cls_api: Handle TC_ACT_CONSUMED in tcf_qevent_handle

tcf_classify() can return TC_ACT_CONSUMED while the skb is held by the
defragmentation engine (e.g. act_ct on out-of-order fragments). When
that happens the skb is no longer owned by the caller and must not be
touched again.

tcf_qevent_handle() did not handle TC_ACT_CONSUMED: it fell through the
switch and returned the skb to the caller as if classification had
passed. The only qdisc that wires up qevents today is RED, via three call sites
(qe_mark on RED_PROB_MARK/HARD_MARK, qe_early_drop on congestion_drop)
red_enqueue() was continuing to operate on an skb it no longer owns  in this
case -- enqueueing it, dropping it, or updating statistics. Resulting in a UAF.

  tc qdisc add dev eth0 root handle 1: red ... qevent early_drop block 10
  tc filter add block 10 ... action ct

  (with ct defrag enabled and traffic that produces out-of-order
  fragments, e.g. a fragmented UDP stream)

Handle TC_ACT_CONSUMED in tcf_qevent_handle() the same way the ingress
and egress fast paths do: treat it as stolen and return NULL without
touching the skb. Unlike the TC_ACT_STOLEN case, the skb must not be
dropped/freed here, as it is no longer owned by us.

Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo frags")
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260620130749.226642-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'drop-skb-metadata-before-lwt-encapsulation'

Jakub Sitnicki says:

====================
Drop skb metadata before LWT encapsulation

See description for patch 1.
====================

Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-0-71d6a33ab76b@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: Add LWT encap tests for skb metadata

Test that an LWT encapsulation does not silently corrupt XDP metadata
sitting in the skb headroom. Exercise all three LWT dispatch paths:

- BPF LWT xmit prog reserves headroom on the LWT .xmit redirect,
- mpls pushes an MPLS label on the LWT .xmit redirect,
- seg6 in encap mode runs on the LWT .input redirect,
- ioam6 encap inserts an IOAM Hop-by-Hop option on LWT .output redirect.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-2-71d6a33ab76b@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lwtunnel: Drop skb metadata before LWT encapsulation

skb metadata is meant for passing information between XDP and TC. It lives
in the skb headroom, immediately before skb->data. LWT programs cannot
access the __sk_buff->data_meta pseudo-pointer to metadata.

However, LWT encapsulation prepends outer headers, moving skb->data back
over the headroom where the metadata sits. On an RX-originated (forwarded)
packet that still carries XDP metadata this goes wrong in two different
ways, depending on the encap type:

1. Non-BPF LWT encaps (mpls, seg6, ioam6 ...) call skb_push()/skb_pull()
   and silently overwrite the metadata that sits in the headroom.

2) BPF LWT xmit calls bpf_skb_change_head(), which uses skb_data_move().
   That helper expects metadata immediately before skb->data. But since
   the IP output path runs LWT xmit before neighbour output has built
   the outgoing L2 header, for forwarded packets skb->data points at the
   L3 header while skb_mac_header() still points at the old L2 header.
   skb_data_move() sees metadata ending at skb_mac_header(), not before
   skb->data, warns and clears metadata:

  WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90
  CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G           O        6.18.21 #1
  RIP: 0010:skb_data_move+0x47/0x90
  Call Trace:
   <IRQ>
   bpf_skb_change_head+0xe6/0x1a0
   bpf_prog_...+0x213/0x2e3
   run_lwt_bpf.isra.0+0x1d3/0x360
   bpf_xmit+0x46/0xe0
   lwtunnel_xmit+0xa1/0xf0
   ip_finish_output2+0x1e7/0x5e0
   ip_output+0x63/0x100
   __netif_receive_skb_one_core+0x85/0xa0
   process_backlog+0x9c/0x150
   __napi_poll+0x2b/0x190
   net_rx_action+0x40b/0x7f0
   handle_softirqs+0xd2/0x270
   do_softirq+0x3f/0x60
   </IRQ>

That is what happens, as for how to fix it - a received packet that
carries metadata can reach an encap through any of the three LWT
redirect modes:

  LWTUNNEL_STATE_INPUT_REDIRECT
   ip6_rcv_finish
     dst_input
       lwtunnel_input

  LWTUNNEL_STATE_OUTPUT_REDIRECT
   ip6_rcv_finish
     dst_input
       ip6_forward
         ip6_forward_finish
           dst_output
             lwtunnel_output

  LWTUNNEL_STATE_XMIT_REDIRECT
   ip6_rcv_finish
     dst_input
       ip6_forward
         ip6_forward_finish
           dst_output
             ip6_output
               ip6_finish_output
                 ip6_finish_output2
                   lwtunnel_xmit

Every encap funnels through the three LWT dispatch helpers, so drop the
metadata there, right before handing the skb to the encap op. This
single chokepoint covers all encap types and all three redirect modes:

  - lwtunnel_input():  seg6, rpl, ila, seg6_local
  - lwtunnel_output(): ioam6
  - lwtunnel_xmit():   mpls, LWT BPF xmit

Alternatively, we could clear the metadata right after TC ingress hook.
That would require a compromise, however. Metadata would become
inaccessible from TC egress (in setups where it actually reaches the
hook it tact, that is without any L2 tunnels on path).

Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-1-71d6a33ab76b@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
"New features:
   - XPRTRDMA: Decouple req recycling from RPC completion
   - NFS: Expose FMODE_NOWAIT for read-only files

  Bugfixes:
   - SUNRPC:
      - Fix sunrpc sysfs error handling
      - Fix uninitialized xprt_create_args structure
   - XPRTRDMA:
      - Harden connect and reply handling
   - NFS:
      - Fix EOF updates after fallocate/zero-range
      - Keep PG_UPTODATE clear after read errors in page groups
      - Use nfsi->rwsem to protect traversal of the file lock list
      - Prevent resource leak in nfs_alloc_server()
   - NFSv4:
      - Clear exception state on successful mkdir retry
      - Don't skip revalidate when holding a dir delegation and attrs are stale
   - pNFS:
      - Fix use-after-free in pnfs_update_layout()
      - Defer return_range callbacks until after inode unlock
      - Fix LAYOUTCOMMIT retry loop on OLD_STATEID
      - Reject zero-length r_addr in nfs4_decode_mp_ds_addr
   - NFS/flexfiles:
      - Reject zero-length filehandle version arrays
      - Fix checking if a layout is striped
      - Fixes for honoring FF_FLAGS_NO_IO_THRU_MDS

  Other cleanups and improvements:
   - Remove the fileid field from struct nfs_inode
   - Move long-delayed xprtrdma work onto the system_dfl_long_wq
   - Convert xprtrdma send buffer free list to an llist
   - Show "<redacted>" for cert_serial and privkey_serial mount options"

* tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (42 commits)
  NFS: Use common error handling code in nfs_alloc_server()
  NFS: Prevent resource leak in nfs_alloc_server()
  NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr
  nfs: don't skip revalidate on directory delegation when attrs flagged stale
  xprtrdma: Return sendctx slot after Send preparation failure
  xprtrdma: Repost Receive buffers for malformed replies
  xprtrdma: Sanitize the reply credit grant after parsing
  xprtrdma: Fix bcall rep leak and unbounded peek
  xprtrdma: Resize reply buffers before reposting receives
  xprtrdma: Check frwr_wp_create() during connect
  xprtrdma: Initialize re_id before removal registration
  xprtrdma: Fix ep kref imbalance on ADDR_CHANGE
  xprtrdma: Convert send buffer free list to llist
  NFS: correct CONFIG_NFS_V4 macro name in #endif comment
  nfs: use nfsi->rwsem to protect traversal of the file lock list
  NFSv4.1/pNFS: fix LAYOUTCOMMIT retry loop on OLD_STATEID
  nfs: expose FMODE_NOWAIT for read-only files
  nfs: add nowait version of nfs_start_io_direct
  NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write
  NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors
  ...

Merge tag 'f2fs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"The changes primarily focus on filesystem error reporting, reducing
  memory footprint by reverting in-memory data structures used for
  runtime validation, honoring FDP hints, and adding trace and debug
  logs. In addition, there are critical bug fixes resolving
  out-of-bounds read vulnerabilities in inline directory and ACL
  handling, potential deadlocks in balance_fs, use-after-free issues in
  atomic writes, and false data/node type assignments in large sections.

  Enhancements:
   - Revert  in-memory sit version and block bitmaps
   - support to report fserror
   - add trace_f2fs_fault_report
   - add iostat latency tracking for direct IO
   - add logs in f2fs_disable_checkpoint()
   - honor per-I/O write streams for direct writes
   - map data writes to FDP streams
   - skip inode folio lookup for cached overwrite
   - skip direct I/O iostat context when disabled
   - revert "check in-memory block bitmap"
   - revert "check in-memory sit version bitmap"

  Fixes:
   - optimize representative type determination in GC
   - fix incorrect FI_NO_EXTENT handling in __destroy_extent_node()
   - fix potential deadlock in f2fs_balance_fs()
   - fix potential deadlock in gc_merge path of f2fs_balance_fs()
   - atomic: fix UAF issue on f2fs_inode_info.atomic_inode
   - fix missing read bio submission on large folio error
   - pass correct iostat type for single node writes
   - fix to do sanity check on f2fs_get_node_folio_ra()
   - validate orphan inode entry count
   - keep atomic write retry from zeroing original data
   - read COW data with the original inode during atomic write
   - validate inline dentry name lengths before conversion
   - validate dentry name length before lookup compares it
   - reject setattr size changes on large folio files
   - revert "remove non-uptodate folio from the page cache in move_data_block"
   - validate ACL entry sizes in f2fs_acl_from_disk()
   - bound i_inline_xattr_size for non-inline-xattr inodes
   - fix listxattr handling of corrupted xattr entries
   - fix to round down start offset of fallocate for pin file"

* tag 'f2fs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
  f2fs: fix to round down start offset of fallocate for pin file
  f2fs: fix listxattr handling of corrupted xattr entries
  f2fs: skip direct I/O iostat context when disabled
  f2fs: remove unneeded f2fs_is_compressed_page()
  f2fs: avoid unnecessary fscrypt_finalize_bounce_page()
  f2fs: avoid unnecessary sanity check on ckpt_valid_blocks
  f2fs: misc cleanup in f2fs_record_stop_reason()
  f2fs: fix wrong description in printed log
  f2fs: bound i_inline_xattr_size for non-inline-xattr inodes
  f2fs: validate ACL entry sizes in f2fs_acl_from_disk()
  Revert "f2fs: remove non-uptodate folio from the page cache in move_data_block"
  f2fs: Split f2fs_write_end_io()
  f2fs: Rename f2fs_post_read_wq into f2fs_wq
  f2fs: Prepare for supporting delayed bio completion
  f2fs: reject setattr size changes on large folio files
  f2fs: validate dentry name length before lookup compares it
  f2fs: validate inline dentry name lengths before conversion
  f2fs: read COW data with the original inode during atomic write
  f2fs: skip inode folio lookup for cached overwrite
  f2fs: keep atomic write retry from zeroing original data
  ...

Merge tag 'x86-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:

- Prevent NULL dereference on theoretical missing IO bitmap (Li
RongQing)

* tag 'x86-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioperm: Prevent NULL dereference on theoretical missing IO bitmap

Merge tag 'timers-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc timer fixes from Ingo Molnar:

- Fix timekeeping locking order bug in the timekeeping init code
   (Mikhail Gavrilov)

- Fix u64 multiplication bug in the posix-cpu-timers code on 32-bit
   kernels (Zhan Xusheng)

- Fix macro name in comment block (Ethan Nelson-Moore)

- Fix off-by-one bug in the compat settimeofday() usecs validation code
   (Wang Yan)

* tag 'timers-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Fix off-by-one in compat settimeofday() usec validation
  hrtimer: Correct CONFIG_NO_HZ_COMMON macro name in comment
  posix-cpu-timers: Use u64 multiplication in update_rlimit_cpu()
  timekeeping: Register default clocksource before taking tk_core.lock

Merge tag 'smp-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc CPU hotplug fixes from Ingo Molnar:

- Fix CPU hotplug error handling rollback bug (Bradley Morgan)

- Fix possible output OOB write bug in the sysfs hotplug states
   printing code (Bradley Morgan)

* tag 'smp-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: hotplug: Bound hotplug states sysfs output
  cpu: hotplug: Preserve per instance callback errors

Merge tag 'perf-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fix from Ingo Molnar:

- Fix event::addr_filter_ranges lifetime bug (Peter Zijlstra)

* tag 'perf-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix addr_filter_ranges lifetime

Merge tag 'ipsec-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2026-06-22

1) xfrm: use compat translator only for u64 alignment mismatch
   Gate the XFRM_USER_COMPAT translator on COMPAT_FOR_U64_ALIGNMENT
   so 32-bit compat tasks on arches whose 32-bit ABI already matches
   the native 64-bit layout are no longer rejected with -EOPNOTSUPP.
   From Sanman Pradhan.

2) net: af_key: initialize alg_key_len for IPComp states
   Initialize the alg_key_len to 0 in the IPComp branch of
   pfkey_msg2xfrm_state() so an uninitialized value cannot drive
   xfrm_alg_len() into a slab-out-of-bounds kmemdup during
   XFRM_MSG_MIGRATE. From Zijing Yin.

3) xfrm: Fix dev use-after-free in xfrm async resumption
   Stash the original skb->dev and extend the RCU critical section
   across xfrm_rcv_cb() and transport_finish() to prevent a
   tunnel-device UAF and original-device refcount leak when a
   callback replaces skb->dev. From Dong Chenchen.

4) xfrm: Fix xfrm state cache insertion race
   Move the state-validity check inside xfrm_state_lock in the
   input state cache insertion path so a state cannot be killed
   between the check and the insert. From Herbert Xu.

5) xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
   Add READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count
   and xfrm_policy_default to silence the KCSAN data race reported
   on net->xfrm.policy_count. From Eric Dumazet.

6) espintcp: use sk_msg_free_partial to fix partial send
   Replace the manual skmsg accounting in espintcp with
   sk_msg_free_partial() so the skmsg stays consistent on every
   iteration and the partial-send accounting bugs go away.
   From Sabrina Dubroca.

7) xfrm: validate selector family and prefixlen during match
   Reject mismatched address families in xfrm_selector_match() and
   bound prefixlen in addr4_match()/addr_match() to prevent the
   shift-out-of-bounds syzbot reported when an AF_UNSPEC selector
   with a large prefixlen is matched against an IPv4 flow.
   From Eric Dumazet.

* tag 'ipsec-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: validate selector family and prefixlen during match
  espintcp: use sk_msg_free_partial to fix partial send
  xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
  xfrm: Fix xfrm state cache insertion race
  xfrm: Fix dev use-after-free in xfrm async resumption
  net: af_key: initialize alg_key_len for IPComp states
  xfrm: use compat translator only for u64 alignment mismatch
====================

Link: https://patch.msgid.link/20260622075726.29685-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'locking-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:

- Fix the incorrect RCU protection in rt_spin_unlock() (Thomas
Gleixner)

* tag 'locking-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()

Merge tag 'core-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc core fixes from Ingo Molnar:

- Fix an MM-CID race that can cause an OOB write (Rik van Riel)

- Fix a debugobjects OOM handling race (Thomas Gleixner)

* tag 'core-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Plug race against a concurrent OOM disable
sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path

Merge tag 'irq-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc irqchip driver fixes from Ingo Molnar:

- Fix indexing bug in the Crossbar irqchip driver (Bhargav Joshi)

- Fix a parent domain resource leak in the Crossbar irqchip driver
   (Bhargav Joshi)

- Fix resource leak in the ImgTec PDC irqchip driver's exit logic
   (Qingshuang Fu)

- Fix macro name in comment block (Ethan Nelson-Moore)

* tag 'irq-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/msi: Correct CONFIG_PCI_MSI_ARCH_FALLBACKS macro name in comment
  irqchip/imgpdc: Fix resource leak, add missing chained handler cleanup on remove
  irqchip/crossbar: Fix parent domain resource leak
  irqchip/crossbar: Use correct index in crossbar_domain_free()

ksmbd: fix kernel-doc warnings in smb2_lease_break_noti()

kernel test robot report missing kernel-doc descriptions for the 'wait_ack'
and 'inc_epoch' parameters of smb2_lease_break_noti():

  Warning: fs/smb/server/oplock.c:937 function parameter 'wait_ack' not
   described in 'smb2_lease_break_noti'
  Warning: fs/smb/server/oplock.c:937 function parameter 'inc_epoch' not
   described in 'smb2_lease_break_noti'

Document both parameters to silence the warnings.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: fix inconsistent indenting warnings

Detected by Smatch.

fs/smb/server/oplock.c:1446 smb_grant_oplock()
warn: inconsistent indenting

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: validate NTLMv2 response before updating session key

ksmbd_auth_ntlmv2() derives the NTLMv2 session key into
sess->sess_key before it verifies the NTLMv2 response.
ksmbd_decode_ntlmssp_auth_blob() then continues into KEY_XCH even
when ksmbd_auth_ntlmv2() failed.

With SMB3 multichannel binding, the failed authentication operates on
an existing session and the session setup error path does not expire
binding sessions. A client can send a binding session setup with a
bad NT proof and KEY_XCH and still modify sess->sess_key before
STATUS_LOGON_FAILURE is returned.

Relevant path:

  smb2_sess_setup()
    -> conn->binding = true
    -> ntlm_authenticate()
       -> session_user()
       -> ksmbd_decode_ntlmssp_auth_blob()
          -> ksmbd_auth_ntlmv2()
             -> calc_ntlmv2_hash()
             -> hmac_md5_usingrawkey(..., sess->sess_key)
             -> crypto_memneq() returns mismatch
          -> KEY_XCH arc4_crypt(..., sess->sess_key, ...)
    -> out_err without expiring the binding session

Derive the base session key into a local buffer and copy it to
sess->sess_key only after the proof matches. Return immediately on
authentication failure so KEY_XCH is only processed after successful
authentication.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Fixes: f9929ef6a2a5 ("ksmbd: add support for key exchange")
Cc: stable@vger.kernel.org
Signed-off-by: Haofeng Li <lihaofeng@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

Merge tag 'dmaengine-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine updates from Vinod Koul:
"Core:
   - New devm_of_dma_controller_register() API
   - Refactor devm_dma_request_chan() API

  New Support:
   - Loongson Multi-Channel DMA controller support
   - Renesas RZ/{T2H,N2H} support
   - Dw CV1800B DMA support
   - Switchtec DMA engine driver

U pdates:
   - Xilinx AXI dma binding conversion
   - Renesas CHCTRL register read updates
   - AMD MDB Endpoint and non-LL mode Support
   - AXI dma handling of SW and HW cyclic transfers termination
   - Intel ioatdma and idxd driver updates"

* tag 'dmaengine-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (62 commits)
  dt-bindings: dma: snps,dw-axi-dmac: Add fallback compatible for CV1800B
  MAINTAINERS: dmaengine/ti: Remove myself and add Vignesh as maintainer
  dmaengine: qcom: Unify user-visible "Qualcomm" name
  dt-bindings: dma: qcom,gpi: Document GPI DMA engine for Shikra SoC
  dmaengine: qcom: hidma: use sysfs_emit() in sysfs show callbacks
  dmaengine: dw-axi-dmac: fix PM for system sleep and channel alloc
  dmaengine: dw-axi-dmac: drop redundant DMAC enable in block start
  dmaengine: altera-msgdma: Use memcpy_toio for descriptor FIFO writes
  dt-bindings: dma: fsl-edma: add dma-channel-mask property description
  dmaengine: tegra: Fix burst size calculation
  dmaengine: iop32x-adma: Remove a leftover header file
  dmaengine: dma-axi-dmac: use DMA pool to manange DMA descriptor
  dmaengine: dma-axi-dmac: Drop struct clk from main struct
  dmaengine: dma-axi-dmac: Properly free struct axi_dmac_desc
  dmaengine: Fix possible use after free
  dmaengine: dw-edma: Add spinlock to protect DONE_INT_MASK and ABORT_INT_MASK
  dmaengine: dw-edma-pcie: Reject devices without driver data
  dmaengine: sh: rz-dmac: Add DMA ACK signal routing support
  irqchip/renesas-rzv2h: Add DMA ACK signal routing support
  dmaengine: dw-edma: Remove dw_edma_add_irq_mask()
  ...

Merge tag 'phy-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy updates from Vinod Koul:
"Bunch of new driver, device support in existing drivers/binding and
  few updates to existing drivers

  New Support:
   - Qualcomm Eliza QMP PHY, Eliza Synopsys eUSB2 support, Eliza PCIe
     phy support, Nord QMP UFS PHY, IPQ5210 USB3 PHY support
   - Econet EN751221 and EN7528 PCIe phy support
   - NXPs TJA1145 CAN transceiver phy support
   - TI DS125DF111 retimer phy support
   - Rockchip RK3528 usb phy support
   - TI J722S phy support
   - Axiado eMMC PHY driver
   - EyeQ5 Ethernet PHY driver
   - Generic PHY driver for Lynx 10G SerDes
   - Spacemit K3 USB2 PHY support

  Updates:
   - Tomi helping maintian zynqmp phys
   - lynx phy updates to support 25GBASER
   - Rockchip GRF for RK3568/RV1108 support
   - Qualcomm QSERDES COM v2 support"

* tag 'phy-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (87 commits)
  phy: rockchip: inno-usb2: Add missing clkout_ctl_phy kerneldoc
  phy: Move MODULE_DEVICE_TABLE next to the table itself
  phy: add basic support for NXPs TJA1145 CAN transceiver
  dt-bindings: phy: add support for NXPs TJA1145 CAN transceiver
  phy: freescale: phy-fsl-imx8qm-lvds-phy: Fix missing pm_runtime_disable() on probe error path
  dt-bindings: phy: qcom,qmp-usb: Add ipq5210 USB3 PHY
  dt-bindings: phy: qcom,qusb2: Document IPQ5210 compatible
  phy: freescale: phy-fsl-imx8qm-lvds-phy: Use synchronous PM runtime put in reset
  MAINTAINERS: expand Lynx 28G entry to cover Lynx 10G SerDes
  phy: lynx-10g: new driver
  dt-bindings: phy: lynx-10g: initial document
  phy: lynx-28g: improve phy_validate() procedure
  phy: lynx-28g: optimize read-modify-write operation
  phy: lynx-28g: add support for big endian register maps
  phy: lynx-28g: common probe() and remove()
  phy: lynx-28g: make lynx_28g_pll_read_configuration() callable per PLL
  phy: lynx-28g: move struct lynx_info definitions downwards
  phy: lynx-28g: provide default lynx_lane_supports_mode() implementation
  phy: lynx-28g: generalize protocol converter accessors
  phy: lynx-28g: common lynx_pll_get()
  ...

Merge branch 'pci/misc'

- Fix typos in documentation (josh ziegler)

- Use FIELD_MODIFY() instead of open-coding it (Hans Zhang)

* pci/misc:
PCI: Use FIELD_MODIFY() instead of open-coding it
Documentation: PCI: Fix typos

Merge branch 'pci/controller/misc'

- Remove unused gpio.h include from amd-mdb, designware-plat, fu740,
  visconti drivers (Andy Shevchenko)

* pci/controller/misc:
  PCI: visconti: Drop unused include
  PCI: fu740: Drop unused include
  PCI: designware-plat: Drop unused include
  PCI: amd-mdb: Use the right GPIO header

Merge branch 'pci/controller/tlp_macros'

- Add common TLP Type macros (MRd/Wr, IORd/Wr, CfgRd/Wr 0, CfgRd/Wr 1, Msg)
  and use them in aspeed, cadence, dwc, mediatek, tegra drivers (Hans
  Zhang)

* pci/controller/tlp_macros:
  PCI: cadence: Use common TLP type macros
  PCI: dwc: Replace ATU type macros with common TLP type macros
  PCI: Add common TLP type macros and convert aspeed/mediatek

Merge branch 'pci/controller/rescan_lock'

- Protect root bus removal with rescan lock in altera, brcmstb, cadence,
  dwc, iproc, mediatek, plda, rockchip to prevent use-after-free or crashes
  when racing with sysfs rescan or hotplug (Hans Zhang)

* pci/controller/rescan_lock:
  PCI: rockchip: Protect root bus removal with rescan lock
  PCI: plda: Protect root bus removal with rescan lock
  PCI: mediatek: Protect root bus removal with rescan lock
  PCI: iproc: Protect root bus removal with rescan lock
  PCI: dwc: Protect root bus removal with rescan lock
  PCI: cadence: Protect root bus removal with rescan lock
  PCI: brcmstb: Protect root bus removal with rescan lock
  PCI: altera: Protect root bus removal with rescan lock

Merge branch 'pci/controller/link_train_delay'

- Add pci_host_common_link_train_delay() for the mandatory delay after
  > 5GT/s Link training completes and use it for cadence HPA, j721e, LGA;
  dwc; aardvark, mediatek-gen3, rzg3s (Hans Zhang)

* pci/controller/link_train_delay:
  PCI: rzg3s-host: Use common pci_host_common_link_train_delay() helper
  PCI: mediatek-gen3: Add 100 ms delay after link up
  PCI: aardvark: Add 100 ms delay after link training
  PCI: dwc: Use common pci_host_common_link_train_delay() helper
  PCI: cadence-hpa: Add post-link delay
  PCI: cadence: Add post-link delay for LGA and j721e glue driver
  PCI: Add pci_host_common_link_train_delay() helper

# Conflicts:
# drivers/pci/controller/pci-host-common.h

Merge branch 'pci/controller/rcar-host'

- Remove unused LIST_HEAD(res) (Lad Prabhakar)

* pci/controller/rcar-host:
PCI: rcar-host: Remove unused LIST_HEAD(res)

Merge branch 'pci/controller/mvebu'

- Use fixed-width interrupt masks to avoid truncation in 64-bit builds
(Rosen Penev)

* pci/controller/mvebu:
PCI: mvebu: Use fixed-width interrupt masks to avoid truncation in 64-bit builds

Merge branch 'pci/controller/mediatek-gen3'

- Deassert PCIE_PHY_RSTB so REFCLK is stable for at least 100ms
  (PCIE_T_PVPERL_MS) before deasserting PERST# (Jian Yang)

- Add .shutdown() to assert PERST# before powering down device (Jian Yang)

- Do full device power down on removal, including asserting PERST#, when
  removing driver (Chen-Yu Tsai)

- Fix a 'failed to create pwrctrl devices' error message that was
  inadvertently skipped (Chen-Yu Tsai)

* pci/controller/mediatek-gen3:
  PCI: mediatek-gen3: Fix incorrectly skipped pwrctrl error message
  PCI: mediatek-gen3: Do full device power down on removal
  PCI: mediatek-gen3: Add a .shutdown() callback to control PERST# signal
  PCI: mediatek-gen3: Fix PERST# control timing during system startup

Merge branch 'pci/controller/mediatek'

- Use FIELD_PREP() to fix incorrect operator precedence in PCIE_FTS_NUM_L0
  (Li RongQing)

- Fix IRQ domain leak when port fails to enable (Manivannan Sadhasivam)

- Use actual physical address for MSI message address instead of
  virt_to_phys() (Manivannan Sadhasivam)

- Add EcoNet EN7528 to DT binding (Caleb James DeLisle)

* pci/controller/mediatek:
  dt-bindings: PCI: mediatek: Add support for EcoNet EN7528
  PCI: mediatek: Use actual physical address instead of virt_to_phys()
  PCI: mediatek: Fix IRQ domain leak when port fails to enable
  PCI: mediatek: Fix operator precedence in PCIE_FTS_NUM_L0 macro

Merge branch 'pci/controller/loongson'

- Ignore downstream devices only on internal bridges to avoid Loongson
  hardware issue (Rong Zhang)

- Quirk old Loongson-3C6000 bridges that advertise incorrect supported link
  speeds (Ziyao Li)

* pci/controller/loongson:
  PCI: loongson: Override PCIe bridge supported speeds for Loongson-3C6000 series
  PCI: loongson: Do not ignore downstream devices on external bridges

Merge branch 'pci/controller/iproc-bcma'

- Restore .map_irq() assignment that broke INTx on the iproc platform bus
driver (Mark Tomlinson)

* pci/controller/iproc-bcma:
PCI: iproc: Restore .map_irq() for the platform bus driver

Merge branch 'pci/controller/dwc-ultrarisc'

- Add UltraRISC DP1000 PCIe controller DT binding and driver (Jia Wang)

* pci/controller/dwc-ultrarisc:
PCI: ultrarisc: Add UltraRISC DP1000 PCIe Root Complex driver
dt-bindings: PCI: Add UltraRISC DP1000 PCIe controller

Merge branch 'pci/controller/dwc-tegra194'

- Program the DesignWare PORT_AFR L1 entrance latency based on the
'aspm-l1-entry-delay-ns' DT property (Manikanta Maddireddy)

* pci/controller/dwc-tegra194:
PCI: tegra194: Use aspm-l1-entry-delay-ns DT property for L1 entrance latency

Merge branch 'pci/controller/dwc-qcom'

- Set max OPP during resume so DBI register accesses don't fail with NoC
  errors (Qiang Yu)

- Add pci_host_common_d3cold_possible() to determine whether downstream
  devices are already in D3hot and wakeup-enabled devices are capable of
  generating PME from D3cold (Krishna Chaitanya Chundru)

- Add a .get_ltssm() callback to get the LTSSM status without DBI, since
  DBI may be inaccessible after PME_Turn_Off (Krishna Chaitanya Chundru)

- Power down PHY via PARF_PHY_CTRL before disabling rails/clocks to avoid
  power leakage (Krishna Chaitanya Chundru)

- Decide whether suspend should put the link in L2 and power down using
  pci_host_common_d3cold_possible() instead of checking whether ASPM L1 is
  enabled (Krishna Chaitanya Chundru)

- Add qcom D3cold support to tear down interconnect bandwidth and OPP votes
  (Krishna Chaitanya Chundru)

- Handle unsupported mixed PERST#/PHY DT configurations, e.g., PHY in RP
  node while PERST# is in the RC node, but warn about the DT issue (Qiang
  Yu)

- Add pcie_encode_t_power_on() to encode L1SS T_POWER_ON fields (Krishna
  Chaitanya Chundru)

- Add dw_pcie_program_t_power_on() to program T_POWER_ON (Krishna Chaitanya
  Chundru)

- Program qcom T_POWER_ON based on DT 't-power-on-us' property in case
  hardware advertises incorrect values (Krishna Chaitanya Chundru)

- Disable ASPM L0s for SA8775P (Shawn Guo)

- Initialize DWC MSI lock for firmware-managed ECAM hosts, which don't use
  the dw_pcie_host_init() path that initializes the lock (Yadu M G)

* pci/controller/dwc-qcom:
  PCI: qcom: Initialize DWC MSI lock for firmware-managed ECAM hosts
  PCI: qcom: Disable ASPM L0s for SA8775P
  PCI: qcom: Program T_POWER_ON
  PCI: dwc: Add dw_pcie_program_t_power_on() to program T_POWER_ON
  PCI/ASPM: Add pcie_encode_t_power_on() helper to encode L1SS T_POWER_ON fields
  PCI: qcom: Handle mixed PERST#/PHY DT configuration
  PCI: qcom: Add D3cold support
  PCI: dwc: Use common D3cold eligibility helper in suspend path
  PCI: qcom: Power down PHY via PARF_PHY_CTRL before disabling rails/clocks
  PCI: qcom: Add .get_ltssm() callback to query LTSSM status
  PCI: host-common: Add pci_host_common_d3cold_possible() helper
  PCI: qcom: Set max OPP before DBI access during resume

# Conflicts:
# drivers/pci/controller/pci-host-common.c

Merge branch 'pci/controller/dwc-meson'

- Propagate devm_add_action_or_reset() failure to fix probe error path
  (Shuvam Pandey)

- Add a .remove() callback to deinitialize the host bridge and power off
  the PHY (Shuvam Pandey)

* pci/controller/dwc-meson:
  PCI: meson: Add missing remove callback
  PCI: meson: Propagate devm_add_action_or_reset() failure

Merge branch 'pci/controller/dwc-intel-gw'

- Enable clock before PHY init for correct ordering (Florian Eckert)

- Add .start_link() callback so the driver works again (Florian Eckert)

- Stop overwriting the ATU base address discovered by
  dw_pcie_get_resources() (Florian Eckert)

- Add DT 'atu' region since this is hardware-specific, and fall back to
  driver default if lacking (Florian Eckert)

* pci/controller/dwc-intel-gw:
  dt-bindings: PCI: intel,lgm-pcie: Add 'atu' resource
  PCI: intel-gw: Fix ATU base address setup and add optional DT 'atu' region
  PCI: intel-gw: Add .start_link() callback
  PCI: intel-gw: Enable clock before PHY init
  PCI: intel-gw: Move interrupt enable to own function
  PCI: intel-gw: Remove unused PCIE_APP_INTX_OFST definition

Merge branch 'pci/controller/dwc-imx6'

- Move IMX6SX_GPR12_PCIE_TEST_POWERDOWN handling into the core reset
  functions (Richard Zhu)

- Add pci_host_common_parse_ports() for use by any native driver to parse
  Root Port properties (currently only reset GPIOs) (Sherry Sun)

- Assert PERST# before enabling regulators to ensure that even if power is
  enabled, endpoint stays inactive until REFCLK is stable (Sherry Sun)

- Parse reset properties in Root Port nodes (falling back to host bridge)
  to help support Key E connectors and the pwrctrl framework (Sherry Sun)

- Configure i.MX95 REF_USE_PAD before PHY reset (Richard Zhu)

- Assert i.MX95 ref_clk_en after reference clock stabilizes (Richard Zhu)

- Integrate new pwrctrl API for DTs with Root Port-level power supplies
  (Sherry Sun)

* pci/controller/dwc-imx6:
  PCI: imx6: Integrate new pwrctrl API
  PCI: imx6: Assert ref_clk_en after reference clock stabilizes on i.MX95
  PCI: imx6: Configure REF_USE_PAD before PHY reset for i.MX95
  PCI: imx6: Parse 'reset-gpios' in Root Port nodes
  PCI: imx6: Assert PERST# before enabling regulators
  PCI: host-generic: Add common helpers for parsing Root Port properties
  dt-bindings: PCI: fsl,imx6q-pcie: Add reset GPIO in Root Port node
  PCI: imx6: Fix IMX6SX_GPR12_PCIE_TEST_POWERDOWN handling

Merge branch 'pci/controller/dwc-amd-mdb'

- Assert PERST# on shutdown so any connected Endpoints are held in reset
during shutdown (Sai Krishna Musham)

* pci/controller/dwc-amd-mdb:
PCI: amd-mdb: Assert PERST# on shutdown

Merge branch 'pci/controller/dwc'

- Apply ECRC TLP Digest workaround for all DesignWare cores prior to 5.10a,
  not just 4.90a and 5.00a (Manikanta Maddireddy)

- Use common struct dw_pcie 'mode' rather than duplicating it in artpec6,
  dra7xx, dwc-pcie, and keembay driver structs (Hans Zhang)

- Use DEFINE_SHOW_ATTRIBUTE for ltssm_status debugfs to reduce boilerplate
  and fix a seq_file memory leak by including a .release() callback (Hans
  Zhang)

- Fix a signedness bug in fault injection test code (Dan Carpenter)

- Avoid NULL pointer dereference when tearing down debugfs for controller
  that lacks RAS DES capability (Shuvam Pandey)

* pci/controller/dwc:
  PCI: dwc: Avoid dwc_pcie_rasdes_debugfs_deinit() NULL dereference when no RAS DES capability
  PCI: dwc: Fix signedness bug in fault injection test code
  PCI: dwc: Use DEFINE_SHOW_ATTRIBUTE for ltssm_status debugfs
  PCI: keembay: Use common mode field in struct dw_pcie
  PCI: dwc: Use common mode field in struct dw_pcie
  PCI: artpec6: Use common mode field in struct dw_pcie
  PCI: dra7xx: Use common mode field in struct dw_pcie
  PCI: dwc: Apply ECRC workaround for DesignWare cores prior to 5.10a

Merge branch 'pci/controller/altera'

- Do not dispose of the parent IRQ mapping, which belongs to the parent
  interrupt controller (Mahesh Vaidya)

- Fix chained IRQ handler ordering issue and resource leaks on probe
  failure (Mahesh Vaidya)

* pci/controller/altera:
  PCI: altera: Fix resource leaks on probe failure
  PCI: altera: Do not dispose parent IRQ mapping

Merge branch 'pci/controller/host-common'

- Request bus reassignment when not probe-only to fix an enumeration
  regression on Marvell CN106XX and possibly other DT-based systems
  (Ratheesh Kannoth)

* pci/controller/host-common:
  PCI: host-common: Request bus reassignment when not probe-only

Merge branch 'pci/endpoint'

- Add endpoint controller APIs for use by function drivers to discover
  auxiliary blocks like DMA engines (Koichiro Den)

- Remember DesignWare eDMA engine base/size and expose them via the EPC
  aux-resource API (Koichiro Den)

- Refactor endpoint doorbell allocation to allow non-MSI doorbells
  (Koichiro Den)

- Add endpoint embedded doorbell fallback, used if MSI allocation fails
  (Koichiro Den)

- Validate BAR index and remove dead BAR read in endpoint doorbell test
  (Carlos Bilbao)

- Unwind MSI/MSI-X vectors if NTB initialization fails part-way through
  (Koichiro Den)

- Cache sleepable pci_irq_vector() value at ISR setup to avoid calling it
  from hardirq context (Koichiro Den)

- Validate doorbell count when configuring NTB and vNTB doorbells
  (Manivannan Sadhasivam)

- Call sleepable pci_epc_raise_irq() from a work item instead of atomic
  context, e.g., when setting bits in NTB peer doorbells in the
  ntb_peer_db_set() path (Koichiro Den)

- Report 0-based vNTB doorbell vector to account for link event 0 and
  historically skipped slot 1 (Koichiro Den)

- Reject unusable vNTB doorbell counts, e.g., if they don't allow space for
  link event 0 and historically skipped slot 1 (Koichiro Den)

- Prevent configfs writes to vNTB db_count and other values that are
  already in use after EPC attach (Koichiro Den)

- Account for vNTB db_valid reserved slots (link event 0 and historically
  skipped slot 1) so they don't appear as valid doorbells (Koichiro Den)

- Implement vNTB .db_vector_count()/mask() for doorbells so clients can use
  multiple vectors and avoid thundering herds (Koichiro Den)

- Report 0-based NTB doorbell vector to account for link event 0 and
  historically skipped slot 1 (Koichiro Den)

- Fix doorbell bitmask and IRQ vector handling to clear only specified
  bits, use the correct vector for non-contiguous Linux IRQ numbers, and
  validate incoming vectors (Koichiro Den)

- Implement NTB .db_vector_count()/mask() for doorbells so clients can use
  multiple vectors (Koichiro Den)

* pci/endpoint:
  NTB: epf: Implement .db_vector_count()/mask() for doorbells
  NTB: epf: Fix doorbell bitmask and IRQ vector handling
  NTB: epf: Report 0-based doorbell vector via ntb_db_event()
  NTB: epf: Make db_valid_mask cover only real doorbell bits
  NTB: epf: Document legacy doorbell slot offset in ntb_epf_peer_db_set()
  PCI: endpoint: pci-epf-vntb: Implement .db_vector_count()/mask() for doorbells
  PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
  PCI: endpoint: pci-epf-vntb: Guard configfs writes after EPC attach
  PCI: endpoint: pci-epf-vntb: Reject unusable doorbell counts
  PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via ntb_db_event()
  PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context
  PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
  PCI: endpoint: pci-epf-ntb: Add check to detect 'db_count' value of 0
  PCI: endpoint: pci-epf-vntb: Add check to detect 'db_count' value of 0
  NTB: epf: Avoid calling pci_irq_vector() from hardirq context
  NTB: epf: Fix request_irq() unwind in ntb_epf_init_isr()
  misc: pci_endpoint_test: Remove dead BAR read before doorbell trigger
  misc: pci_endpoint_test: Validate BAR index in doorbell test
  PCI: endpoint: pci-ep-msi: Add embedded doorbell fallback
  PCI: endpoint: pci-epf-test: Reuse pre-exposed doorbell targets
  PCI: endpoint: pci-epf-vntb: Reuse pre-exposed doorbells and IRQ flags
  PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
  PCI: dwc: ep: Expose integrated eDMA resources via EPC aux-resource API
  PCI: dwc: Record integrated eDMA register window
  PCI: endpoint: Add auxiliary resource query API

Merge branch 'pci/dt-binding'

- Add 'dma-coherent' property for sg2042-pcie driver (Han Gao)

- Add RZ/V2N DT support for rzg3s-pcie-host driver (Lad Prabhakar)

- Add Eliza SoC compatible for qcom-pcie driver (Krishna Chaitanya Chundru)

* pci/dt-binding:
  dt-bindings: PCI: qcom,pcie-sm8550: Add Eliza compatible
  dt-bindings: PCI: renesas,r9a08g045-pcie: Add RZ/V2N support
  dt-bindings: PCI: sophgo: Add dma-coherent property for SG2042

Merge branch 'pci/switchtec'

- Add Gen6 Device IDs to the switchtec driver (Ben Reed)

* pci/switchtec:
PCI: switchtec: Add Gen6 Device IDs

Merge branch 'pci/virtualization'

- Avoid FLR for MediaTek MT7925 WiFi, where FLR fails after a VM terminates
  uncleanly (Jose Ignacio Tornos Martinez)

- Avoid SBR for Qualcomm WCN6855/WCN7850 WiFi, SDX62/SDX65 modems, which
  seem not to support it correctly (Jose Ignacio Tornos Martinez)

* pci/virtualization:
  PCI: Avoid SBR for Qualcomm WCN6855/WCN7850 WiFi, SDX62/SDX65 modems
  PCI: Avoid FLR for MediaTek MT7925 WiFi