git.ipfire.org Git - thirdparty/kernel/linux.git/log

PCI/sysfs: Only allow supported resource types in I/O and MMIO helpers

Currently, when the sysfs attributes for PCI resources are added
dynamically, the resource access callbacks are only set when the underlying
BAR type matches, using .read and .write for IORESOURCE_IO, and .mmap for
IORESOURCE_MEM or IORESOURCE_IO with arch_can_pci_mmap_io() support. As
such, when the callback is not set, the operation inherently fails.

After the conversion to static attributes, visibility callbacks will
control which resource files appear for each BAR, but the callbacks
themselves will always be set.

Add a type check to pci_resource_io() and pci_mmap_resource() to return
-EIO for an unsupported resource type.

Use the new pci_resource_is_io() and pci_resource_is_mem() helpers for the
type checks, replacing the open-coded bitwise flag tests and also drop the
local struct resource pointer in pci_mmap_resource().

Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260508043543.217179-4-kwilczynski@kernel.org

PCI: Add pci_resource_is_io() and pci_resource_is_mem() helpers

Add helpers to check whether a PCI resource is of I/O port or memory type.
These replace the open-coded pci_resource_flags() with IORESOURCE_IO and
IORESOURCE_MEM pattern used across the tree.

Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260508043543.217179-3-kwilczynski@kernel.org

PCI/sysfs: Use PCI resource accessor macros

Replace direct pdev->resource[] accesses with pci_resource_n(), and
pdev->resource[].flags accesses with pci_resource_flags().

No functional changes intended.

Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260508043543.217179-2-kwilczynski@kernel.org

genetlink: free the skb on 'group >= family->n_mcgrps'

These methods generally consume ownership of the provided skb, so even
if an error path is encountered, the skb is freed. This is because the
very first thing they do after some initial setup is to unconditionally
consume the skb via consume_skb(skb). Any subsequent errors lead to the
core netlink layer freeing the skb.

However, there is one check that occurs before ownership is passed,
which is the check for the group index. So if this error condition is
encountered, then the skb is leaked. This error condition is generally
considered a violation of the netlink API, so it's not expected to occur
under normal circumstances. For the same reason, no callers check for
this error condition, and no callers need to be adjusted. However, we
should still follow the same ownership semantics of the rest of the
function. Thus, free the skb in this codepath.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Matthew Maurer <mmaurer@google.com>
Fixes: 2a94fe48f32c ("genetlink: make multicast groups const, prevent abuse")
Link: https://lore.kernel.org/r/845b36ba-7b3a-41f2-acb2-b284f253e2ca@lunn.ch
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260506-genlmsg-return-v2-1-a63ee2a055d6@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: Use generic power management

Switch to the generic power management and remove the usage of legacy
(pci_driver) hooks.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506165015.641738-1-vaibhavgupta40@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: nsh: fix incorrect header length macros

NSH header length is a 6-bit field that encodes the total length of
the header in 4-byte words.  So the maximum length is 0b111111 * 4,
which is 252 and not 256.  The maximum context length is the same
number minus the length of the base header (8), so 244.

These macros are used to validate push_nsh() action in openvswitch.
Miscalculation here doesn't cause any real issues.  In the worst case
the oversized context is truncated while building the header, so we'll
construct and send a broken packet, which is not a big problem, as any
receiver should validate the fields.  No invalid memory accesses will
happen during the header push.  But we should fix the macros to reject
the incorrect actions in the first place.

Using previously defined values and calculating the length instead
of defining numbers directly, so it's easier to understand where they
come from and harder to make a mistake.

Fixes: 1f0b7744c505 ("net: add NSH header structures and helpers")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20260507120434.2962505-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Drop pci_save_state() after pci_restore_state()

Commit 383d89699c50 ("treewide: Drop pci_save_state() after
pci_restore_state()") sought to purge all superfluous invocations of
pci_save_state() from the tree.

Unfortunately the commit missed one invocation in the Broadcom
NetXtreme-C/E driver. Drop it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/39de1b025928d9a457976010b2324e7e99baa92a.1778158755.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: fix NULL pointer dereference in phy_reply_size

In phy_prepare_data(), several strings such as 'name', 'drvname',
'upstream_sfp_name', and 'downstream_sfp_name' are allocated using
kstrdup(). However, these allocations were not checked for failure.

If kstrdup() fails for 'name', it returns NULL while the function
continues. This leads to a kernel NULL pointer dereference and panic
later in phy_reply_size() when it unconditionally calls strlen() on
the NULL pointer.

While other strings like 'upstream_sfp_name' might be checked before
access in certain code paths, failing to handle these allocations
consistently can lead to incomplete data reporting or hidden bugs.

Fix this by adding proper NULL checks for all kstrdup() calls in
phy_prepare_data() and implement a centralized error handling path
using goto labels to ensure all previously allocated resources are
freed on failure.

Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump")
Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260507131738.1173835-1-2022090917019@std.uestc.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: change maintainers for macb Ethernet driver

I would like to hand over the macb maintenance to Théo, as I'm unable to
keep up with the recent flow of patches for this driver. After speaking
with Claudiu, he indicated that he is in the same position as me.
To help with this work, Conor has agreed to act as a reviewer.

I was given responsibility for this driver years ago, and I'm glad to
see it continue with talented developers.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260507120444.9733-1-nicolas.ferre@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

PCI: Check ROM header and data structure addr before accessing

We meet a crash when running stress-ng on x86_64 machine:

  BUG: unable to handle page fault for address: ffa0000007f40000
  RIP: 0010:pci_get_rom_size+0x52/0x220
  Call Trace:
  <TASK>
    pci_map_rom+0x80/0x130
    pci_read_rom+0x4b/0xe0
    kernfs_file_read_iter+0x96/0x180
    vfs_read+0x1b1/0x300

Our analysis reveals that the ROM space's start address is
0xffa0000007f30000, and size is 0x10000. Because of broken ROM space,
before calling readl(pds), the pds's value is 0xffa0000007f3ffff, which is
already pointed to the ROM space end, invoking readl() would read 4 bytes
therefore cause an out-of-bounds access and trigger a crash.  Fix this by
adding image header and data structure checking.

We also found another crash on arm64 machine:

  Unable to handle kernel paging request at virtual address ffff8000dd1393ff
  Mem abort info:
  ESR = 0x0000000096000021
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x21: alignment fault

The call trace is the same with x86_64, but the crash reason is that the
data structure addr is not aligned with 4, and arm64 machine report
"alignment fault". Fix this by adding alignment checking.

Fixes: 47b975d234ea ("PCI: Avoid iterating through memory outside the resource window")
Suggested-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
[bhelgaas: shorten function names, wrap comments]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20260508082128.3344255-3-kanie@linux.alibaba.com

PCI: Introduce named defines for PCI ROM

Convert the magic numbers associated with PCI ROM into named
definitions. Some of these definitions will be used in the second
fix patch.

Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/20260508082128.3344255-2-kanie@linux.alibaba.com

dt-bindings: net: lan966x: Accept standard ethernet prefixes

The dsa.yaml and ethernet-switch.yaml bindings recommend
prefixing ethernet switches and ports with "ethernet-" so
make the LAN966x do the same.

Reported-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260507-lan966-binding-v1-1-e99293d2a4ec@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: atheros: atl2: remove kernel backward-compatibility code

The atl2 driver contains code for compatibility with old kernels that
do not support module_param_array. Backward compatibility is
irrelevant because this driver is in-tree. Remove this unreachable
code to simplify the driver's handling of module parameters.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506054035.23710-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: napi: Avoid gro timer misfiring at end of busypoll

When in irq deferral mode (defer-hard-irqs > 0), a short enough
gro-flush timeout can trigger before NAPI_STATE_SCHED is cleared if the
last poll in busy_poll_stop() takes too long. This can have the effect
of leaving the queue stuck with interrupts disabled and no timer armed
which results in a tx timeout if there is no subsequent busypoll cycle.

To prevent this, defer the gro-flush timer arm after the last poll.

Fixes: 7fd3253a7de6 ("net: Introduce preferred busy-polling")
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260506090808.820559-2-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-flowlabel-per-netns-budget-for-unprivileged-callers'

Maoyi Xie says:

====================
ipv6: flowlabel: per-netns budget for unprivileged callers

From: Maoyi Xie <maoyi.xie@ntu.edu.sg>

This series fixes the cross-tenant DoS in net/ipv6/ip6_flowlabel.c.
v1 through v6 were single-patch postings, each in its own thread.
v6 review pointed out that the existing fl_size read in
mem_check() and the corresponding write in fl_intern() are not in
the same critical section. v7 split the work into 2 patches.

Patch 1/2 is a prerequisite. It moves spin_lock_bh(&ip6_fl_lock)
and the matching unlock from fl_intern() into its only caller
ipv6_flowlabel_get(), so the mem_check() call runs under the same
critical section as the fl_intern() insert. With all writers and
the read of fl_size under the lock, fl_size is converted from
atomic_t to plain int. This is independent of the per-netns
budget. It also makes 2/2 backportable without conflicts.

Patch 2/2 is the v6 patch, rebased on 1/2.

  - flowlabel_count is plain int rather than atomic_t, since the
    previous patch put all writers and readers under ip6_fl_lock.
  - In ip6_fl_gc(), fl_free() is now placed below the fl_size
    and flowlabel_count decrements, removing the v6 cache of
    fl->fl_net.
  - In ip6_fl_purge(), fl_free() stays in its original position.
    The function argument net is used for flowlabel_count.
  - mem_check() uses spaces around the / operator on all four
    expressions, addressing the checkpatch note in v6 review.

Numeric budget (preserved from v6):

  pre-patch:
    global non-CAP_NET_ADMIN budget = FL_MAX_SIZE - FL_MAX_SIZE/4
                                    = 4096 - 1024 = 3072
    per-actor reach                 = 3072

  post-patch:
    FL_MAX_SIZE doubled to 8192
    global non-CAP_NET_ADMIN budget = 8192 - 2048 = 6144
    per-netns ceiling               = 6144 / 2 = 3072
    per-actor reach                 = 3072 (preserved)

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

Reproducer (KASAN VM, 4 cores, qemu): unprivileged netns A holds
3072 flowlabels via 100 procs. Fresh unprivileged netns B then
allocates 32 flowlabels (the FL_MAX_PER_SOCK ceiling for one
socket), the same as a clean baseline. Without the per-netns
ceiling, netns A could push fl_size past FL_MAX_SIZE - FL_MAX_SIZE
/ 4 and netns B would see allocations denied.
====================

Link: https://patch.msgid.link/20260506082416.2259567-1-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: flowlabel: enforce per-netns limit for unprivileged callers

fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are
file scope and shared across netns. mem_check() reads fl_size to
decide whether to deny non-CAP_NET_ADMIN callers. capable() runs
against init_user_ns, so an unprivileged user in any non-init
userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and
starve every other unprivileged userns on the host.

Add struct netns_ipv6::flowlabel_count, bumped and decremented
next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new
field fills the existing 4-byte hole after ipmr_seq, so struct
netns_ipv6 stays the same size on 64-bit builds.

Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the
file was added. Machines and connection counts have grown.

mem_check() folds an extra per-netns ceiling into the existing
non-CAP_NET_ADMIN conditional. The ceiling is half of the total
budget that unprivileged callers have ever been able to use, i.e.
(FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With
FL_MAX_SIZE doubled, this preserves the original per-user reach
of 3K (what an unprivileged caller could already obtain before
this change), while forcing an attacker to spread allocations
across at least two netns to exhaust the global non-CAP_NET_ADMIN
budget.

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

The previous patch took ip6_fl_lock across mem_check and
fl_intern, so the new flowlabel_count read in mem_check and the
new flowlabel_count++ in fl_intern run under the same critical
section. flowlabel_count is therefore plain int, like fl_size.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern

mem_check() in net/ipv6/ip6_flowlabel.c reads fl_size without
holding ip6_fl_lock. fl_intern() takes the lock immediately
afterwards. The two checks therefore race against concurrent
fl_intern, ip6_fl_gc and ip6_fl_purge writers, which makes the
mem_check budget check approximate.

Move spin_lock_bh(&ip6_fl_lock) and the matching unlock from
fl_intern() into its only caller ipv6_flowlabel_get(). The
mem_check() call now runs under the same critical section as the
fl_intern() insert, so the budget check is exact.

With all writers and the read of fl_size under ip6_fl_lock,
convert fl_size from atomic_t to plain int. The four sites that
update or read fl_size are fl_intern (insert path), ip6_fl_gc
(garbage collector, the !sched check and the per-entry decrement),
ip6_fl_purge (per-netns purge), and mem_check (budget check), and
all four now run under ip6_fl_lock.

This is a prerequisite for adding a per-netns budget alongside
fl_size. The follow-up patch adds netns_ipv6::flowlabel_count and
folds it into mem_check().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506082416.2259567-2-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Add self for the 3c509 network driver

It appears there's a need for a maintainer for the 3Com EtherLink III
family of Ethernet network adapters. There is documentation available
and the driver is very mature so the task ought to be of little hassle,
so I think I should be able to squeeze in any issues to be addressed.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/alpine.DEB.2.21.2604271056460.28583@angie.orcam.me.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add tests for filtered dumps of page pool

Add tests for page pool dumps of a specific ifindex.

Link: https://patch.msgid.link/20260506034821.1710113-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-two-fixes-for-socket-migration-in-reqsk_timer_handler'

Kuniyuki Iwashima says:

====================
tcp: Two fixes for socket migration in reqsk_timer_handler().

The series fixes two bugs in the error path of socket migration
in reqsk_timer_handler().

Patch 1 fixes a potential UAF in reqsk_timer_handler().

Patch 2 fixes imbalanced icsk_accept_queue count.
====================

Link: https://patch.msgid.link/20260506035954.1563147-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: page_pool: support dumping pps of a specific ifindex via Netlink

NIPA tries to make sure that HW tests don't modify system state.
It saves the state of page pools, too. Now that I write this commit
message I realize that this is impractical since page pool IDs and
state will get legitimately changed by the tests. But I already
spent a couple of hours implementing the filtering, so..

Link: https://patch.msgid.link/20260506034821.1710113-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Fix imbalanced icsk_accept_queue count.

When TCP socket migration happens in reqsk_timer_handler(),
@sk_listener will be updated with the new listener.

When we call __inet_csk_reqsk_queue_drop(), the listener must
be the one stored in req->rsk_listener.

The cited commit accidentally replaced oreq->rsk_listener with
sk_listener, leading to imbalanced icsk_accept_queue count.

Let's pass the correct listener to __inet_csk_reqsk_queue_drop().

Fixes: e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506035954.1563147-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Fix potential UAF in reqsk_timer_handler().

When TCP socket migration fails at inet_ehash_insert() in
reqsk_timer_handler(), we jump to the no_ownership: label
and free the new reqsk immediately with __reqsk_free().

Thus, we must stop the new reqsk's timer before jumping to the
label, but the timer might be missed since the cited commit,
resulting in UAF.

As we are in the original reqsk's timer context, we can safely
call timer_delete_sync() for the new reqsk.

Let's pass false to __inet_csk_reqsk_queue_drop() to stop
the new reqsk's timer.

Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506035954.1563147-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

x86/cpuid: Introduce <asm/cpuid/leaf_types.h>

To centralize all CPUID access across the x86 subsystem, introduce
<asm/cpuid/leaf_types.h>. It is generated by the x86-cpuid-db project¹ and
provides C99 bitfield listings for all publicly known CPUID leaves.

¹ https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v3.0/CHANGELOG.rst

Suggested-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de

x86/cpuid: Rename cpuid_leaf()/cpuid_subleaf() APIs

A new CPUID model will be added where its APIs will be designated as the
official CPUID API. Free the cpuid_leaf() and cpuid_subleaf() function
names for that API. Rename them accordingly to cpuid_read() and
cpuid_read_subleaf().

For kernel/cpuid.c, rename its local file operations read function from
cpuid_read() to cpuid_read_f() so that it does not conflict with the new
API.

No functional change.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de

x86/cpu: Do not include the CPUID API header in asm/processor.h

asm/processor.h includes asm/cpuid/api.h but it does not need it.

Remove the include.

This allows the CPUID APIs header to include <asm/processor.h> at a later
step without introducing a circular dependency.

Note, all call sites which implicitly included the CPUID API through
<asm/processor.h> have been modified to explicitly include the CPUID APIs
instead.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20260327021645.555257-1-darwi@linutronix.de

drm/xe/multi_queue: Whitelist QUEUE_TIMESTAMP register

In a multi-queue use case, when a job is running on the secondary queue,
the CTX_TIMESTAMP does not reflect the queues run ticks. Instead, we use
the QUEUE TIMESTAMP to check how long the job ran. For user space to see
the run ticks for a secondary queue, whitelist the QUEUE_TIMESTAMP
register.

Compute PR: https://github.com/intel/compute-runtime/pull/923

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-24-umesh.nerlige.ramappa@intel.com

MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer

I wish to contribute to the review process for Cadence PCIe IP drivers,
hence add myself as a reviewer.

Signed-off-by: Aksh Garg <a-garg7@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508060951.840233-1-a-garg7@ti.com

MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1

Update my email address as my work email account is no longer in use.

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508023006.1787674-1-18255117159@163.com

MAINTAINERS: Update Marek Vasut email for PCIe R-Car

Use up to date address. No functional change.

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260428052030.51101-1-marek.vasut+renesas@mailbox.org

PCI: Initialize temporary device in new_id_store()

When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override().  That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").

The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.

Initialize the temporary pci_dev to fix this.

Repro:

  Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:

  # echo "8086 10f5" > /sys/bus/pci/drivers/e1000e/new_id

  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   register_lock_class+0x77e/0x790
   lock_acquire+0xbf/0x2e0
   pci_match_device+0x24/0x180
   new_id_store+0x189/0x1d0
   kernfs_fop_write_iter+0x14f/0x210
   vfs_write+0x263/0x5e0
   ksys_write+0x79/0xf0
   do_syscall_64+0x117/0xf80

Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure")
Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com

PCI: Update saved_config_space upon resource assignment

Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):

  ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
  ddbridge 0000:05:00.0: cannot read registers
  ddbridge 0000:05:00.0: fail

BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window.  The kernel corrects the BAR assignment:

  pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
  pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned

Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:

  https://git.kernel.org/tglx/history/c/a06a30144bbc

No other architecture performs such a late BAR correction.

Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit
4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.

Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction.  Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:

  vfio_pci_core_register_device()
    vfio_pci_set_power_state()
      pci_restore_state()

But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7.  Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.

Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well.  The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.

Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.

To robustly fix the issue, always update saved_config_space upon resource
assignment.

Reported-by: Bernd Schumacher <bernd@bschu.de>
Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/
Reported-by: Alexandre N. <an.tech@mailo.com>
Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/
Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Bernd Schumacher <bernd@bschu.de>
Tested-by: Alexandre N. <an.tech@mailo.com>
Cc: stable@vger.kernel.org # v6.12+
Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de

drm/xe/multi_queue: Use QUEUE_TIMESTAMP as job timestamp for multi-queue

Each queue in a multi queue group has a dedicated timestamp counter. Use
this QUEUE TIMESTAMP register to capture the start timestamp for the
job.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-23-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Add trace event for the multi queue timestamp

Add a trace event for multi queue timestamp capture.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-22-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Capture queue run times for active queues

If a queue is currently active on the CS, query the QUEUE TIMESTAMP
register to get an up to date value of the runtime. To do so, ensure
that the primary queue is active and then check if the secondary queue
is executing on the CS.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-21-umesh.nerlige.ramappa@intel.com

drm/xe/lrc: Refactor out engine id to hwe conversion

We need to define more helpers that read engine ID specific register, so
move that logic outside of get_ctx_timestamp().

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-20-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Add helpers to access CS QUEUE TIMESTAMP from lrc

In secondary queue LRCs, the QUEUE TIMESTAMP register is saved and
restored allowing us to view the individual queue run times. Add helpers
to read this value from the LRC.

BSpec: 73988

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-19-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Store primary LRC and position info in LRC

For an LRC belonging to the secondary queue, in order to check if its
context group is active, we need to check the LRC of the primary queue.
In addition to that we want to compare the secondary queue position to
CSMQDEBUG register to check if the queue itself is active.

To do so, store primary LRC and position information in the LRC.

A note on references involved:

- In general the Queue takes a ref on its LRC.
- In addition, for multi-queue,
a. Primary Queue takes a ref for each Secondary LRC.
b. Each Secondary Queue takes a ref to the Primary Queue

In the current patch, each LRC in the queue group is storing a pointer
to primary LRC. Both primary and secondary LRCs are freed only when
primary queue is destroyed. At this time, all secondary queues are
already destroyed, so there is no one using secondary LRCs. We should be
good without taking any additional references.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-18-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Refactor check for multi queue support for engine class

xe exec queue code is using a check to see if a class of engines support
multi queue. This check is also needed by other code, so move it to
xe_gt and export it for others.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-17-umesh.nerlige.ramappa@intel.com

drm/xe/lrc: Refactor xe_lrc_timestamp to simplify logic

Use a context_active() helper and simplify the timestamp logic.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-16-umesh.nerlige.ramappa@intel.com

drm/xe: Add timestamp_ms to LRC snapshot

Add a timestamp in milliseconds to the LRC snapshot to make it easier to
reason about how long the LRC has been running and the average duration
of each job.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-15-umesh.nerlige.ramappa@intel.com

drm/xe/lrc: Use 64 bit ctx timestamp in the LRC snapshot

Use the 64 bit value when available for the context timestamp in the LRC
snapshot.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-14-umesh.nerlige.ramappa@intel.com

bpf: Free reuseport cBPF prog after RCU grace period.

Eulgyu Kim reported the splat below with a repro. [0]

The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.

The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().

If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.

Let's defer freeing the reuseport cBPF prog after one RCU
grace period.

Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().

[0]:
BUG: KASAN: vmalloc-out-of-bounds in reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
Read of size 4 at addr ffffc9000051e004 by task slowme/10208
CPU: 6 UID: 1000 PID: 10208 Comm: slowme Not tainted 7.0.0-geb7ac95ff75e #32 PREEMPT(full)
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
udp4_lib_lookup2+0x3bc/0x950 net/ipv4/udp.c:495
__udp4_lib_lookup+0x768/0xe20 net/ipv4/udp.c:723
__udp4_lib_lookup_skb+0x297/0x390 net/ipv4/udp.c:752
__udp4_lib_rcv+0x1312/0x2620 net/ipv4/udp.c:2752
ip_protocol_deliver_rcu+0x282/0x440 net/ipv4/ip_input.c:207
ip_local_deliver_finish+0x3bb/0x6f0 net/ipv4/ip_input.c:241
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
__netif_receive_skb_one_core net/core/dev.c:6181 [inline]
__netif_receive_skb net/core/dev.c:6294 [inline]
process_backlog+0xaa4/0x1960 net/core/dev.c:6645
__napi_poll+0xae/0x340 net/core/dev.c:7709
napi_poll net/core/dev.c:7772 [inline]
net_rx_action+0x5d7/0xf50 net/core/dev.c:7929
handle_softirqs+0x22b/0x870 kernel/softirq.c:622
do_softirq+0x76/0xd0 kernel/softirq.c:523
</IRQ>
<TASK>
__local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:924 [inline]
__dev_queue_xmit+0x1dd7/0x3710 net/core/dev.c:4890
neigh_output include/net/neighbour.h:556 [inline]
ip_finish_output2+0xca9/0x1070 net/ipv4/ip_output.c:237
NF_HOOK_COND include/linux/netfilter.h:307 [inline]
ip_output+0x29f/0x450 net/ipv4/ip_output.c:438
ip_send_skb+0x45/0xc0 net/ipv4/ip_output.c:1508
udp_send_skb+0xb04/0x1510 net/ipv4/udp.c:1195
udp_sendmsg+0x1a71/0x2350 net/ipv4/udp.c:1485
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
__sys_sendto+0x554/0x680 net/socket.c:2206
__do_sys_sendto net/socket.c:2213 [inline]
__se_sys_sendto net/socket.c:2209 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2209
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x160/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x415a2d
Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6bc31e41e8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f6bc31e4cdc RCX: 0000000000415a2d
RDX: 0000000000000001 RSI: 00007f6bc31e421f RDI: 0000000000000003
RBP: 00007f6bc31e4240 R08: 00007f6bc31e4220 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000212 R12: 00007f6bc31e46c0
R13: ffffffffffffffb8 R14: 0000000000000000 R15: 00007ffc9b0d70b0
</TASK>

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Reported-by: Eulgyu Kim <eulgyukim@snu.ac.kr>
Reported-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20260426012647.3233119-1-kuniyu@google.com

drm/xe/eustall: Return ENODEV from read if EU stall registers get reset

If a reset (GT or engine) happens during EU stall data sampling, all the
EU stall registers can get reset to 0. This will result in EU stall data
buffers' read and write pointer register values to be out of sync with
the cached values. This will result in read() returning invalid data. To
prevent this, check the value of a EU stall base register. If it is zero,
it indicates a reset may have happened that wiped the register to zero.
If this happens, return ENODEV from read() upon which the user space
should disable and enable EU stall data sampling or close the fd and
open a new fd for a new EU stall data collection session.

This patch has been tested by running two IGT tests simultaneously
xe_eu_stall and xe_exec_reset.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Felix Degrood <felix.j.degrood@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/6935698d70b6c7c347ff8b138209d601030c2c9f.1776200094.git.harish.chegondi@intel.com

dma-debug: Ensure mappings are created and released with matching attributes

The DMA API expects that callers use the same attributes when mapping
and unmapping. Add tracking to verify this and catch mismatches.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-6-8dbac75cd501@nvidia.com

dma-debug: Feed DMA attribute for unmapping flows too

There are multiple unmapping flows which didn't provide DMA attributes,
which limited DMA debug code to compare the mapping and unmapping
attributes. Let's fix it.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-5-8dbac75cd501@nvidia.com

dma-debug: Record DMA attributes in debug entry

To enable reliable comparison of DMA attributes between map and
unmap operations, store the attribute value in dma_debug_entry.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-4-8dbac75cd501@nvidia.com

dma-debug: Remove unused DMA attribute parameter

debug_dma_alloc_pages() always receives a DMA attribute value of 0,
because dma_alloc_pages() never receives any attributes from its callers.
As preparation for upcoming patches, remove this unused attribute from
the debug routine.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-3-8dbac75cd501@nvidia.com

ntb: Use consistent DMA attributes when freeing DMA mappings

The counterpart of dma_alloc_attrs() is dma_free_attrs(), which must
receive the same DMA attributes used during allocation. The code
previously used dma_free_coherent(), which does not accept or apply any
DMA attributes.

Fixes: 061a785a114f ("ntb: Force physically contiguous allocation of rx ring buffers")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-2-8dbac75cd501@nvidia.com

ntb: Store original DMA address for future release

The DMA API requires that dma_free_attrs receive the exact dma_handle
originally returned by the allocation function. Do not modify it.

Fixes: fc5d1829f9bf ("NTB: transport: Try harder to alloc an aligned MW buffer")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-1-8dbac75cd501@nvidia.com

Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Fix for ublk not doing an actual issue from the task_work fallback
   path. Any request hitting that should be canceled automatically

- Fix for uring_cmd prep side handling, for the block side uring_cmd
   discard handling

- Fix for missing validation of the io and physical block size shifts

- Fix for a use-after-free in ublk's cancel command handling

* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  ublk: fix use-after-free in ublk_cancel_cmd()
  ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
  block: only read from sqe on initial invocation of blkdev_uring_cmd()
  ublk: don't issue uring_cmd from fallback task work

Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Ensure that the absolute timeouts for both the command side and the
   waiting side honor the callers time namespace

- Ensure tracked NAPI entries are cleared at unregistration time, as
   the NAPI polling loop checks the list state rather than the general
   NAPI state. This can lead to NAPI polling even after unregistration
   has been done. If unregistered, all NAPI polling should be disabled

- Fix for eventfd recursive invocation handling

* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
  io_uring/eventfd: reset deferred signal state
  io_uring/napi: clear tracked NAPI entries on unregister

drm/xe/multi_queue: Refactor CGP_SYNC send path

Factor the repeated CGP_SYNC action build-and-send sequence into a new
helper guc_exec_queue_send_cgp_sync(). Drop the redundant guc parameter
from __register_exec_queue_group() since it can be derived via
exec_queue_to_guc(q). Remove xe_guc_exec_queue_group_add() which is now
identical to the helper and replace its call site directly.

No functional change.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260508194428.61819-6-niranjana.vishwanathapura@intel.com

drm/xe/multi_queue: Remove redundant assignment in guc_exec_queue_run_job

The 'killed_or_banned_or_wedged = true' assignment is redundant
since the variable is never read after that point.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260508194428.61819-5-niranjana.vishwanathapura@intel.com

Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"

Some older systems don't support CPPC in the firmware and this just makes
noise for them when booting. Drop back to debug.

This reverts commit 21fb59ab4b9767085f4fe1edbdbe3177fbb9ec97.

Fixes: 21fb59ab4b976 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn")
Suggested-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260504230141.484743-2-mario.limonciello@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: PMIC: Replace mutex_lock/unlock() with guard()/scoped_guard()

Replace mutex_lock() and unlock() calls with the newer guard() and
scoped_guard() macros to make the code more straightforward.

Signed-off-by: Maxwell Doose <m32285159@gmail.com>
[ rjw: Subject tweak, changelog edits ]
Link: https://patch.msgid.link/20260430013507.46259-1-m32285159@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

drm/nouveau/gsp: Use kzalloc_flex() for r535 display funcs

struct nvkm_disp_func ends with the user flexible array member. Allocate
the r535 display function table with kzalloc_flex() instead of open-coding
the size calculation with sizeof().

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[dropped nothing-burger sentence from commit message]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260508052056.1744665-1-rosenp@gmail.com

nouveau/vmm: use kzalloc_flex

Use the proper macro do to these sizeof calculations.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[fixed style warning from checkpatch]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260312195529.13002-1-rosenp@gmail.com

platform/x86/intel/pmc: Add Nova Lake support to intel_pmc_core driver

Add Nova Lake support in intel_pmc_core driver

Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://patch.msgid.link/20260505043342.2573556-8-xi.pardee@linux.intel.com
[ij: added missing statics to a few structs]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86/intel/pmc: Retrieve PMC info only for available PMCs

Update the Intel PMC Core driver to fetch PMC information only for
available PMCs. Previously, the driver attempted to retrieve PMC info
even when the corresponding PMC was not present.

This change aligns with recent updates to the Intel SSRAM Telemetry
driver. Starting with NVL, the SSRAM Telemetry driver is probed for
each individual SSRAM device. The prior implementation could not
differentiate between an unavailable PMC and one that had not yet
completed information retrieval. To resolve this, the PMC Core driver
now skips obtaining PMC info for unavailable PMCs.

Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://patch.msgid.link/20260505043342.2573556-7-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86/intel/pmc: Add support for variable DMU offsets

Add support for handling different DMU Die C6 offsets across platforms.
The previous implementation assumed a uniform DMU Die C6 offset for all
platforms, which is no longer valid after an upcoming change.

Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://patch.msgid.link/20260505043342.2573556-6-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86/intel/pmc: Use PCI DID for PMC SSRAM device discovery

Update the PMC SSRAM discovery process to identify the device using its
PCI Device ID rather than relying on a fixed PCI bus location. The
enumeration of integrated devices on the PCI bus is no longer guaranteed
to be consistent across CPUs.

On earlier platforms, the IOE and PCH SSRAM devices were hidden from the
BIOS, and the SOC SSRAM device is associated to telemetry regions from all
available SSRAM devices. Starting with Nova Lake, the IOE and PCH SSRAM
devices register their telemetry regions independently, meaning each
telemetry region is now linked to its corresponding SSRAM device. A new
ssram_hidden attribute has been added to the pmc_dev_info structure to
reflect this distinction.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://patch.msgid.link/20260505043342.2573556-5-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86/intel/pmc: Enable Pkgc blocking residency counter

Enable the Package C-state blocking counter in the PMT telemetry
region. This counter reports the number of 10 µs intervals during
which a Package C-state 10.2/3 entry was blocked for the specified
reasons.

Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://patch.msgid.link/20260505043342.2573556-4-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86/intel/pmc: Enable PkgC LTR blocking counter

Enable the Package C-state LTR blocking counter in the PMT telemetry
region. This counter records how many times any Package C-state entry
is blocked for the specified reasons.

Add pmc_core_pkgc_counters_show() as a common helper to display
package C-state blocking counters from the telemetry region.

Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://patch.msgid.link/20260505043342.2573556-3-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86/intel/pmc: Use __free() in pmc_core_punit_pmt_init()

Use scope-based cleanup in pmc_core_punit_pmt_init() instead of manually
freeing. This simplifies the code flow by removing the explicit put call,
making it less error-prone.

Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://patch.msgid.link/20260505043342.2573556-2-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Merge branch 'bpf-tcp-fix-type-confusion-in-bpf-helper-functions'

Kuniyuki Iwashima says:

====================
bpf: tcp: Fix type confusion in bpf helper functions.

bpf_tcp_sock() only check if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

The same issues exist in other bpf functions:

  * bpf_mptcp_sock_from_subflow()
  * bpf_skc_to_tcp_sock()
  * bpf_skc_to_tcp6_sock()
  * sol_tcp_sockopt()

Patch 1 fixes bpf_tcp_sock() and Patch 2 adds a test for it.
Patch 3 ~ 6 fix the rest of the functions above.

Changes:
  v2:
    * Inverse if (err) to if (!err) in the selftest
    * Add patch 3 ~ 6

  v1: https://lore.kernel.org/bpf/20260430184405.1227386-1-kuniyu@google.com/
      https://lore.kernel.org/mptcp/20260430-mptcp-bpf-mptcp-sock-type-v1-1-d2ed5cda7da9@kernel.org/
====================

Link: https://patch.msgid.link/20260504210610.180150-1-kuniyu@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpf: tcp: Fix type confusion in sol_tcp_sockopt().

sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.

Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().

bpf_skc_to_tcp6_sock() only checks if sk->sk_protocol is IPPROTO_TCP
and sk->sk_family is AF_INET6, but RAW socket can bypass it:

socket(AF_INET6, SOCK_RAW, IPPROTO_TCP)

Let's check sk->sk_type too.

Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-6-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().

bpf_skc_to_tcp_sock() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Fixes: 478cfbdf5f13 ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-5-kuniyu@google.com

mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()

bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.

Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260504210610.180150-4-kuniyu@google.com

selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.

Let's extend sockopt_sk.c to cover bpf_tcp_sock() for the
wrong socket type.

Before:
  # ./test_progs -t sockopt_sk
  [  151.948613] ==================================================================
  [  151.951376] BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt+0xc7/0x8e0
  [  151.954159] Read of size 8 at addr ffff88801083d760 by task test_progs/1259
  ...
  run_test:FAIL:getsetsockopt unexpected error: -1 (errno 0)
  #427     sockopt_sk:FAIL

After:
  #427     sockopt_sk:OK

While at it, missing free() is fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-3-kuniyu@google.com

crypto/ccp: Skip SNP_INIT if preparation fails

If snp_prepare() fails, SNP_INIT will fail, so skip it and return early.

Note that this is not a change in initialization behavior: if SNP_INIT fails
even before this change, it will still return an error and
__sev_snp_init_locked() will fail initialization of other SEV modes.

Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/20260429155636.540040-1-tycho@kernel.org

x86/sev: Do not initialize SNP if missing CPUs

The SEV firmware checks that the SNP enable bit is set on each CPU during SNP
initialization, and will fail if not. If there are some CPUs offline, they
will not run the setup functions, so SNP initialization will always fail.

Skip the IPIs in this case and return an error so that the CCP driver can
skip the SNP_INIT that will fail. Also print the CPU masks in order to leave
breadcrumbs so people can figure out what happened.

[ bp: Massage commit message. ]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://20260429155636.540040-1-tycho@kernel.org

drm/nouveau/kms/nvd9-: Remove unused header in crc.c

No functional changes.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260501215703.820656-2-lyude@redhat.com
Reviewed-by: Dave Airlie <airlied@redhat.com>

nilfs2: fix backing_dev_info reference leak

setup_bdev_super() already initializes sb->s_bdev and takes a
reference on the block device backing_dev_info when assigning sb->s_bdi.

nilfs_fill_super() takes another reference to the same
backing_dev_info and stores it in sb->s_bdi again. The extra
reference is not paired with a matching bdi_put(), since
generic_shutdown_super() releases sb->s_bdi only once.

Drop the redundant bdi_get() in nilfs_fill_super(). The single
reference taken by setup_bdev_super() is enough and is released
during superblock shutdown.

Fixes: c1e012ea9e83 ("nilfs2: use setup_bdev_super to de-duplicate the mount code")
Signed-off-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>

workqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path

For WQ_UNBOUND workqueues, alloc_and_link_pwqs() allocates wq->cpu_pwq
via alloc_percpu() and then calls apply_workqueue_attrs_locked(). On
failure it returns the error directly, bypassing the enomem: label
which holds the only free_percpu(wq->cpu_pwq) in this function.

The caller's error path kfree()s wq without touching wq->cpu_pwq,
leaking one percpu pointer table (nr_cpu_ids * sizeof(void *) bytes) per
failed call.

If kmemleak is enabled, we can see:

  unreferenced object (percpu) 0xc0fffa5b121048 (size 8):
    comm "insmod", pid 776, jiffies 4294682844
    backtrace (crc 0):
      pcpu_alloc_noprof+0x665/0xac0
      __alloc_workqueue+0x33f/0xa20
      alloc_workqueue_noprof+0x60/0x100

Route the error through the existing enomem: cleanup and any error
before this one.

Cc: stable@kernel.org
Fixes: 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: Release PENDING in __queue_work() drain/destroy reject path

The caller of __queue_work() owns WORK_STRUCT_PENDING, won via
test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The
state machine documented above __queue_work() requires that owner
to either hand the token to a pwq (insert_work() -> set_work_pwq()),
hand it to a timer, or release it via set_work_pool_and_clear_pending().
try_to_grab_pending() relies on this: when it observes
"PENDING && off-queue" it busy-loops, trusting the current owner to
make progress.

The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that
contract. It WARN_ONCE()s and bare-returns, leaving work->data with
PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty.

The path is reachable without explicit API abuse: queue_delayed_work()
arms a timer with PENDING set; if drain_workqueue() runs while the
timer is still pending, delayed_work_timer_fn() -> __queue_work() in
softirq context hits the WARN, current is not a wq worker so
is_chained_work() is false, and the work is silently dropped with
PENDING leaked.

Mirror what clear_pending_if_disabled() already does on its analogous
reject path: unpack the off-queue data and call
set_work_pool_and_clear_pending() to release the token before
returning.

I was able to reproduce this by queueing several slow works on
a max_active=1 wq, arm a delayed_work whose timer fires while
drain_workqueue() is blocked, then call cancel_delayed_work_sync().
Without this patch the cancel livelocks at 100% CPU; with it the cancel
returns immediately.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix for two ACL issues (security fix to validate dacloffset better
   and chmod fix)

- Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
   symlinks)

- Two Kerberos fixes including an important one when AES-256 encryption
   chosen

- Fix open_cached_dir problem when directory leases disabled

* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: validate dacloffset before building DACL pointers
  smb/client: fix out-of-bounds read in smb2_compound_op()
  smb/client: fix out-of-bounds read in symlink_data()
  smb: client: Zero-pad short GSS session keys per MS-SMB2
  smb: client: Use FullSessionKey for AES-256 encryption key derivation
  smb: client: use kzalloc to zero-initialize security descriptor buffer
  cifs: abort open_cached_dir if we don't request leases

platform/x86: x86-android-tablets: enable fwnode matching of GPIO chips

In order to allow GPIOLIB to match cherryview and baytrail GPIO
controllers by their firmware nodes instead of their names, we need to
attach the - currently "dangling" - existing software nodes to their
target devices dynamically.

The driver uses platform_create_bundle() and expects all required
providers to be present before it itself is probed. We know the name of
the device we're waiting for so look them up and assign the appropriate
software node as the secondary firmware node of the underlying ACPI node.

Scheduling fine-grained devres actions allows for proper teardown and
unsetting of the secondary firmware nodes.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260504-baytrail-real-swnode-v5-2-c7878b69e383@oss.qualcomm.com
[ij: added linux/bug.h include]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

ACPI: provide acpi_bus_find_device_by_name()

Provide a helper allowing to locate an ACPI device by its name.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260504-baytrail-real-swnode-v5-1-c7878b69e383@oss.qualcomm.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"There's two main series here, fixing issues that came up in the
  Microchip QSPI and Freescale i.MX drivers. Both of those could result
  in some quite noticable issues if they were encountered in production.
  We also have one minor documentation fix in the ch341 driver"

* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: ch341: correct company name in MODULE_DESCRIPTION
  spi: microchip-core-qspi: remove some inline markings
  spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
  spi: microchip-core-qspi: control built-in cs manually
  spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
  spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
  spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()

Merge tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"A straightforward fix for an incorrect description of one of the
regulators on the Qualcomm PMH0101"

* tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix index for pmh0101 ldo16

bpf: tcp: Fix type confusion in bpf_tcp_sock().

bpf_tcp_sock() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Calling bpf_setsockopt() in SOCKOPT prog triggers out-of-bounds
access to another slab object. [0]

Let's use sk_is_tcp().

[0]:
BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt (net/core/filter.c:5519)
Read of size 8 at addr ffff88801083d760 by task test_progs/1259

CPU: 1 UID: 0 PID: 1259 Comm: test_progs Tainted: G OE 7.0.0-11175-gb5c111f4967b #1 PREEMPT(full)
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
kasan_report (mm/kasan/report.c:595)
sol_tcp_sockopt (net/core/filter.c:5519)
__bpf_getsockopt (net/core/filter.c:5633)
bpf_sk_getsockopt (net/core/filter.c:5654)
bpf_prog_629ba00a1601e9f2__setsockopt+0x86/0x22c
__cgroup_bpf_run_filter_setsockopt (./include/linux/bpf.h:1402 ./include/linux/filter.h:722 ./include/linux/filter.h:729 kernel/bpf/cgroup.c:81 kernel/bpf/cgroup.c:2026)
do_sock_setsockopt (net/socket.c:2363)
__x64_sys_setsockopt (net/socket.c:2406)
do_syscall_64 (arch/x86/entry/syscall_64.c:63)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
RIP: 0033:0x7f85f82fe7de
Code: 55 48 63 c9 48 63 ff 45 89 c9 48 89 e5 48 83 ec 08 6a 2c e8 34 69 f7 ff c9 c3 66 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 e1
RSP: 002b:00007ffe59dcecd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85f82fe7de
RDX: 000000000000001c RSI: 0000000000000006 RDI: 000000000000000d
RBP: 00007ffe59dcef20 R08: 000000000000003c R09: 0000000000000000
R10: 00007ffe59dcef00 R11: 0000000000000202 R12: 00007ffe59dcf268
R13: 0000000000000003 R14: 00007f85f9da5000 R15: 000055b2f3201400
</TASK>

The buggy address belongs to the object at ffff88801083d280
which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
allocated 1792-byte region [ffff88801083d280, ffff88801083d980)

Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com

drm/xe: Make decision to use Xe2-style blitter instructions a feature flag

The blitter engines' MEM_COPY and MEM_SET instructions were added as
part of the same hardware change that introduced service copy engines
(i.e., BCS1-BCS8) which is why the driver checks for service copy engine
presence when deciding whether to use these instructions or the older
XY_* instructions.  However when making this decision the driver should
consider which engines are part of the hardware architecture, not which
engines are present/usable on the current device.  For graphics IP
versions that architecturally include service copy engines (i.e.,
everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and
MEM_COPY even in if all of the service copy engines wind up getting
fused off.  I.e., we need to decide based on whether the platform's
graphics descriptor contains these engines, rather than whether the
usable engine mask contains them.  This logic got broken when
gt->info.__engine_mask was removed, although in practice that mistake
has been harmless so far because there haven't been any hardware
SKUs that fuse off all of the service copy engines yet.

Replace the incorrect has_service_copy_support() function with a GT
feature flag that tracks more accurately whether the new blitter
instructions are usable.  In addition to fixing incorrect logic if all
service copies are fused off, the flag also makes it more obvious what
the calling code is trying to do; previously it wasn't terribly obvious
why "has service copy engines" was being used as the condition for using
different instructions on all copy engine types.

The new feature flag is named 'has_xe2_blt_instructions' because we
expect this flag to be set for all Xe2 and later platforms (i.e.,
everything officially supported by the Xe driver).  Technically there's
also one Xe1-era platform (PVC) that supports these engines/instructions
and will set this flag, but this still seems to be the most clear and
understandable name for the flag.

Fixes: 61549a2ee594 ("drm/xe: Drop __engine_mask")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

Documentation: core-api/cpu_hotplug: Remove stale cpu0_hotplug docs

Commit e59e74dc48a3 ("x86/topology: Remove CPU0 hotplug option")
removed the 'cpu0_hotplug' option, but its documentation remained in
cpu_hotplug.rst. Remove the stale entry.

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260507134732.254617-1-chao.gao@intel.com

selftests/cgroup: Fix incorrect variable check in online_cpus()

"OFFLINE_CPUS" is a literal string that is always non-empty. It should
be "$OFFLINE_CPUS" to check the variable's value instead.

Signed-off-by: Hongfu Li <lihongfu@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched_ext: Use offsetofend on both sides of the ops_cid layout assert

sizeof() includes trailing struct pad, offsetofend() doesn't. On
32-bit PPC, sched_ext_ops_cid tail-pads 4 bytes past @priv and the
assert trips. Use offsetofend() on both sides.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605081637.DbH4SZ1E-lkp@intel.com/
Fixes: 7e655ed7b953 ("sched_ext: Add bpf_sched_ext_ops_cid struct_ops type")
Signed-off-by: Tejun Heo <tj@kernel.org>

platform/x86: pcengines-apuv2: reduce indiraction in swnode assignment

struct platform_device_info now has the 'swnode' field allowing passing
software nodes directly to platform_device_register_full() without going
through software_node_fwnode(). It supports both registered and
unregistered software nodes. Use it to drop one layer of indirection.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://patch.msgid.link/20260429-pcengines-apuv2-swnodes-v1-1-cd1a6fad856d@oss.qualcomm.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86: barco-p50-gpio: attach software node to its target GPIO device

The software node representing the GPIO controller to consumers is
"dangling": it's not really attached to the device. The GPIO lookup
relies on matching the name of the node to the chip's label. Switch to
using platform_device_register_full() and use the swnode field of struct
platform_device_info to attach the software node to the GPIO device.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260427102459.110332-1-bartosz.golaszewski@oss.qualcomm.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

selftests/sched_ext: Fix select_cpu_dfl link leak on early return

If run() exits early via SCX_EQ/SCX_ASSERT (which calls return
directly), bpf_link__destroy() is never reached and the BPF
scheduler stays loaded. All subsequent tests then fail to attach
because SCX is not in the DISABLED state.

Move bpf_link into a context struct so cleanup() always destroys
it, regardless of how run() exits. Also skip waitpid() for children
where fork() returned -1, avoiding waitpid(-1,...) accidentally
reaping an unrelated child and triggering the early return path.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
  usual but then a large amount of fixes all over.

  core:
   - fix race condition in handle change ioctl

  fb-helper:
   - fix clipping

  rust:
   - fix unsound initialization
   - fix GEM state cleanup
   - fix wrong ARef import

  ttm:
   - update GPU MM stats on pool shrinking

  i915:
   - Re-enable ccs modifiers on dg2

  nova:
   - fix mailing list

  xe:
   - Add NULL check for media_gt in intel_hdcp_gsc_check_status
   - Fix EAGAIN sign in pf_migration_consume
   - Fix MMIO access using PF view instead of VF view during migration
   - Exclude indirect ring state page from ADS engine state size

  amdgpu:
   - GFX9 fixes
   - Hawaii SMU fixes
   - SDMA4 fix
   - GART fix
   - Userq fixes

  amdkfd:
   - GPUVM TLB flush fix
   - Hotplug fix

  radeon:
   - Hawaii SMU fixes

  bochs:
   - fix managed cleanup

  bridge:
   - tda998x: fix sparse warnings on type correctness

  etnaviv:
   - schedule armed jobs

  exynos:
   - managed bridge cleanup

  ivpu:
   - disallow reexport of GEM buffer objects

  noveau:
   - revert support for GA100

  panel:
   - boe-tv101wum-nl16: use correct MIPI_DSI mode
   - feyjang-fy07024di26a30d: fix error reporting
   - himax-hx83102: use correct MIPI_DSI mode
   - himax-hx83121a: fix error checks
   - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER

  qaic:
   - fix RAS message handling

  qxl:
   - clean up polling

  sti:
   - managed bridge cleanup

* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
  drm: Set old handle to NULL before prime swap in change_handle
  drm/bochs: Drop manual put on probe error path
  drm/xe/guc: Exclude indirect ring state page from ADS engine state size
  drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
  drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
  drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
  drm/exynos: remove bridge when component_add fails
  drm/amdgpu: nuke amdgpu_userq_fence_slab v2
  drm/amdgpu/userq: fix access to stale wptr mapping
  drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
  drm/amdgpu: zero-initialize GART table on allocation
  drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
  drm/radeon: add missing revision check for CI
  drm/amdgpu/pm: align Hawaii mclk workaround with radeon
  drm/amdgpu/pm: add missing revision check for CI
  drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
  drm/amdkfd: Make all TLB-flushes heavy-weight
  drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
  drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
  drm/panel: feiyang-fy07024di26a30d: return display-on error
  ...

drm/ttm: Fix ttm_bo_swapout() infinite LRU walk on swapout failure

When ttm_tt_swapout() fails, the current code calls
ttm_resource_add_bulk_move() followed by ttm_resource_move_to_lru_tail()
to restore the resource's bulk_move membership.

However, ttm_resource_move_to_lru_tail() places the resource at the tail
of the LRU list which, relative to the walk cursor's hitch node (placed
immediately after the resource when it was yielded), puts the resource
*in front of the* the hitch. The next list_for_each_entry_continue() from
the hitch finds the same resource again, causing an infinite loop.

Fix by deferring del_bulk_move to the success path only.

On the success path, TTM_TT_FLAG_SWAPPED has just been set by
ttm_tt_swapout() but the resource is still tracked in the bulk_move range,
so ttm_resource_del_bulk_move()'s !ttm_resource_unevictable() guard would
incorrectly skip the removal. Introduce
ttm_resource_del_bulk_move_unevictable() which bypasses that guard.

Reported-by: Jatin Kataria <jkataria@netflix.com>
Fixes: fc5d96670eb2 ("drm/ttm: Move swapped objects off the manager's LRU list")
Cc: Christian König <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <stable@vger.kernel.org> # v6.13+
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260428094442.16985-1-thomas.hellstrom@linux.intel.com

Merge tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB serial device ids for 7.1-rc3

Here are some new modem device ids.

This one has been in linux-next with no reported issues.

* tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit Cinterion LE910Cx compositions

Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:
"Core:
   - Cache-flushing fix for non-x86 platforms

  AMD-Vi:
   - Security fix when SEV-SNP is enabled
   - Operator precedence fix in DTE setting"

* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/amd: Fix precedence order in set_dte_passthrough()
  iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
  iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
  iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19

sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work

For built with PREEMPT_RT kernels, the scx_disable_irq_workfn() is
called from per-cpu irq_work kthreads context, this means that
when call the scx_dump_state() in the scx_disable_irq_workfn() to
output current->comm/pid, it always output current irq_work kthread's
comm/pid. this commit therefore use the IRQ_WORK_INIT_HARD() to
initialize sch->disable_irq_work to make scx_disable_irq_workfn() is
called from hardirq context.

Fixes: f4a6c506d118 ("sched_ext: Always bounce scx_disable() through irq_work")
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>

arm64/entry: Fix arm64-specific rseq brokenness

Mathias Stearn reports that since v6.19, there are two big issues
affecting rseq:

(1) On arm64 specifically, rseq critical sections aren't aborted when
    they should be.

(2) The 'cpu_id_start' field is no longer written by the kernel in all
    cases it used to be, including some cases where TCMalloc depends on
    the kernel clobbering the field.

This patch fixes issue #1. This patch DOES NOT fix issue #2, which will
need to be addressed by other patches.

The arm64-specific brokenness is a result of commits:

  2fc0e4b4126c ("rseq: Record interrupt from user space")
  39a167560a61 ("rseq: Optimize event setting")

The first commit failed to add a call to rseq_note_user_irq_entry() on
arm64. Thus arm64 never sets rseq_event::user_irq to record that it may
be necessary to abort an active rseq critical section upon return to
userspace. On its own, this commit had no functional impact as the value
of rseq_event::user_irq was not consumed.

The second commit relied upon rseq_event::user_irq to determine whether
or not to bother to perform rseq work when returning to userspace. As
rseq_event::user_irq wasn't set on arm64, this work would be skipped,
and consequently an active rseq critical section would not be aborted.

Fix this by giving arm64 syscall-specific entry/exit paths, and
performing the relevant logic in syscall and non-syscall paths,
including calling rseq_note_user_irq_entry() for non-syscall entry.

Currently arm64 cannot use syscall_enter_from_user_mode(),
syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to
ordering constraints with exception masking, and risk of ABI breakage
for syscall tracing/audit/etc. For the moment the entry/exit logic is
left as arm64-specific, directly using enter_from_user_mode() and
exit_to_user_mode(), but mirroring the generic code.

I intend to follow up with refactoring/cleanup, as we did for kernel
mode entry paths in commit:

  041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()")

... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly.

Fixes: 39a167560a61 ("rseq: Optimize event setting")
Reported-by: Mathias Stearn <mathias@mongodb.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/
Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com

x86/kexec: Push kjump return address even for non-kjump kexec

The version of purgatory code shipped by kexec-tools attempts to look above
the top of its stack to find a return address for a kjump, even in a non-kjump
kexec.

After the commit in Fixes: the word above the stack might not be there,
leading to a fault (which is at least now caught by my exception-handling code
in kexec).

That commit fixed things for the actual kjump path, but no longer
"gratuitously" pushes the unused return address to the stack in the non-kjump
path. Put that *back* in the non-kjump path, to prevent purgatory from
crashing when trying to access it.

Fixes: 2cacf7f23a02 ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context")
Reported-by: Rohan Kakulawaram <rohanka@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Rohan Kakulawaram <rohanka@google.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/32d627134143ffd957891cb697138e839c623211.camel@infradead.org

ntfs: avoid leaking uninitialised bytes in new security descriptors

ntfs_sd_add_everyone() builds the on-disk security descriptor for a
newly created file by kmalloc()'ing a buffer and then partially
filling it in:

sd = kmalloc(sd_len, GFP_NOFS);
...
sd->revision = 1;
sd->control = SE_DACL_PRESENT | SE_SELF_RELATIVE;
...

The buffer is then handed to ntfs_attr_add() and persisted as the
SECURITY_DESCRIPTOR attribute of the new MFT record.  The descriptor
covers a relative security descriptor header, two SIDs (owner and
group), an ACL header, and a single ACE, but several fields inside
those structures are never written before the buffer is committed
to disk:

  - struct security_descriptor_relative
        @alignment (1 byte)
        @sacl (4 bytes; SE_SACL_PRESENT is not set
                                 but the offset still reaches disk)

  - struct ntfs_sid (3 instances: owner, group, ACE.sid)
        identifier_authority.value[0..4] (5 bytes per SID, 15 total
                                          - only value[5] is set)

  - struct ntfs_acl
        @alignment1 (1 byte)
        @alignment2 (2 bytes)

That is 23 bytes of uninitialised slab memory persisted to disk for
every new file or directory the legacy ntfs driver creates.  The
"+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0],
which the existing code does explicitly write to zero, so it is
not part of the leak.

Anything later able to read the SECURITY_DESCRIPTOR attribute - the
same NTFS volume mounted on Windows or by another NTFS reader, an
offline forensics tool, an unprivileged user that ends up with read
access to the volume - can recover those bytes.  The leak persists
for the lifetime of the file on disk, not just the lifetime of the
kernel that wrote it.

Switch the allocation to kzalloc() so every byte the on-disk
descriptor covers is zero before the explicit initialisations run.
While there, replace the bare "return -1" allocation-failure path
with a proper -ENOMEM so the error reaches userspace as a meaningful
errno instead of an unrelated -EPERM.

Found by inspection while auditing fs/ntfs new-inode paths.

Fixes: af0db57d4293 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix out-of-bounds write in ntfs_index_walk_down()

ntfs_index_walk_down() used to update the index traversal depth
directly before writing parent_pos[] and parent_vcn[]. A malformed
directory index with too many child-node levels can therefore advance
pindex past MAX_PARENT_VCN and write past the fixed arrays in struct
ntfs_index_context, corrupting context state used by later index
traversal.

Use ntfs_icx_parent_inc() for walk-down transitions so the existing
depth limit is enforced before the arrays are updated. Make the helper
check the limit before incrementing pindex so failed callers do not
leave the context at an out-of-range depth.

This is reachable by iterating a crafted NTFS directory after the volume
has been mounted, including read-only mounts. The reproducer uses
getdents64() on an index root that points to an excessively deep chain
of child index blocks.

A crafted directory index with a chain of child-node entries reproduced
UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and
subsequent KASAN reports in ntfs_index_walk_up(). With this change, the
same image is rejected with "Index is over 32 level deep" and no KASAN
or UBSAN report is emitted.

Fixes: 0a8ac0c1fa0b ("ntfs: update directory operations")
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix out-of-bounds write in ntfs_rl_collapse_range() merge path

ntfs_rl_collapse_range() merges the run on the left of the collapsed
region with the run on its right when they are contiguous. The contiguous
check chooses a clamped index when @new_1st_cnt is 0:

i = new_1st_cnt == 0 ? 1 : new_1st_cnt;
if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) {

but the merge itself uses the unclamped value:

s_rl = &new_rl[new_1st_cnt - 1];
s_rl->length += s_rl[1].length;

When @new_1st_cnt is 0 this computes &new_rl[-1] and writes 8 bytes
before the kvcalloc() runlist buffer. The path is reachable through
fallocate(FALLOC_FL_COLLAPSE_RANGE) starting at vcn 0 against an
attribute whose first run after the collapsed region and the following
run are holes. In that case ntfs_rle_lcn_contiguous() returns true
because both checked entries are LCN_HOLE, so the merge path is entered
with @new_1st_cnt still 0. Such consecutive holes do not occur on a
well-formed runlist (NTFS keeps runlists coalesced in memory), so this
OOB path is only reachable from a crafted volume.

A normal runlist has no element to the left of vcn 0, so the left/right
merge is not valid when @new_1st_cnt is 0. Require @new_1st_cnt to be
positive before checking or performing the merge. This skips the merge
entirely in that case instead of clamping the merge target.

The out-of-bounds write can corrupt an adjacent slab object. On a
non-KASAN kernel, it is reachable after a crafted NTFS volume has been
mounted read-write with the legacy fs/ntfs driver, by a local user that
has write access to the crafted file.

Fixes: 11ccc9107dc4 ("ntfs: update runlist handling and cluster allocator")
Suggested-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>