git.ipfire.org Git - thirdparty/linux.git/log

xfrm: add documentation for XFRM_MSG_MIGRATE_STATE

Add documentation for the new XFRM_MSG_MIGRATE_STATE netlink message,
which migrates a single SA identified by SPI and mark without involving
policies.

The document covers the motivation and design differences from the
existing XFRM_MSG_MIGRATE, the SA lookup mechanism, supported attributes
with their omit-to-inherit semantics, and usage examples.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE

Only accept XFRMA used in this method, reject the rest.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration

Add a new netlink method to migrate a single xfrm_state.
Unlike the existing migration mechanism (SA + policy), this
supports migrating only the SA and allows changing the reqid.

The SA is looked up via xfrm_usersa_id, which uniquely
identifies it, so old_saddr is not needed. old_daddr is carried in
xfrm_usersa_id.daddr.

The reqid is invariant in the old migration.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: make xfrm_dev_state_add xuo parameter const

The xuo pointer is not modified by xfrm_dev_state_add(); make it const.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: extract address family and selector validation helpers

Extract verify_xfrm_family() and verify_selector_prefixlen() from
verify_newsa_info() to allow reuse by other netlink handlers.

verify_xfrm_family() validates that a given address family is AF_INET
or AF_INET6 (with CONFIG_IPV6 guard).

verify_selector_prefixlen() validates that the selector prefix lengths
are within the bounds for the given address family.

No functional change.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper

Extract verify_mtimer_thresh() to consolidate the XFRMA_MTIMER_THRESH
validation logic shared between the add_sa and upcoming patch.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: move encap and xuo into struct xfrm_migrate

In preparation for an upcoming patch, move the xfrm_encap_tmpl and
xfrm_user_offload pointers from separate parameters into struct
xfrm_migrate, reducing the parameter count of
xfrm_state_migrate_create(), xfrm_state_migrate_install()
and xfrm_state_migrate()

The fields are placed after the four xfrm_address_t members where
the struct is naturally 8-byte aligned, avoiding padding.

No functional change.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: add error messages to state migration

Add descriptive(extack) error messages for all error paths
in state migration. This improves diagnostics by
providing clear feedback when migration fails.

After xfrm_init_state() use NL_SET_ERR_MSG_WEAK() as fallback for
error paths not yet propagating extack e.g. mode_cbs->init_state()

No functional change.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: add state synchronization after migration

Add xfrm_migrate_sync() to copy curlft and replay state from the old SA
to the new one before installation. The function allocates no memory, so
it can be called under a spinlock. In preparation for a subsequent patch
in this series.

A subsequent patch calls this under x->lock, atomically capturing the
latest lifetime counters and replay state from the original SA and
deleting it in the same critical section to prevent SN/IV reuse
for XFRM_MSG_MIGRATE_STATE method.

No functional change.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: check family before comparing addresses in migrate

When migrating between different address families, xfrm_addr_equal()
cannot meaningfully compare addresses, different lengths.
Only call xfrm_addr_equal() when families match, and take
the xfrm_state_insert() path when addresses are equal.

Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: split xfrm_state_migrate into create and install functions

To prepare for subsequent patches, split
xfrm_state_migrate() into two functions:
- xfrm_state_migrate_create(): creates the migrated state
- xfrm_state_migrate_install(): installs it into the state table

splitting will help to avoid SN/IV reuse when migrating AEAD SA.

And add const whenever possible.
No functional change.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: rename reqid in xfrm_migrate

In preparation for a later patch in this series s/reqid/old_reqid/.
No functional change.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: fix NAT-related field inheritance in SA migration

During SA migration via xfrm_state_clone_and_setup(),
nat_keepalive_interval was silently dropped and never copied to the new
SA. mapping_maxage was unconditionally copied even when migrating to a
non-encapsulated SA.

Both fields are only meaningful when UDP encapsulation (NAT-T) is in
use. Move mapping_maxage and add nat_keepalive_interval inside the
existing if (encap) block, so both are inherited when migrating with
encapsulation and correctly absent when migrating without it.

Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: allow migration from UDP encapsulated to non-encapsulated ESP

The current code prevents migrating an SA from UDP encapsulation to
plain ESP. This is needed when moving from a NATed path to a non-NATed
one, for example when switching from IPv4+NAT to IPv6.

Only copy the existing encapsulation during migration if the encap
attribute is explicitly provided.

Note: PF_KEY's SADB_X_MIGRATE always passes encap=NULL and never
supported encapsulation in migration. PF_KEY is deprecated and was
in feature freeze when UDP encapsulation was added to xfrm.

Tested-by: Yan Yan <evitayan@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: add extack to xfrm_init_state

Add a struct extack parameter to xfrm_init_state() and pass it
through to __xfrm_init_state(). This allows validation errors detected
during state initialization to propagate meaningful error messages back
to userspace.

xfrm_state_migrate() now passes extack so that errors from the
XFRM_MSG_MIGRATE_STATE path are properly reported. Callers without an
extack context (af_key, ipcomp4, ipcomp6) pass NULL, preserving their
existing behaviour.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: remove redundant assignments

These assignments are overwritten within the same function further down

commit e8961c50ee9cc ("xfrm: Refactor migration setup
during the cloning process")
x->props.family = m->new_family;

Which actually moved it in the
commit e03c3bba351f9 ("xfrm: Fix xfrm migrate issues when address family changes")

And the initial
commit 80c9abaabf428 ("[XFRM]: Extension for dynamic update of endpoint address(es)")

added x->props.saddr = orig->props.saddr; and
memcpy(&xc->props.saddr, &m->new_saddr, sizeof(xc->props.saddr));

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

ASoC: amd: yc: Add MSI Raider A18 HX A9WJG to quirk table

The MSI Raider A18 HX A9WJG has an internal digital microphone
connected through AMD ACP6x, but this machine does not expose the
AcpDmicConnected ACPI property, so acp_yc_mach does not bind.

Add a DMI quirk for this model.

This was tested on an MSI Raider A18 HX A9WJG with board MS-182L,
BIOS E182LAMS.31A, AMD ACP6x rev 0x62, and Realtek ALC274. After
applying the quirk, the internal microphone appears as an acp6x DMIC
capture device and records correctly.

Signed-off-by: David Glushkov <david.glushkov@sntiq.com>
Link: https://patch.msgid.link/20260531214512.170716-1-david.glushkov@sntiq.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: fsl_sai: Fix 32 slots TDM broken by integer shift UB in xMR write

When configuring 32 slots TDM (channels == slots == 32), the xMR
(Mask Register) write used:
~0UL - ((1 << min(channels, slots)) - 1)

The literal "1" is a signed 32-bit int. Shifting it by 32 positions is
undefined behaviour which may set this register to 0xFFFFFFFF, masking
all 32 slots.

Use GENMASK_U32() macro instead. For 32 slots this produces a zero mask:
~GENMASK_U32(31, 0) = ~0xFFFFFFFF = 0x00000000
Behaviour for fewer than 32 slots is unchanged.

Fixes: 770f58d7d2c5 ("ASoC: fsl_sai: Support multiple data channel enable bits")
Cc: stable@vger.kernel.org
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Reviewed-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Link: https://patch.msgid.link/20260601083327.1535185-1-chancel.liu@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

sched_ext: Add scx_arena_to_kaddr() / scx_kaddr_to_arena()

Translating between a BPF-arena pointer and its kernel-side address is just
an add or subtract of the arena's kern_vm start. More such translations are
coming, so cache that start on scx_sched as @arena_kern_base at arena attach
and wrap both directions. Convert the existing open-coded subtraction in
scx_call_op_set_cpumask().

Signed-off-by: Tejun Heo <tj@kernel.org>

xfrm: policy: fix use-after-free on inexact bin in xfrm_policy_bysel_ctx()

Fix the race by pruning the bin while still holding xfrm_policy_lock,
before dropping it. Use __xfrm_policy_inexact_prune_bin() directly since
the lock is already held. The wrapper xfrm_policy_inexact_prune_bin()
becomes unused and is removed.

Race:

  CPU0 (XFRM_MSG_DELPOLICY)           CPU1 (XFRM_MSG_NEWSPDINFO)
  ==========================          ==========================
  xfrm_policy_bysel_ctx():
    spin_lock_bh(xfrm_policy_lock)
    bin = xfrm_policy_inexact_lookup()
    __xfrm_policy_unlink(pol)
    spin_unlock_bh(xfrm_policy_lock)
    xfrm_policy_kill(ret)
    // wide window, lock not held
                                       xfrm_hash_rebuild():
                                         spin_lock_bh(xfrm_policy_lock)
                                         __xfrm_policy_inexact_flush():
                                           kfree_rcu(bin)  // bin freed
                                         spin_unlock_bh(xfrm_policy_lock)
    xfrm_policy_inexact_prune_bin(bin)
    // UAF: bin is freed

Fixes: 6be3b0db6db8 ("xfrm: policy: add inexact policy search tree infrastructure")
Signed-off-by: Sanghyun Park <sanghyun.park.cnu@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

net: bonding: fix NULL pointer dereference in bond_do_ioctl()

In bond_do_ioctl(), slave_dev is obtained via __dev_get_by_name() which
can return NULL if the requested interface name does not exist. However,
the subsequent slave_dbg() call is placed before the NULL check:

    slave_dev = __dev_get_by_name(net, ifr->ifr_slave);
    slave_dbg(bond_dev, slave_dev, "slave_dev=%p:\n", slave_dev); //here
    if (!slave_dev)
        return -ENODEV;

The slave_dbg() macro expands to netdev_dbg(bond_dev, "(slave %s): " fmt,
(slave_dev)->name, ...) which unconditionally dereferences slave_dev->name
before the NULL check is performed. This results in a NULL pointer
dereference kernel oops when a user calls bonding ioctl (e.g.
SIOCBONDENSLAVE, SIOCBONDRELEASE, etc.) with a non-existent slave
interface name.

This is reachable from userspace via the bonding ioctl interface with
CAP_NET_ADMIN capability, making it a potential local denial-of-service
vector.

Fix by moving the slave_dbg() call after the NULL check.

Fixes: e2a7420df2e0 ("bonding/main: convert to using slave printk macros")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
Link: https://patch.msgid.link/20260601085649.4029067-1-zhaojinming@uniontech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

perf/x86/amd/uncore: Use Node ID to identify DF and UMC domains

For DF and UMC PMUs, a single context is shared across all CPUs that
are connected to the same Data Fabric (DF) instance. Currently, the
Package ID, which also happens to be the Socket ID, is used to identify
DF instances. This approach works for configurations having a single IO
Die (IOD) but fails in the following cases.
  * Older Zen 1 processors, where each chiplet has its own DF instance.
  * Any configurations with multiple DF instances or multiple IODs in
    the same package.

The correct way to identify DF instances is through the Node ID (not to
be confused with NUMA Node ID). This is available in ECX[7:0] of CPUID
leaf 0x8000001e and returned via topology_amd_node_id(). Hence, replace
usage of topology_logical_package_id() with topology_amd_node_id().

Fixes: 07888daa056e ("perf/x86/amd/uncore: Move discovery and registration")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/e7a71a727c6a7b118c23d3e469929c538c4665aa.1780315832.git.sandipan.das@amd.com

perf: Reveal PMU type in fdinfo

It gives useful info on knowing which PMUs are reserved by this process.
Also add config which would be useful.
Testing cycles:

  $ ./perf stat -e cycles &
  $ cat /proc/`pidof perf`/fdinfo/3
  pos:    0
  flags:  02000002
  mnt_id: 16
  ino:    3081
  perf_event_attr.type:   0
  perf_event_attr.config: 0x0
  perf_event_attr.config1:        0x0
  perf_event_attr.config2:        0x0
  perf_event_attr.config3:        0x0
  perf_event_attr.config4:        0x0

Testing L1-dcache-load-misses:

  $ ./perf stat -e L1-dcache-load-misses &
  $ cat /proc/`pidof perf`/fdinfo/3
  pos:    0
  flags:  02000002
  mnt_id: 16
  ino:    1072
  perf_event_attr.type:   3
  perf_event_attr.config: 0x10000
  perf_event_attr.config1:        0x0
  perf_event_attr.config2:        0x0
  perf_event_attr.config3:        0x0
  perf_event_attr.config4:        0x0

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://patch.msgid.link/20260602181349.3969429-1-ctshao@google.com

perf/x86/intel/uncore: Implement global init callback for GNR uncore

On Sierra Forest and Clearwater Forest, the FRZ_ALL bit in the global
control register defaults to 0 at boot, but UBOX PMON units do not
work until the global control register is explicitly written with 0
to trigger hardware initialization properly.

Implement the generic uncore_msr_global_init() callback and add it to
gnr_uncore_init[], which is shared by GNR, GRR, SRF, and CWF.

Fixes: 632c4bf6d007 ("perf/x86/intel/uncore: Support Granite Rapids")
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260602144908.263680-8-zide.chen@intel.com

perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies

If the die is offline when uncore_die_to_cpu() is called, it silently
returns 0, which is misleading. Return -1 in this case to indicate
that all CPUs on the die are offline and the caller can take care of
it accordingly.

Opportunistically, replace -EPERM with -ENODEV, as -ENODEV is
the appropriate error when no CPUs are online across all dies.

Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260602144908.263680-7-zide.chen@intel.com

perf/x86/intel/uncore: Move die_to_cpu() to uncore.c

Move die_to_cpu() into uncore.c so it can be reused by the MSR
initialization path, preparing for the introduction of an MSR global
initialization callback.

Move the cpus_read_{lock,unlock}() out of the API, in order to make
it possible to be called when the lock is being held.

Add the uncore_ prefix for consistency with other uncore APIs.

Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260602144908.263680-6-zide.chen@intel.com

perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()

On some Raptor Cove CPUs, enabling uncore PMON globally at driver init
may increase power consumption even when no perf events are in use.

Drop adl_uncore_msr_init_box() and defer programming the global control
register to enable_box(), so it is only set when a box is actually used.

IMC and IMC freerunning counters use a separate control path and are
unaffected.

Fixes: 772ed05f3c5c ("perf/x86/intel/uncore: Add Alder Lake support")
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260602144908.263680-5-zide.chen@intel.com

perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery

pci_get_domain_bus_and_slot() increments the reference count of the
returned PCI device and therefore requires a matching pci_dev_put().

In skx_upi_topology_cb() and discover_upi_topology(), the lookup is
performed inside a loop, but pci_dev_put() is only called once after
the loop. As a result, references from all previous iterations are
leaked.

Move pci_dev_put(dev) into the if (dev) block immediately after
upi_fill_topology() returns.

Opportunistically, fix uninitialized variable in skx_upi_topology_cb().

Fixes: 4cfce57fa42d ("perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server")
Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server")
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260602144908.263680-4-zide.chen@intel.com

perf/x86/intel/uncore: Guard against invalid box control address

Theoretically, intel_uncore_find_discovery_unit() could return NULL,
e.g., when a CPU die is offline during uncore enumeration and its PMU
units are not added to the discovery RB-tree.

Guard against a NULL return value and the resulting invalid box control
address (0) before accessing hardware.

Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260602144908.263680-3-zide.chen@intel.com

perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems

In uncore_find_add_unit(), PMON units with the same unit ID may be
added to the uncore discovery RB-tree for different dies. These units
are distinguished by node->die.

However, intel_generic_uncore_box_ctl() uses a fixed die ID of -1 when
looking up the discovery unit, which may retrieve the wrong node on
multi-die systems.

Use box->dieid instead so the correct discovery unit is selected.

No functional issue has been observed so far because currently supported
platforms happen to use the same unit control register for such units.

Remove WARN_ON_ONCE() because with the above change a NULL unit can be
expected, e.g. when a CPU die is offline during uncore enumeration and
the unit is not added to the RB-tree. In this case,
intel_uncore_find_discovery_unit() returns NULL once the die becomes
online, and it is expected that the PMU box is not functional for that
die.

Fixes: b1d9ea2e1ca4 ("perf/x86/uncore: Apply the unit control RB tree to MSR uncore units")
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20260602144908.263680-2-zide.chen@intel.com

perf/x86/amd/core: Always use the NMI latency mitigation

Commit df4d29732fda ("perf/x86/amd: Change/fix NMI latency mitigation
to use a timestamp") fixed handling of late-arriving NMIs but limited
the mitigation to processors having X86_FEATURE_PERFCTR_CORE. However,
it is unclear if processors without this feature are also affected.
When Mediated vPMU is enabled on affected hardware, it is also possible
to bypass the fix inside KVM guests if X86_FEATURE_PERFCTR_CORE is
removed from the guest CPUID (e.g. using "-cpu host,-perfctr-core" with
QEMU). Hence, use the mitigation at all times.

Fixes: df4d29732fda ("perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/29a3c970da289ab8f24282933bdb36545c0403e8.1780325517.git.sandipan.das@amd.com

timekeeping: Add clocksource read_snapshot() method and hw_cycles to snapshot

Add a read_snapshot() callback to struct clocksource which returns the
derived clocksource value while also providing the underlying hardware
counter reading and the related clocksource ID.

This allows ktime_get_snapshot_id() to populate new hw_cycles and hw_csid
fields in struct system_time_snapshot.

For clocksources that are derived from an underlying counter (e.g., Hyper-V
TSC page scales TSC to 10MHz, kvmclock scales TSC to 1GHz), this provides
atomic access to both the derived value needed for timekeeping
calculations, and the raw hardware counter needed by consumers like KVM's
master clock and the vmclock PTP driver.

[ tglx: Reworked it slightly ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Assisted-by: Kiro:claude-opus-4.6-1m
Link: https://patch.msgid.link/20260526230635.136914-1-dwmw2@infradead.org
Link: https://patch.msgid.link/20260529195558.202568489@kernel.org

ptp: Switch to ktime_get_snapshot_id() for pre/post timestamps

To prepare for a new PTP IOCTL, which exposes the raw counter value along
with the requested system time snapshot, switch the pre/post time stamp
sampling over to use ktime_get_snapshot_id() and fix up all usage sites.

No functional change intended.

The ptp_vmclock conversion was simplified by David Woodhouse.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260529195558.149589566@kernel.org

timekeeping: Add support for AUX clock cross timestamping

Now that all prerequisites are in place add the final support for AUX
clocks in get_device_system_crosststamp(), which enables the PTP layer to
support hardware cross timestamps with a new IOTCL.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195558.097464513@kernel.org

timekeeping: Remove system_device_crosststamp::sys_realtime

All users are converted to sys_systime.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195558.046694580@kernel.org

ALSA: hda/common: Use system_device_crosststamp::sys_systime

sys_systime is an alias for sys_realtime. The latter will be removed so
switch the code over to the new naming scheme.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.995298795@kernel.org

wifi: iwlwifi: Use system_device_crosststamp::sys_systime

sys_systime is an alias for sys_realtime. The latter will be removed so
switch the code over to the new naming scheme.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.946612509@kernel.org

ptp: Use system_device_crosststamp::sys_systime

.. to prepare for cross timestamps with variable clock IDs.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.897808371@kernel.org

timekeeping: Prepare for cross timestamps on arbitrary clock IDs

PTP device system crosstime stamps support only CLOCK_REALTIME, which is
meaningless for AUX clocks. The PTP core hands in the clock ID already, so
prepare the core code to honor it.

- Add a new sys_systime field to struct system_device_crosststamp which
   aliases the sys_realtime field. Once all users are converted
   sys_realtime can be removed.

- Prepare get_device_system_crosststamp() and the related code for it by
   switching to sys_systime and providing the initial changes to utilize
   different time keepers.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.846634842@kernel.org

timekeeping: Remove ktime_get_snapshot()

All users have been converted to ktime_get_snapshot_id().

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.795510496@kernel.org

virtio_rtc: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which the system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Use ktime_get_snapshot_id() and hand the provided clock ID in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20260529195557.744271454@kernel.org

net/mlx5: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which the system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Use ktime_get_snapshot_id() and hand the provided clock ID in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.689836531@kernel.org

igc: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which the system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Save the provided clock ID and use it in igc_phc_get_syncdevicetime() for
taking the history snapshot.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.637381831@kernel.org

ice/ptp: Use provided clock ID for history snapshot

The PTP core indicates in system_device_crosststamp::clock_id the clock ID
for which then system time stamp should be taken. That allows to utilize
hardware timestamps with e.g. AUX clocks.

Save the provided clock ID and use it in ice_capture_crosststamp() for
taking the history snapshot.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.587226681@kernel.org

wifi: iwlwifi: Adopt PTP cross timestamps to core changes

iwlwifi only supports CLOCK_REALTIME timestamps and provides an incomplete
result without system counter values etc.

It also zeros struct system_device_crosststamp, which is already zeroed in
the core and initialized with the clock ID.

Remove the zeroing and reject any request for a clock ID other than REALTIME.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.535447186@kernel.org

timekeeping: Add CLOCK ID to system_device_crosststamp

The normal capture for system/device cross timestamps is CLOCK_REALTIME,
but that's meaningless for AUX clocks.

Add a clock_id field to struct system_device_crosststamp and initialize it
with CLOCK_REALTIME at the two places which prepare for cross
timestamps.

After the related code has been cleaned up, the core code will honor the
clock_id field when calculating the system time from the system counter
snapshot.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.482153523@kernel.org

timekeeping: Add system_counterval_t to struct system_device_crosststamp

An upcoming extension to the PTP IOCTL requires to return the system counter
value and the clocksource ID to user space. get_device_system_crosststamp() has
this information already.

Extend struct system_device_crosststamp with a system_counterval_t member
and fill in the data.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.429406675@kernel.org

timekeeping: Add CLOCK_AUX support for ktime_get_snapshot_id()

Now that all users are converted it's possible to enable snapshotting of
CLOCK_AUX time. The underlying clocksource is the same as for all other
CLOCK variants.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.380601005@kernel.org

timekeeping: Remove system_time_snapshot::real/boot/raw

All users are converted over to ktime_get_snapshot_id() and
system_time_snapshot::systime and ::monoraw.

Remove the leftovers.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.330029635@kernel.org

ptp: ptp_vmclock: Convert to ktime_get_snapshot_id()

ktime_get_snapshot() is replaced by ktime_get_snapshot_id() which allows to
request a particular CLOCK ID to be captured along with the clocksource
counter.

Convert vmclock over and use the new system_time_snapshot::systime field,
which holds the system timestamp selected by the CLOCK ID argument.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.281425262@kernel.org

KVM: arm64: Use ktime_get_snapshot_id() to snapshot CLOCK_REALTIME

ktime_get_snapshot() is replaced by ktime_get_snapshot_id() which allows to
request a particular CLOCK ID to be captured along with the clocksource
counter.

Convert the usage in kvm_get_ptp_time() over and use the new
system_time_snapshot::systime field, which holds the system timestamp
selected by the CLOCK ID argument.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20260529195557.225399927@kernel.org

KVM: arm64: Use ktime_get_snapshot_id() to retrieve CLOCK_BOOTTIME

ktime_get_snapshot() is replaced by ktime_get_snapshot_id() which allows to
request a particular CLOCK ID to be captured along with the clocksource
counter.

Convert the tracing mechanism over and use the new
system_time_snapshot::systime field, which holds the system timestamp
selected by the CLOCK ID argument.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260529195557.174373054@kernel.org

tracing: Fix CFI violation in probestub being called by tprobes

The probestub is a function to allow tprobes to hook to a tracepoint to
gain access to its parameters. The function itself is only referenced by
the tracepoint structure which lives in the __tracepoint section. objtool
explicitly ignores that section and when processing functions in the
kernel, if it detects one that has no references it will seal it to have
its ENDBR stripped on boot up.

This means when a tprobe is attached to the sched_wakeup tracepoint, when it
is triggered it will call __probestub_sched_wakeup and due to the missing
ENDBR on a CFI-enabled machine it will take a #CP exception.

Fix this by adding CFI_NOSEAL annotation to probestub declaration.

Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260603153147.573589-1-eva.kurchatova@virtuozzo.com
Fixes: d5173f753750 ("objtool: Exclude __tracepoints data from ENDBR checks")
Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com>
[ Updated change log ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

nvme: export controller reconnect event count via sysfs

When an NVMe-oF link goes down, the driver attempts to recover the
connection by repeatedly reconnecting to the remote controller at
configured intervals. A maximum number of reconnect attempts is also
configured, after which recovery stops and the controller is removed
if the connection cannot be re-established.

The driver maintains a counter, nr_reconnects, which is incremented on
each reconnect attempt. However if in case the reconnect is successful
then this counter reset to zero. Moreover, currently, this counter is
only reported via kernel log messages and is not exposed to userspace.
Since dmesg is a circular buffer, this information may be lost over
time.

So introduce a new accumulator which accumulates nr_reconnect attempts
and also expose this accumulator per-fabric ctrl via a new sysfs
attribute reconnect_count, under diag attribute grroup to provide
persistent visibility into the number of reconnect attempts made by the
host. This information can help users diagnose unstable links or
connectivity issues. Furthermore, this sysfs attribute is also writable
so user may reset it to zero, if needed.

The reconnect_count can also be consumed by monitoring tools such as
nvme-top to improve controller-level observability.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: export controller reset event count via sysfs

The NVMe controller transitions into the RESETTING state during error
recovery, link instability, firmware activation, or when a reset is
explicitly triggered by the user.

Expose a per-ctrl sysfs attribute reset_count, under diag attribute
group to provide visibility into these RESETTING state transitions.
Observing the frequency of reset events can help users identify issues
such as PCIe errors or unstable fabric links. This counter is also
writable thus allowing user to reset its value, if needed.

This counter can also be consumed by monitoring tools such as nvme-top
to improve controller-level observability.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: export I/O failure count when no path is available via sysfs

When I/O is submitted to the NVMe namespace head and no available path
can handle the request, the driver fails the I/O immediately. Currently,
such failures are only reported via kernel log messages, which may be
lost over time since dmesg is a circular buffer.

Add a new ns-head sysfs counter io_fail_no_available_path_count, under
diag attribute group to expose the number of I/Os that failed due to the
absence of an available path. This provides persistent visibility into
path-related I/O failures and can help users diagnose the cause of I/O
errors. This counter is also writable and so user may reset its value,
if needed.

This counter can also be consumed by monitoring tools such as nvme-top.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: export I/O requeue count when no path is usable via sysfs

When the NVMe namespace head determines that there is no currently
available path to handle I/O (for example, while a controller is
resetting/connecting or due to a transient link failure), incoming
I/Os are added to the requeue list.

Currently, there is no visibility into how many I/Os have been requeued
in this situation. Add a new ns-head sysfs counter
io_requeue_no_usable_path_count, under diag attribute group to expose
the number of I/Os that were requeued due to the absence of an available
path. This counter is also writable thus allowing user to reset it, if
needed.

This statistic can help users understand I/O slowdowns or stalls caused
by temporary path unavailability, and can be consumed by monitoring
tools such as nvme-top for real-time observability.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: export command error counters via sysfs

When an NVMe command completes with an error status, the driver
logs the error to the kernel log. However, these messages may be
lost or overwritten over time since dmesg is a circular buffer.

Expose per-path and ctrl sysfs attribute command_error_count, under
diag attribute group to provide persistent visibility into error
occurrences. This allows users to observe the total number of commands
that have failed on a given path over time, which can be useful for
diagnosing path health and stability.

This attribute is both readable and writable thus allowing user to reset
these counters. These counters can also be consumed by observability
tools such as nvme-top to provide additional insight into NVMe error
behavior.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: export multipath failover count via sysfs

When an NVMe command completes with a path-specific error, the NVMe
driver may retry the command on an alternate controller or path if one
is available. These failover events indicate that I/O was redirected
away from the original path.

Currently, the number of times requests are failed over to another
available path is not visible to userspace. Exposing this information
can be useful for diagnosing path health and stability.

Export per-path sysfs attribute "multipath_failover_count" under diag
attribute group. This attribute is both readable and writable and thus
allowing user to reset the counter. This counter can be consumed by
monitoring tools such as nvme-top to help identify paths that
consistently trigger failovers under load.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: export command retry count via sysfs

When Advanced Command Retry Enable (ACRE) is configured, a controller
may interrupt command execution and return a completion status
indicating command interrupted with the DNR bit cleared. In this case,
the driver retries the command based on the Command Retry Delay (CRD)
value provided in the completion status.

Currently, these command retries are handled entirely within the NVMe
driver and are not visible to userspace. As a result, there is no
observability into retry behavior, which can be a useful diagnostic
signal.

Expose a per-namespace sysfs attribute command_retries_count, under
diag attribute group to provide visibility into retry activity. This
information can help identify controller-side congestion under load
and enables comparison across paths in multipath setups (for example,
detecting cases where one path experiences significantly more retries
than another under identical workloads).

This exported metric is intended for diagnostics and monitoring tools
such as nvme-top, and does not change command retry behavior. A new
sysfs attribute named "command_retries_count" is added for this purpose.
This attribute is both readable as well as writable. So user could
reset this counter if needed.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: add diag attribute group under sysfs

Add a new diag attribute group under:
/sys/class/nvme/<ctrl>/
/sys/block/<nvme-path-dev>/
/sys/block/<ns-head-dev>/

This new sysfs attribute group will be used to organize NVMe diagnostic
and telemetry-related counters under it.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

cpu: Add lockdep_is_cpus_held()/lockdep_is_cpus_write_held() stubs for !CONFIG_HOTPLUG_CPU

lockdep_is_cpus_held() and lockdep_is_cpus_write_held() are undefined when
!CONFIG_HOTPLUG_CPU. This is ok because their few callers protect the calls
with a "if (IS_ENABLED(CONFIG_HOTPLUG_CPU) ..." check.

It is error prone to require callers to protect lockdep_is_cpus_held()
and lockdep_is_cpus_write_held() with an IS_ENABLED(CONFIG_HOTPLUG_CPU)
check while the custom for equivalent functions, for example the more
prevalent lockdep_is_held(), is to not require similar protection.
It is also inconsistent with CPU hotplug lockdep code self since related
call lockdep_assert_cpus_held() does not require protection.

Create stubs for lockdep_is_cpus_held() and lockdep_is_cpus_write_held()
that returns 1 (LOCK_STATE_UNKNOWN/LOCK_STATE_HELD) when !CONFIG_HOTPLUG_CPU.
This makes the CPU hotplug lockdep checks consistent while following
existing lockdep custom. Drop the "extern" from the function declaration
as part of the move to match kernel coding style.

Keep the IS_ENABLED(CONFIG_HOTPLUG_CPU) checks in existing users since
removing them would change the logic of these expressions.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/7484f0b58fd86153d445819cc4e172adba16cff9.1780543665.git.reinette.chatre@intel.com
Closes: https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=1

MAINTAINERS: Add include/linux/cpuhplock.h to CPU HOTPLUG area

The move of CPU hotplug function declarations to include/linux/cpuhplock.h
did not update the CPU HOTPLUG section of MAINTAINERS. Update it now.

Fixes: 195fb517ee25 ("cpu: Move CPU hotplug function declarations into their own header")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/ef78e9d61d0ae041a1f44bd3cf8bb6e733b294d6.1780543665.git.reinette.chatre@intel.com

rtla: Fix parsing of multi-character short options

A bug was reported where the parsing of multi-character short options,
be it a short option with an argument specified without space (e.g.
"-p100") or multiple short options in one argument (e.g. -un), ignores
options specific to individual tools.

Furthermore, if the rest of the option is supposed to be an argument, it
gets reinterpreted as a string of options. For example, -p100 gets
interpreted as -100, which is due to hackish implementation read as
--no-thread --no-irq --no-irq with timerlat hist, causing rtla to error
out:

$ rtla timerlat hist -p100
no-irq and no-thread set, there is nothing to do here

This behavior is caused by getopt_long() being called twice on each
argument, once in common_parse_options(), once in [tool]_parse_args():

- common_parse_options() calls getopt_long() with an array of options
common for all rtla tools, while suppressing errors (opterr = 0).
- If the option fails to parse, common_parse_options() returns 0.
- If 0 is returned from common_parse_options(), [tool]_parse_args()
calls getopt_long() again, with its own set of options.

* [tool] means one of {osnoise,timerlat}_{top,hist}

At least in glibc, getopt_long() increments its internal nextchar
variable even if the option is not recognized. That means that in the
case of "-p100", common_parse_options() sets nextchar pointing to '1',
and timerlat_hist_parse_args() sees '1', not 'p'; the same then repeats
for the first and second '0'.

As there is no way to restore the correct internal state of
getopt_long() reliably, fix the issue by merging the common options back
to the longopt array and option string of the [tool]_parse_args()
functions using a macro; only the switch part is left in the original
function, which is renamed to set_common_option().

Fixes: 850cd24cb6d6 ("tools/rtla: Add common_parse_options()")
Reported-by: John Kacur <jkacur@redhat.com>
Tested-by: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/r/20260602125506.3325345-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>

geneve: fix length used in GRO hint UDP checksum adjustment

In geneve_post_decap_hint the length used for adjusting the UDP checksum
should be 'skb->len - gro_hint->nested_tp_offset' (UDP length) instead
of 'skb->len - gro_hint->nested_nh_offset' (IP length).

Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260521131436.748832-1-jhs%40mojatatu.com
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260529144713.780938-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge patch series "Remove b_end_io from struct buffer_head"

Matthew Wilcox (Oracle) <willy@infradead.org> says:

There are four benefits to this patchset.  First, it removes an
indirect function call from the completion path.  Instead of setting
bio->bi_end_io to end_bio_bh_io_sync() which then calls bh->b_end_io(),
we set bio->bi_end_io to the appropriate completion handler, replacing
two indirect function calls with one.

Second, there is a slight security advantage to this.  It is one fewer
function pointer in the middle of a writable data structure that can
be corrupted.  Third, it shrinks struct buffer_head from 104 bytes to 96
bytes, allowing for appropriximately 7% reduction in the amount of memory
used by buffer_heads (or, alternatively, allows 7% more buffer_heads to
be cached in the same amount of memory).  Fourth, it removes some
atomic operations as the buffer refcount is no longer incremented before
calling the end_io handler.

I've run ext4 through its paces, and everything seems OK.  I've only
compiled ocfs2/gfs2/nilfs/md-bitmap.  Hopefully the maintainers can give
this series a try.  I'm sending the entire series to linux-fsdevel
and cc'ing the fs-specific mailing lists for the fs-specific patches.

* patches from https://patch.msgid.link/20260528173150.1093780-1-willy@infradead.org: (34 commits)
  buffer: Remove end_buffer_write_sync()
  buffer: Change calling convention for end_buffer_read_sync()
  buffer: Remove b_end_io
  buffer: Remove submit_bh()
  md-bitmap: Convert read_file_page and write_file_page to bh_submit()
  nilfs2: Convert nilfs_mdt_submit_block to bh_submit()
  nilfs2: Convert nilfs_gccache_submit_read_data to bh_submit()
  nilfs2: Convert nilfs_btnode_submit_block to bh_submit()
  buffer: Remove mark_buffer_async_write()
  gfs2: Convert gfs2_aspace_write_folio to bh_submit()
  gfs2: Remove use of b_end_io in gfs2_meta_read_endio()
  gfs2: Convert gfs2_dir_readahead to bh_submit()
  gfs2: Convert gfs2_metapath_ra to bh_submit()
  ocfs2: Convert ocfs2_write_super_or_backup to bh_submit()
  ocfs2: Convert ocfs2_read_blocks to bh_submit()
  ocfs2: Convert ocfs2_read_block to bh_submit()
  ocfs2: Convert ocfs2_write_block to bh_submit()
  jbd2: Convert jbd2_write_superblock() to bh_submit()
  jbd2: Convert journal commit to bh_submit()
  ext4: Convert ext4_commit_super() to bh_submit()
  ...

Link: https://patch.msgid.link/20260528173150.1093780-1-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

buffer: Remove end_buffer_write_sync()

It has no callers left, so delete it. Inline __end_buffer_write_sync()
into bh_end_write().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-35-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Change calling convention for end_buffer_read_sync()

Unify end_buffer_read_sync() and __end_buffer_read_notouch()
by requiring the caller put the refcount on the buffer. The only caller
is in the gfs2_meta_read() path, and there we can put the refcount
after locking the buffer.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-34-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Remove b_end_io

This shrinks buffer_head by 8 bytes, letting us pack more buffer heads
per slab. With a Debian config, it shrinks from 104 bytes to 96 bytes
which is 42 objects per 4KiB page rather than 39, a 7% reduction in the
amount of memory used.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-33-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Remove submit_bh()

No users are left; remove this API. Also remove/fix comments mentioning
it, and end_bio_bh_io_sync() as it's now unused.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-32-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

md-bitmap: Convert read_file_page and write_file_page to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-31-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

nilfs2: Convert nilfs_mdt_submit_block to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-30-willy@infradead.org
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-nilfs@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

nilfs2: Convert nilfs_gccache_submit_read_data to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-29-willy@infradead.org
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-nilfs@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

nilfs2: Convert nilfs_btnode_submit_block to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-28-willy@infradead.org
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-nilfs@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Remove mark_buffer_async_write()

There are no more callers of this function, so delete it.
end_buffer_async_write() then has only one caller left, so
inline it into bh_end_async_write().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-27-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

gfs2: Convert gfs2_aspace_write_folio to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-26-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

gfs2: Remove use of b_end_io in gfs2_meta_read_endio()

All buffer heads submitted by gfs2_submit_bhs() use
end_buffer_read_sync() so we can call it directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-25-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

gfs2: Convert gfs2_dir_readahead to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh(). Also simplify the control flow now that the buffer
refcount is not put by bh_end_read().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-24-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

gfs2: Convert gfs2_metapath_ra to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead
of submit_bh(). Also simplify the control flow now that the buffer
refcount is not put by bh_end_read().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-23-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: gfs2@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ocfs2: Convert ocfs2_write_super_or_backup to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-22-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ocfs2: Convert ocfs2_read_blocks to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-21-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ocfs2: Convert ocfs2_read_block to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-20-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ocfs2: Convert ocfs2_write_block to bh_submit()

Avoid an extra indirect function call and changing the buffer
refcount by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-19-willy@infradead.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: ocfs2-devel@lists.linux.dev
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

jbd2: Convert jbd2_write_superblock() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-18-willy@infradead.org
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

jbd2: Convert journal commit to bh_submit()

Avoid an extra indirect function call by using bh_submit()
instead of submit_bh() in journal_submit_commit_record()
and jbd2_journal_commit_transaction(). These both use
journal_end_buffer_io_sync(), so it's more straightforward to do them
both at once.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-17-willy@infradead.org
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4: Convert ext4_commit_super() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-16-willy@infradead.org
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4: Convert write_mmp_block_thawed() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-15-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4: Convert ext4_fc_submit_bh() to bh_submit()

Avoid an extra indirect function call by converting
ext4_end_buffer_io_sync() from bh_end_io_t to bio_end_io_t and
calling bh_submit().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-14-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4; Convert __ext4_read_bh() to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by converting ext4_end_bitmap_read() from bh_end_io_t to bio_end_io_t
and calling bh_submit().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-13-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Convert __block_write_full_folio to __bh_submit()

Avoid an extra indirect function call by using __bh_submit() instead
of submit_bh_wbc(). Since there is only one caller of submit_bh_wbc()
left, inline it into submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-12-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Convert block_read_full_folio to bh_submit()

Avoid an extra indirect function call by using bh_submit() instead of
submit_bh(). Since mark_buffer_async_read() would collapse to a single
function call, inline it into block_read_full_folio() along with its
extensive comment. Convert end_buffer_async_read_io() to
bh_end_async_read().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-11-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Convert __bh_read_batch to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-10-willy@infradead.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Convert __bh_read to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-9-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Convert __sync_dirty_buffer to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-8-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Convert __bread_slow to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-7-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Convert write_dirty_buffer to bh_submit()

Avoid an extra indirect function call and changing the buffer refcount
by using bh_submit() instead of submit_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-6-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Add bh_end_read(), bh_end_write() and bh_end_async_write()

These are the bio_end_io_t versions of end_buffer_read_sync(),
end_buffer_write_sync() and end_buffer_async_write(). They do not
contain a put_bh() call as it is no longer necessary.

Also add the helper function bio_endio_bh().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-5-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Remove mark_buffer_async_write_endio()

All callers of mark_buffer_async_write_endio() pass
end_buffer_async_write, so we can inline mark_buffer_async_write_endio()
into mark_buffer_async_write() and just call that instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-4-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Add bh_submit()

bh_submit() takes a bio_end_io allowing users to avoid the indirect
function call through bh->b_end_io, and eventually allowing us to remove
bh->b_end_io.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-3-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

buffer: Remove forward declaration of submit_bh_wbc()

Rearrange functions to avoid this forward declaration.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20260528173150.1093780-2-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>