]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
2 weeks agortla/tests: Add unit tests for _parse_args() functions
Tomas Glozar [Thu, 28 May 2026 10:32:53 +0000 (12:32 +0200)] 
rtla/tests: Add unit tests for _parse_args() functions

Add a test suite for the _parse_args() function of each tool that checks
the params structures (struct common_params, struct osnoise_params,
struct timerlat_params) returned by them for correctness.

One test case is added per option, as well as a few special cases for
tricky combinations of options. Test cases are ordered the same as the
option arrays and help message to allow easy checking of whether all
options are covered.

This should help clarify what the proper command line behavior of RTLA
is in case there are holes in the documentation and verify that the
intended behavior is implemented correctly.

A few necessary changes to the unit tests were done as part of this
commit:

- Unit tests now also link to libsubcmd and its dependencies.
- A new global variable in_unit_test is added to RTLA's CLI interface,
  causing it to skip check for root if running in unit tests. This
  allows the CLI unit tests to run as non-root, like existing unit
  tests.

There is quite a lot of duplication, some of it is mitigated with macros,
but partially it is intentional so that future changes in behavior are
tracked across tools.

Link: https://lore.kernel.org/r/20260528103254.2990068-6-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agortla: Parse cmdline using libsubcmd
Tomas Glozar [Thu, 28 May 2026 10:32:52 +0000 (12:32 +0200)] 
rtla: Parse cmdline using libsubcmd

Instead of using getopt_long() directly to parse the command line
arguments given to an RTLA tool, use libsubcmd's parse_options().

Utilizing libsubcmd for parsing command line arguments has several
benefits:

- A help message is automatically generated by libsubcmd from the
  specification, removing the need of writing it by hand.
- Options are sorted into groups based on which part of tracing (CPU,
  thread, auto-analysis, tuning, histogram) they relate to.
- Common parsing patterns for numerical and boolean values now share
  code, with the target variable being stored in the option array.

To avoid duplication of the option parsing logic, RTLA-specific
macros defining struct option values are created:

- RTLA_OPT_* for options common to all tools
- OSNOISE_OPT_* and TIMERLAT_OPT_* for options specific to
  osnoise/timerlat tools
- HIST_OPT_* macros for options specific to histogram-based tools.

Individual *_parse_args() functions then construct an array out of
these macros that is then passed to libsubcmd's parse_options().

All code specific to command line options parsing is moved out of the
individual tool files into a new file, cli.c, which also contains the
contents of the rtla.c file. A private header, cli_p.h, is added
alongside the public header cli.h, so that unit tests are able to test
statically declared option callbacks.

Minor changes:

- The return value of tool-level help option changes to 129, as this is
  the value set by libsubcmd; this is reflected in affected test cases.
  The implementation of help for command-level and tracer-level help
  is set to 129 as well for consistency, and the change is reflected in
  exit value documentation.
- Related to the above, {rtla,osnoise,timerlat}_usage() are marked
  __noreturn and exit() is removed from after they are called for
  cleaner code.
- The error messages for invalid argument for options --dma-latency and
  -E/--entries were corrected, fixing off-by-one in the limits.

Note that unsetting options (using --no-<opt> syntax) is currently not
implemented for options that use custom callbacks. For --irq and
--thread, it will never be implemented, as they conflict with already
existing --no-irq and --no-thread with a different meaning.

Assisted-by: Composer:composer-1.5
Link: https://lore.kernel.org/r/20260528103254.2990068-5-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agotools subcmd: allow parsing distinct --opt and --no-opt
Tomas Glozar [Thu, 28 May 2026 10:32:51 +0000 (12:32 +0200)] 
tools subcmd: allow parsing distinct --opt and --no-opt

libsubcmd automatically generates for every option --opt an equivalent
negated option, --no-opt, to unset the option. Vice versa, for every
option declared as --no-opt, a shorthand --opt is declared for
convenience.

Add a flag, PARSE_OPT_NOAUTONEG, to disable this behavior. This new flag
behaves similarly to the already existing PARSE_OPT_NONEG, only it does
not reject the --no-opt variant, but leaves it undefined. That is useful
when there is a conflicting distinct --no-opt option in the syntax of
the tool.

PARSE_OPT_NOAUTONEG is enabled per-option, allowing to unset other
options that do not have this conflict.

Link: https://lore.kernel.org/r/20260528103254.2990068-4-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agotools subcmd: support optarg as separate argument
Tomas Glozar [Thu, 28 May 2026 10:32:50 +0000 (12:32 +0200)] 
tools subcmd: support optarg as separate argument

In addition to "-ovalue" and "--opt=value" syntax, allow also "-o value"
and "--opt value" for options with optional argument when the newly
added PARSE_OPT_OPTARG_ALLOW_NEXT flag is set.

This behavior is turned off by default since it does not make sense for
tools using non-option command line arguments. Consider the ambiguity
of "cmd -d x", where "-d x" can mean either "-d with argument of x" or
"-d without argument, followed by non-option argument x". This is not an
issue in the case that the tool takes no non-option arguments.

To implement this, a new local variable, force_defval, is created in
get_value(), along with a comment explaining the logic.

Link: https://lore.kernel.org/r/20260528103254.2990068-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agortla: Add libsubcmd dependency
Tomas Glozar [Thu, 28 May 2026 10:32:49 +0000 (12:32 +0200)] 
rtla: Add libsubcmd dependency

In preparation for migrating RTLA to libsubcmd, build libsubcmd from the
appropriate directory next to the RTLA build proper, and link the
resulting object to RTLA.

libsubcmd uses str_error_r() and strlcpy() at several places. To support
these, also link the respective libraries from tools/lib.

For completeness, also add tools/include to include path. This will
allow other userspace functions and macros shipped with the kernel to be
used in RTLA; perf and bpftool, two other users of libsubcmd, already do
that.

To prevent a name conflict, rename RTLA's run_command() function to
run_tool_command(), and replace RTLA's own container_of implementation
with the one in tools/include/linux/container_of.h.

Assisted-by: Composer:composer-1
Link: https://lore.kernel.org/r/20260528103254.2990068-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agortla/tests: Add runtime tests for restoring continue flag
Tomas Glozar [Tue, 26 May 2026 10:25:23 +0000 (12:25 +0200)] 
rtla/tests: Add runtime tests for restoring continue flag

In case an action preceding the continue action fails, not only
the continue flag should not be set, it should be unset if it was set
from a previous run of actions_perform().

Add a runtime test to both osnoise and timerlat tools that checks that
this works properly by creating a temporary file.

Link: https://lore.kernel.org/r/20260526102523.2662391-4-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agortla/tests: Run runtime tests in temporary directory
Tomas Glozar [Tue, 26 May 2026 10:25:22 +0000 (12:25 +0200)] 
rtla/tests: Run runtime tests in temporary directory

Create a temporary directory before each test case to serve as working
directory during the duration of the test.

This prevents littering of the original working directory as well as
allows tests to use it to avoid path conflicts.

In order not to break already existing tests, also add a new "testdir"
variable containing the directory where the test file is located. This
is then used to locate artifacts used during testing like BPF programs
and scripts for checking the tracer threads.

Link: https://lore.kernel.org/r/20260526102523.2662391-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agortla/tests: Add unit test for restoring continue flag
Tomas Glozar [Tue, 26 May 2026 10:25:21 +0000 (12:25 +0200)] 
rtla/tests: Add unit test for restoring continue flag

In case an action preceding the continue action fails, not only
the continue flag should not be set, it should be unset if it was set
from a previous run of actions_perform().

Add a unit test to check if this is implemented correctly.

Link: https://lore.kernel.org/r/20260526102523.2662391-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agortla/actions: Restore continue flag in actions_perform()
Tomas Glozar [Tue, 26 May 2026 10:25:20 +0000 (12:25 +0200)] 
rtla/actions: Restore continue flag in actions_perform()

Currently, actions_perform() only ever sets the continue flag (when
performing the continue action), but never resets it. That leads to
RTLA continuing tracing even if the continue action was not performed in
the current iteration.

For example, the following command:

$ rtla timerlat hist -T 100 --on-threshold shell,command='
    echo Spike!
    if [ -f /tmp/a ]
    then
      exit 1
    else
      touch /tmp/a
    fi' --on-threshold continue

should print Spike! at most once, because after hitting the threshold
for the first time, /tmp/a exists, the shell action will fail, and the
continue action is not performed. However, unless /tmp/a exists before
the measurement, it will print Spike! until stopped, as the continue
flag stays set.

Set the continue flag to false in the beginning of actions_perform() to
make RTLA continue only if the action was actually performed.

Fixes: 8d933d5c89e8 ("rtla/timerlat: Add continue action")
Link: https://lore.kernel.org/r/20260526102523.2662391-1-tglozar@redhat.com
[ correct Fixes tag to include 12 characters of hash ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2 weeks agonet: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path
Jijie Shao [Mon, 25 May 2026 14:45:25 +0000 (22:45 +0800)] 
net: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path

The dma_rmb() barrier was placed before dma_sync_single_for_cpu(), which
is incorrect. DMA sync must complete first to make the buffer accessible
to the CPU, then the rmb barrier ensures subsequent descriptor reads
observe the latest data written by the hardware.

Reorder the operations so dma_sync_single_for_cpu() is called before
dma_rmb() to guarantee the driver reads consistent data from the DMA
buffer.

Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260525144525.94884-3-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agonet: hibmcge: disable Relaxed Ordering to fix RX packet corruption
Jijie Shao [Mon, 25 May 2026 14:45:24 +0000 (22:45 +0800)] 
net: hibmcge: disable Relaxed Ordering to fix RX packet corruption

When SMMU is disabled, the hibmcge driver may receive corrupted packets.
The hardware writes packet data and descriptors to the same page, but
with Relaxed Ordering enabled, PCI write transactions may not be
strictly ordered. This can cause the driver to observe a valid
descriptor before the corresponding packet data is fully written.

Fix this by clearing PCI_EXP_DEVCTL_RELAX_EN in the PCI bridge control
register to ensure strict write ordering between packet data and
descriptors.

Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260525144525.94884-2-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoMerge branch 'net-sched-fix-packet-loops-in-mirred-and-netem'
Paolo Abeni [Thu, 28 May 2026 10:26:38 +0000 (12:26 +0200)] 
Merge branch 'net-sched-fix-packet-loops-in-mirred-and-netem'

Jamal Hadi Salim says:

====================
net/sched: Fix packet loops in mirred and netem

This patchset adds a 2-bit per-skb tc_depth counter that travels with
the packet. The existing per-CPU mirred nest tracking loses state
when a packet is deferred through the backlog or moves between CPUs
via XPS/RPS. A per-skb field covers both cases.

Patch 1 adds the tc_depth field in a padding hole in sk_buff.
Patches 2-3 revert the check_netem_in_tree() fix and its tests,
which broke legitimate multi-netem configurations.
Patch 4 uses tc_depth to stop netem duplicate recursion.
Patch 5 uses tc_depth to catch mirred ingress redirect loops.
Patch 6 fixes the infinite loop in the mirred egress blockcast case.
Patch 7 fixes drop stats in early return error scenarios in tcf_mirred_act
for redirect (caught by Sashiko [1]).
Patches 8-9 add mirred and netem test cases.

[1] https://sashiko.dev/#/patchset/20260413082027.2244884-1-hxzene%40gmail.com
====================

Link: https://patch.msgid.link/20260525122556.973584-1-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoselftests/tc-testing: Add netem test case exercising loops
Victor Nogueira [Mon, 25 May 2026 12:25:56 +0000 (08:25 -0400)] 
selftests/tc-testing: Add netem test case exercising loops

Add a netem nested duplicate test case to validate that it won't
cause an infinite loop

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-10-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoselftests/tc-testing: Add mirred test cases exercising loops
Victor Nogueira [Mon, 25 May 2026 12:25:55 +0000 (08:25 -0400)] 
selftests/tc-testing: Add mirred test cases exercising loops

Add mirred loop test cases to validate that those will be caught and other
test cases that were previously misinterpreted as loops by mirred.

This commit adds 12 test cases:

- Redirect multiport: dummy egress -> dev1 ingress -> dummy egress (Loop)
- Redirect singleport: dev1 ingress -> dev1 egress -> dev1 ingress (Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dev1 egress (No Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dev1 ingress (Loop)
- Redirect multiport: dev1 ingress -> dummy egress -> dev1 ingress (Loop)
- Redirect multiport: dummy egress -> dev1 ingress -> dummy egress, different prios (Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dummy egress -> dev1 egress (No Loop)
- Redirect multiport: dev1 ingress -> dummy egress -> dev1 egress (No Loop)
- Redirect multiport: dev1 ingress -> dummy egress -> dummy ingress (No Loop)
- Redirect singleport: dev1 ingress -> dev1 ingress (Loop)
- Redirect singleport: dummy egress -> dummy ingress (No Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dummy egress (No Loop)

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-9-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agonet/sched: act_mirred: Fix return code in early mirred redirect error paths
Victor Nogueira [Mon, 25 May 2026 12:25:54 +0000 (08:25 -0400)] 
net/sched: act_mirred: Fix return code in early mirred redirect error paths

Since retval is set as TC_ACT_STOLEN in the mirred redirect case, returning
retval in cases where redirect failed will make the callers not register
the skb as being dropped.

Fix this by returning TC_ACT_SHOT instead in such scenarios.

Fixes: 16085e48cb48 ("net/sched: act_mirred: Create function tcf_mirred_to_dev and improve readability")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260413082027.2244884-1-hxzene%40gmail.com
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-8-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agonet/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow
Kito Xu (veritas501) [Mon, 25 May 2026 12:25:53 +0000 (08:25 -0400)] 
net/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow

tcf_mirred_act() checks sched_mirred_nest against MIRRED_NEST_LIMIT (4)
to prevent deep recursion.  However, when the action uses blockcast
(tcfm_blockid != 0), the function returns at the tcf_blockcast() call
BEFORE reaching the counter increment.  As a result, the recursion
counter never advances and the limit check is entirely bypassed.

When two devices share a TC egress block with a mirred blockcast rule,
a packet egressing on device A is mirrored to device B via blockcast;
device B's egress TC re-enters tcf_mirred_act() via blockcast and
mirrors back to A, creating an unbounded recursion loop:

  tcf_mirred_act -> tcf_blockcast -> tcf_mirred_to_dev -> dev_queue_xmit
  -> sch_handle_egress -> tcf_classify -> tcf_mirred_act -> (repeat)

This recursion continues until the kernel stack overflows.

The bug is reachable from an unprivileged user via
unshare(CLONE_NEWUSER | CLONE_NEWNET): user namespaces grant
CAP_NET_ADMIN in the new network namespace, which is sufficient to
create dummy devices, attach clsact qdiscs with shared blocks, and
install mirred blockcast filters.

 BUG: TASK stack guard page was hit at ffffc90000b7fff8
 Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI
 CPU: 2 UID: 1000 PID: 169 Comm: poc Not tainted 7.0.0-rc7-next-20260410
 RIP: 0010:xas_find+0x17/0x480
 Call Trace:
  xa_find+0x17b/0x1d0
  tcf_mirred_act+0x640/0x1060
  tcf_action_exec+0x400/0x530
  basic_classify+0x128/0x1d0
  tcf_classify+0xd83/0x1150
  tc_run+0x328/0x620
  __dev_queue_xmit+0x797/0x3100
  tcf_mirred_to_dev+0x7b1/0xf70
  tcf_mirred_act+0x68a/0x1060
  [repeating ~30+ times until stack overflow]
 Kernel panic - not syncing: Fatal exception in interrupt

Fix this by incrementing sched_mirred_nest before calling
tcf_blockcast() and decrementing it on return, mirroring the
non-blockcast path.  This ensures subsequent recursive entries see the
updated counter and are correctly limited by MIRRED_NEST_LIMIT.

Fixes: fe946a751d9b ("net/sched: act_mirred: add loop detection")
Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>
Link: https://patch.msgid.link/20260525122556.973584-7-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agonet/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop
Jamal Hadi Salim [Mon, 25 May 2026 12:25:52 +0000 (08:25 -0400)] 
net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop

When mirred redirects to ingress (from either ingress or egress) the loop
state from sched_mirred_dev array dev is lost because of 1) the packet
deferral into the backlog and 2) the fact the sched_mirred_dev array is
cleared. In such cases, if there was a loop we won't discover it.

Here's a simple test to reproduce:
ip a add dev port0 10.10.10.11/24

tc qdisc add dev port0 clsact
tc filter add dev port0 egress protocol ip \
   prio 10 matchall action mirred ingress redirect dev port1

tc qdisc add dev port1 clsact
tc filter add dev port1 ingress protocol ip \
   prio 10 matchall action mirred egress redirect dev port0

ping -c 1 -W0.01 10.10.10.10

Fixes: fe946a751d9b ("net/sched: act_mirred: add loop detection")
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-6-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agonet/sched: fix packet loop on netem when duplicate is on
Jamal Hadi Salim [Mon, 25 May 2026 12:25:51 +0000 (08:25 -0400)] 
net/sched: fix packet loop on netem when duplicate is on

When netem duplicates a packet it re-enqueues the copy at the root qdisc.
If another netem sits in the tree the copy can be duplicated
again, recursing until the stack or memory is exhausted.

The original duplication guard temporarily zeroed q->duplicate around
the re-enqueue, but that does not cover all cases because it is
per-qdisc state shared across all concurrent enqueue paths
and is not safe without additional locking.

Use the skb tc_depth field introduced in an earlier patch:
 - increment it on the duplicate before re-enqueue
 - skip duplication for any skb whose tc_depth is already non-zero.

This marks the packet itself rather than mutating qdisc state,
therefore it is safe regardless of tree topology or concurrency.

Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Closes: https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
Co-developed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: William Liu <will@willsroot.io>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-5-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoRevert "selftests/tc-testing: Add tests for restrictions on netem duplication"
Jamal Hadi Salim [Mon, 25 May 2026 12:25:50 +0000 (08:25 -0400)] 
Revert "selftests/tc-testing: Add tests for restrictions on netem duplication"

This reverts commit ecdec65ec78d67d3ebd17edc88b88312054abe0d.

The tests added were related to check_netem_in_tree() which was
just reverted in the previous patch.

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-4-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agonet/sched: Revert "net/sched: Restrict conditions for adding duplicating netems to...
Jamal Hadi Salim [Mon, 25 May 2026 12:25:49 +0000 (08:25 -0400)] 
net/sched: Revert "net/sched: Restrict conditions for adding duplicating netems to qdisc tree"

This reverts commit ec8e0e3d7adef940cdf9475e2352c0680189d14e.

The original patch rejects any tree containing two netems when
either has duplication set, even when they sit on unrelated classes
of the same classful parent. That broke configurations that have
worked since netem was introduced.

The re-entrancy problem the original commit was trying to solve is
handled by later patch using tc_depth flag.

Doing this revert will (re)expose the original bug with multiple
netem duplication. When this patch is backported make sure
and get the full series.

Fixes: ec8e0e3d7ade ("net/sched: Restrict conditions for adding duplicating netems to qdisc tree")
Reported-by: Ji-Soo Chung <jschung2@proton.me>
Reported-by: Gerlinde <lrGerlinde@mailfence.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220774
Reported-by: zyc zyc <zyc199902@zohomail.cn>
Closes: https://lore.kernel.org/all/19adda5a1e2.12410b78222774.9191120410578703463@zohomail.cn/
Reported-by: Manas Ghandat <ghandatmanas@gmail.com>
Closes: https://lore.kernel.org/netdev/f69b2c8f-8325-4c2e-a011-6dbc089f30e4@gmail.com/
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-3-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agonet: Introduce skb tc depth field to track packet loops
Jamal Hadi Salim [Mon, 25 May 2026 12:25:48 +0000 (08:25 -0400)] 
net: Introduce skb tc depth field to track packet loops

Add a 2-bit per-skb tc depth field to track packet loops across the stack.

The previous per-CPU loop counters like MIRRED_NEST_LIMIT
assume a single call stack and lose state in two cases:
1) When a packet is queued and reprocessed later (e.g., egress->ingress
   via backlog), the per-cpu state is gone by the time it is dequeued.
2) With XPS/RPS a packet may arrive on one CPU and be processed on
   another.

A per-skb field solves both by travelling with the packet itself.

The field fits in existing padding, using 2 bits that were previously a
hole:

pahole before(-) and after (+) diff looks like:
   __u8       slow_gro:1;           /*   132: 3  1 */
   __u8       csum_not_inet:1;      /*   132: 4  1 */
   __u8       unreadable:1;         /*   132: 5  1 */
 + __u8       tc_depth:2;           /*   132: 6  1 */

 - /* XXX 2 bits hole, try to pack */
   /* XXX 1 byte hole, try to pack */

   __u16      tc_index;             /*   134     2 */

There used to be a ttl field which was removed as part of tc_verd in commit
aec745e2c520 ("net-tc: remove unused tc_verd fields").  It was already
unused by that time, due to remove earlier in commit c19ae86a510c ("tc: remove
unused redirect ttl").

The first user of this field is netem, which increments tc_depth on
duplicated packets before re-enqueueing them at the root qdisc.  On
re-entry, netem skips duplication for any skb with tc_depth already set,
bounding recursion to a single level regardless of tree topology.

The other user is mirred which increments it on each pass
and limits to depth to MIRRED_DEFER_LIMIT (3).

The new field was called ttl in earlier versions of this patch
but renamed to tc_depth to avoid confusion with IP ttl.

Note (looking at you Sashiko! Dont ignore me and continue bringing this up):
1. Since both mirred and netem utilize the same 2-bit tc_depth field it is
   possible when netem and mirred are used together that netem qdisc to skip
   the duplication step. This is a known trade-off, as a 2-bit field cannot
   independently track both features' recursion depths and it is not considered
   sane to have a setup that addresses both features on at the same time.

2. skb_scrub_packet does not clear tc_depth. This means a packet's loop history
  is preserved even across namespaces. While this might be restrictive for
  some topologies, it is also design intent to provide robustness against loops
  across namespaces.

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-2-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoMerge branch 'locking/context' into locking/core
Peter Zijlstra [Thu, 28 May 2026 10:23:39 +0000 (12:23 +0200)] 
Merge branch 'locking/context' into locking/core

2 weeks agoseqlock: Allow UBSAN_ALIGNMENT to fail optimizing
Heiko Carstens [Tue, 19 May 2026 11:03:15 +0000 (13:03 +0200)] 
seqlock: Allow UBSAN_ALIGNMENT to fail optimizing

With gcc-15 and gcc-16 with UBSAN_ALIGNMENT enabled the compiler fails to
inline and optimize __scoped_seqlock_bug() away on s390:

s390x-16.1.0-ld: kernel/sched/build_policy.o: in function `__scoped_seqlock_next':
/.../seqlock.h:1286:(.text+0x22030): undefined reference to `__scoped_seqlock_bug'

Fix this by adding UBSAN_ALIGNMENT to the list of config options where a
not inlined empty __scoped_seqlock_bug() is allowed.

Closes: https://lore.kernel.org/r/20260515092057.810542-1-arnd@kernel.org/
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260519110315.1385307-1-hca@linux.ibm.com
2 weeks agocompiler-context-analysis: Bump required Clang version to 23
Marco Elver [Fri, 15 May 2026 12:43:31 +0000 (14:43 +0200)] 
compiler-context-analysis: Bump required Clang version to 23

Clang 23 introduces several major improvements:

1. Support for multiple arguments in the `guarded_by` and
   `pt_guarded_by` attributes [1]. This allows defining variables
   protected by multiple context locks, where read access requires
   holding at least one lock (shared or exclusive), and write access
   requires holding all of them exclusively.

2. Function pointer support [2]. We can now add attributes to function
   pointers just like we do on normal functions.

3. A fix to use arrays of locks [3]. Each index is now correctly treated
   as a separate lock instance.

4. A fix for implicit member access in attributes [4]. This allows to
   use __guarded_by(&foo->lock) correctly.

Overall that makes it worthwhile bumping the compiler version instead of
trying to make both Clang 22 and later work while supporting these new
features.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://github.com/llvm/llvm-project/pull/186838
Link: https://github.com/llvm/llvm-project/pull/191187
Link: https://github.com/llvm/llvm-project/pull/148551
Link: https://github.com/llvm/llvm-project/pull/194457
Link: https://patch.msgid.link/20260515124426.2227783-1-elver@google.com
2 weeks agos390/bug: Always emit format word in __BUG_ENTRY
Jan Polensky [Thu, 21 May 2026 12:01:32 +0000 (14:01 +0200)] 
s390/bug: Always emit format word in __BUG_ENTRY

When CONFIG_DEBUG_BUGVERBOSE is disabled, the s390 __BUG_ENTRY() macro
omits the format string pointer, so the generated __bug_table entry no
longer matches struct bug_entry.

With HAVE_ARCH_BUG_FORMAT enabled, the generic BUG infrastructure reads
bug_entry::format via bug_get_format(). If the format word is missing,
subsequent fields are read from the wrong offset, which may:
- Misinterpret flags (BUG vs WARN classification errors)
- Fault when dereferencing a misread format pointer

The root cause is that __BUG_ENTRY() delegates format word emission to
__BUG_ENTRY_VERBOSE(), which is conditional on CONFIG_DEBUG_BUGVERBOSE.

Fix this by moving the format field emission directly into __BUG_ENTRY()
so it is always emitted unconditionally. Remove the format parameter from
__BUG_ENTRY_VERBOSE() and keep only file/line emission conditional on
CONFIG_DEBUG_BUGVERBOSE.

Fixes: 2b71b8ab9718 ("s390/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED")
Signed-off-by: Jan Polensky <japo@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2 weeks agomm/slub: fix typo in sheaves comment
Wilson Zeng [Sat, 16 May 2026 16:40:33 +0000 (00:40 +0800)] 
mm/slub: fix typo in sheaves comment

Fix a typo in the comment describing oversize sheaves handling:
"area" should be "are".

Signed-off-by: Wilson Zeng <cheng20011202@gmail.com>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Link: https://patch.msgid.link/20260516164033.1566208-1-cheng20011202@gmail.com
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
2 weeks agomm, slab: simplify returning slab in __refill_objects_node()
Vlastimil Babka (SUSE) [Fri, 22 May 2026 14:23:21 +0000 (16:23 +0200)] 
mm, slab: simplify returning slab in __refill_objects_node()

When we return slabs to the partial list because we didn't fully refill
from them, we observe the min_partial limit when the returned slab is
empty, and discard it when over the limit. But it's unlikely for the
limit to be reached while we were refilling, and the worst outcome is to
have temporarily more free slabs on the list than necessary. So just
drop that code and simplify the function.

Link: https://patch.msgid.link/20260522-b4-refill-optimistic-return-v3-2-2ba78ec1c6ed@kernel.org
Reviewed-by: Hao Li <hao.li@linux.dev>
Tested-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
2 weeks agomm, slab: add an optimistic __slab_try_return_freelist()
Vlastimil Babka (SUSE) [Fri, 22 May 2026 14:23:20 +0000 (16:23 +0200)] 
mm, slab: add an optimistic __slab_try_return_freelist()

When we end up returning extraneous objects during refill to a slab
where we just did a get_freelist_nofreeze(), it is likely no other CPU
has freed objects to it meanwhile. We can then reattach the remainder of
the freelist without having to walk the (potentially cache cold)
freelist for finding its tail to connect slab->freelist to it.

Add a __slab_try_return_freelist() function that does that. As suggested
by Hao Li, it doesn't need to also return the slab to the partial list,
because there's code in __refill_objects_node() that already does that
for any slabs where we don't detach the freelist in the first place. So
we just put the slab back to the pc.slabs list. It's no longer likely
that the list will be empty now, so remove the unlikely() annotation.

However, also change that code to add to the tail of the partial list
instead of head to match what __slab_free() did and avoid a regression,
that was reported for the earlier version by the kernel test robot [1].
This change will also affect slabs which were grabbed from the partial
list and not refilled from even partially, but those should be much more
rare than a partial refill.

[1] https://lore.kernel.org/all/202605112204.9382cecf-lkp@intel.com/

Reviewed-by: Hao Li <hao.li@linux.dev>
Tested-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Link: https://patch.msgid.link/20260522-b4-refill-optimistic-return-v3-1-2ba78ec1c6ed@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
2 weeks agox86/kvm/vmx: Fix x86_64 CFI build
Peter Zijlstra [Tue, 26 May 2026 09:06:31 +0000 (11:06 +0200)] 
x86/kvm/vmx: Fix x86_64 CFI build

It was missed that idt_do_interrupt_irqoff() gets compiled on x84_64;
this is a problem for CFI builds because it includes an unadorned
indirect call. It is however completely dead code.

Rework things to not emit this function at all.

Fixes: 0701c9e17bd9 ("x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260526090631.GA4149641@noisy.programming.kicks-ass.net
2 weeks agorust: error: replace match + panic in const context with const expect
Daniel del Castillo [Wed, 8 Apr 2026 20:09:39 +0000 (22:09 +0200)] 
rust: error: replace match + panic in const context with const expect

This patch replaces an instance of match + panic with const expect,
which is now usable in const contexts after the MSRV was updated to
1.85.0 (it was available since Rust 1.83.0).

Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://github.com/Rust-for-Linux/linux/issues/1229
Signed-off-by: Daniel del Castillo <delcastillodelarosadaniel@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260408200949.99059-1-delcastillodelarosadaniel@gmail.com
[ Adjusted Git author's name with the Signed-off-by value. Reworded
  slightly and removed duplicated word. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2 weeks agogpio: realtek-otto: fix kernel-doc warnings
Rosen Penev [Thu, 28 May 2026 04:10:31 +0000 (21:10 -0700)] 
gpio: realtek-otto: fix kernel-doc warnings

Add the missing 'struct' keyword in the kernel-doc comment for
realtek_gpio_ctrl, and document the @cpumask_base and @cpu_irq_maskable
members that were added later but never described. Also fix the
mismatch between documented @imr_line_pos and the actual member name
line_imr_pos.

Fixes W=1 warning:

Warning: drivers/gpio/gpio-realtek-otto.c:66 cannot understand function prototype: 'struct realtek_gpio_ctrl'

Assisted-by: Opencode:BigPickle
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260528041031.728557-1-rosenp@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2 weeks agogpio: max77620: Unify usage of space and comma in platform_device_id array
Uwe Kleine-König (The Capable Hub) [Wed, 27 May 2026 14:57:29 +0000 (16:57 +0200)] 
gpio: max77620: Unify usage of space and comma in platform_device_id array

The most accepted style for the array terminator is to use a single
space between the curly braces and no trailing comma.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/985c86e80f35a944a4712f0c2ac8dd795868cdfb.1779893336.git.u.kleine-koenig@baylibre.com
[Bartosz: Fixed Uwe's S-B]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2 weeks agogpio: Use named initializers for platform_device_id arrays
Uwe Kleine-König (The Capable Hub) [Wed, 27 May 2026 14:57:28 +0000 (16:57 +0200)] 
gpio: Use named initializers for platform_device_id arrays

Named initializers are better readable and more robust to changes of the
struct definition. This robustness is relevant for a planned change to
struct platform_device_id replacing .driver_data by an anonymous unit.

While touching these arrays unify spacing and usage of commas.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/b8f7581e9311d5579447304ac4f2d557b29e4f9d.1779893336.git.u.kleine-koenig@baylibre.com
[Bartosz: Rebased on top of current linux-next where one of the drivers
no longer exists.]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2 weeks agogpio: cros-ec: Drop unused assignment of platform_device_id driver data
Uwe Kleine-König (The Capable Hub) [Wed, 27 May 2026 14:57:27 +0000 (16:57 +0200)] 
gpio: cros-ec: Drop unused assignment of platform_device_id driver data

The driver explicitly set the .driver_data member of struct
platform_device_id to zero without relying on that value. Drop this
unused assignments.

While touching this array unify spacing and use named initializers for
.name.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/06dfc8d1df46467269ee6113f161edac234e51cf.1779893336.git.u.kleine-koenig@baylibre.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2 weeks agoMerge tag 'alloc-7.2-rc1' of https://github.com/Rust-for-Linux/linux into rust-next
Miguel Ojeda [Thu, 28 May 2026 07:25:41 +0000 (09:25 +0200)] 
Merge tag 'alloc-7.2-rc1' of https://github.com/Rust-for-Linux/linux into rust-next

Pull alloc updates from Danilo Krummrich:

 - Fix the 'Vec::reserve()' doctest to properly account for the existing
   vector length in the capacity assertion.

 - Fix an incorrect operator in the 'Vec::extend_with()' SAFETY comment;
   add a doc test demonstrating basic usage and the zero-length case.

 - Cleanup all imports in the alloc module and its doctests to use the
   "kernel vertical" import style.

* tag 'alloc-7.2-rc1' of https://github.com/Rust-for-Linux/linux:
  rust: alloc: cleanup doctest imports to "kernel vertical" style
  rust: alloc: cleanup imports and use "kernel vertical" style
  rust: alloc: fix `Vec::extend_with` SAFETY comment
  rust: alloc: add doc test for `Vec::extend_with`
  rust: alloc: fix assert in `Vec::reserve` doc test

2 weeks agoiommu, debugobjects: avoid gcc-16.1 section mismatch warnings
Arnd Bergmann [Wed, 13 May 2026 14:53:54 +0000 (16:53 +0200)] 
iommu, debugobjects: avoid gcc-16.1 section mismatch warnings

gcc-16 has gained some more advanced inter-procedual optimization
techniques that enable it to inline the dummy_tlb_add_page() and
dummy_tlb_flush() function pointers into a specialized version of
__arm_v7s_unmap:

WARNING: modpost: vmlinux: section mismatch in reference: __arm_v7s_unmap+0x2cc (section: .text) -> dummy_tlb_add_page (section: .init.text)
ERROR: modpost: Section mismatches detected.

>From what I can tell, the transformation is correct, as this is only
called when __arm_v7s_unmap() is called from arm_v7s_do_selftests(),
which is also __init. Since __arm_v7s_unmap() however is not __init,
gcc cannot inline the inner function calls directly.

In debug_objects_selftest(), the same thing happens. Both the
caller and the leaf function are __init, but the IPA pulls
it into a non-init one:

WARNING: modpost: vmlinux: section mismatch in reference: lookup_object_or_alloc+0x7c (section: .text.lookup_object_or_alloc) -> is_static_object (section: .init.text)

Marking the affected functions as not "__init" would reliably avoid this
issue but is not a good solution because it removes an otherwise correct
annotation. I tried marking the functions as 'noinline', but that ended
up not covering all the affected configurations.

With some more experimenting, I found that marking these functions as
__attribute__((noipa)) is both logical and reliable.

In order to keep the syntax readable, add a custom macro for this in
include/linux/compiler_attributes.h next to other related macros and
use it to annotate both files.

Link: https://lore.kernel.org/all/abRB6g-48ZX6Yl2r@willie-the-truck/
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: linux-kbuild@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2 weeks agoplatform/chrome: cros_ec_chardev: Introduce rwsem for protecting ec_dev
Tzung-Bi Shih [Mon, 25 May 2026 05:26:54 +0000 (05:26 +0000)] 
platform/chrome: cros_ec_chardev: Introduce rwsem for protecting ec_dev

Introduce a rwsem for protecting `ec_dev` to prevent Use-After-Free on
the `ec_dev`.

- Writers: In driver's probe() and remove().
- Readers: In file operations.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260525052654.4076429-5-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2 weeks agoplatform/chrome: cros_ec_chardev: Add event relayer
Tzung-Bi Shih [Mon, 25 May 2026 05:26:53 +0000 (05:26 +0000)] 
platform/chrome: cros_ec_chardev: Add event relayer

Introduce an event relayer mechanism.  Instead of each open file
registering directly with `ec_dev->event_notifier`, the platform device
registers a single relayer notifier.  Individual files then register
with a local subscribers list in `chardev_pdata`.

This allows the driver to safely disconnect from the event chain
`ec_dev->event_notifier` during cros_ec_chardev_remove(), preventing
events from being delivered to open files after the device is removed,
while still allowing those files to be closed safely later.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260525052654.4076429-4-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2 weeks agoplatform/chrome: cros_ec_chardev: Move data to chardev_pdata
Tzung-Bi Shih [Mon, 25 May 2026 05:26:52 +0000 (05:26 +0000)] 
platform/chrome: cros_ec_chardev: Move data to chardev_pdata

Move `ec_dev` and `cmd_offset` from `chardev_priv` to `chardev_pdata` as
they are per-device properties but not per-open-file properties.

Hold a reference to `chardev_pdata` for each open file to ensure the
data remains valid even if the underlying platform device is removed.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260525052654.4076429-3-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2 weeks agoplatform/chrome: cros_ec_chardev: Introduce chardev_data
Tzung-Bi Shih [Mon, 25 May 2026 05:26:51 +0000 (05:26 +0000)] 
platform/chrome: cros_ec_chardev: Introduce chardev_data

Introduce struct chardev_pdata to hold platform driver data.

The platform driver data is allocated by kzalloc() instead of devm
variant, allowing for managed cleanup that can eventually extend beyond
device removal if files are still open.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260525052654.4076429-2-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2 weeks agoipv6: rpl: fix hdrlen overflow in ipv6_rpl_srh_decompress()
Rahul Chandelkar [Mon, 25 May 2026 15:40:31 +0000 (21:10 +0530)] 
ipv6: rpl: fix hdrlen overflow in ipv6_rpl_srh_decompress()

ipv6_rpl_srh_decompress() computes:

    outhdr->hdrlen = (((n + 1) * sizeof(struct in6_addr)) >> 3);

hdrlen is __u8. For n >= 127 the result exceeds 255 and silently
truncates. With n=127 (cmpri=15, cmpre=15, pad=0, hdrlen=16):

    (128 * 16) >> 3 = 256, truncated to 0 as __u8

The caller in ipv6_rpl_srh_rcv() then places the compressed header
at buf + ((ohdr->hdrlen + 1) << 3). With hdrlen=0 this is buf + 8,
but the decompressed region occupies buf[0..2055] (8-byte header
plus 128 full addresses). The compressed header overlaps the
decompressed data, and ipv6_rpl_srh_compress() writes into this
overlap, corrupting the routing header of the forwarded packet.

The existing guard at exthdrs.c:546 checks (n + 1) > 255, which
prevents n+1 from overflowing unsigned char (the segments_left
field), but does not prevent the computed hdrlen from overflowing
__u8. n=127 passes because 128 <= 255, yet hdrlen=256 does not
fit.

Tighten the bound to (n + 1) > 127. This caps n at 126, giving
hdrlen = (127 * 16) >> 3 = 254, which fits in __u8. The compressed
header then lands at buf + ((254 + 1) << 3) = buf + 2040, exactly
past the decompressed region (buf[0..2039]). No overlap. 127
segments is well beyond any realistic RPL deployment.

Fixes: 8610c7c6e3bd ("net: ipv6: add support for rpl sr exthdr")
Signed-off-by: Rahul Chandelkar <rc@rexion.ai>
Link: https://patch.msgid.link/20260525154031.2290876-1-rc@rexion.ai
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'ethtool-more-bug-fixes'
Jakub Kicinski [Thu, 28 May 2026 00:42:18 +0000 (17:42 -0700)] 
Merge branch 'ethtool-more-bug-fixes'

Jakub Kicinski says:

====================
ethtool: more bug fixes

Last week I sent two patch sets - one fixing bugs in RSS handling,
and one fixing CMIS / module handling. This set contains the remaining
fixes. There's a concentration of fixes around PHY and timestamp config
handling but not enough to break those out as separate sets.
====================

Link: https://patch.msgid.link/20260526153533.2779187-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: eeprom: add more safeties to EEPROM Netlink fallback
Jakub Kicinski [Tue, 26 May 2026 15:35:33 +0000 (08:35 -0700)] 
ethtool: eeprom: add more safeties to EEPROM Netlink fallback

The Netlink fallback path for reading module EEPROM
(fallback_set_params()) validates that offset < eeprom_len,
but does not check that offset + length stays within eeprom_len.
The ioctl equivalent (ethtool_get_any_eeprom() in ioctl.c) has
always enforced both bounds:

  if (eeprom.offset + eeprom.len > total_len)
      return -EINVAL;

This could lead to surprises in both drivers and device FW.
Add the missing offset + length validation to fallback_set_params(),
mirroring the ioctl.

Similarly - ethtool core in general, and ethtool_get_any_eeprom()
in particular tries to zero-init all buffers passed to the drivers
to avoid any extra work of zeroing things out. eeprom_fallback()
uses a plain kmalloc(), change it to zalloc.

Fixes: 96d971e307cc ("ethtool: Add fallback to get_module_eeprom from netlink command")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: eeprom: add missing ethnl_ops_begin() / _complete() during fallback
Jakub Kicinski [Tue, 26 May 2026 15:35:32 +0000 (08:35 -0700)] 
ethtool: eeprom: add missing ethnl_ops_begin() / _complete() during fallback

All ethtool driver op calls should be sandwiched between
ethnl_ops_begin() / ethnl_ops_complete(). In Netlink eeprom code,
if the paged access failed we fall back to old API, but we
first call _complete() and the fallback never does its own
ethnl_ops_begin(). Move the fallback into the _begin() / _complete()
section.

Fixes: 96d971e307cc ("ethtool: Add fallback to get_module_eeprom from netlink command")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: strset: fix header attribute index in ethnl_req_get_phydev()
Jakub Kicinski [Tue, 26 May 2026 15:35:31 +0000 (08:35 -0700)] 
ethtool: strset: fix header attribute index in ethnl_req_get_phydev()

strset_prepare_data() passes ETHTOOL_A_HEADER_FLAGS (3) as the header
attribute to ethnl_req_get_phydev(). This is incorrect, in the main
attr space 3 is ETHTOOL_A_STRSET_COUNTS_ONLY, not the request
header attr. The correct constant is ETHTOOL_A_STRSET_HEADER (1).

ethnl_req_get_phydev() only uses this value for the extack,
so this is not a "functionally visible"(?) bug.

Fixes: e96c93aa4be9 ("net: ethtool: strset: Allow querying phy stats by index")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: tsinfo: don't pass ERR_PTR to genlmsg_cancel on prepare failure
Jakub Kicinski [Tue, 26 May 2026 15:35:30 +0000 (08:35 -0700)] 
ethtool: tsinfo: don't pass ERR_PTR to genlmsg_cancel on prepare failure

The goto err label leads to:

genlmsg_cancel(skb, ehdr);
return ret;

If ethnl_tsinfo_prepare_dump() failed, it has not started a genlmsg.
There's nothing to cancel, and passing an error pointer to
genlmsg_cancel() would cause a crash.

Fixes: b9e3f7dc9ed9 ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: tsinfo: fix uninitialized stats on the by-PHC path
Jakub Kicinski [Tue, 26 May 2026 15:35:29 +0000 (08:35 -0700)] 
ethtool: tsinfo: fix uninitialized stats on the by-PHC path

tsinfo_prepare_data() has two code paths: a "by-PHC" path for
user-specified hardware timestamping providers, and the old path.
Commit 89e281ebff72 ("ethtool: init tsinfo stats if requested") added
ethtool_stats_init() to mark stat slots as ETHTOOL_STAT_NOT_SET before
the driver callback populates them, but placed the call inside the
old-path block.

When commit b9e3f7dc9ed9 ("net: ethtool: tsinfo: Enhance tsinfo to
support several hwtstamp by net topology") added the by-PHC early
return, it landed above the stats initialization. On that path
the stats array retains the zero-fill from ethnl_init_reply_data()'s
zalloc. This leads to the reply including a stats nest with four
zero-valued attributes that should have been absent.

Reject GET requests for stats with HWTSTAMP_PROVIDER or dump.

Fixes: b9e3f7dc9ed9 ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: tsconfig: fix missing ethnl_ops_complete()
Jakub Kicinski [Tue, 26 May 2026 15:35:28 +0000 (08:35 -0700)] 
ethtool: tsconfig: fix missing ethnl_ops_complete()

tsconfig_prepare_data() calls ethnl_ops_begin(), we need to call
ethnl_ops_complete() before returning the error.

Fixes: 6e9e2eed4f39 ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config")
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: pse-pd: fix missing ethnl_ops_complete()
Jakub Kicinski [Tue, 26 May 2026 15:35:27 +0000 (08:35 -0700)] 
ethtool: pse-pd: fix missing ethnl_ops_complete()

pse_prepare_data() is missing ethnl_ops_complete() if
ethnl_req_get_phydev() returned an error. Move getting
phydev up so that we don't have to worry about this
(similar order to linkstate_prepare_data()).

Note that phydev may still be NULL (this is checked in
pse_get_pse_attributes()), the goal isn't really to avoid
the _begin() / _complete() calls, only to simplify the error
handling.

While at it propagate the original error. Why this code
overrides the error with -ENODEV but !phydev generates
-EOPNOTSUPP is unclear to me...

Fixes: 31748765bed3 ("net: ethtool: pse-pd: Target the command to the requested PHY")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: linkstate: fix unbalanced ethnl_ops_complete() on PHY lookup error
Jakub Kicinski [Tue, 26 May 2026 15:35:26 +0000 (08:35 -0700)] 
ethtool: linkstate: fix unbalanced ethnl_ops_complete() on PHY lookup error

linkstate_prepare_data() calls ethnl_req_get_phydev() before
ethnl_ops_begin(), but routes its error path through "goto out"
which calls ethnl_ops_complete().

Fixes: fe55b1d401c6 ("ethtool: linkstate: migrate linkstate functions to support multi-PHY setups")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: tsconfig: fix reply error handling
Jakub Kicinski [Tue, 26 May 2026 15:35:25 +0000 (08:35 -0700)] 
ethtool: tsconfig: fix reply error handling

A couple of trivial bugs in error handling in tsconfig_send_reply().
If we failed to allocate rskb we need to set the error.
If we did allocate it but failed to send it - we need to remember
to free it.

Fixes: 6e9e2eed4f39 ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config")
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES
Jakub Kicinski [Tue, 26 May 2026 15:35:24 +0000 (08:35 -0700)] 
ethtool: coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES

ethnl_update_profile() walks the ETHTOOL_A_PROFILE_IRQ_MODERATION
nest list with an index 'i' and writes new_profile[i++] without
bounding i. The destination is kmemdup()'d at NET_DIM_PARAMS_NUM_PROFILES
entries (5), but the Netlink nest count is entirely user-controlled.
Netlink policies do not have support for constraining the number
of nested entries (or number of multi-attr entries).

Fixes: f750dfe825b9 ("ethtool: provide customized dim profile management")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260526153533.2779187-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'bridge-fix-sleep-in-atomic-context'
Jakub Kicinski [Thu, 28 May 2026 00:23:07 +0000 (17:23 -0700)] 
Merge branch 'bridge-fix-sleep-in-atomic-context'

Ido Schimmel says:

====================
bridge: Fix sleep in atomic context

Under certain circumstances the bridge driver can call
dev_set_promiscuity() while holding the bridge spin lock. This is a
problem as dev_set_promiscuity() might sleep.

Patches #1-#2 fix the problem in the netlink and sysfs configuration
paths by only taking the lock where it is actually needed, thereby
avoiding calling dev_set_promiscuity() from an atomic context.

Patch #3 adds test cases for both configuration paths in rtnetlink.sh
which already includes test cases for similar issues.

Note that dev_set_promiscuity() can sleep either when it takes the net
device mutex or when calling netif_rx_mode_sync(). I encountered the
problem with the latter, but blamed the former since it came earlier.
====================

Link: https://patch.msgid.link/20260526064818.272516-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: rtnetlink: Add bridge promiscuity tests
Ido Schimmel [Tue, 26 May 2026 06:48:18 +0000 (09:48 +0300)] 
selftests: rtnetlink: Add bridge promiscuity tests

Add two test cases that always pass, but trigger sleeping in atomic
context BUGs without "bridge: Fix sleep in atomic context in netlink
path" and "bridge: Fix sleep in atomic context in sysfs path".

Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260526064818.272516-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobridge: Fix sleep in atomic context in sysfs path
Ido Schimmel [Tue, 26 May 2026 06:48:17 +0000 (09:48 +0300)] 
bridge: Fix sleep in atomic context in sysfs path

Since the start of the git history, brport_store() always acquired the
bridge lock. Back then this decision made sense: The bridge lock
protects the STP state of the bridge and its ports and at that time the
function was only used by two STP related attributes (cost and
priority).

Nowadays, brport_store() processes a lot more attributes and most of
them do not need the bridge lock:

* Bridge flags: Only require RTNL. Read locklessly by the data path.
  Annotations can be added in net-next.

* FDB port flushing: Only requires the FDB lock.

* Multicast attributes: Only require the multicast lock.

* Group forward mask: Only requires RTNL. Read locklessly by the data
  path. Annotations can be added in net-next.

* Backup port: Only requires RTNL. Read locklessly by the data path.

This is a problem as the bridge calls dev_set_promiscuity() when certain
bridge port flags change and this function can sleep since the commit
cited below, resulting in a splat such as [1].

Fix this by reducing the scope of the bridge lock and only take it when
processing the two STP related attributes that require it. Remove the
now stale comment from br_switchdev_set_port_flag(). The
SWITCHDEV_F_DEFER flag can be removed in net-next.

[1]
BUG: sleeping function called from invalid context at net/core/dev_addr_lists.c:1262
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 372, name: bash
preempt_count: 201, expected: 0
RCU nest depth: 0, expected: 0
5 locks held by bash/372:
#0: ffff88810c51c3f0 (sb_writers#7){.+.+}-{0:0}, at: ksys_write (fs/read_write.c:740)
#1: ffff888115ce9480 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter (fs/kernfs/file.c:343)
#2: ffff88810b9fd330 (kn->active#37){.+.+}-{0:0}, at: kernfs_fop_write_iter (fs/kernfs/file.c:80 fs/kernfs/file.c:344)
#3: ffffffffa59473a0 (rtnl_mutex){+.+.}-{4:4}, at: brport_store (net/bridge/br_sysfs_if.c:326)
#4: ffff8881099d2d58 (&br->lock){+...}-{3:3}, at: brport_store (./include/linux/spinlock.h:348 net/bridge/br_sysfs_if.c:345)
Preemption disabled at:
 0x0
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
__might_resched.cold (kernel/sched/core.c:9163)
netif_rx_mode_run (net/core/dev_addr_lists.c:1262)
netif_rx_mode_sync (net/core/dev_addr_lists.c:1428)
dev_set_promiscuity (net/core/dev_api.c:289)
br_manage_promisc (net/bridge/br_if.c:135 net/bridge/br_if.c:172)
br_port_flags_change (net/bridge/br_if.c:242 net/bridge/br_if.c:747)
store_learning (net/bridge/br_sysfs_if.c:79 net/bridge/br_sysfs_if.c:235)
brport_store (net/bridge/br_sysfs_if.c:346)
kernfs_fop_write_iter (fs/kernfs/file.c:352)
new_sync_write (fs/read_write.c:595)
vfs_write (fs/read_write.c:688)
ksys_write (fs/read_write.c:740)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)

Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260526064818.272516-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobridge: Fix sleep in atomic context in netlink path
Ido Schimmel [Tue, 26 May 2026 06:48:16 +0000 (09:48 +0300)] 
bridge: Fix sleep in atomic context in netlink path

Since the introduction of the netlink configuration path for bridge
ports in commit 25c71c75ac87 ("bridge: bridge port parameters over
netlink"), br_setport() was always called with the bridge lock held
around it. Back then this decision made sense: The bridge lock protects
the STP state of the bridge and its ports and at that time the function
only processed three STP related netlink attributes (cost, priority and
state).

Nowadays, br_setport() processes a lot more attributes and most of them
do not need the bridge lock:

* Bridge flags: Only require RTNL. Read locklessly by the data path.
  Annotations can be added in net-next.

* FDB port flushing: Only requires the FDB lock.

* Multicast attributes: Only require the multicast lock.

* Group forward mask: Only requires RTNL. Read locklessly by the data
  path. Annotations can be added in net-next.

* Backup port and NHID: Only require RTNL. Read locklessly by the data
  path.

This is a problem as the bridge calls dev_set_promiscuity() when certain
bridge port flags change and this function can sleep since the commit
cited below, resulting in a splat such as [1].

Fix this by reducing the scope of the bridge lock and only take it when
processing the three STP related attributes that require it. This is
consistent with the multicast attributes where each attribute acquires
the multicast lock instead of having one critical section for all
relevant attributes.

[1]
BUG: sleeping function called from invalid context at net/core/dev_addr_lists.c:1262
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 356, name: bridge
preempt_count: 201, expected: 0
RCU nest depth: 0, expected: 0
2 locks held by bridge/356:
#0: ffffffff919473a0 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg (net/core/rtnetlink.c:80 net/core/rtnetlink.c:7002)
#1: ffff888115072d58 (&br->lock){+...}-{3:3}, at: br_setlink (./include/linux/spinlock.h:348 net/bridge/br_netlink.c:1117)
Preemption disabled at:
 0x0
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
__might_resched.cold (kernel/sched/core.c:9163)
netif_rx_mode_run (net/core/dev_addr_lists.c:1262)
netif_rx_mode_sync (net/core/dev_addr_lists.c:1428)
dev_set_promiscuity (net/core/dev_api.c:289)
br_manage_promisc (net/bridge/br_if.c:135 net/bridge/br_if.c:172)
br_port_flags_change (net/bridge/br_if.c:242 net/bridge/br_if.c:747)
br_setport (net/bridge/br_netlink.c:1000)
br_setlink (net/bridge/br_netlink.c:1118)
rtnl_bridge_setlink (net/core/rtnetlink.c:5572)
rtnetlink_rcv_msg (net/core/rtnetlink.c:7005)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
netlink_unicast (net/netlink/af_netlink.c:1318 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sock_sendmsg (net/socket.c:787 (discriminator 4) net/socket.c:802 (discriminator 4))
____sys_sendmsg (net/socket.c:2698)
___sys_sendmsg (net/socket.c:2752)
__sys_sendmsg (net/socket.c:2784)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)

Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260526064818.272516-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobonding: refuse to enslave CAN devices
Oliver Hartkopp [Tue, 26 May 2026 19:33:19 +0000 (21:33 +0200)] 
bonding: refuse to enslave CAN devices

syzbot reported a kernel paging request crash in
can_rx_unregister() inside net/can/af_can.c. The crash occurs
because a virtual CAN device (vxcan) is being enslaved to a
bonding master.

During the enslavement process, the bonding driver mutates
and modifies the network device states to fit an Ethernet-like
aggregation model. However, CAN devices operate on a completely
different Layer 2 architecture, relying on the CAN mid-layer
private data structure (can_ml_priv) instead of standard
Ethernet structures. Since bonding does not initialize or
maintain these CAN structures, subsequent operations on the
half-enslaved interface (such as closing associated sockets
via isotp_release) lead to a null-pointer dereference when
accessing the CAN receiver lists.

Bonding CAN interfaces is architecturally invalid as CAN lacks
MAC addresses, ARP capabilities, and standard Ethernet
link-layer mechanisms. While generic loopback devices are
blocked globally in net/core/dev.c, virtual CAN devices
bypass this check because they do not carry the IFF_LOOPBACK
flag, despite acting as local software-loopbacks.

Fix this by explicitly blocking network devices of type
ARPHRD_CAN from being enslaved at the very beginning of
bond_enslave(). This prevents illegal state mutations,
eliminates the resulting KASAN crashes, and avoids potential
memory leaks from incomplete socket cleanups.

As the CAN support has been added a long time after bonding
the Fixes-tag points to the introduction of ARPHRD_CAN that
would have needed a specific handling in bonding_main.c.

Fixes: cd05acfe65ed ("[CAN]: Allocate protocol numbers for PF_CAN")
Reported-by: syzbot+8ed98cbd0161632bce95@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8ed98cbd0161632bce95
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20260526-bonding-candev-v1-1-ba1df400918a@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agorun-clang-tools: run multiprocessing.Pool as context manager
Philipp Hahn [Fri, 15 May 2026 12:47:50 +0000 (14:47 +0200)] 
run-clang-tools: run multiprocessing.Pool as context manager

`multiprocessing.pool.Pool()` should be used as a context manager so
Python can free its internal resources and do a proper cleanup.[1]

While at it move the code to read the `compiler_commands.json` so the
opened file can be closed before the sub-processes are fork()ed.

Link: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
Link: https://patch.msgid.link/40180613bef84946c45d6fbeb4bb274573cd0beb.1778849135.git.phahn-oss@avm.de
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agomshv: support 1G hugepages by passing them as 2M-aligned chunks
Anirudh Rayabharam (Microsoft) [Wed, 13 May 2026 13:25:56 +0000 (13:25 +0000)] 
mshv: support 1G hugepages by passing them as 2M-aligned chunks

The hypervisor's map GPA hypercall coalesces contiguous 2M-aligned
chunks into 1G mappings when alignment permits, so the driver can
support 1G hugepages by feeding them in as 2M chunks. Note that this
is the only way to make 1G mappings; there is no way to directly map
a 1G hugepage using the hypercall.

Always emit a 2M (PMD_ORDER) stride for the huge-page case. The
hypercall has no 1G stride, so 1G folios are processed as a
sequence of 2M chunks. Folios whose order is less than PMD_ORDER
(e.g. mTHP) fall back to single-page stride; mapping them as 2M
would fail in the hypervisor anyway.

Assisted-by: Copilot-CLI:claude-opus-4.7
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2 weeks agoDrivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs
Dexuan Cui [Thu, 7 May 2026 21:28:38 +0000 (14:28 -0700)] 
Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs

If vmbus_reserve_fb() in the kdump/kexec kernel fails to properly reserve
the framebuffer MMIO range (which is below 4GB) due to a Gen2 VM's
screen.lfb_base being zero [1], there is an MMIO conflict between the
drivers hyperv-drm and pci-hyperv: when the driver pci-hyperv's
hv_allocate_config_window() calls vmbus_allocate_mmio() to get an
MMIO range, typically it gets a 32-bit MMIO range that overlaps with the
framebuffer MMIO range, and later hv_pci_enter_d0() fails with an
error message "PCI Pass-through VSP failed D0 Entry with status" since
the host thinks that PCI devices must not use MMIO space that the
host has assigned to the framebuffer.

This is especially an issue if pci-hyperv is built-in and hyperv-drm is
built as a module. Consequently, the kdump/kexec kernel fails to detect
PCI devices via pci-hyperv, and may fail to mount the root file system,
which may reside in a NVMe disk. The issue described here has existed
for SR-IOV VF NICs since day one of the pci-hyperv driver, and has been
worked around on x64 when possible. With the recent introduction of
ARM64 VMs that boot from NVMe, there is no workaround, so we need a
formal fix.

On Gen2 VMs, if the screen.lfb_base is 0 in the kdump/kexec kernel [1],
fall back to the low MMIO base, which should be equal to the framebuffer
MMIO base [2] (the statement is true according to my testing on x64
Windows Server 2016, and on x64 and ARM64 Windows Server 2025 and on
Azure. I checked with the Hyper-V team and they said the statement should
continue to be true for Gen2 VMs). In the first kernel, screen.lfb_base
is not 0; if the user specifies a very high resolution, it's not enough
to only reserve 8MB: let's always reserve half of the space below 4GB,
but cap the reservation to 128MB, which is the required framebuffer size
of the highest resolution 7680*4320 supported by Hyper-V.

While at it, fix the comparison "end > VTPM_BASE_ADDRESS" by changing
the > to >=. Here the 'end' is an inclusive end (typically, it's
0xFFFF_FFFF for the low MMIO range).

Note: vmbus_reserve_fb() now also reserves an MMIO range at the beginning
of the low MMIO range on CVMs, which have no framebuffers (the
'screen.lfb_base' in vmbus_reserve_fb() is 0 for CVMs), just in case the
host might treat the beginning of the low MMIO range specially [3]. BTW,
the OpenHCL kernel is not affected by the change, because that kernel
boots with DeviceTree rather than ACPI (so vmbus_reserve_fb() won't run
there), and there is no framebuffer device for that kernel.

Note: normally Gen1 VMs don't have the MMIO conflict issue because the
framebuffer MMIO range (which is hardcoded to base=4GB-128MB and
size=64MB for Gen1 VMs by the host) is always reported via the legacy PCI
graphics device's BAR, so the kdump/kexec kernel can reserve the 64MB
MMIO range; however, if the VM is configured to use a very high resolution
and the required framebuffer size exceeds 64MB (AFAIK, in practice, this
isn't a typical configuration by users), the hyperv-drm driver may need to
allocate an MMIO range above 4GB and change the framebuffer MMIO location
to the allocated MMIO range -- in this case, there can still be issues [4]
which can't be easily fixed: any possible affected Gen1 users would have
to use a resolution whose framebuffer size is <= 64MB, or switch to Gen2
VMs.

[1] https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
[2] https://lore.kernel.org/all/SA1PR21MB69218F955B62DFF62E3E88D2BF222@SA1PR21MB6921.namprd21.prod.outlook.com/
[3] https://lore.kernel.org/all/SN6PR02MB415726B17D5A6027CD1717E8D4342@SN6PR02MB4157.namprd02.prod.outlook.com/
[4] https://lore.kernel.org/all/SA1PR21MB69213486F821CA5A2C793C81BF342@SA1PR21MB6921.namprd21.prod.outlook.com/

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
CC: stable@vger.kernel.org
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2 weeks agomshv: use kmalloc_array in mshv_root_scheduler_init
Can Peng [Wed, 20 May 2026 07:16:32 +0000 (15:16 +0800)] 
mshv: use kmalloc_array in mshv_root_scheduler_init

Replace kmalloc() with kmalloc_array() to prevent potential
overflow, as recommended in Documentation/process/deprecated.rst.

No functional change.

Signed-off-by: Can Peng <pengcan@kylinos.cn>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2 weeks agox86/ftrace: Relocate %rip-relative percpu refs in dynamic trampolines
Alexis Lothoré (eBPF Foundation) [Wed, 27 May 2026 19:12:31 +0000 (21:12 +0200)] 
x86/ftrace: Relocate %rip-relative percpu refs in dynamic trampolines

With CONFIG_CALL_DEPTH_TRACKING enabled on an x86 retbleed-affected platform
(eg: Skylake), with retbleed=stuff, registering a dynamic ftrace trampoline
crashes on the first call into the traced function:

  BUG: unable to handle page fault for address: ffff88817ae18880
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 4b53067 P4D 4b53067 PUD 0
  Oops: Oops: 0002 [#1] SMP PTI
  CPU: 3 UID: 0 PID: 187 Comm: usleep Not tainted 7.0.10 #243 PREEMPT(full)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014
  Code: 24 78 00 00 00 00 48 89 ea 48 89 54 24 20 48 8b b4 24 b8 00 00 00 48 8b bc 24 b0 00 00 00 48 89 bc 24 80 00 00 00 48 83 ef 05 <65> 48 c1 3d 1f a8 b6 02 05 48 8b 15 f6 00 00 00 4c 89 3c 24 4c 89
  Call Trace:
   <TASK>
   ? find_held_lock
   ? exc_page_fault
   ? lock_release
   ? __x64_sys_clock_nanosleep
   ? lockdep_hardirqs_on_prepare
   ? trace_hardirqs_on
   __x64_sys_clock_nanosleep
   do_syscall_64
   ? exc_page_fault
   ? call_depth_return_thunk
   entry_SYSCALL_64_after_hwframe
  ...
  Kernel panic - not syncing: Fatal exception

This small reproducer allows to easily trigger the crash:

  # echo 'p __x64_sys_clock_nanosleep' > /sys/kernel/tracing/kprobe_events
  # echo 1 > /sys/kernel/tracing/events/kprobes/p___x64_sys_clock_nanosleep_0/enable
  # usleep 1

Monitoring the crash under GDB points to the exact instruction in charge of
incrementing the call depth:

  sarq $5, %gs:__x86_call_depth(%rip)

This instruction matches the one inserted by the ftrace_regs_caller from
ftrace_64.S. This emitted code was likely working fine until the introduction
of

  59bec00ace28 ("x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()"):

it has made the call depth accounting addressing relative to $rip, instead of
being based on an absolute address.

As this code exact location depends on where the trampoline lives in memory,
the corresponding displacement needs to be adjusted at runtime to actually
correctly find the per-cpu __x86_call_depth value, otherwise the targeted
address is wrong, leading to the page fault seen above.

Fix the %rip-relative displacement of the copied CALL_DEPTH_ACCOUNT
instruction (from ftrace_regs_caller) by calling text_poke_apply_relocation(),
as it is done for example by the x86 BPF JIT compiler through
x86_call_depth_emit_accounting(). This corrects both CALL_DEPTH_ACCOUNT slots,
in ftrace_caller and ftrace_regs_caller.

  [ bp: Massage. ]

Fixes: 59bec00ace28 ("x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()")
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/20260527-fix_call_depth_in_trampoline-v1-1-1c1abc8ae310@bootlin.com
2 weeks agocompiler-clang.h: Drop explicit version number from "all" diagnostic macro
Nathan Chancellor [Sun, 17 May 2026 23:05:19 +0000 (13:05 -1000)] 
compiler-clang.h: Drop explicit version number from "all" diagnostic macro

This is more consistent with what commit 7efa84b5cdd6 ("compiler-gcc.h:
Introduce __diag_GCC_all") did for GCC.

Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-16-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agocompiler-clang.h: Remove __cleanup -Wunused-variable workaround
Nathan Chancellor [Sun, 17 May 2026 23:05:18 +0000 (13:05 -1000)] 
compiler-clang.h: Remove __cleanup -Wunused-variable workaround

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the redefinition of __cleanup with
__maybe_unused added to it is unnecessary because the referenced LLVM
change is present in all supported LLVM versions. Drop it.

Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-15-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agokbuild: Remove check for broken scoping with clang < 17 in CC_HAS_ASM_GOTO_OUTPUT
Nathan Chancellor [Sun, 17 May 2026 23:05:17 +0000 (13:05 -1000)] 
kbuild: Remove check for broken scoping with clang < 17 in CC_HAS_ASM_GOTO_OUTPUT

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the check added to CC_HAS_ASM_GOTO_OUTPUT by
commit e2ffa15b9baa ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang <
17") can be removed, as the issue it detects is guaranteed to be fixed.

Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-14-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agox86/entry/vdso32: Remove conditional omission of '.cfi_offset eflags'
Nathan Chancellor [Sun, 17 May 2026 23:05:16 +0000 (13:05 -1000)] 
x86/entry/vdso32: Remove conditional omission of '.cfi_offset eflags'

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the inclusion condition added by

  3e30278e0c71 ("x86/entry/vdso32: Omit '.cfi_offset eflags' for LLVM < 16")

will always be true. Revert the change to clean up the source code.

Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-13-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agox86/module: Revert "Deal with GOT based stack cookie load on Clang < 17"
Nathan Chancellor [Sun, 17 May 2026 23:05:15 +0000 (13:05 -1000)] 
x86/module: Revert "Deal with GOT based stack cookie load on Clang < 17"

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the workaround added by

  78c4374ef8b8 ("x86/module: Deal with GOT based stack cookie load on Clang < 17")

will never be included, as the final clause in the preprocessor
conditional is always false. Revert the change to clean up the dead
code.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-12-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agox86/build: Drop unnecessary '-ffreestanding' addition to KBUILD_CFLAGS
Nathan Chancellor [Sun, 17 May 2026 23:05:14 +0000 (13:05 -1000)] 
x86/build: Drop unnecessary '-ffreestanding' addition to KBUILD_CFLAGS

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the addition of '-ffreestanding' to
KBUILD_CFLAGS for 32-bit x86 is unnecessary, as the linked LLVM bug is
resolved in all supported LLVM versions.

  16cb16e0d285 ("x86/build: Remove -ffreestanding on i386 with GCC")

intended to make the addition of '-ffreestanding' clang only but due to
a bug in the adjusted check from

  d70da12453ac ("hardening: Enable i386 FORTIFY_SOURCE on Clang 16+")

it has been applied for all versions of GCC and clang < 16.0.0. There
are no known problems with removing this for GCC but if one surfaces, it
can be restored under a CONFIG_CC_IS_GCC block.

Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-11-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agoscripts/Makefile.warn: Drop -Wformat handling for clang < 16
Nathan Chancellor [Sun, 17 May 2026 23:05:13 +0000 (13:05 -1000)] 
scripts/Makefile.warn: Drop -Wformat handling for clang < 16

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the block dealing with -Wformat with clang
prior to 16 can be removed since the condition for its inclusion is
always false.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-10-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agoriscv: Drop tautological condition from TOOLCHAIN_NEEDS_OLD_ISA_SPEC
Nathan Chancellor [Sun, 17 May 2026 23:05:12 +0000 (13:05 -1000)] 
riscv: Drop tautological condition from TOOLCHAIN_NEEDS_OLD_ISA_SPEC

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the Clang dependency part of
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC is always false, so it can be
removed. Adjust the help text to remove mention of Clang < 17, as it is
irrelevant for the kernel after the minimum supported bump.

Acked-by: Paul Walmsley <pjw@kernel.org> # arch/riscv
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-9-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agoriscv: Remove tautological condition from selection of ARCH_SUPPORTS_CFI
Nathan Chancellor [Sun, 17 May 2026 23:05:11 +0000 (13:05 -1000)] 
riscv: Remove tautological condition from selection of ARCH_SUPPORTS_CFI

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the condition of the selection of
CONFIG_ARCH_SUPPORTS_CFI is always true, so it can be removed.

Acked-by: Paul Walmsley <pjw@kernel.org> # arch/riscv
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-8-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agoARM: Drop tautological ld.lld conditions from ARCH_MULTI_V4{,T}
Nathan Chancellor [Sun, 17 May 2026 23:05:10 +0000 (13:05 -1000)] 
ARM: Drop tautological ld.lld conditions from ARCH_MULTI_V4{,T}

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the '!ld.lld || ld.lld >= 16' dependency of
CONFIG_ARCH_MULTI_V4{,T} is always true, so it can be removed from both
symbols.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-7-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agoarch/Kconfig: Remove tautological condition from AUTOFDO_CLANG
Nathan Chancellor [Sun, 17 May 2026 23:05:09 +0000 (13:05 -1000)] 
arch/Kconfig: Remove tautological condition from AUTOFDO_CLANG

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the clang version check in
CONFIG_AUTOFDO_CLANG can be removed because it is always true.

Reviewed-by: Rong Xu <xur@google.com>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-6-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agoarch/Kconfig: Remove tautological conditions from HAS_LTO_CLANG
Nathan Chancellor [Sun, 17 May 2026 23:05:08 +0000 (13:05 -1000)] 
arch/Kconfig: Remove tautological conditions from HAS_LTO_CLANG

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, two dependency lines in CONFIG_HAS_LTO_CLANG
are always true because Clang will always be newer than 17.0.0, so they
can be removed.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-5-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agosecurity/Kconfig.hardening: Remove tautological condition from CC_HAS_RANDSTRUCT
Nathan Chancellor [Sun, 17 May 2026 23:05:07 +0000 (13:05 -1000)] 
security/Kconfig.hardening: Remove tautological condition from CC_HAS_RANDSTRUCT

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the '!Clang || Clang >= 16' dependency for
CONFIG_CC_HAS_RANDSTRUCT is always true, so it can be removed.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-4-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agosecurity/Kconfig.hardening: Remove tautological condition from FORTIFY_SOURCE
Nathan Chancellor [Sun, 17 May 2026 23:05:06 +0000 (13:05 -1000)] 
security/Kconfig.hardening: Remove tautological condition from FORTIFY_SOURCE

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the '!X86_32 || !Clang || Clang > 16'
dependency of CONFIG_FORTIFY_SOURCE is always true, so it can be
removed.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-3-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agosecurity/Kconfig.hardening: Remove tautological condition from CC_HAS_ZERO_CALL_USED_REGS
Nathan Chancellor [Sun, 17 May 2026 23:05:05 +0000 (13:05 -1000)] 
security/Kconfig.hardening: Remove tautological condition from CC_HAS_ZERO_CALL_USED_REGS

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the '!Clang || Clang > 15.0.6' dependency for
CONFIG_CC_HAS_ZERO_CALL_USED_REGS is always true, so it can be removed.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-2-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agokbuild: Bump minimum version of LLVM for building the kernel to 17.0.1
Nathan Chancellor [Sun, 17 May 2026 23:05:04 +0000 (13:05 -1000)] 
kbuild: Bump minimum version of LLVM for building the kernel to 17.0.1

The current minimum version of LLVM for building the kernel is 15.0.0.
However, there are two deficiencies compared to GCC that were fixed in
LLVM 17 that are starting to become more noticeable.

The first was a bug in LLVM's scope checker [1], where all labels in a
function were validated as potential targets of an asm goto statement,
even if they were not listed in the asm goto statement as targets. This
becomes particularly problematic when the cleanup attribute is used, as

  asm goto(... : label_a);
  ...
label_a:
  ...
  int var __free(foo);
  asm goto(... : label_b);
  ...
label_b:
  ...

will trigger an error since the scope checker will complain that the
cleanup variable would be skipped when jumping from the first asm goto
to label_b (which obviously cannot happen). This issue was the catalyst
for commit e2ffa15b9baa ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on
clang < 17"). Unfortunately, this issue is reproducible with regular asm
goto in addition to asm goto with outputs, so that change was not
entirely sufficient to avoid the issue altogether. As asm goto has
effectively been required since commit a0a12c3ed057 ("asm goto:
eradicate CC_HAS_ASM_GOTO") and the usage of the cleanup attribute
continues to grow across the tree, raising the minimum to a version that
avoids this issue altogether is a better long term solution than
attempting to workaround it at every spot where it happens.

The second issue is an incompatibility with GCC 8.1+ around variables
marked with const being valid constant expressions for _Static_assert
and other macros [2]. With GCC 8.1 being the minimum supported version
since commit 118c40b7b503 ("kbuild: require gcc-8 and binutils-2.30"),
this incompatibility becomes more of a maintenance burden since only
clang-15 and clang-16 are affected by it.

Looking at the clang version of various major distributions through
Docker images, no one should be left behind as a result of this bump, as
the old ones cannot clear the current minimum of 15.0.0.

  archlinux:latest              clang version 22.1.3
  debian:oldoldstable-slim      Debian clang version 11.0.1-2
  debian:oldstable-slim         Debian clang version 14.0.6
  debian:stable-slim            Debian clang version 19.1.7 (3+b1)
  debian:testing-slim           Debian clang version 21.1.8 (3+b1)
  debian:unstable-slim          Debian clang version 21.1.8 (7+b1)
  fedora:42                     clang version 20.1.8 (Fedora 20.1.8-4.fc42)
  fedora:latest                 clang version 21.1.8 (Fedora 21.1.8-4.fc43)
  fedora:44                     clang version 22.1.1 (Fedora 22.1.1-2.fc44)
  fedora:rawhide                clang version 22.1.3 (Fedora 22.1.3-1.fc45)
  opensuse/leap:latest          clang version 17.0.6
  opensuse/tumbleweed:latest    clang version 21.1.8
  ubuntu:jammy                  Ubuntu clang version 14.0.0-1ubuntu1.1
  ubuntu:noble                  Ubuntu clang version 18.1.3 (1ubuntu1)
  ubuntu:questing               Ubuntu clang version 20.1.8 (0ubuntu4)
  ubuntu:resolute               Ubuntu clang version 21.1.8 (6ubuntu1)

17.0.1 is chosen as the minimum instead of 17.0.0 to ensure that the
particular version of LLVM 17 has the two aforementioned bugs fixed, as
the second was fixed during the 17.0.0 release candidate phase and it
was not until LLVM 18 that LLVM adopted the scheme of x.0.0 being a
prerelease version and x.1.0 is a release version [3] to help with
scenarios such as this.

Link: https://github.com/llvm/llvm-project/commit/f023f5cdb2e6c19026f04a15b5a935c041835d14
Link: https://github.com/llvm/llvm-project/commit/0b2d5b967d98375793897295d651f58f6fbd3034
Link: https://github.com/llvm/llvm-project/commit/4532617ae420056bf32f6403dde07fb99d276a49
Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-1-b3b8cda46bdd@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2 weeks agosmb: client: fix uninitialized variable in smb2_writev_callback
Steve French [Fri, 22 May 2026 23:28:49 +0000 (18:28 -0500)] 
smb: client: fix uninitialized variable in smb2_writev_callback

compiling with W=2 pointed out that "written may be used uninitialized"

Fixes: 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref")
Cc: stable@vger.kernel.org
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 weeks agosmb: client: detect short folioq copy in cifs_copy_folioq_to_iter()
Jeremy Erazo [Wed, 20 May 2026 18:23:31 +0000 (18:23 +0000)] 
smb: client: detect short folioq copy in cifs_copy_folioq_to_iter()

cifs_copy_folioq_to_iter() copies a requested number of bytes from
a folio queue into the destination iterator.  Since the encrypted
SMB2 READ path was changed to pass the server-declared payload
length (data_len) instead of the larger folioq buffer length, the
caller can ask for fewer bytes than the folio queue holds.

In that case the helper continues walking the remaining folios after
data_size has reached zero and calls copy_folio_to_iter() with
len = 0, which is unnecessary work.

The helper also returns 0 (success) when the folio queue is
exhausted before data_size bytes have been copied.  The caller has
no way to distinguish that from a full copy and the reported
transfer count ends up larger than the amount of data placed in the
iterator.

Add an early exit when data_size reaches zero, and return an error
when the folio queue is exhausted before all requested bytes have
been copied.

Signed-off-by: Jeremy Erazo <mendozayt13@gmail.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2 weeks agomshv: Add conditional VMBus dependency
Michael Kelley [Tue, 26 May 2026 14:13:04 +0000 (07:13 -0700)] 
mshv: Add conditional VMBus dependency

When the VMBus driver is not part of the kernel (CONFIG_HYPERV_VMBUS=n),
the MSHV root driver fails to link:

ERROR: modpost: "hv_vmbus_exists" [drivers/hv/mshv_root.ko] undefined!

Fix this while meeting these requirements:
* It must be possible to include the MSHV root driver without the
  VMBus driver. In such case, the MSHV root driver can be built-in
  to the kernel image, or it can be built as a separate module.
* If both the MSHV root driver and the VMBus driver are present, the
  MSHV root driver and VMBus driver can both be built-in, or they can
  both be separate modules. Or the MSHV root driver can be a module
  while the VMBus driver can be built-in, but the reverse is
  disallowed. Regardless of the build choices, the VMBus driver must
  be loaded before the MSHV driver in order for the SynIC to be
  managed properly (see comments in the MSHV SynIC code).

The fix has two parts:
* Add a Kconfig entry for MSHV_ROOT to depend on HYPERV_VMBUS if
  HYPERV_VMBUS is present. The entry disallows MSHV_ROOT being
  built-in when HYPERV_VMBUS is a module, but without requiring that
  HYPERV_VMBUS be built.
* Add a stub implementation of hv_vmbus_exists() for when the
  VMBus driver is not present so that the MSHV root driver has
  no module dependency on VMBus. When the VMBus driver *is*
  present, the module dependency ensures that the VMBus driver
  loads first when both are built as modules.

Existing code ensures that the VMBus driver loads first if it is
built-in. The VMBus driver uses subsys_initcall(), which is
initcall level 4. The MSHV root driver uses module_init(), which
becomes device_init() when built-in, and device_init() is
initcall level 6.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/all/20260520074044.923728-1-arnd@kernel.org/
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Hardik Garg <hargar@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2 weeks agohyperv: Clean up and fix the guest ID comment in hvgdk.h
Dexuan Cui [Wed, 27 May 2026 19:21:01 +0000 (12:21 -0700)] 
hyperv: Clean up and fix the guest ID comment in hvgdk.h

Change the "64 bit" to "64-bit", and the "Os" to "OS".

Remove the obsolete paragraph since the guideline has been
published in the Hypervisor Top Level Functional Specification
for many years.

The "OS Type" is 0x1 for Linux, not 0x100.

No functional change.

Fixes: 83ba0c4f3f31 ("Drivers: hv: Cleanup the guest ID computation")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2 weeks agoBluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading
Shuai Zhang [Mon, 25 May 2026 06:51:56 +0000 (14:51 +0800)] 
Bluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading

When bt_en is pulled high by hardware, the host does not re-download
the firmware after SSR. The controller loads the rampatch and NVM
internally.

On HMT chip, the rampatch is ~264 KB and the NVM is ~9.4 KB. The
loading process takes approximately 70 ms. The previous 50 ms delay is
too short, causing the controller to not respond to the reset command
sent by the host, which leads to BT initialization failure:

 Bluetooth: hci0: QCA memdump Done, received 458752, total 458752
 Bluetooth: hci0: mem_dump_status: 2
 Bluetooth: hci0: Opcode 0x0c03 failed: -110

Increase the delay to 100 ms, which was confirmed as a safe value by
the controller, to ensure the controller has finished loading the
firmware before the host sends commands.

Steps to reproduce:
1. Trigger SSR and wait for SSR to complete:
   hcitool cmd 0x3f 0c 26
2. Run "bluetoothctl power on" and observe that BT fails to start.

Fixes: fce1a9244a0f ("Bluetooth: hci_qca: Fix SSR (SubSystem Restart) fail when BT_EN is pulled up by hw")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agoBluetooth: hci_sync: fix UAF in hci_le_create_cis_sync
Doruk Tan Ozturk [Mon, 25 May 2026 16:24:38 +0000 (18:24 +0200)] 
Bluetooth: hci_sync: fix UAF in hci_le_create_cis_sync

hci_le_create_cis_sync() dereferences conn->conn_timeout after releasing
both rcu_read_lock() and hci_dev_lock(hdev).  The conn pointer was
obtained from an RCU-protected iteration over hdev->conn_hash.list and
is not valid once these locks are dropped.  A concurrent disconnect can
free the hci_conn between the unlock and the dereference, causing a
use-after-free read.

The cancellation mechanism in hci_conn_del() cannot prevent this because
hci_le_create_cis_pending() queues hci_create_cis_sync with data=NULL:

    hci_cmd_sync_queue(hdev, hci_create_cis_sync, NULL, NULL);

While hci_conn_del() dequeues with data=conn:

    hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);

Since NULL != conn, the lookup in _hci_cmd_sync_lookup_entry() never
matches, and the pending work item is not cancelled.

Fix this by saving conn->conn_timeout into a local variable while the
locks are still held, so the stale conn pointer is never dereferenced
after unlock.

This is the same class of bug as the one fixed by commit 035c25007c9e
("Bluetooth: hci_sync: Fix UAF on le_read_features_complete") which
addressed the identical pattern in a different function.

This vulnerability was identified using 0sec.ai, an open-source
automated security auditing platform (https://github.com/0sec-labs).

Fixes: c09b80be6ffc ("Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHED")
Cc: stable@vger.kernel.org
Reported-by: Doruk Tan Ozturk <doruk@0sec.ai>
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agoBluetooth: 6lowpan: check skb_clone() return value in send_mcast_pkt()
Zhao Dongdong [Tue, 26 May 2026 03:21:39 +0000 (11:21 +0800)] 
Bluetooth: 6lowpan: check skb_clone() return value in send_mcast_pkt()

The skb_clone() function can return NULL if memory allocation fails.
send_mcast_pkt() calls skb_clone() without checking the return value, which
can lead to a NULL pointer dereference in send_pkt() when it dereferences
skb->data.
Add a NULL check after skb_clone() and skip the peer if the clone fails.

Fixes: 18722c247023 ("Bluetooth: Enable 6LoWPAN support for BT LE devices")
Signed-off-by: Zhao Dongdong <zhaodongdong@kylinos.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agoBluetooth: btusb: Allow firmware re-download when version matches
Shuai Zhang [Thu, 21 May 2026 05:25:47 +0000 (13:25 +0800)] 
Bluetooth: btusb: Allow firmware re-download when version matches

The Bluetooth host decides whether to download firmware by reading the
controller firmware download completion flag and firmware version
information.

If a USB error occurs during the firmware download process (for example
due to a USB disconnect), the download is aborted immediately. An
incomplete firmware transfer does not cause the controller to set the
download completion flag, but the firmware version information may be
updated at an early stage of the download process.

In this case, after USB reconnection, the host attempts to re-download
the firmware because the download completion flag is not set. However,
since the controller reports the same firmware version as the target
firmware, the download is skipped. This ultimately results in the
firmware not being properly updated on the controller.

This change removes the restriction that skips firmware download when
the versions are equal. It covers scenarios where the USB connection
can be disconnected at any time and ensures that firmware download can
be retriggered after USB reconnection, allowing the Bluetooth firmware
to be correctly and completely updated.

Fixes: 3267c884cefa ("Bluetooth: btusb: Add support for QCA ROME chipset family")
Cc: stable@vger.kernel.org
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agoBluetooth: HIDP: fix missing length checks in hidp_input_report()
Muhammad Bilal [Wed, 20 May 2026 22:56:43 +0000 (18:56 -0400)] 
Bluetooth: HIDP: fix missing length checks in hidp_input_report()

hidp_input_report() reads keyboard and mouse payload data from an skb
without first verifying that skb->len contains enough data.

hidp_recv_intr_frame() pulls the 1-byte HIDP header before dispatching
to hidp_input_report(). If a paired device sends a truncated packet,
the handler reads beyond the valid skb data, resulting in an
out-of-bounds read of skb data. The OOB bytes may be interpreted as
phantom key presses or spurious mouse movement.

Replace the open-coded length tracking and pointer arithmetic with
skb_pull_data() calls. skb_pull_data() returns NULL if the requested
bytes are not present, eliminating the need for a manual size variable
and the separate skb->len guard.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agoBluetooth: L2CAP: use chan timer to close channels in cleanup_listen()
Siwei Zhang [Thu, 21 May 2026 02:12:20 +0000 (22:12 -0400)] 
Bluetooth: L2CAP: use chan timer to close channels in cleanup_listen()

l2cap_chan_close() removes the channel from conn->chan_l, which
must be done under conn->lock.  cleanup_listen() runs under the
parent sk_lock, so acquiring conn->lock would invert the
established conn->lock -> chan->lock -> sk_lock order.

Instead of calling l2cap_chan_close() directly, schedule
l2cap_chan_timeout with delay 0 to close the channel
asynchronously.  The timeout handler already acquires conn->lock
and chan->lock in the correct order.

The timer is only armed when chan->conn is still set: if it is
already NULL, l2cap_conn_del() has already processed this channel
(l2cap_chan_del + l2cap_sock_teardown_cb + l2cap_sock_close_cb),
so there is nothing left to do.  If l2cap_conn_del() races in
after the timer is armed, __clear_chan_timer() inside
l2cap_chan_del() cancels it; if the timer has already fired, the
handler returns harmlessly because chan->conn was cleared.

Fixes: 3df91ea20e74 ("Bluetooth: Revert to mutexes from RCU list")
Cc: <stable@vger.kernel.org> # 0b58004: Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agoBluetooth: L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn
Siwei Zhang [Thu, 21 May 2026 02:30:36 +0000 (22:30 -0400)] 
Bluetooth: L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn

__set_chan_timer() takes a l2cap_chan reference via l2cap_chan_hold()
before scheduling the delayed work.  The normal path in
l2cap_chan_timeout() drops this reference with l2cap_chan_put() at the
end, but the early return when chan->conn is NULL skips the put,
leaking the reference.

Add the missing l2cap_chan_put() before the early return.

Fixes: adf0398cee86 ("Bluetooth: l2cap: fix null-ptr-deref in l2cap_chan_timeout")
Cc: stable@vger.kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agoBluetooth: hci_conn: Fix memory leak in hci_le_big_terminate()
Pavitra Jha [Thu, 21 May 2026 08:04:14 +0000 (04:04 -0400)] 
Bluetooth: hci_conn: Fix memory leak in hci_le_big_terminate()

hci_le_big_terminate() allocates iso_list_data via kzalloc_obj but
returns 0 without freeing it when neither pa_sync_term nor big_sync_term
flags are set after evaluating the PA and BIG sync connection state.

This early-return path was introduced when hci_le_big_terminate() was
refactored to take struct hci_conn instead of raw u8 parameters, adding
PA/BIG flag evaluation logic. The existing kfree() on hci_cmd_sync_queue
failure does not cover this path.

Fixes: a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections")
Cc: stable@vger.kernel.org
Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2 weeks agotools/sched_ext: Fix scx_show_state per-scheduler state reads
Zicheng Qu [Wed, 27 May 2026 09:38:50 +0000 (17:38 +0800)] 
tools/sched_ext: Fix scx_show_state per-scheduler state reads

scx_show_state.py still reads scx_aborting and scx_bypass_depth as
global symbols. Those symbols no longer exist after the state was moved
into struct scx_sched, so the drgn script fails when it reaches either
field.

Read aborting and bypass_depth from scx_root instead. This preserves the
script's current root-scheduler view: with sub-scheduler support, the
reported values are for the root scheduler and sub-schedulers are not
enumerated.

Fixes: 5c8d98a1b4de ("sched_ext: Move bypass state into scx_sched")
Fixes: c1743da43cf5 ("sched_ext: Move aborting flag to per-scheduler field")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 weeks agocgroup/cpuset: Add test cases for sibling CPU exclusion on partition update
Sun Shaojie [Wed, 27 May 2026 07:05:09 +0000 (15:05 +0800)] 
cgroup/cpuset: Add test cases for sibling CPU exclusion on partition update

When sibling CPU exclusion occurs, a partition's effective_xcpus may be
a subset of its user_xcpus. The partcmd_update path must use
effective_xcpus instead of user_xcpus when calculating CPUs to return
to or request from the parent.

Add two test cases to verify this behavior:

  1) Narrowing cpuset.cpus to only the sibling-excluded CPUs should not
     return CPUs to parent that the partition never actually owned.

  2) Expanding cpuset.cpus after a sibling becomes a member should
     correctly request the additional CPUs from parent.

Co-developed-by: Zhang Guopeng <zhangguopeng@kylinos.cn>
Signed-off-by: Zhang Guopeng <zhangguopeng@kylinos.cn>
Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 weeks agocgroup/cpuset: Use effective_xcpus in partcmd_update add/del mask calculation
Sun Shaojie [Wed, 27 May 2026 06:43:28 +0000 (14:43 +0800)] 
cgroup/cpuset: Use effective_xcpus in partcmd_update add/del mask calculation

When sibling CPU exclusion occurs, a partition's user_xcpus may contain
CPUs that were never actually granted to it. These CPUs are present in
user_xcpus(cs) but not in cs->effective_xcpus.

The partcmd_update path in update_parent_effective_cpumask() uses
user_xcpus(cs) (via the local variable xcpus) to compute the addmask
(CPUs to return to parent) and delmask (CPUs to request from parent).
This is incorrect:

 1) When newmask removes a CPU that was previously excluded by a
    sibling, addmask incorrectly includes that CPU and tries to return
    it to the parent even though the partition never actually owned it,
    causing CPU overlap with sibling partitions and triggering warnings
    in generate_sched_domains().

 2) When newmask adds a previously excluded CPU that is now available,
    delmask fails to request it from the parent because user_xcpus(cs)
    already includes it.

Fix this by using cs->effective_xcpus instead of user_xcpus(cs) in all
partcmd_update paths that calculate addmask or delmask, including the
PERR_NOCPUS error handling paths.

Reproducers:

  Example 1 - Removing a sibling-excluded CPU incorrectly returns it:

    # cd /sys/fs/cgroup
    # echo "0-1" > a1/cpuset.cpus
    # echo "root" > a1/cpuset.cpus.partition
    # echo "0-2" > b1/cpuset.cpus
    # echo "root" > b1/cpuset.cpus.partition
    # echo "2" > b1/cpuset.cpus
    # cat cpuset.cpus.effective
    # Actual: 0-1,3    Expected: 3

  Example 2 - Expanding to a previously excluded CPU fails to request it:

    # cd /sys/fs/cgroup
    # echo "0-1" > a1/cpuset.cpus
    # echo "root" > a1/cpuset.cpus.partition
    # echo "0-2" > b1/cpuset.cpus
    # echo "root" > b1/cpuset.cpus.partition
    # echo "member" > a1/cpuset.cpus.partition
    # echo "1-2" > b1/cpuset.cpus
    # cat cpuset.cpus.effective
    # Actual: 0-1,3    Expected: 0,3

Fixes: 2a3602030d80 ("cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict")
Cc: stable@vger.kernel.org # v7.0+
Suggested-by: Zhang Guopeng <zhangguopeng@kylinos.cn>
Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 weeks agox86/mm: Fix freeing of PMD-sized vmemmap pages
David Hildenbrand (Arm) [Wed, 29 Apr 2026 10:49:14 +0000 (12:49 +0200)] 
x86/mm: Fix freeing of PMD-sized vmemmap pages

Commit bf9e4e30f353 ("x86/mm: use pagetable_free()"), switched from
freeing non-boot page tables through __free_pages() to
pagetable_free().

However, the function is also called to free vmemmap pages.

Given that vmemmap pages are not page tables, already the page_ptdesc(page)
is wrong. But worse, pagetable_free() calls:

__free_pages(page, compound_order(page));

Since vmemmap pages are not compound pages (see vmemmap_alloc_block())
-- except for HVO, which doesn't apply here -- only first page of a
PMD-sized vmemmap page is freed, leaking the other ones.

Fix it by properly decoupling pagetable and vmemmap freeing.
free_pagetable() no longer has to mess with SECTION_INFO, as only the
vmemmap is marked like that in register_page_bootmem_memmap().

The indentation in remove_pmd_table() is messed up. Fix that while
touching it.

Bootmem info handling will soon be fixed up. For now, handle it
similar to free_pagetable(), just avoiding the ifdef.

[ dhansen: changelog munging. More imperative voice ]

Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Lance Yang <lance.yang@linux.dev>
Link: https://lore.kernel.org/20260429-vmemmap-v2-1-8dfcacffd877@kernel.org
Link: https://patch.msgid.link/20260429-vmemmap-v2-1-8dfcacffd877@kernel.org
Cc: stable@vger.kernel.org
2 weeks agohfs: rework hfsplus_readdir() logic
Viacheslav Dubeyko [Tue, 19 May 2026 22:28:12 +0000 (15:28 -0700)] 
hfs: rework hfsplus_readdir() logic

The xfstests' test-case generic/637 fails with error:

FSTYP -- hfs
PLATFORM -- Linux/x86_64 kvm-xfstests 6.15.0-rc4-xfstests-g00b827f0cffa #1 SMP PREEMPT_DYNAMIC Fri May 25
MKFS_OPTIONS -- /dev/vdc
MOUNT_OPTIONS -- /dev/vdc /vdc

QA output created by 637
entries 7 and 8 have duplicate d_off 8
Found unlinked files in open dir (see xfstests-dev/results//generic/637.full for details)

Likewise HFS+, currently, HFS has very complicated and
fragile logic of rd->file->f_pos correction in hfs_delete_cat().
This patch removes this logic and it stores the current
pos into hfs_readdir_data. Finally, if rd->pos == ctx->pos
then hfs_readdir() tries to find the position in
b-tree's node by means of hfs_cat_key. This position is
used to re-start the folder's content traversal.

sudo ./check generic/637
FSTYP         -- hfs
PLATFORM      -- Linux/x86_64 hfsplus-testing-0001 7.1.0-rc1+ #55 SMP PREEMPT_DYNAMIC Tue May 19 15:18:02 PDT 2026
MKFS_OPTIONS  -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/637  32s ...  31s
Ran: generic/637
Passed all 1 tests

Closes: https://github.com/hfs-linux-kernel/hfs-linux-kernel/issues/65
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20260519222811.1311071-2-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2 weeks agoACPICA: add boundary checks in two places
ikaros [Wed, 27 May 2026 18:10:18 +0000 (20:10 +0200)] 
ACPICA: add boundary checks in two places

Add boundary checks in acpi_ps_get_next_namestring() and
acpi_ps_peek_opcode() to prevent out-of-bounds access.

Link: https://github.com/acpica/acpica/commit/cfdc96896d8d
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5180044.0VBMTVartN@rafael.j.wysocki
2 weeks agoACPICA: Add package limit checks in parser functions
ikaros [Wed, 27 May 2026 18:09:24 +0000 (20:09 +0200)] 
ACPICA: Add package limit checks in parser functions

Add package limit checks in parser functions to prevent out-of-bounds
access.

Link: https://github.com/acpica/acpica/commit/b31b45af2122
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3212937.CbtlEUcBR6@rafael.j.wysocki
2 weeks agoACPICA: Update version to 20260408
Saket Dumbre [Wed, 27 May 2026 18:08:44 +0000 (20:08 +0200)] 
ACPICA: Update version to 20260408

Update ACPI_CA_VERSION to match the 20260408 upstream release.

Link: https://github.com/acpica/acpica/commit/232ff3f8ae1a
Signed-off-by: Saket Dumbre <saket.dumbre@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1881459.TLkxdtWsSY@rafael.j.wysocki
2 weeks agoACPICA: Update the copyright year to 2026
Pawel Chmielewski [Wed, 27 May 2026 18:08:02 +0000 (20:08 +0200)] 
ACPICA: Update the copyright year to 2026

Update copyright notices in all ACPICA files.

Link: https://github.com/acpica/acpica/commit/9def02549a9c
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4379132.1IzOArtZ34@rafael.j.wysocki
2 weeks agoACPICA: Remove spurious precision from format used to dump parse trees
David Laight [Wed, 27 May 2026 18:07:08 +0000 (20:07 +0200)] 
ACPICA: Remove spurious precision from format used to dump parse trees

The debug code in acpi_ps_delete_parse_tree() uses ("%*.s", level * 4, " ")
to indent traces.

POSIX requires the empty precision be treated as zero, but the kernel treats
is as 'no precision specified'.

Change to ("%*s", level * 4, "") since there is additional whitespace and no
reason to indent by one space when level is zero.

Link: https://github.com/acpica/acpica/commit/a87038098af6
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2914242.BEx9A2HvPv@rafael.j.wysocki