The trace event has not recorded the right data since it was introduced at
commit c8b360031218 ("mm: add alloc_contig_migrate_range allocation
statistics"). Remove it.
Link: https://lkml.kernel.org/r/20250722194649.4135191-1-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507220742.P3SaKlI6-lkp@intel.com/ Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Chang <richardycc@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Enze Li [Fri, 18 Jul 2025 06:42:17 +0000 (14:42 +0800)]
selftests/damon: introduce _common.sh to host shared function
The current test scripts contain duplicated root permission checks in
multiple locations. This patch consolidates these checks into _common.sh
to eliminate code redundancy.
SeongJae Park [Sun, 20 Jul 2025 17:16:51 +0000 (10:16 -0700)]
selftests/damon/sysfs.py: test non-default parameters runtime commit
sysfs.py is testing only the default and minimum DAMON parameters. Add
another test case for more non-default additional DAMON parameters
commitment on runtime.
DAMON context commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
DAMON monitoring attributes commitment assertion is hard-coded for a
specific test case. Split it out into a general version that can be
reused for different test cases.
DAMOS schemes commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
DAMOS scheme commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
DamosQuota commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
DamosWatermarks commitment assertion is hard-coded for a specific test
case. Split it out into a general version that can be reused for
different test cases.
drgn_dump_damon_status.py is a script for dumping DAMON internal status in
json format. It is being used for seeing if DAMON parameters that are set
using _damon_sysfs.py are actually passed to DAMON in the kernel space.
It is, however, not dumping full DAMON internal status, and it makes
increasing test coverage difficult. Add damos filters dumping for more
tests.
drgn_dump_damon_status.py is a script for dumping DAMON internal status in
json format. It is being used for seeing if DAMON parameters that are set
using _damon_sysfs.py are actually passed to DAMON in the kernel space.
It is, however, not dumping full DAMON internal status, and it makes
increasing test coverage difficult. Add ctx->ops.id dumping for more
tests.
drgn_dump_damon_status.py is a script for dumping DAMON internal status in
json format. It is being used for seeing if DAMON parameters that are set
using _damon_sysfs.py are actually passed to DAMON in the kernel space.
It is, however, not dumping full DAMON internal status, and it makes
increasing test coverage difficult. Add damos->migrate_dests dumping for
more tests.
SeongJae Park [Sun, 20 Jul 2025 17:16:33 +0000 (10:16 -0700)]
selftests/damon/_damon_sysfs: support monitoring intervals goal setup
_damon_sysfs.py contains code for test-purpose DAMON sysfs interface
control. Add support of the monitoring intervals auto-tune goal setup for
more tests.
SeongJae Park [Sun, 20 Jul 2025 17:16:31 +0000 (10:16 -0700)]
selftests/damon/_damon_sysfs: support DAMOS watermarks setup
Patch series "selftests/damon/sysfs.py: test all parameters".
sysfs.py tests if DAMON sysfs interface is passing the user-requested
parameters to DAMON as expected. But only the default (minimum)
parameters are being tested. This is partially because _damon_sysfs.py,
which is the library for making the parameter requests, is not supporting
the entire parameters. The internal DAMON status dump script
(drgn_dump_damon_status.py) is also not dumping entire parameters. Extend
the test coverage by updating parameters input and status dumping scripts
to support all parameters, and writing additional tests using those.
This increased test coverage actually found one real bug
(https://lore.kernel.org/20250719181932.72944-1-sj@kernel.org).
First seven patches (1-7) extend _damon_sysfs.py for all parameters setup.
The eight patch (8) fixes _damon_sysfs.py to use correct max nr_acceses
and age values for their type. Following three patches (9-11) extend
drgn_dump_damon_status.py to dump full DAMON parameters. Following nine
patches (12-20) refactor sysfs.py for general testing code reuse, and
extend it for full parameters check. Finally, two patches (21 and 22) add
test cases in sysfs.py for full parameters testing.
This patch (of 22):
_damon_sysfs.py contains code for test-purpose DAMON sysfs interface
control. Add support of DAMOS watermarks setup for more tests.
SeongJae Park [Tue, 22 Jul 2025 06:03:30 +0000 (23:03 -0700)]
selftests/damon/sysfs.py: stop DAMON for dumping failures
Commit 4ece01897627 ("selftests/damon: add python and drgn-based DAMON
sysfs test") in mm-stable tree introduced sysfs.py that runs drgn for
dumping DAMON status. When the DAMON status dumping fails for reasons
including drgn uninstalled environment, the test fails without stopping
DAMON. Following DAMON selftests that assumes DAMON is not running when
they executed therefore fail. Catch dumping failures and stop DAMON for
that case.
Link: https://lkml.kernel.org/r/20250722060330.56068-1-sj@kernel.org Fixes: 4ece01897627 ("selftests/damon: add python and drgn-based DAMON sysfs test") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220707.9c5d6247-lkp@intel.com Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
sched/task_stack: Add missing const qualifier to end_of_stack()
Add missing const qualifier to the non-CONFIG_THREAD_INFO_IN_TASK
version of end_of_stack() to match the CONFIG_THREAD_INFO_IN_TASK
version. Fixes a warning with CONFIG_KSTACK_ERASE=y on archs that don't
select THREAD_INFO_IN_TASK (such as LoongArch):
error: passing 'const struct task_struct *' to parameter of type 'struct task_struct *' discards qualifiers
The stackleak_task_low_bound() function correctly uses a const task
parameter, but the legacy end_of_stack() prototype didn't like that.
Build tested on loongarch (with CONFIG_KSTACK_ERASE=y) and m68k
(with CONFIG_DEBUG_STACK_USAGE=y).
kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
Once CONFIG_KSTACK_ERASE is enabled with Clang on i386, the build warns:
kernel/kstack_erase.c:168:2: warning: function with attribute 'no_caller_saved_registers' should only call a function with attribute 'no_caller_saved_registers' or be compiled with '-mgeneral-regs-only' [-Wexcessive-regsave]
Add -mgeneral-regs-only for the kstack_erase handler, to make Clang feel
better (it is effectively a no-op flag for the kernel). No binary
changes encountered.
Build & boot tested with Clang 21 on x86_64, and i386.
Build tested with GCC 14.2.0 on x86_64, i386, arm64, and arm.
init.h: Disable sanitizer coverage for __init and __head
While __noinstr already contained __no_sanitize_coverage, it needs to
be added to __init and __head section markings to support the Clang
implementation of CONFIG_KSTACK_ERASE. This is to make sure the stack
depth tracking callback is not executed in unsupported contexts.
The other sanitizer coverage options (trace-pc and trace-cmp) aren't
needed in __head nor __init either ("We are interested in code coverage
as a function of a syscall inputs"[1]), so this is fine to disable for
them as well.
kstack_erase: Disable kstack_erase for all of arm compressed boot code
When building with CONFIG_KSTACK_ERASE=y and CONFIG_ARM_ATAG_DTB_COMPAT=y,
the compressed boot environment encounters an undefined symbol error:
ld.lld: error: undefined symbol: __sanitizer_cov_stack_depth
>>> referenced by atags_to_fdt.c:135
This occurs because the compiler instruments the atags_to_fdt() function
with sanitizer coverage calls, but the minimal compressed boot environment
lacks access to sanitizer runtime support.
The compressed boot environment already disables stack protector with
-fno-stack-protector. Similarly disable sanitizer coverage by adding
$(DISABLE_KSTACK_ERASE) to the general compiler flags (and remove it
from the one place it was noticed before), which contains the appropriate
flags to prevent sanitizer instrumentation.
This follows the same pattern used in other early boot contexts where
sanitizer runtime support is unavailable.
Merge tag 'i2c-for-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- qup: avoid potential hang when waiting for bus idle
- tegra: improve ACPI reset error handling
- virtio: use interruptible wait to prevent hang during transfer
* tag 'i2c-for-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: qup: jump out of the loop in case of timeout
i2c: virtio: Avoid hang by using interruptible completion wait
i2c: tegra: Fix reset error handling with ACPI
The private stack is allocated in bpf_int_jit_compile() with 16-byte
alignment. It includes additional guard regions to detect stack
overflows and underflows at runtime.
On detection of an overflow or underflow, the kernel emits messages
like:
BPF private stack overflow/underflow detected for prog <prog_name>
After commit bd737fcb6485 ("bpf, arm64: Get rid of fpb"), Jited BPF
programs use the stack in two ways:
1. Via the BPF frame pointer (top of stack), using negative offsets.
2. Via the stack pointer (bottom of stack), using positive offsets in
LDR/STR instructions.
When a private stack is used, ARM64 callee-saved register x27 replaces
the stack pointer. The BPF frame pointer usage remains unchanged; but
it now points to the top of the private stack.
bpf, arm64: Fix fp initialization for exception boundary
In the ARM64 BPF JIT when prog->aux->exception_boundary is set for a BPF
program, find_used_callee_regs() is not called because for a program
acting as exception boundary, all callee saved registers are saved.
find_used_callee_regs() sets `ctx->fp_used = true;` when it sees FP
being used in any of the instructions.
For programs acting as exception boundary, ctx->fp_used remains false
even if frame pointer is used by the program and therefore, FP is not
set-up for such programs in the prologue. This can cause the kernel to
crash due to a pagefault.
Fix it by setting ctx->fp_used = true for exception boundary programs as
fp is always saved in such programs.
Ivan Vecera [Sat, 26 Jul 2025 18:41:45 +0000 (20:41 +0200)]
dpll: zl3073x: Fix build failure
If CONFIG_ZL3073X is enabled but both CONFIG_ZL3073X_I2C and
CONFIG_ZL3073X_SPI are disabled, the compilation may fail because
CONFIG_REGMAP is not enabled.
Fix the issue by selecting CONFIG_REGMAP when CONFIG_ZL3073X is enabled.
Jakub Kicinski [Sat, 26 Jul 2025 15:53:49 +0000 (08:53 -0700)]
selftests: bpf: fix legacy netfilter options
Recent commit to add NETFILTER_XTABLES_LEGACY missed setting
a couple of configs to y. They are still enabled but as modules
which appears to have upset BPF CI, e.g.:
test_bpf_nf_ct:FAIL:iptables-legacy -t raw -A PREROUTING -j CONNMARK --set-mark 42/0 unexpected error: 768 (errno 0)
Thomas Weißschuh [Mon, 21 Jul 2025 09:04:42 +0000 (11:04 +0200)]
umd: Remove usermode driver framework
The code is unused since 98e20e5e13d2 ("bpfilter: remove bpfilter"),
therefore remove it.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-2-0d0083334382@linutronix.de
Merge in late fixes to prepare for the 6.17 net-next PR.
Conflicts:
net/core/neighbour.c 1bbb76a89948 ("neighbour: Fix null-ptr-deref in neigh_flush_dev().") 13a936bb99fb ("neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.") 03dc03fa0432 ("neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries")
Adjacent changes:
drivers/net/usb/usbnet.c 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event") 2c04d279e857 ("net: usb: Convert tasklet API to new bottom half workqueue mechanism")
net/ipv6/route.c 31d7d67ba127 ("ipv6: annotate data-races around rt->fib6_nsiblings") 1caf27297215 ("ipv6: adopt dst_dev() helper") 3b3ccf9ed05e ("net: Remove unnecessary NULL check for lwtunnel_fill_encap()")
====================
ipv6: f6i->fib6_siblings and rt->fib6_nsiblings fixes
Series based on an internal syzbot report with a repro.
After fixing (in the first patch) the original minor issue,
I found that syzbot repro was able to trigger a second
more serious bug in rt6_nlmsg_size().
Code review then led to the two final patches.
I have not released the syzbot bug, because other issues
still need investigations.
====================
Eric Dumazet [Fri, 25 Jul 2025 14:07:24 +0000 (14:07 +0000)]
ipv6: fix possible infinite loop in fib6_info_uses_dev()
fib6_info_uses_dev() seems to rely on RCU without an explicit
protection.
Like the prior fix in rt6_nlmsg_size(),
we need to make sure fib6_del_route() or fib6_add_rt2node()
have not removed the anchor from the list, or we risk an infinite loop.
This is because fib6_del_route() and fib6_add_rt2node()
uses list_del_rcu(), which can confuse rcu readers,
because they might no longer see the head of the list.
Eric Dumazet [Fri, 25 Jul 2025 14:07:22 +0000 (14:07 +0000)]
ipv6: add a retry logic in net6_rt_notify()
inet6_rt_notify() can be called under RCU protection only.
This means the route could be changed concurrently
and rt6_fill_node() could return -EMSGSIZE.
Re-size the skb when this happens and retry, removing
one WARN_ON() that syzbot was able to trigger:
Fixes: 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://patch.msgid.link/20250725140725.3626540-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Fri, 25 Jul 2025 09:56:47 +0000 (10:56 +0100)]
net/sched: taprio: align entry index attr validation with mqprio
Both taprio and mqprio have code to validate respective entry index
attributes. The validation is indented to ensure that the attribute is
present, and that it's value is in range, and that each value is only
used once.
The purpose of this patch is to align the implementation of taprio with
that of mqprio as there seems to be no good reason for them to differ.
For one thing, this way, bugs will be present in both or neither.
As a follow-up some consideration could be given to a common function
used by both sch.
No functional change intended.
Except of tdc run: the results of the taprio tests
# ok 81 ba39 - Add taprio Qdisc to multi-queue device (8 queues)
# ok 82 9462 - Add taprio Qdisc with multiple sched-entry
# ok 83 8d92 - Add taprio Qdisc with txtime-delay
# ok 84 d092 - Delete taprio Qdisc with valid handle
# ok 85 8471 - Show taprio class
# ok 86 0a85 - Add taprio Qdisc to single-queue device
# ok 87 6f62 - Add taprio Qdisc with too short interval
# ok 88 831f - Add taprio Qdisc with too short cycle-time
# ok 89 3e1e - Add taprio Qdisc with an invalid cycle-time
# ok 90 39b4 - Reject grafting taprio as child qdisc of software taprio
# ok 91 e8a1 - Reject grafting taprio as child qdisc of offloaded taprio
# ok 92 a7bf - Graft cbs as child of software taprio
# ok 93 6a83 - Graft cbs as child of offloaded taprio
Alexander Stein [Fri, 25 Jul 2025 05:56:13 +0000 (07:56 +0200)]
net: fsl_pq_mdio: use dev_err_probe
Silence deferred probes using dev_err_probe(). This can happen when
the ethernet PHY uses an IRQ line attached to a i2c GPIO expander. If the
i2c bus is not yet ready, a probe deferral can occur.
Xiumei Mu [Fri, 25 Jul 2025 03:50:28 +0000 (11:50 +0800)]
selftests: rtnetlink.sh: remove esp4_offload after test
The esp4_offload module, loaded during IPsec offload tests, should
be reset to its default settings after testing.
Otherwise, leaving it enabled could unintentionally affect subsequence
test cases by keeping offload active.
Jason Xing [Wed, 23 Jul 2025 14:23:27 +0000 (22:23 +0800)]
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
There is no break time in the while() loop, so every time at the end of
igb_xmit_zc(), negative overflow of nb_pkts will occur, which renders
the return value always false. But theoretically, the result should be
set after calling xsk_tx_peek_release_desc_batch(). We can take
i40e_xmit_zc() as a good example.
Returning false means we're not done with transmission and we need one
more poll, which is exactly what igb_xmit_zc() always did before this
patch. After this patch, the return value depends on the nb_pkts value.
Two cases might happen then:
1. if (nb_pkts < budget), it means we process all the possible data, so
return true and no more necessary poll will be triggered because of
this.
2. if (nb_pkts == budget), it means we might have more data, so return
false to let another poll run again.
Fixes: f8e284a02afc ("igb: Add AF_XDP zero-copy Tx support") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250723142327.85187-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Wed, 23 Jul 2025 14:23:26 +0000 (22:23 +0800)]
stmmac: xsk: fix negative overflow of budget in zerocopy mode
A negative overflow can happen when the budget number of descs are
consumed. as long as the budget is decreased to zero, it will again go
into while (budget-- > 0) statement and get decreased by one, so the
overflow issue can happen. It will lead to returning true whereas the
expected value should be false.
In this case where all the budget is used up, it means zc function
should return false to let the poll run again because normally we
might have more data to process. Without this patch, zc function would
return true instead.
Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250723142327.85187-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:
- use an absolute path for asm/unified.h in KBUILD_AFLAGS to solve a
regression caused by commit d5c8d6e0fa61 ("kbuild: Update assembler
calls to use proper flags and language target")
- fix dead code elimination binutils version check again
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36
ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
Merge tag 'soc-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"These are two fixes that came in late, one addresses a regression on a
rockchips based board, the other is for ensuring a consistent dt
binding for a device added in 6.16 before the incorrect one makes it
into a release"
* tag 'soc-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: rockchip: Drop netdev led-triggers on NanoPi R5S
arm64: dts: allwinner: a523: Rename emac0 to gmac0
Oliver Upton [Sat, 26 Jul 2025 15:50:06 +0000 (08:50 -0700)]
Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/next
* kvm-arm64/gcie-legacy:
: Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff
:
: FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to
: support the legacy GICv3 for VMs, including a backwards-compatible VGIC
: implementation that we all know and love.
:
: As a starting point for GICv5 enablement in KVM, enable + use the
: GICv3-compatible feature when running VMs on GICv5 hardware.
KVM: arm64: gic-v5: Probe for GICv5
KVM: arm64: gic-v5: Support GICv3 compat
arm64/sysreg: Add ICH_VCTLR_EL2
irqchip/gic-v5: Populate struct gic_kvm_info
irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Sat, 26 Jul 2025 15:47:22 +0000 (08:47 -0700)]
Merge branch 'kvm-arm64/doublefault2' into kvmarm/next
* kvm-arm64/doublefault2: (33 commits)
: NV Support for FEAT_RAS + DoubleFault2
:
: Delegate the vSError context to the guest hypervisor when in a nested
: state, including registers related to ESR propagation. Additionally,
: catch up KVM's external abort infrastructure to the architecture,
: implementing the effects of FEAT_DoubleFault2.
:
: This has some impact on non-nested guests, as SErrors deemed unmasked at
: the time they're made pending are now immediately injected with an
: emulated exception entry rather than using the VSE bit.
KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
KVM: arm64: selftests: Test ESR propagation for vSError injection
KVM: arm64: Populate ESR_ELx.EC for emulated SError injection
KVM: arm64: selftests: Catch up set_id_regs with the kernel
KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list
KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1
KVM: arm64: selftests: Add basic SError injection test
KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError
KVM: arm64: Advertise support for FEAT_DoubleFault2
KVM: arm64: Advertise support for FEAT_SCTLR2
KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set
KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
KVM: arm64: Route SEAs to the SError vector when EASE is set
KVM: arm64: nv: Ensure Address size faults affect correct ESR
KVM: arm64: Factor out helper for selecting exception target EL
KVM: arm64: Describe SCTLR2_ELx RESx masks
...
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Sat, 26 Jul 2025 15:47:02 +0000 (08:47 -0700)]
Merge branch 'kvm-arm64/cacheable-pfnmap' into kvmarm/next
* kvm-arm64/cacheable-pfnmap:
: Cacheable PFNMAP support at stage-2, courtesy of Ankit Agrawal
:
: For historical reasons, KVM only allows cacheable mappings at stage-2
: when a kernel alias exists in the direct map for the memory region. On
: hardware without FEAT_S2FWB, this is necessary as KVM must do cache
: maintenance to keep guest/host accesses coherent.
:
: This is unnecessarily restrictive on systems with FEAT_S2FWB and
: CTR_EL0.DIC, as KVM no longer needs to perform cache maintenance to
: maintain correctness.
:
: Allow cacheable mappings at stage-2 on supporting hardware when the
: corresponding VMA has cacheable memory attributes and advertise a
: capability to userspace such that a VMM can determine if a stage-2
: mapping can be established (e.g. VFIO device).
KVM: arm64: Expose new KVM cap for cacheable PFNMAP
KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
KVM: arm64: Block cacheable PFNMAP mapping
KVM: arm64: Assume non-PFNMAP/MIXEDMAP VMAs can be mapped cacheable
KVM: arm64: Rename the device variable to s2_force_noncacheable
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM allows userspace to control GICD_IIDR.Revision and
GICD_TYPER2.nASSGIcap prior to initialization for the sake of
provisioning the guest-visible feature set. Document the userspace
expectations surrounding accesses to these registers.
KVM: arm64: selftests: Add test for nASSGIcap attribute
Extend vgic_init to test the nASSGIcap attribute, asserting that it is
configurable (within reason) prior to initializing the VGIC.
Additionally, check that userspace cannot set the attribute after the
VGIC has been initialized.
KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
KVM unconditionally advertises GICD_TYPER2.nASSGIcap (which internally
implies vSGIs) on GICv4.1 systems. Allow userspace to change whether a
VM supports the feature. Only allow changes prior to VGIC initialization
as at that point vPEs need to be allocated for the VM.
For convenience, bundle support for vLPIs and vSGIs behind this feature,
allowing userspace to control vPE allocation for VMs in environments
that may be constrained on vPE IDs.
Oliver Upton [Thu, 24 Jul 2025 06:28:02 +0000 (23:28 -0700)]
KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
KVM allows userspace to write GICD_IIDR for backwards-compatibility with
older kernels, where new implementation revisions have new features.
Unfortunately this is allowed to happen at runtime, and ripping features
out from underneath a running guest is a terrible idea.
While we can't do anything about the ABI, prepare for more ID-like
registers by allowing access to GICD_IIDR prior to VGIC initialization.
Hoist initializaiton of the default value to kvm_vgic_create() and
discard the incorrect comment that assumed userspace could access the
register before initialization (until now).
Subsequent changes will allow the VMM to further provision the GIC
feature set, e.g. the presence of nASSGIcap.
Consolidate the duplicated handling of the VGICv3 maintenance IRQ
attribute as a regular GICv3 attribute, as it is neither a register nor
a common attribute. As this is now handled separately from the VGIC
registers, the locking is relaxed to only acquire the intended
config_lock.
Oliver Upton [Thu, 24 Jul 2025 06:28:00 +0000 (23:28 -0700)]
KVM: arm64: Disambiguate support for vSGIs v. vLPIs
vgic_supports_direct_msis() is a bit of a misnomer, as it returns true
if either vSGIs or vLPIs are supported. Pick it apart into a few
predicates and replace some open-coded checks for vSGIs, including an
opportunistic fix to always check if the CPUIF is capable of handling
vSGIs.
We have a lot of more or less useful vgic tests, but none of them
tracks the availability of GICv3 system registers, which is a bit
annoying.
Add one such test, which covers both EL1 and EL2 registers.
Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250718111154.104029-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Fri, 18 Jul 2025 11:11:52 +0000 (12:11 +0100)]
KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
check_sysreg_table() has a wonky 'is_32" parameter, which is really
an indication that we should enforce the presence of a reset helper.
Clean this up by naming the variable accordingly and inverting the
condition. Contrary to popular belief, system instructions don't
have a reset value (duh!), and therefore do not need to be checked
for reset (they escaped the check through luck...).
Marc Zyngier [Fri, 18 Jul 2025 11:11:51 +0000 (12:11 +0100)]
KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
The sysreg tables are supposed to be sorted so that a binary search
can easily find them. However, ICH_HCR_EL2 is obviously at the wrong
spot.
Move it where it belongs.
Fixes: 9fe9663e47e21 ("KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250718111154.104029-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Ben Hutchings [Thu, 17 Jul 2025 14:47:32 +0000 (16:47 +0200)]
sh: Do not use hyphen in exported variable name
arch/sh/Makefile defines and exports ld-bfd to be used by
arch/sh/boot/compressed/Makefile and arch/sh/boot/romimage/Makefile.
However some shells, including dash, will not pass through environment
variables whose name includes a hyphen. Usually GNU make does not use
a shell to recurse, but if e.g. $(srctree) contains '~' it will use a
shell here.
Other instances of this problem were previously fixed by commits 2bfbe7881ee0 "kbuild: Do not use hyphen in exported variable name"
and 82977af93a0d "sh: rename suffix-y to suffix_y".
Rename the variable to ld_bfd.
References: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=sh4&ver=4.13%7Erc5-1%7Eexp1&stamp=1502943967&raw=0 Fixes: 7b022d07a0fd ("sh: Tidy up the ldscript output format specifier.") Signed-off-by: Ben Hutchings <benh@debian.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Clicking the Back button may navigate to a non-menu hierarchy level.
[Example]
menu "menu1"
config A
bool "A"
default y
config B
bool "B"
depends on A
default y
menu "menu2"
depends on B
config C
bool "C"
default y
endmenu
endmenu
After being re-parented by menu_finalize(), the menu tree is structured
like follows:
menu "menu1"
\-- A
\-- B
\-- menu "menu2"
\-- C
In Single view, visit "menu2" and click the Back button. It should go up
to "menu1" and show A, B and "menu2", but instead goes up to A and show
only B and "menu2". This is a bug in on_back_clicked().
PCI: pnv_php: Enable third attention indicator state
The PCIe specification allows three attention indicator states, on, off,
and blink. Enable all three states instead of basic on / off control.
This changes the userspace API (writes to the sysfs "attention" file) to
match the behavior of pciehp. Here's the comparison of previous and new
indicator behavior:
Value Previous New Behavior
----- -------- ------------------------
0 off (reserved, so undefined)
1 on on
2 on blink
3 on off
PCI: pnv_php: Fix surprise plug detection and recovery
The existing PowerNV hotplug code did not handle surprise plug events
correctly, leading to a complete failure of the hotplug system after device
removal and a required reboot to detect new devices.
This comes down to two issues:
1) When a device is surprise removed, often the bridge upstream
port will cause a PE freeze on the PHB. If this freeze is not
cleared, the MSI interrupts from the bridge hotplug notification
logic will not be received by the kernel, stalling all plug events
on all slots associated with the PE.
2) When a device is removed from a slot, regardless of surprise or
programmatic removal, the associated PHB/PE ls left frozen.
If this freeze is not cleared via a fundamental reset, skiboot
is unable to clear the freeze and cannot retrain / rescan the
slot. This also requires a reboot to clear the freeze and redetect
the device in the slot.
Issue the appropriate unfreeze and rescan commands on hotplug events,
and don't oops on hotplug if pci_bus_to_OF_node() returns NULL.
PCI: pnv_php: Work around switches with broken presence detection
The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system
was observed to incorrectly assert the Presence Detect Set bit in its
capabilities when tested on a Raptor Computing Systems Blackbird system,
resulting in the hot insert path never attempting a rescan of the bus
and any downstream devices not being re-detected.
Work around this by additionally checking whether the PCIe data link is
active or not when performing presence detection on downstream switches'
ports, similar to the pciehp_hpc.c driver.
When the root of a nested PCIe bridge configuration is unplugged, the
pnv_php driver leaked the allocated IRQ resources for the child bridges'
hotplug event notifications, resulting in a panic.
Fix this by walking all child buses and deallocating all its IRQ resources
before calling pci_hp_remove_devices().
Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so
that it is only destroyed in pnv_php_free_slot(), instead of
pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will
now be called by workers triggered by hot unplug interrupts, so the
workqueue needs to stay allocated.
The abridged kernel panic that occurs without this patch is as follows:
Jeremy Linton [Mon, 14 Jul 2025 22:29:23 +0000 (17:29 -0500)]
scripts: add zboot support to extract-vmlinux
Zboot compressed kernel images are used for arm64 kernels on various
distros.
extract-vmlinux fails with those kernels because the wrapped image is
another PE. While this could be a bit confusing, the tools primary
purpose of unwrapping and decompressing the contained kernel image
makes it the obvious place for this functionality.
Add a 'file' check in check_vmlinux() that detects a contained PE
image before trying readelf. Recent (FILES_39, Jun/2020) file
implementations output something like:
Which is also a stronger statement than readelf provides so drop that
part of the comment. At the same time this means that kernel images
which don't appear to contain a compressed image will be returned
rather than reporting an error. Which matches the behavior for
existing ELF files.
The extracted PE image can then be inspected, or used as would any
other kernel PE.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When writing symtypes information, we iterate through the entire hash
table containing type expansions. The key order varies unpredictably
as new entries are added, making it harder to compare symtypes between
builds.
Resolve this by sorting the type expansions by name before output.
Masahiro Yamada [Sun, 29 Jun 2025 18:48:31 +0000 (03:48 +0900)]
kconfig: add a function to dump all menu entries in a tree-like format
This is useful for debugging purposes. menu_finalize() re-parents menu
entries, and this function can be used to dump the final structure of
the menu tree.