Palmer Dabbelt [Thu, 5 Jun 2025 18:11:21 +0000 (11:11 -0700)]
Merge tag 'riscv-mw1-6.16-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into for-next
riscv patches for 6.16-rc1
* Implement atomic patching support for ftrace which finally allows to
get rid of stop_machine().
* Support for kexec_file_load() syscall
* Improve module loading time by changing the algorithm that counts the
number of plt/got entries in a module.
* Zicbop is now used in the kernel to prefetch instructions
[Palmer: There's been two rounds of surgery on this one, so as a result
it's a bit different than the PR.]
* alex-pr: (734 commits)
riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
MAINTAINERS: Update Atish's email address
riscv: hwprobe: export Zabha extension
riscv: Make regs_irqs_disabled() more clear
perf symbols: Ignore mapping symbols on riscv
RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
riscv: module: Optimize PLT/GOT entry counting
riscv: Add support for PUD THP
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
riscv: Introduce Zicbop instructions
riscv/kexec_file: Fix comment in purgatory relocator
riscv: kexec_file: Support loading Image binary file
riscv: kexec_file: Split the loading of kernel and others
riscv: Documentation: add a description about dynamic ftrace
riscv: ftrace: support direct call using call_ops
riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
riscv: ftrace: support PREEMPT
riscv: add a data fence for CMODX in the kernel mode
...
Alexandre Ghiti [Tue, 6 May 2025 08:19:57 +0000 (08:19 +0000)]
Merge patch series "riscv: Add Zicbop & prefetchw support"
Alexandre Ghiti <alexghiti@rivosinc.com> says:
I found this lost series developed by Guo so here is a respin with the
comments on v2 applied.
This patch series adds Zicbop support and then enables the Linux
prefetch features.
* patches from https://lore.kernel.org/r/20250421142441.395849-1-alexghiti@rivosinc.com:
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
riscv: Introduce Zicbop instructions
RISCV ELF use mapping symbols with special names $x, $d to
identify regions of RISCV code or code with different ISAs[1].
These symbols don't identify functions, so will confuse the
perf output.
The patch filters out these symbols at load time, similar to
"4886f2ca perf symbols: Ignore mapping symbols on aarch64".
Palmer Dabbelt [Wed, 28 May 2025 00:22:40 +0000 (17:22 -0700)]
Merge patch series "riscv: kexec_file: Support loading Image binary file"
Björn Töpel <bjorn@kernel.org> says:
From: Björn Töpel <bjorn@rivosinc.com>
Hi!
For over a year ago, Daniel and I was testing the V2 of Song's series.
I also promised to take the V2, that had been sitting on the lists for
too long, to rebase it on a new kernel, and re-test it.
One year later, here's the V3! ;-)
There are no changes from V2 other, than some simple checkpatch
cleanups.
Song's original cover:
| This series makes the kexec_file_load() syscall support to load
| Image binary file. At the same time, corresponding support for
| kexec-tools had been pushed to my repo[2].
|
| Now, we can leverage that kexec-tools and this series to use the
| kexec_load() or kexec_file_load() syscall to boot both vmlinux and
| Image file, as seen in these combo tests:
|
| ```
| 1. kexec -l vmlinux
| 2. kexec -l Image
| 3. kexec -s -l vmlinux
| 4. kexec -s -l Image
| ```
Notably, kexec-tools has still not made it upstream. I've prepared a
branch on my GH [3], that I indend to post ASAP. That branch is a
collection of fixes/features, including Song's userland Image loading.
The V2 is here [2], and V1 [1].
I've tested the kexec-file/Image on qemu-rv64, with following
combinations:
* ACPI/UEFI
* DT/UEFI
* DT
both "regular" kexec (-s + -e), and crashkernels (-p).
Note that there are two purgatory patches that has to be present (part
of -rc1, so all good):
commit 28093cfef5dd ("riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator")
commit 3f7023171df4 ("riscv/purgatory: 4B align purgatory_start")
* patches from https://lore.kernel.org/r/20250409193004.643839-1-bjorn@kernel.org:
riscv: kexec_file: Support loading Image binary file
riscv: kexec_file: Split the loading of kernel and others
Samuel Holland [Wed, 9 Apr 2025 17:14:51 +0000 (10:14 -0700)]
riscv: module: Optimize PLT/GOT entry counting
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.
Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340919fb ("arm64/module: Optimize module load time by
optimizing PLT counting").
Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.
Merge patch series "riscv: ftrace: atmoic patching and preempt improvements"
Andy Chiu <andybnac@gmail.com> says:
This series makes atomic code patching in ftrace possible and eliminates
the need of the stop_machine dance. The major difference of this version
is that we merge the CALL_OPS support from Puranjay [1] and make direct
calls available for practical uses such as BPF. Thanks for the time
reviewing the series and suggestions, we hope this version gets a step
closer to happening in the upstream.
Please reference the link to v3 below for more introductory view of the
implementation [2]
* patches from https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com:
riscv: Documentation: add a description about dynamic ftrace
riscv: ftrace: support direct call using call_ops
riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
riscv: ftrace: support PREEMPT
riscv: add a data fence for CMODX in the kernel mode
riscv: vector: Support calling schedule() for preemptible Vector
riscv: ftrace: do not use stop_machine to update code
riscv: ftrace: prepare ftrace for atomic code patching
kernel: ftrace: export ftrace_sync_ipi
riscv: ftrace: align patchable functions to 4 Byte boundary
riscv: ftrace factor out code defined by !WITH_ARG
riscv: ftrace: support fastcc in Clang for WITH_ARGS
riscv: xchg: Prefetch the destination word for sc.w
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.
This patch makes use of prefetch.w to prefetch cachelines for write
prior to lr/sc loops when using the xchg_small atomic routine.
This patch is inspired by commit 0ea366f5e1b6 ("arm64: atomics:
prefetch the destination word for write prior to stxr").
Yao Zi [Wed, 26 Mar 2025 07:34:51 +0000 (07:34 +0000)]
riscv/kexec_file: Fix comment in purgatory relocator
Apparently sec_base doesn't mean relocated symbol value, which seems a
copy-pasting error in the comment. Assigned with the address of section
indexed by sym->st_shndx, it should represent base address of the
relevant section. Let's fix the comment to avoid possible confusion.
Fixes: 838b3e28488f ("RISC-V: Load purgatory in kexec_file") Signed-off-by: Yao Zi <ziyao@disroot.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250326073450.57648-2-ziyao@disroot.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Song Shuai [Wed, 9 Apr 2025 19:29:58 +0000 (21:29 +0200)]
riscv: kexec_file: Split the loading of kernel and others
This is the preparative patch for kexec_file_load Image support.
It separates the elf_kexec_load() as two parts:
- the first part loads the vmlinux (or Image)
- the second part loads other segments (e.g. initrd,fdt,purgatory)
And the second part is exported as the load_extra_segments() function
which would be used in both kexec-elf.c and kexec-image.c.
No functional change intended.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250409193004.643839-2-bjorn@kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The idea and most of the implementation is taken from the ARM64's
implementation of the same feature. The idea is to place a pointer to
the ftrace_ops as a literal at a fixed offset from the function entry
point, which can be recovered by the common ftrace trampoline.
We use -fpatchable-function-entry to reserve 8 bytes above the function
entry by emitting 2 4 byte or 4 2 byte nops depending on the presence of
CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer
to the associated ftrace_ops for that callsite. Functions are aligned to
8 bytes to make sure that the accesses to this literal are atomic.
This approach allows for directly invoking ftrace_ops::func even for
ftrace_ops which are dynamically-allocated (or part of a module),
without going via ftrace_ops_list_func.
We've benchamrked this with the ftrace_ops sample module on Spacemit K1
Jupiter:
Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/3 of the
overhead prior to this series (from 122ns to 38ns). This is largely
due to permitting calls to dynamically-allocated ftrace_ops without
going through ftrace_ops_list_func.
Andy Chiu [Mon, 7 Apr 2025 18:08:32 +0000 (02:08 +0800)]
riscv: add a data fence for CMODX in the kernel mode
RISC-V spec explicitly calls out that a local fence.i is not enough for
the code modification to be visble from a remote hart. In fact, it
states:
To make a store to instruction memory visible to all RISC-V harts, the
writing hart also has to execute a data FENCE before requesting that all
remote RISC-V harts execute a FENCE.I.
Although current riscv drivers for IPI use ordered MMIO when sending IPIs
in order to synchronize the action between previous csd writes, riscv
does not restrict itself to any particular flavor of IPI. Any driver or
firmware implementation that does not order data writes before the IPI
may pose a risk for code-modifying race.
Thus, add a fence here to order data writes before making the IPI.
Signed-off-by: Andy Chiu <andybnac@gmail.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250407180838.42877-8-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Andy Chiu [Mon, 7 Apr 2025 18:08:31 +0000 (02:08 +0800)]
riscv: vector: Support calling schedule() for preemptible Vector
Each function entry implies a call to ftrace infrastructure. And it may
call into schedule in some cases. So, it is possible for preemptible
kernel-mode Vector to implicitly call into schedule. Since all V-regs
are caller-saved, it is possible to drop all V context when a thread
voluntarily call schedule(). Besides, we currently don't pass argument
through vector register, so we don't have to save/restore V-regs in
ftrace trampoline.
Andy Chiu [Mon, 7 Apr 2025 18:08:29 +0000 (02:08 +0800)]
riscv: ftrace: prepare ftrace for atomic code patching
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.
Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.
The new don't-stop-the-world text patching on change only one RISC-V
instruction:
This means that f+0x0 is fixed, and should not be claimed by ftrace,
e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
offset and MCOUNT_INSN_SIZE accordingly.
[ alex: Fix build errors with !CONFIG_DYNAMIC_FTRACE ]
Co-developed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20250407180838.42877-5-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Andy Chiu [Mon, 7 Apr 2025 18:08:28 +0000 (02:08 +0800)]
kernel: ftrace: export ftrace_sync_ipi
The following ftrace patch for riscv uses a data store to update ftrace
function. Therefore, a romote fence is required to order it against
function_trace_op updates. The mechanism is similar to the fence between
function_trace_op and update_ftrace_func in the generic ftrace, so we
leverage the same ftrace_sync_ipi function.
[ alex: Fix build warning when !CONFIG_DYNAMIC_FTRACE ]
Andy Chiu [Mon, 7 Apr 2025 18:08:27 +0000 (02:08 +0800)]
riscv: ftrace: align patchable functions to 4 Byte boundary
We are changing ftrace code patching in order to remove dependency from
stop_machine() and enable kernel preemption. This requires us to align
functions entry at a 4-B align address.
However, -falign-functions on older versions of GCC alone was not strong
enoungh to align all functions. In fact, cold functions are not aligned
after turning on optimizations. We consider this is a bug in GCC and
turn off guess-branch-probility as a workaround to align all functions.
The option -fmin-function-alignment is able to align all functions
properly on newer versions of gcc. So, we add a cc-option to test if
the toolchain supports it.
Andy Chiu [Mon, 7 Apr 2025 18:08:26 +0000 (02:08 +0800)]
riscv: ftrace factor out code defined by !WITH_ARG
DYNAMIC_FTRACE selects DYNAMIC_FTRACE_WITH_ARGS and mcount-dyn.S in
riscv, so we can remove ifdef jargons of WITH_ARG when it is known that
DYNAMIC_FTRACE is true.
Andy Chiu [Mon, 7 Apr 2025 18:08:25 +0000 (02:08 +0800)]
riscv: ftrace: support fastcc in Clang for WITH_ARGS
Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.
- [1]: https://reviews.llvm.org/D68559
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/ Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS") Acked-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Palmer Dabbelt [Thu, 8 May 2025 18:01:47 +0000 (11:01 -0700)]
Merge patch series "riscv: Add vendor extensions support for SiFive"
Cyan Yang <cyan.yang@sifive.com> says:
This patch set adds four vendor-specific ISA extensions from SiFive:
"xsfvqmaccdod", "xsfvqmaccqoq", "xsfvfnrclipxfqf", and "xsfvfwmaccqqq".
Additionally, a new hwprobe key, RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0,
has been added to query which SiFive vendor extensions are supported on
the current platform.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Link: https://lore.kernel.org/r/20250418053239.4351-1-cyan.yang@sifive.com
* b4-shazam-merge:
riscv: hwprobe: Add SiFive xsfvfwmaccqqq vendor extension
riscv: hwprobe: Document SiFive xsfvfwmaccqqq vendor extension
riscv: Add SiFive xsfvfwmaccqqq vendor extension
dt-bindings: riscv: Add xsfvfwmaccqqq ISA extension description
riscv: hwprobe: Add SiFive xsfvfnrclipxfqf vendor extension
riscv: hwprobe: Document SiFive xsfvfnrclipxfqf vendor extension
riscv: Add SiFive xsfvfnrclipxfqf vendor extension
dt-bindings: riscv: Add xsfvfnrclipxfqf ISA extension description
riscv: hwprobe: Add SiFive vendor extension support and probe for xsfqmaccdod and xsfqmaccqoq
riscv: hwprobe: Document SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions
riscv: Add SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions
dt-bindings: riscv: Add xsfvqmaccdod and xsfvqmaccqoq ISA extension description
Cyan Yang [Fri, 18 Apr 2025 05:32:31 +0000 (13:32 +0800)]
riscv: hwprobe: Add SiFive vendor extension support and probe for xsfqmaccdod and xsfqmaccqoq
Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0" which allows
userspace to probe for the new vendor extensions from SiFive. Also, add
new hwprobe for SiFive "xsfvqmaccdod" and "xsfvqmaccqoq" vendor
extensions.
Cyan Yang [Fri, 18 Apr 2025 05:32:30 +0000 (13:32 +0800)]
riscv: hwprobe: Document SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions
Document the support for sifive vendor extensions using the key
RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0 and two vendor extensions for SiFive
Int8 Matrix Multiplication Instructions using
RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCDOD and
RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCQOQ.
Xi Ruoyao [Mon, 24 Feb 2025 11:20:40 +0000 (19:20 +0800)]
riscv: vDSO: Remove --hash-style=both
When RISC-V borned, DT_GNU_HASH had already became the de-facto
standard so DT_HASH is just wasting storage space. Remove the explicit
--hash-style=both setting and rely on the distro toolchain default,
which is most likely "gnu" (i.e. generating only DT_GNU_HASH, no
DT_HASH).
Following the logic of commit 48f6430505c0
("arm64/vdso: Remove --hash-style=sysv").
Palmer Dabbelt [Thu, 8 May 2025 17:01:02 +0000 (10:01 -0700)]
Merge patch series "riscv: uaccess: optimisations"
Cyril Bur <cyrilbur@tenstorrent.com> says:
This series tries to optimize riscv uaccess by allowing the use of
user_access_begin() and user_access_end() which permits grouping user accesses
and avoiding the CSR write penalty for each access.
The error path can also be optimised using asm goto which patches 3 and 4
achieve. This will speed up jumping to labels by avoiding the need of an
intermediary error type variable within the uaccess macros
I did read the discussion this series generated. It isn't clear to me
which direction to take the patches, if any.
* b4-shazam-merge:
riscv: uaccess: use 'asm_goto_output' for get_user()
riscv: uaccess: use 'asm goto' for put_user()
riscv: uaccess: use input constraints for ptr of __put_user()
riscv: implement user_access_begin() and families
riscv: save the SR_SUM status over switches
With 'asm goto' we don't need to test the error etc, the exception just
jumps to the error handling directly.
Because there are no output clobbers which could trigger gcc bugs [1]
the use of asm_goto_output() macro is not necessary here. Not using
asm_goto_output() is desirable as the generated output asm will be
cleaner.
Use of the volatile keyword is redundant as per gcc 14.2.0 manual section
6.48.2.7 Goto Labels:
> Also note that an asm goto statement is always implicitly considered
volatile.
riscv: uaccess: use input constraints for ptr of __put_user()
Putting ptr in the inputs as opposed to output may seem incorrect but
this is done for a few reasons:
- Not having it in the output permits the use of asm goto in a
subsequent patch. There are bugs in gcc [1] which would otherwise
prevent it.
- Since the output memory is userspace there isn't any real benefit from
telling the compiler about the memory clobber.
- x86, arm and powerpc all use this technique.
The problem is that we may have a sleeping function as argument which
could clear SR_SUM causing the panic above. This was fixed by
evaluating the argument of the put_user() macro outside the user-enabled
section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
enabling user access")"
In order for riscv to take advantage of unsafe_get/put_XXX() macros and
to avoid the same issue we had with put_user() and sleeping functions we
must ensure code flow can go through switch_to() from within a region of
code with SR_SUM enabled and come back with SR_SUM still enabled. This
patch addresses the problem allowing future work to enable full use of
unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
on every access. Make switch_to() save and restore SR_SUM.
The C sequence points are complicated things, and gcc-15 has apparently
added a warning for the case where an object is both used and modified
multiple times within the same sequence point.
That's a great warning.
Or rather, it would be a great warning, except gcc-15 seems to not
really be very exact about it, and doesn't notice that the modification
are to two entirely different members of the same object: the array
counter and the array entries.
So that seems kind of silly.
That said, the code that gcc complains about is unnecessarily
complicated, so moving the array counter update into a separate
statement seems like the most straightforward fix for these warnings:
drivers/net/wireless/intel/iwlwifi/mld/d3.c: In function ‘iwl_mld_set_netdetect_info’:
drivers/net/wireless/intel/iwlwifi/mld/d3.c:1102:66: error: operation on ‘netdetect_info->n_matches’ may be undefined [-Werror=sequence-point]
1102 | netdetect_info->matches[netdetect_info->n_matches++] = match;
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~
drivers/net/wireless/intel/iwlwifi/mld/d3.c:1120:58: error: operation on ‘match->n_channels’ may be undefined [-Werror=sequence-point]
1120 | match->channels[match->n_channels++] =
| ~~~~~~~~~~~~~~~~~^~
side note: the code at that second warning is actively buggy, and only
works on little-endian machines that don't do strict alignment checks.
The code casts an array of integers into an array of unsigned long in
order to use our bitmap iterators. That happens to work fine on any
sane architecture, but it's still wrong.
This does *not* fix that more serious problem. This only splits the two
assignments into two statements and fixes the compiler warning. I need
to get rid of the new warnings in order to be able to actually do any
build testing.
All of these cases are perfectly valid and good traditional C, but hit
by the "you're not NUL-terminating your byte array" warning.
And none of the cases want any terminating NUL character.
Mark them __nonstring to shut up gcc-15 (and in the case of the ak8974
magnetometer driver, I just removed the explicit array size and let gcc
expand the 3-byte and 6-byte arrays by one extra byte, because it was
the simpler change).
gcc-15: get rid of misc extra NUL character padding
This removes two cases of explicit NUL padding that now causes warnings
because of '-Wunterminated-string-initialization' being part of -Wextra
in gcc-15.
Gcc is being silly in this case when it says that it truncates a NUL
terminator, because in these cases there were _multiple_ NUL characters.
But we can get rid of the warning by just simplifying the two
initializers that trigger the warning for me, so this does exactly that.
I'm not sure why the power supply code did that odd
.attr_name = #_name "\0",
pattern: it was introduced in commit 2cabeaf15129 ("power: supply: core:
Cleanup power supply sysfs attribute list"), but that 'attr_name[]'
field is an explicitly sized character array in a statically initialized
variable, and a string initializer always has a terminating NUL _and_
statically initialized character arrays are zero-padded anyway, so it
really seems to be rather extraneous belt-and-suspenders.
The zero_uuid[16] initialization in drivers/md/bcache/super.c makes
perfect sense, but it isn't necessary for the same reasons, and not
worth the new gcc warning noise.
gcc-15: acpi: sprinkle random '__nonstring' crumbles around
This is not great: I'd much rather introduce a typedef that is a "ACPI
name byte buffer", and use that to mark these special 4-byte ACPI names
that do not use NUL termination.
But as noted in the previous commit ("gcc-15: make 'unterminated string
initialization' just a warning") gcc doesn't actually seem to support
that notion, so instead you have to just mark every single array
declaration individually.
So this is not pretty, but this gets rid of the bulk of the annoying
warnings during an allmodconfig build for me.
and we use this all over the kernel. And the warning is fine, but gcc
developers apparently never made a reasonable way to disable it. As is
(sadly) tradition with these things.
Yes, there's "__attribute__((nonstring))", and we have a macro to make
that absolutely disgusting syntax more palatable (ie the kernel syntax
for that monstrosity is just "__nonstring").
But that attribute is misdesigned. What you'd typically want to do is
tell the compiler that you are using a type that isn't a string but a
byte array, but that doesn't work at all:
warning: ‘nonstring’ attribute does not apply to types [-Wattributes]
and because of this fundamental mis-design, you then have to mark each
instance of that pattern.
This is particularly noticeable in our ACPI code, because ACPI has this
notion of a 4-byte "type name" that gets used all over, and is exactly
this kind of byte array.
This is a sad oversight, because the warning is useful, but really would
be so much better if gcc had also given a sane way to indicate that we
really just want a byte array type at a type level, not the broken "each
and every array definition" level.
So now instead of creating a nice "ACPI name" type using something like
typedef char acpi_name_t[4] __nonstring;
we have to do things like
char name[ACPI_NAMESEG_SIZE] __nonstring;
in every place that uses this concept and then happens to have the
typical initializers.
This is annoying me mainly because I think the warning _is_ a good
warning, which is why I'm not just turning it off in disgust. But it is
hampered by this bad implementation detail.
[ And obviously I'm doing this now because system upgrades for me are
something that happen in the middle of the release cycle: don't do it
before or during travel, or just before or during the busy merge
window period. ]
Merge tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"16 hotfixes. 2 are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
All patches are basically for MM although five are alterations to
MAINTAINERS"
[ Basic counting skills are clearly not a strictly necessary requirement
for kernel maintainers. - Linus ]
* tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add section for locking of mm's and VMAs
mm: vmscan: fix kswapd exit condition in defrag_mode
mm: vmscan: restore high-cpu watermark safety in kswapd
MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
mm, hugetlb: increment the number of pages to be reset on HVO
writeback: fix false warning in inode_to_wb()
docs: ABI: replace mcroce@microsoft.com with new Meta address
mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
MAINTAINERS: add memory advice section
MAINTAINERS: add mmap trace events to MEMORY MAPPING
mm: memcontrol: fix swap counter leak from offline cgroup
MAINTAINERS: add MM subsection for the page allocator
MAINTAINERS: update SLAB ALLOCATOR maintainers
fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages
mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
Merge tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Revert the hfs{plus} deprecation warning that's also included in this
pull request. The commit introducing the deprecation warning resides
rather early in this branch. So simply dropping it would've rebased
all other commits which I decided to avoid. Hence the revert in the
same branch
[ Background - the deprecation warning discussion resulted in people
stepping up, and so hfs{plus} will have a maintainer taking care of
it after all.. - Linus ]
- Switch CONFIG_SYSFS_SYCALL default to n and decouple from
CONFIG_EXPERT
- Fix an audit bug caused by changes to our kernel path lookup helpers
this cycle. Audit needs the parent path even if the dentry it tried
to look up is negative
- Ensure that the kernel path lookup helpers leave the passed in path
argument clean when they return an error. This is consistent with all
our other helpers
- Ensure that vfs_getattr_nosec() calls bdev_statx() so the relevant
information is available to kernel consumers as well
- Don't set a timer and call schedule() if the timer will expire
immediately in epoll
- Make netfs lookup tables with __nonstring
* tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Revert "hfs{plus}: add deprecation warning"
fs: move the bdex_statx call to vfs_getattr_nosec
netfs: Mark __nonstring lookup tables
eventpoll: Set epoll timeout if it's in the future
fs: ensure that *path_locked*() helpers leave passed path pristine
fs: add kern_path_locked_negative()
hfs{plus}: add deprecation warning
Kconfig: switch CONFIG_SYSFS_SYCALL default to n
* tag 'i2c-for-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: atr: Fix wrong include
i2c: cros-ec-tunnel: defer probe if parent EC is not present
Merge tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Initialize hash variables in ftrace subops logic
The fix that simplified the ftrace subops logic opened a path where
some variables could be used without being initialized, and done
subtly where the compiler did not catch it. Initialize those
variables to the EMPTY_HASH, which is the default hash.
- Reinitialize the hash pointers after they are freed
Some of the hash pointers in the subop logic were freed but may still
be referenced later. To prevent use-after-free bugs, initialize them
back to the EMPTY_HASH.
- Free the ftrace hashes when they are replaced
The fix that simplified the subops logic updated some hash pointers,
but left the original hash that they were pointing to where they are
no longer used. This caused a memory leak. Free the hashes that are
pointed to by the pointers when they are replaced.
- Fix size initialization of ftrace direct function hash
The ftrace direct function hash used by BPF initialized the hash size
incorrectly. It checked the size of items to a hard coded 32, which
made the hash bit size of 5. The hash size is supposed to be limited
by the bit size of the hash, as the bitmask is allowed to be greater
than 5. Rework the size check to first pass the number of elements to
fls() and then compare that to FTRACE_HASH_MAX_BITS before allocating
the hash.
- Fix format output of ftrace_graph_ent_entry event
The field depth of the ftrace_graph_ent_entry event is of size 4 but
the output showed it as unsigned long and use "%lu". Change it to
unsigned int and use "%u" in the print format that is displayed to
user space.
- Fix the trace event filter on strings
Events can be filtered on numbers or string values. The return value
checked from strncpy_from_kernel_nofault() and
strncpy_from_user_nofault() was used to determine if reading the
strings would fault or not. It would return fault if the value was
non zero, which is basically meant that it was always considering the
read as a fault.
- Add selftest to test trace event string filtering
In order to catch the breakage of the string filtering, add a self
test to make sure that it continues to work.
* tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: selftests: Add testing a user string to filters
tracing: Fix filter string testing
ftrace: Fix type of ftrace_graph_ent_entry.depth
ftrace: fix incorrect hash size in register_ftrace_direct()
ftrace: Free ftrace hashes after they are replaced in the subops code
ftrace: Reinitialize hash to EMPTY_HASH after freeing
ftrace: Initialize variables for ftrace_startup/shutdown_subops()
Merge tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- v6.15 libcrc clean-up makes invalid configurations possible
- Fix a potential deadlock introduced during the v6.15 merge window
* tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: decrease sc_count directly if fail to queue dl_recall
nfs: add missing selections of CONFIG_CRC32
Merge tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix missing KASAN LLVM flags on first build (and fix spurious
rebuilds) by skipping '--target'
- Fix Make < 4.3 build error by using '$(pound)'
- Fix UML build error by removing 'volatile' qualifier from io
helpers
- Fix UML build error by adding 'dma_{alloc,free}_attrs()' helpers
- Clean gendwarfksyms warnings by avoiding to export '__pfx' symbols
- Clean objtool warning by adding a new 'noreturn' function for
1.86.0
- Disable 'needless_continue' Clippy lint due to new 1.86.0 warnings
- Add missing 'ffi' crate to 'generate_rust_analyzer.py'
'pin-init' crate:
- Import a couple fixes from upstream"
* tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: helpers: Add dma_alloc_attrs() and dma_free_attrs()
rust: helpers: Remove volatile qualifier from io helpers
rust: kbuild: use `pound` to support GNU Make < 4.3
objtool/rust: add one more `noreturn` Rust function for Rust 1.86.0
rust: kasan/kbuild: fix missing flags on first build
rust: disable `clippy::needless_continue`
rust: kbuild: Don't export __pfx symbols
rust: pin-init: use Markdown autolinks in Rust comments
rust: pin-init: alloc: restrict `impl ZeroableOption` for `Box` to `T: Sized`
scripts: generate_rust_analyzer: Add ffi crate
Merge tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Easter rc3 pull request, fixes in all the usuals, amdgpu, xe, msm,
with some i915/ivpu/mgag200/v3d fixes, then a couple of bits in
dma-buf/gem.
Hopefully has no easter eggs in it.
dma-buf:
- Correctly decrement refcounter on errors
i915:
- Fix DP DSC configurations that require 3 DSC engines per pipe
xe:
- Fix LRC address being written too late for GuC
- Fix notifier vs folio deadlock
- Fix race betwen dma_buf unmap and vram eviction
- Fix debugfs handling PXP terminations unconditionally
msm:
- Display:
- Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
case of multi-rect
- Fix to validate plane_state pointer before using it in
dpu_plane_virtual_atomic_check()
- Fix to make sure dereferencing dpu_encoder_phys happens after
making sure it is valid in _dpu_encoder_trigger_start()
- Remove the remaining intr_tear_rd_ptr which we initialized to
-1 because NO_IRQ indices start from 0 now
- GPU:
- Fix IB_SIZE overflow
ivpu:
- Fix debugging
- Fixes to frequency
- Support firmware API 3.28.3
- Flush jobs upon reset
mgag200:
- Set vblank start to correct values
v3d:
- Fix Indirect Dispatch"
* tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel: (26 commits)
drm/msm/a6xx+: Don't let IB_SIZE overflow
drm/xe/pxp: do not queue unneeded terminations from debugfs
drm/xe/dma_buf: stop relying on placement in unmap
drm/xe/userptr: fix notifier vs folio deadlock
drm/xe: Set LRC addresses before guc load
drm/mgag200: Fix value in <VBLKSTR> register
drm/gem: Internally test import_attach for imported objects
drm/amdgpu: Use the right function for hdp flush
drm/amd/display/dml2: use vzalloc rather than kzalloc
drm/amdgpu: Add back JPEG to video caps for carrizo and newer
drm/amdgpu: fix warning of drm_mm_clean
drm/amd: Forbid suspending into non-default suspend states
drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v4
drm/i915/dp: Check for HAS_DSC_3ENGINES while configuring DSC slices
drm/i915/display: Add macro for checking 3 DSC engines
dma-buf/sw_sync: Decrement refcount on error in sw_sync_ioctl_get_deadline()
accel/ivpu: Add cmdq_id to job related logs
accel/ivpu: Show NPU frequency in sysfs
accel/ivpu: Fix the NPU's DPU frequency calculation
accel/ivpu: Update FW Boot API to version 3.28.3
...
Dave Airlie [Sat, 19 Apr 2025 05:09:29 +0000 (15:09 +1000)]
Merge tag 'drm-msm-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Fixes for v6.15-rc3
Display:
- Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
case of multi-rect
- Fix to validate plane_state pointer before using it in
dpu_plane_virtual_atomic_check()
- Fix to make sure dereferencing dpu_encoder_phys happens after
making sure it is valid in _dpu_encoder_trigger_start()
- Remove the remaining intr_tear_rd_ptr which we initialized
to -1 because NO_IRQ indices start from 0 now
Dave Airlie [Sat, 19 Apr 2025 04:59:47 +0000 (14:59 +1000)]
Merge tag 'drm-xe-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix LRC address being written too late for GuC
- Fix notifier vs folio deadlock
- Fix race betwen dma_buf unmap and vram eviction
- Fix debugfs handling PXP terminations unconditionally
Merge tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix hard link lease key problem when close is deferred
- Revert the socket lockdep/refcount workarounds done in cifs.ko now
that it is fixed at the socket layer
* tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
Revert "smb: client: fix TCP timers deadlock after rmmod"
Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free"
smb3 client: fix open hardlink on deferred close file error
Rob Clark [Mon, 17 Mar 2025 15:00:06 +0000 (08:00 -0700)]
drm/msm/a6xx+: Don't let IB_SIZE overflow
IB_SIZE is only b0..b19. Starting with a6xx gen3, additional fields
were added above the IB_SIZE. Accidentially setting them can cause
badness. Fix this by properly defining the CP_INDIRECT_BUFFER packet
and using the generated builder macro to ensure unintended bits are not
set.
v2: add missing type attribute for IB_BASE
v3: fix offset attribute in xml
Reported-by: Connor Abbott <cwabbott0@gmail.com> Fixes: a83366ef19ea ("drm/msm/a6xx: add A640/A650 to gpulist") Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/643396/
Merge tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix hypercall detection on Xen guests
- Extend the AMD microcode loader SHA check to Zen5, to block loading
of any unreleased standalone Zen5 microcode patches
- Add new Intel CPU model number for Bartlett Lake
- Fix the workaround for AMD erratum 1054
- Fix buggy early memory acceptance between SEV-SNP guests and the EFI
stub
* tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/sev: Avoid shared GHCB page for early memory acceptance
x86/cpu/amd: Fix workaround for erratum 1054
x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores
x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches
x86/xen: Fix __xen_hypercall_setfunc()
Merge tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix a lockdep false positive in the i8253 driver"
* tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/i8253: Call clockevent_i8253_disable() with interrupts disabled
Merge tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fixes from Ingo Molnar:
"Miscellaneous fixes and a hardware-enabling change:
- Fix Intel uncore PMU IIO free running counters on SPR, ICX and SNR
systems
- Fix Intel PEBS buffer overflow handling
- Fix skid in Intel PEBS sampling of user-space general purpose
registers
- Enable Panther Lake PMU support - similar to Lunar Lake"
* tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add Panther Lake support
perf/x86/intel: Allow to update user space GPRs from PEBS records
perf/x86/intel: Don't clear perf metrics overflow bit unconditionally
perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR
perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX
perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR
Merge tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc irq fixes from Ingo Molnar:
- Fix BCM2712 irqchip driver Kconfig dependencies required on the
Raspberry PI5
- Fix spurious interrupts on RZ/G3E SMARC EVK systems
- Fix crash regression on Sun/NIU hardware
- Apply MSI driver quirk for Sun Neptune chips
* tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabled
irqchip/renesas-rzv2h: Prevent TINT spurious interrupt
net/niu: Niu requires MSIX ENTRY_DATA fields touch before entry reads
PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads
Merge tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
"Fix a genksyms related bug, triggered by recent changes to the percpu
code, and update the .clang-format file to not include obsolete
function names"
* tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genksyms: Handle typeof_unqual keyword and __seg_{fs,gs} qualifiers
clang-format: Update the ForEachMacros list for v6.15-rc1
Merge tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- lib/prime_numbers: KUnit test should not select PRIME_NUMBERS (Geert
Uytterhoeven)
- ubsan: Fix panic from test_ubsan_out_of_bounds (Mostafa Saleh)
- ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP (Nathan
Chancellor)
- string: Add load_unaligned_zeropad() code path to sized_strscpy()
(Peter Collingbourne)
- kasan: Add strscpy() test to trigger tag fault on arm64 (Vincenzo
Frascino)
- Disable GCC randstruct for COMPILE_TEST
* tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/prime_numbers: KUnit test should not select PRIME_NUMBERS
ubsan: Fix panic from test_ubsan_out_of_bounds
lib/Kconfig.ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP
hardening: Disable GCC randstruct for COMPILE_TEST
kasan: Add strscpy() test to trigger tag fault on arm64
string: Add load_unaligned_zeropad() code path to sized_strscpy()
Merge tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fix from Bartosz Golaszewski:
- check for both the new AND old (deprecated) setter callback when
changing GPIO direction to output
* tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Allow to use setters with return value for output-only gpios
Merge tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Add missing DVFS support flags for the Lunar Lake and Panther Lake
platforms to the int340x Intel thermal driver and fix DLVR support
for Panther Lake in it (Srinivas Pandruvada)"
* tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: Fix Panther Lake DLVR support
thermal: intel: int340x: Add missing DVFS support flags
Merge tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These are mostly cpufreq fixes, some of which address recent
regressions and some address older issues that have come to light
during the last two weeks, and a runtime PM documentation correction:
- Fix the performance-to-frequency scaling factor computation on
systems using HWP in the intel_pstate driver after a recent
incorrect update of it (Rafael Wysocki)
- Fix the usage of the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag
in the schedutil cpufreq governor after a recent update of it that
has caused frequency limits changes to be missed sometimes (Rafael
Wysocki)
- Address some recently discovered synchronization issues related to
frequency limits changes in the schedutil cpufreq governor and in
the cpufreq core (Rafael Wysocki)
- Fix ITMT support in the amd-pstate cpufreq driver so that it is
enabled after asym priorities have been correctly initialized for
all CPUs (K Prateek Nayak)
- Fix changing min/max limits in the amd-pstate cpufreq driver while
on the performance governor (Dhananjay Ugwekar)
- Fix a function name in the runtime PM documentation that was
previously incorrectly updated by mistake (Sakari Ailus)"
* tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid using inconsistent policy->min and policy->max
cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
cpufreq/sched: Explicitly synchronize limits_changed flag handling
cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()
cpufreq: intel_pstate: Fix hwp_get_cpu_scaling()
cpufreq/amd-pstate: Enable ITMT support after initializing core rankings
cpufreq/amd-pstate: Fix min_limit perf and freq updation for performance governor
Merge tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for an issue where C instructions ended up in non-C builds, due
to some broken inline assembly in the KGDB breakpoint insertion code
- A fix to avoid spurious printk messages about misaligned access
performance probing
- A fix for a handful of issues with /proc/iomem's reserved region
handling
- A pair of fixes for module relocation processing
- A few build-time fixes
* tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break
riscv: KGDB: Do not inline arch_kgdb_breakpoint()
riscv: Avoid fortify warning in syscall_get_arguments()
riscv: Provide all alternative macros all the time
riscv: module: Allocate PLT entries for R_RISCV_PLT32
riscv: module: Fix out-of-bounds relocation access
riscv: Properly export reserved regions in /proc/iomem
riscv: Fix unaligned access info messages
riscv: Avoid fortify warning in syscall_get_arguments()
Documentation: riscv: Fix typo MIMPLID -> MIMPID
riscv: Use kvmalloc_array on relocation_hashtable
Merge tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"Fixes dynevent_limitations.tc test failure on dash by detecting and
handling bash and dash differences in evaluating \\"
* tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc
Merge patch series "riscv: misaligned: Add ZCB handling and fix sleeping function"
Nylon Chen <nylon.chen@sifive.com> says:
1. Adds support for ZCB compressed instructions (C.LHU, C.LH, C.SH).
2. Fixes a bug where copy_from/to_user() calls in non-sleepable contexts
triggered attempts to sleep.
Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Nylon Chen nylon.chen@sifive.com
Nylon Chen (2):
riscv: misaligned: Add handling for ZCB instructions
riscv: misaligned: fix sleeping function called during misaligned
access handling
* b4-shazam-merge:
riscv: misaligned: fix sleeping function called during misaligned access handling
riscv: misaligned: Add handling for ZCB instructions
Merge tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix integer overflow in server disconnect deadtime calculation
- Three fixes for potential use after frees: one for oplocks, and one
for leases and one for kerberos authentication
- Fix to prevent attempted write to directory
- Fix locking warning for durable scavenger thread
* tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Prevent integer overflow in calculation of deadtime
ksmbd: fix the warning from __kernel_write_iter
ksmbd: fix use-after-free in smb_break_all_levII_oplock()
ksmbd: fix use-after-free in __smb2_lease_break_noti()
ksmbd: fix WARNING "do not call blocking ops when !TASK_RUNNING"
ksmbd: Fix dangling pointer in krb_authenticate
Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- MD pull via Yu:
- fix raid10 missing discard IO accounting (Yu Kuai)
- fix bitmap stats for bitmap file (Zheng Qixing)
- fix oops while reading all member disks failed during
check/repair (Meir Elisha)
- NVMe pull via Christoph:
- fix scan failure for non-ANA multipath controllers (Hannes
Reinecke)
- fix multipath sysfs links creation for some cases (Hannes
Reinecke)
- PCIe endpoint fixes (Damien Le Moal)
- use NULL instead of 0 in the auth code (Damien Le Moal)
- Various ublk fixes:
- Slew of selftest additions
- Improvements and fixes for IO cancelation
- Tweak to Kconfig verbiage
- Fix for page dirtying for blk integrity mapped pages
* tag 'block-6.15-20250417' of git://git.kernel.dk/linux: (38 commits)
selftests: ublk: add generic_06 for covering fault inject
ublk: simplify aborting ublk request
ublk: remove __ublk_quiesce_dev()
ublk: improve detection and handling of ublk server exit
ublk: move device reset into ublk_ch_release()
ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io
ublk: add ublk_force_abort_dev()
ublk: properly serialize all FETCH_REQs
selftests: ublk: move creating UBLK_TMP into _prep_test()
selftests: ublk: add test_stress_05.sh
selftests: ublk: support user recovery
selftests: ublk: support target specific command line
selftests: ublk: increase max nr_queues and queue depth
selftests: ublk: set queue pthread's cpu affinity
selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN
selftests: ublk: add two stress tests for zero copy feature
selftests: ublk: run stress tests in parallel
selftests: ublk: make sure _add_ublk_dev can return in sub-shell
selftests: ublk: cleanup backfile automatically
selftests: ublk: add io_uring uapi header
...
Merge tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Correctly cap iov_iter->nr_segs for imports of registered buffers,
both kbuf and normal ones.
Three cleanups to make it saner first, then two fixes for each of the
buffer types.
This fixes a performance regression where partial buffer usage
doesn't trim the tail number of segments, leading the block layer to
iterate the IOs to check if it needs splitting.
- Two patches tweaking the newly introduced zero-copy rx API, mostly to
keep the API consistent once we add multiple interface queues per
ring support in the 6.16 release.
- zc rx unmapping fix for a dead device
* tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux:
io_uring/zcrx: fix late dma unmap for a dead dev
io_uring/rsrc: ensure segments counts are correct on kbuf buffers
io_uring/rsrc: send exact nr_segs for fixed buffer
io_uring/rsrc: refactor io_import_fixed
io_uring/rsrc: separate kbuf offset adjustments
io_uring/rsrc: don't skip offset calculation
io_uring/zcrx: add pp to ifq conversion helper
io_uring/zcrx: return ifq id to the user
x86/boot/sev: Avoid shared GHCB page for early memory acceptance
Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.
So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.
For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.
Sandipan Das [Fri, 18 Apr 2025 06:19:40 +0000 (11:49 +0530)]
x86/cpu/amd: Fix workaround for erratum 1054
Erratum 1054 affects AMD Zen processors that are a part of Family 17h
Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However,
when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected
processors was incorrectly changed in a way that the IRPerfEn bit gets
set only for unaffected Zen 1 processors.
Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This
includes a subset of Zen 1 (Family 17h Models 30h and above) and all
later processors. Also clear X86_FEATURE_IRPERF on affected processors
so that the IRPerfCount register is not used by other entities like the
MSR PMU driver.
Pavel Begunkov [Fri, 18 Apr 2025 12:02:27 +0000 (13:02 +0100)]
io_uring/zcrx: fix late dma unmap for a dead dev
There is a problem with page pools not dma-unmapping immediately when
the device is going down, and delaying it until the page pool is
destroyed, which is not allowed (see links). That just got fixed for
normal page pools, and we need to address memory providers as well.
Unmap pages in the memory provider uninstall callback, and protect it
with a new lock. There is also a gap between when a dma mapping is
created and the mp is installed, so if the device is killed in between,
io_uring would be holding on to dma mappings to a dead device with no
one to call ->uninstall. Move it to page pool init and rely on
->is_mapped to make sure it's only done once.
Lorenzo Stoakes [Wed, 16 Apr 2025 10:38:37 +0000 (11:38 +0100)]
MAINTAINERS: add section for locking of mm's and VMAs
We place this under memory mapping as related to memory mapping
abstractions in the form of mm_struct and vm_area_struct (VMA). Now we
have separated out mmap/vma locking logic into the mmap_lock.c and
mmap_lock.h files, so this should encapsulate the majority of the mm
locking logic in the kernel.
Suren is best placed to maintain this logic as the core architect of VMA
locking as a whole.
Link: https://lkml.kernel.org/r/e6ed679a184ca444b20dfa77af96913fd8b5efa0.1744799282.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Wed, 16 Apr 2025 13:45:40 +0000 (09:45 -0400)]
mm: vmscan: fix kswapd exit condition in defrag_mode
Vlastimil points out an issue with kswapd in defrag_mode not waking up
kcompactd reliably.
Background: When kswapd is woken for any higher-order request, it
initially checks those high-order watermarks to decide if work is
necesary. However, it cannot (efficiently) meet the contiguity goal of
such a request by itself. So once it has reclaimed a compaction gap, it
adjusts the request down to check for free order-0 pages, then wakes
kcompactd to coalesce them into larger blocks.
In defrag_mode, the initial watermark check needs to be analogously
against free pageblocks. However, once kswapd drops the high-order to
hand off contiguity work, it also needs to fall back to base page
watermarks - otherwise it'll keep reclaiming until blocks are freed.
While it appears kcompactd is woken up frequently enough to do most of the
compaction work, kswapd ends up overreclaiming by quite a bit:
Johannes Weiner [Wed, 16 Apr 2025 13:45:39 +0000 (09:45 -0400)]
mm: vmscan: restore high-cpu watermark safety in kswapd
Vlastimil points out that commit a211c6550efc ("mm: page_alloc:
defrag_mode kswapd/kcompactd watermarks") switched kswapd from
zone_watermark_ok_safe() to the standard, percpu-cached version of reading
free pages, thus dropping the watermark safety precautions for systems
with high CPU counts (e.g. >212 cpus on 64G). Restore them.
Since zone_watermark_ok_safe() is no longer the right interface, and this
was the last caller of the function anyway, open-code the
zone_page_state_snapshot() conditional and delete the function.
Lorenzo Stoakes [Wed, 16 Apr 2025 13:53:01 +0000 (14:53 +0100)]
MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
Pedro has offered to review memory mapping code. He has good experience
in this area and has provided excellent feedback on memory mapping series
in the past so I feel he'll be a great addition.
Link: https://lkml.kernel.org/r/20250416135301.43513-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
In __folio_remove_rmap() for RMAP_LEVEL_PMD/RMAP_LEVEL_PUD and with
CONFIG_PAGE_MAPCOUNT we first decrement the folio mapcount (and recompute
mapped shared vs. mapped exclusively) to then adjust the entire mapcount.
This means that another process might stumble in do_wp_page() over a
PTE-mapped PMD folio that is indicated as "exclusively mapped", but still
has an entire mapcount (PMD mapping), because it is racing with the
process that is unmapping the folio (PMD mapping). Note that do_wp_page()
will back off once it detects the remaining folio reference from the
process that is in the process of unmapping the folio.
This will trigger the early VM_WARN_ON_ONCE(folio_entire_mapcount(folio))
check in do_wp_page(), that can easily be reproduced by looping a couple
of times over allocating a PMD THP, forking a child where we immediately
unmap it again, and writing in the parent concurrently to the THP.
While we could adjust the sequence in __folio_remove_rmap(), let's rater
move the mapcount sanity checks after the mapcount vs. refcount
stabilization phase. With this fix, a simple reproducer is happy.
While at it, convert the two VM_WARN_ON_ONCE() we are moving to
VM_WARN_ON_ONCE_FOLIO().
Link: https://lkml.kernel.org/r/20250415095007.569836-1-david@redhat.com Fixes: 1da190f4d0a6 ("mm: Copy-on-Write (COW) reuse support for PTE-mapped THP") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+5e8feb543ca8e12e0ede@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/67fab4fe.050a0220.2c5fcf.0011.GAE@google.com Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>