s390/ap: Restrict driver_override versus apmask and aqmask use
Introduce a restriction for the driver_override feature versus apmask
and aqmask:
- driver_override is only allowed when the apmask and aqmask values
both are default (=0xffff..ffff).
- apmask and aqmask modifications are only allowed when there is no
driver_override on any AP device active.
So in the end the user is restricted to choose to either use
apmask/apmask to divide the AP devices into host owned and vfio owned
or use the driver_override feature but not mix these two approaches.
s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
The mutex ap_perms_mutex was already used not only for protection
of the struct ap_perms ap_perms variable but also for an consistent
update of the AP bus sysfs attributes apmask and aqmask.
So rename this mutex to ap_attr_mutex which better reflects the
current use. This is also a preparation for an upcoming patch which
will use this mutex to lock updates on a new sysfs attribute.
s390/ap: Support driver_override for AP queue devices
Add a new sysfs attribute driver_override the AP queue's
directory. Writing in a string overrides the default driver
determination and the drivers are matched against this string
instead. This overrules the driver binding determined by the
apmask/aqmask bitmask fields.
According to the common understanding of how the driver_override
behavior shall work, there is no further checking done. Neither about
the string which is given as override driver nor if this device is
currently in use by an mdev device. Another patch may limit this
behavior to refuse a mixed usage of the driver_override and
apmask/aqmask feature.
As there exists some tooling for this kind of driver_override
(see package driverctl) the AP bus behavior for re-binding
should be compatible to this. The steps for a driver_override are:
1) unbind the current driver from the device. For example
echo "17.0005" > /sys/devices/ap/card17/17.0005/driver/unbind
2) set the new driver for this device in the sysfs
driver_override attribute. For example
echo "vfio_ap" > /sys//devices/ap/card17/17.0005/driver_override
3) trigger a bus reprobe of this device. For example
echo "17.0005" > /sys/bus/ap/drivers_probe
With the driverctl package this is more comfortable and
the settings get persisted:
driverctl -b ap set-override 17.0005 vfio_ap
and unset with
driverctl -b ap unset-override 17.0005
s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
For the in_use() check of an updated apmask the host's aqmask
was provided to the vfio function. Similar on an update of the
aqmask the host's apmask was provided to the vfio in_use()
function. This led to false results on the check for apmask or
aqmask updates. For example with only one APQN when exactly
this card is tried to be re-assigned back to the host, the
in_use() check did not complain.
The correct behavior is achieved with providing a full mask
for aqmask when an adapter is to be checked and similar a full
mask for aqmask when a domain is to be checked for usage.
s390/debug: Update description of resize operation
With commit 1204777867e8 ("s390/debug: keep debug data on resize")
the behavior of a debug area resize operation was changed. Update the
associated documentation to reflect this change.
Fixes: 1204777867e8 ("s390/debug: keep debug data on resize") Reported-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Mon, 17 Nov 2025 10:11:48 +0000 (11:11 +0100)]
Merge branch 'compat-removal'
Heiko Carstens says:
====================
Remove s390 compat support to allow for code simplification and especially
reduced test effort. To the best of our knowledge there aren't any 31 bit
binaries out in the world anymore that would matter for newer kernels or
newer distributions.
Distributions do not provide compat packages since quite some time or even
have CONFIG_COMPAT disabled.
Instead of adding deprecation warnings to config option, or adding kernel
messages, just remove the code. Deprecation warnings haven't proven to be
useful. If it turns out there is still a reason to keep the compat support
this series can be reverted at any time in the future.
Heiko Carstens [Mon, 10 Nov 2025 18:54:40 +0000 (19:54 +0100)]
s390/syscalls: Switch to generic system call table generation
The s390 syscall.tbl format differs slightly from most others, and
therefore requires an s390 specific system call table generation
script.
With compat support gone use the opportunity to switch to generic
system call table generation. The abi for all 64 bit system calls is
now common, since there is no need to specify if system call entry
points are only for 64 bit anymore.
Furthermore create the system call table in C instead of assembler
code in order to get type checking for all system call functions
contained within the table.
Heiko Carstens [Mon, 10 Nov 2025 18:54:39 +0000 (19:54 +0100)]
s390/syscalls: Remove system call table pointer from thread_struct
With compat support gone there is only one system call table
left. Therefore remove the sys_call_table pointer from
thread_struct and use the sys_call_table directly.
Heiko Carstens [Mon, 10 Nov 2025 18:54:35 +0000 (19:54 +0100)]
s390/syscalls: Add pt_regs parameter to SYSCALL_DEFINE0() syscall wrapper
All system call wrappers should match the sys_call_ptr_t type. This is not
the case for system calls without parameters. Add the missing pt_regs
parameter there too.
Note: this is currently not a problem, since the parameter is unused.
However it prevents to create a correctly typed system call table in
C. With the current assembler implementation this works because of
missing type checking.
Heiko Carstens [Mon, 10 Nov 2025 18:54:34 +0000 (19:54 +0100)]
s390/kvm: Use psw32_t instead of psw_compat_t
kvm_s390_handle_lpsw() make use of the psw_compat_t type even though
the code has nothing to do with CONFIG_COMPAT, for which the type is
supposed to be used. Use psw32_t instead.
Heiko Carstens [Tue, 4 Nov 2025 10:48:57 +0000 (11:48 +0100)]
s390/fault: Print unmodified PSW address on protection exception
In case of a kernel crash caused by a protection exception, print the
unmodified PSW address as reported by the CPU. The protection exception
handler modifies the PSW address in order to keep fault handling easy,
however that leads to misleading call traces.
Therefore restore the original PSW address before printing it.
Before this change the output in case of a protection exception looks like
this:
Note that with this change the PSW address points to the instruction behind
the instruction which caused the exception like it is expected for
protection exceptions.
This also replaces the '#' marker in the disassembly with '*', which allows
to distinguish between new and old behavior.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Tue, 4 Nov 2025 10:48:56 +0000 (11:48 +0100)]
s390/uprobes: Use __forward_psw() instead of private implementation
With adjust_psw_addr() the uprobes code contains more or less a private
__forward_psw() implementation. Switch it to use __forward_psw(), and
remove adjust_psw_addr().
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Tue, 4 Nov 2025 10:48:55 +0000 (11:48 +0100)]
s390/processor: Add __forward_psw() helper
Similar to __rewind_psw() add the counter part __forward_psw(). This
helps to make code more readable if a PSW address has to be forwarded,
since it is more natural to write
addr = __forward_psw(psw, ilen);
instead of
addr = __rewind_psw(psw, -ilen);
This renames also the ilc parameter of __rewind_psw() to ilen, since
the parameter reflects an instruction length, and not an instruction
length code. Also change the type of ilen from unsigned long to long
so it reflects that lengths can be negative or positive.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390/fpu: Fix false-positive kmsan report in fpu_vstl()
A false-positive kmsan report is detected when running ping command.
An inline assembly instruction 'vstl' can write varied amount of bytes
depending on value of 'index' argument. If 'index' > 0, 'vstl' writes
at least 2 bytes.
clang generates kmsan write helper call depending on inline assembly
constraints. Constraints are evaluated compile-time, but value of
'index' argument is known only at runtime.
clang currently generates call to __msan_instrument_asm_store with 1 byte
as size. Manually call kmsan function to indicate correct amount of bytes
written and fix false-positive report.
Thomas Richter [Thu, 6 Nov 2025 13:23:33 +0000 (14:23 +0100)]
s390/pai: Calculate size of reserved PAI extension control block area
The PAI extension 1 control block area is 512 bytes in total.
It currently contains three address pointer which refer to counter
memory blocks followed by a reserved area.
Calculate the reserved area instead of hardcoding its size. This
makes the code more readable and maintainable.
No functional chance.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Suggested-by: Jan Polensky <japo@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Mon, 3 Nov 2025 15:25:33 +0000 (16:25 +0100)]
s390/mm: Let dump_fault_info() print additional information
Let dump_fault_info() print additional information to make debugging
easier:
Print "FSI" if the access-exception-fetch/store-indication facility is
installed. If it is installed the TEID may also indicate if an exception
happened because of a fetch or a store operation.
Print "SOP", "ESOP-1", or "ESOP-2" depending on the type of the installed
Suppression-on-Protection facility. This also gives additional information
about the validity and meaning of the TEID bits.
Heiko Carstens [Mon, 3 Nov 2025 15:25:32 +0000 (16:25 +0100)]
s390/mm: Change comment and die() message if teid.b61 is zero
The comments in do_protection() give the impression that a TEID, where bit
61 is zero, indicates a low address protection exception. This is not
necessarily true, and it depends on the type of Suppression-on-Protection
facility of the machine (see Princples of Operation) what this means.
Rework the comments and the die() message to reflect this. This may also
help to avoid confusion.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Fri, 14 Nov 2025 10:30:50 +0000 (11:30 +0100)]
Merge branch 'pai-pmu-merge'
Thomas Richter says:
====================
The PAI PMUs pai_crypto and pai_ext both operate on memory
mapped counters supported by z16 and follow on machines.
These memory mapped counters have a lot in common, like:
- validation, installing and removing events
- starting and stopping events
- retrieving counter values
- collecting sample data.
However both PMU drivers have slightly different parameters,
for example:
- different mapped memory size
- different number of supported counters
- different counter numbers and names
- different bits in the CR0 register
- different anchor address in lowcore
Due to these different parameters, two independent
PMUs have been developed. However both PMU drivers
have very much in common and most of the PMU call back
functions look very similar and are sometimes identical.
This patch set combines both independent PMU device drivers
perf_pai_crypto.c and per_pai_ext.c into one device driver.
The new device driver operations on a table which contains
the different parameters and uses common functions for
event operations.
Result is one PAI PMU driver which supports both PMUs.
It is also extendable to support new PAI PMUs.
Thomas Richter [Wed, 5 Nov 2025 14:39:02 +0000 (15:39 +0100)]
s390/pai: Rename perf_pai_crypto.c to perf_pai.c
Rename perf_pai_crypto.c to perf_pai.c. The new perf_pai.c
contains both PAI device drivers:
- pai_crypto for PAI crypto counter set
- pai_ext for PAI NNPA counter set
The rename reflects this common driver supporting both PMUs.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:39:01 +0000 (15:39 +0100)]
s390/pai_crypto: Merge pai_ext PMU into pai_crypto
Combine PAI cryptography and PAI extension (NNPA) PMUs in one driver.
Remove file perf_pai_ext.c and registration of PMU "pai_ext"
from perf_pai_crypto.c.
Includes:
- Shared alloc/free and sched_task handling
- NNPA events with exclude_kernel enforced, exclude_user rejected
- Setup CR0 bits for both PMUs
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:59 +0000 (15:38 +0100)]
s390/pai_crypto: Make pai_root per-PMU and unify naming
Prepare the common PAI PMU driver to handle multiple PMUs.
Convert pai_root into an array indexed by PAI_PMU_IDX(event)
so that per-CPU state becomes per-PMU. Adjust all call sites
accordingly. Rename KMSG_COMPONENT and the s390dbf buffer from
"pai_crypto" to "pai" for consistent naming.
No functional change intended beyond log identifiers.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:58 +0000 (15:38 +0100)]
s390/pai_crypto: Rename paicrypt_copy() to pai_copy()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename paicrypt_copy() to pai_copy() to indicate its common usage.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:57 +0000 (15:38 +0100)]
s390/pai_crypto: Add common pai_del() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Add a common usable function pai_stop() for the event on a CPU.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:56 +0000 (15:38 +0100)]
s390/pai_crypto: Add common pai_stop() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Add a common usable function pai_stop() for the event on a CPU.
Call this common pai_stop() from paicrypt_del().
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:55 +0000 (15:38 +0100)]
s390/pai_crypto: Add common pai_add() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Add a common usable function pai_add() for the event on a CPU.
Call this common pai_add() from paicrypt_add().
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:54 +0000 (15:38 +0100)]
s390/pai_crypto: Add common pai_start() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Add a common usable function pai_start() to the event on a CPU.
The function expects a PAI PMU specific read function as second
parameter to read out the start value for an event.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:53 +0000 (15:38 +0100)]
s390/pai_crypto: Add common pai_read() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Add a common usable function pai_read() to read counter values.
The function expects a PAI PMU specific read function as second
parameter.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:52 +0000 (15:38 +0100)]
s390/pai_crypto: Unify sample push logic and update context handling
Unify naming and logic for PAI PMU drivers to support both PMUs
pai_crypto and pai_ext.
Rename paicrypt_push_sample() to pai_push_sample() to reflect
its common usage. Add detailed comments about invocation context
and scheduler callbacks. Use struct pai_pmu to determine area_size
instead of PAGE_SIZE for counter backup.
Remove obsolete variable paicrypt_cnt.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:51 +0000 (15:38 +0100)]
s390/pai_crypto: Rename paicrypt_have_samples() to pai_have_samples()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename paicrypt_have_samples() to pai_have_samples() to reflect
its common usage. No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:50 +0000 (15:38 +0100)]
s390/pai_crypto: Rename paicrypt_getctr() to pai_getctr()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename paicrypt_getctr() to pai_getctr() to reflect is common
purpose. pai_getctr() now uses pai_pmu table to extract PAI PMU
characteristics such as kernel_offset inside the counter area page.
Also rename paicrypt_have_sample() to pai_have_sample().
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:49 +0000 (15:38 +0100)]
s390/pai_crypto: Rename paicrypt_getdata() to pai_getdata()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename paicrypt_getdata() to pai_getdata(). Use the PAI PMU
characteristics in the pai_pmu table to determine the number
of counters to be extracted.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:48 +0000 (15:38 +0100)]
s390/pai_crypto: Rename some function for common usage.
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename functions
- paicrypt_free() -> pai_free()
- paicrypt_destroy_event() -> pai_destroy_event()
- paicrypt_destroy_event_cpu() -> pai_destroy_event_cpu()
to reflect their future common usage.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:47 +0000 (15:38 +0100)]
s390/pai_crypto: Introduce generic event init using pai_pmu[]
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rework PAI crypto event initialization. Add a common
function for event initialization. It uses the PAI characteristics
stored in the pai_pmu table instead of hardcoded values.
Enlarge pai_event_valid() to check all event validation aspects.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:46 +0000 (15:38 +0100)]
s390/pai_crypto: Add PAI crypto characteristics table for parameters
Create and add a PMU characteristics table to store the parameters
of the PAI crypto PMU. This table contains PMU details such as
- number of available counters
- name of these counters to export to /sysfs
- Size of the memory mapped counter area
- base number of first counter
- etc
Also define a PMU specific initialization function to be called when
a PAI PMU feature is supported. At device driver initialization
test these features and if available use instruction qpaci to
retrieve the number of available counters. Also export these counter
names to /sysfs and register this PMU.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:45 +0000 (15:38 +0100)]
s390/pai_crypto: Rename paicrypt_root_alloc() and paicrypt_root_free()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename functions paicrypt_root_alloc() and paicrypt_root_free()
to pai_root_alloc() and pai_root_free().
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:44 +0000 (15:38 +0100)]
s390/pai_crypto: Rename structure paicrypt_root
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename structure paicrypt_root to pai_root.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:43 +0000 (15:38 +0100)]
s390/pai_crypto: Rename structure paicrypt_map to pai_map
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename structure paicrypt_map to pai_map.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:42 +0000 (15:38 +0100)]
s390/pai_crypto: Rename structure paicrypt_mapptr to pai_mapptr
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename structure paicrypt_mapptr to pai_mapptr.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 5 Nov 2025 14:38:41 +0000 (15:38 +0100)]
s390/pai_crypto: Rename member paicrypt_map::page
Rename member page in struct paicrypt_map to area. This rename
creates consistent naming for both PMU drivers paicrypto and PMU
paiext. No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390/sclp_mem: Consider global memory_hotplug.memmap_on_memory setting
When the global kernel command line parameter
memory_hotplug.memmap_on_memory is set to false, per-memory-block
memmap_on_memory setting can still be set to true. However, when
configuring memory block, add_memory_resource() would configure it
without memmap_on_memory.
i.e.
Even if the MHP_MEMMAP_ON_MEMORY flag is set,
mhp_supports_memmap_on_memory() returns false unless the kernel command
line parameter "memory_hotplug.memmap_on_memory" is enabled. When both
the flag and the cmdline parameter are set, the memory block can be
configured with or without memmap_on_memory support.
To ensure consistent behavior, permit configuring per-memory-block
memmap_on_memory only when the memory_hotplug.memmap_on_memory kernel
command line parameter is enabled.
This is similar to commit 73954d379efd ("dax: add a sysfs knob to
control memmap_on_memory behavior")
Fixes: ff18dcb19aab ("s390/sclp: Add support for dynamic (de)configuration of memory") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Mete Durlu [Wed, 5 Nov 2025 11:15:34 +0000 (12:15 +0100)]
s390/hiperdispatch: Decrease steal time threshold
Higher steal time thresholds favor low utilization scenarios, which is not
the common case for s390. Set steal time threshold to a lower value to
prioritize vertical high and medium CPUs sooner and allow high utilization
scenarios to benefit from it.
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thorsten Blum [Thu, 30 Oct 2025 13:42:41 +0000 (14:42 +0100)]
s390/smp: Mark pcpu_delegate() and smp_call_ipl_cpu() as __noreturn
pcpu_delegate() never returns to its caller. If the target CPU is the
current CPU, it calls __pcpu_delegate(), whose delegate function is not
supposed to return. In any case, even if __pcpu_delegate() unexpectedly
returns, pcpu_delegate() sends SIGP_STOP to the current CPU and waits
in an infinite loop. Annotate pcpu_delegate() with the __noreturn
attribute to improve compiler optimizations.
Also annotate smp_call_ipl_cpu() accordingly since it always calls
pcpu_delegate().
Thorsten Blum [Mon, 27 Oct 2025 08:47:25 +0000 (09:47 +0100)]
s390/nmi: Annotate s390_handle_damage() with __noreturn
s390_handle_damage() ends by calling the non-returning function
disabled_wait() and therefore also never returns. Annotate it with the
__noreturn compiler attribute to improve compiler optimizations.
Heiko Carstens [Thu, 6 Nov 2025 13:14:00 +0000 (14:14 +0100)]
Merge branch 'dat-enhancement-1'
Heiko Carstens says:
====================
Add the Dat-Enhancement facility 1 to the list of facilities which are
required to start the kernel. The facility provides the CSPG and IDTE
instructions. In particular the CSPG instruction can be used to replace a
valid page table entry with a different page table entry, which also
differs in the page frame real address.
Without the CSPG instruction it is possible to use the CSP instruction to
change valid page table entries, however it only allows to change the lower
or higher 32 bits of such entries, which means it cannot be used to change
the page frame real address of valid page table entries.
Given that there is code around (e.g. HugeTLB vmemmap optimization) which
requires to change valid page table entries of the kernel mapping, without
the detour over an invalid page table entry, make the CSPG instruction
unconditionally available.
The Dat-Enhancement facility 1 is available since z990, which is older than
the currently supported minimum architecture (z10). Therefore adding this
the architecture level set shouldn't cause any problems.
Heiko Carstens [Mon, 3 Nov 2025 13:39:28 +0000 (14:39 +0100)]
s390/mm: Replace the CSP instruction with CSPG
The CSPG instruction is part of the Dat-Enhancement facility 1, which
is always available. Given that it can be used everywhere where also
the CSP instruction can be used, replace CSP with CSPG everywhere.
This allows to remove the csp() inline assembly. Also remove the
unused gmap_pmdp_csp() function.
Heiko Carstens [Mon, 3 Nov 2025 13:39:27 +0000 (14:39 +0100)]
s390/mm: Remove cpu_has_idte()
Remove cpu_has_idte(). The IDTE instruction is part of the
Dat-Enhancement facility 1, which is always available.
Therefore remove the helper and now superfluous code.
Heiko Carstens [Mon, 3 Nov 2025 13:39:26 +0000 (14:39 +0100)]
s390: Add Dat-Enhancement facility 1 to architecture level set
Add the Dat-Enhancement facility 1 to the list of facilities which are
required to start the kernel. The facility provides the CSPG and IDTE
instructions. In particular the CSPG instruction can be used to replace a
valid page table entry with a different page table entry, which also
differs in the page frame real address.
Without the CSPG instruction it is possible to use the CSP instruction to
change valid page table entries, however it only allows to change the lower
or higher 32 bits of such entries, which means it cannot be used to change
the page frame real address of valid page table entries.
Given that there is code around (e.g. HugeTLB vmemmap optimization) which
requires to change valid page table entries of the kernel mapping, without
the detour over an invalid page table entry, make the CSPG instruction
unconditionally available.
The Dat-Enhancement facility 1 is available since z990, which is older than
the currently supported minimum architecture (z10). Therefore adding this
to the architecture level set shouldn't cause any problems.
Jens Remus [Thu, 23 Oct 2025 09:23:58 +0000 (11:23 +0200)]
s390/ptrace: Explicitly include <linux/typecheck.h>
The psw_bits() macro makes use of typecheck() without that typecheck.h
is included. Add the missing include to avoid potential future compile
problems.
s390/ap: Expose ap_bindings_complete_count counter via sysfs
The AP bus udev event BINDINGS=complete is sent out when the
first time all devices detected by the AP bus scan have been
bound to device drivers. This is the ideal time to for example
change the AP bus masks apmask and aqmask to re-establish a
persistent change on the decision about which cards/domains
should be available for the host and which should go into the
pool for kvm guests.
However, if exactly this initial udev event is sent out early
in the boot process a udev rule may not have been established
yet and thus this event will never be recognized. To have
some indication about if the AP bus binding complete has
already happened, the internal ap_bindings_complete_count
counter is exposed via sysfs with this patch.
Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Mon, 20 Oct 2025 14:17:54 +0000 (16:17 +0200)]
s390/smp: Fix fallback CPU detection
In case SCLP CPU detection does not work a fallback mechanism using SIGP is
in place. Since a cleanup this does not work correctly anymore: new CPUs
are only considered if their type matches the boot CPU.
Before the cleanup the information if a CPU type should be considered was
also part of a structure generated by the fallback mechanism and indicated
that a CPU type should not be considered when adding CPUs.
Since the rework a global SCLP state is used instead. If the global SCLP
state indicates that the CPU type should be considered and the fallback
mechanism is used, there may be a mismatch with CPU types if CPUs are
added. This can lead to a system with only a single CPU even tough there
are many more CPUs.
Address this by simply copying the boot cpu type into the generated data
structure from the fallback mechanism.
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Fixes: d08d94306e90 ("s390/smp: cleanup core vs. cpu in the SCLP interface") Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Gerd Bayer [Tue, 21 Oct 2025 08:57:16 +0000 (10:57 +0200)]
s390/pci: Highlight failure to enable PCI function
Emit an error log when a PCI function cannot be enabled for use, despite
being reported as configured to the system.
This brings to attention situations where functions might go missing
without notice. Going unnoticed is less likely when functions are added
to the system through hotplug, but will produce the same error log.
Heiko Carstens [Tue, 21 Oct 2025 09:10:55 +0000 (11:10 +0200)]
Merge branch 'ap-bus-trace-events'
Harald Freudenberger says:
====================
Investigations related to runtime of crypto requests has revealed a lack of
performance or runtime information with crypto requests. There are the two
zcrypt ioctl trace events covering the entry and exit of an ioctl with
crypto requests giving the overall runtime within the kernel. However,
there is no way to figure out the time where a request is enqueued into the
AP bus queue but not pushed into the firmware queue. Then there is no
information about the runtime of an request during processing in the
firmware. And finally some info about pulling the reply from the firmware
and delivering it into user space is missing.
This series is aiming to provide a way to collect measurements which can be
used to cover these runtime information for each crypto request/reply.
s390/ap: Introduce new AP nqap and dqap trace events
Introduce two new AP bus related tracepoint events:
- There is a tracepoint s390_ap_nqap event immediately after a request
has been pushed into the AP firmware queue with the NQAP AP command.
- The other tracepoint s390_ap_dqap event fires immediately after a
reply has been pulled out of the AP firmware queue via DQAP AP
command.
Both events are triggered unconditional and may need filtering.
Filtering can be done based on the status value which is part of
the nqap and dqap trace. So for example a
echo "!(status & 0x00ff0000)" >.../s390_ap_dqap/filter
filters out all trace events which have a response_code != 0
leaving just the successful nqap and dqap invocations.
The idea of these two trace events focuses on performance to measure
the runtime of a crypto request/reply as close as possible at the
firmware level. In combination with the two zcrypt tracepoints (see
the zcrypt.h trace event definition file) this gives measurement data
about the runtime of a request/reply within the zcrpyt and AP bus
layer. However, with having the status of these AP commands in hand
also other usage may be possible.
s390/ap: Extend struct ap_queue_status with some convenience fields
Sometimes there is a different view of the AP status word
needed. So here is slight rework of the struct ap_queue_status
to open up the possibility to have different ways of accessing
the AP status bits and fields.
The new struct ap_queue_status
struct ap_queue_status {
union {
unsigned int value : 32;
struct {
unsigned int status_bits : 8;
unsigned int rc : 8;
unsigned int : 16;
};
struct {
unsigned int queue_empty : 1;
unsigned int replies_waiting : 1;
unsigned int queue_full : 1;
unsigned int : 3;
unsigned int async : 1;
unsigned int irq_enabled : 1;
unsigned int response_code : 8;
unsigned int : 16;
};
};
};
comprises the old struct ap_queue_status but extends it
to have this also accessible as an unsigned int required
for example for a simple print or trace of the whole value.
Note that this rework is fully backward compatible to the
existing code exploiting the struct ap_queue_status.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390/zcrypt: Rework zcrypt request and reply trace event definition
This is a slight rework of the s390_zcrypt_req and s390_zcrypt_rep
trace event:
- the psmid has been added to the s390_zcrypt_rep
- "dev" renamed to "card"
- "domain" renamed to "dom"
The motivation of these changes is to make these traces more
aligned to new upcoming traces for AP bus related trace events.
Additionally the psmid is needed to match the reply (and thus
indirect the request) to AP bus related trace events where only
the psmid is unique identifying AP messages.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390/ptdump: Use seq_puts() in pt_dump_seq_puts() macro
The pt_dump_seq_puts() macro incorrectly uses seq_printf() instead of
seq_puts(). This is both a performance issue and conceptually wrong,
as the macro name suggests plain string output (puts) but the
implementation uses formatted output (printf).
The macro is used in dump_pagetables.c:67-68 and 131 to output
constant strings. Using seq_printf() adds unnecessary overhead for
format string parsing.
Heiko Carstens [Tue, 21 Oct 2025 08:26:26 +0000 (10:26 +0200)]
Merge branch 'tape-block-sizes'
Jan Höppner says:
====================
The tape device driver is limited to a block size of 65535 bytes since a
single CCW can only transfer up to 64K-1 bytes (The count field is a
16bit value). This series introduces data chaining for all read/write
functions to support block sizes larger than 65535.
The tape device type 3490 (emulated) and 3590/3592 can handle up to
256K. [1]
Jan Höppner [Thu, 16 Oct 2025 07:47:18 +0000 (09:47 +0200)]
s390/tape: Add support for bigger block sizes
The tape device type 3590/3592 and emulated 3490 VTS can handle a block
size of up to 256K bytes. Currently the tape device driver is limited to
a block size of 65535 bytes (64K-1). This limitation stems from the
maximum of 65535 bytes of data that can be transferred with one
Channel-Command Word (CCW).
To work around this limitation data chaining is used which uses several
CCW to transfer an entire 256K block of data. A single CCW holds a
maximum of 65535 bytes of data.
Set MAX_BLOCKSIZE to 262144 (= 256K) to allow for data transfers with
larger block sizes. The read_block() and write_block() discipline
functions calculate the number of CCWs required based on the IDAL buffer
array size that was created for a given block size. If there is more
than one CCW required for the data transfer, the new helper function
tape_ccw_dc_idal() is used to build the data chain accordingly.
The Interruption-Repsonse Block (irb) is added to the tape_request
struct so that the tapechar_read/write() functions can analyze what data
was read or written accordingly.
Jan Höppner [Thu, 16 Oct 2025 07:47:17 +0000 (09:47 +0200)]
s390/tape: Introduce idal buffer array
The tape device driver uses a single idal_buffer for I/O. While the
buffer itself can be arbitrary big, the limit for data transfer for a
single Channel-Command Word is at 65535 bytes (64K-1) since the count
field specifying the amount of data designated by the CCW is a 16-bit
unsigned value.
Provide functionality that allocates an array of multiple IDAL buffer
with the limitation mentioned above in mind.
A call to idal_buffer_array_alloc() allocates an array with a certain
amount of IDAL buffers which is determined based on the total size of
@size. Each individual buffer is limited to a size of CCW_MAX_BYTE_COUNT
(65535 bytes).
Add helper functions that determine the size (# of elements) and the
total data size covered by the array as well.
Current users of the single IDAL buffer are adapted to use the new
functions with one buffer to allocate.
The single IDAL buffer is removed from the tape_char_data struct.
Jan Höppner [Thu, 16 Oct 2025 07:47:16 +0000 (09:47 +0200)]
s390/tape: Move idal allocation to core functions
Currently tapechar_check_idalbuffer() is part of tape_char.c and is used
to ensure the idal buffer is big enough for the requested I/O and
reallocates a new one if required. The same is done in tape_std.c when a
fixed block size is set using the mtsetblk command. This is essentially
duplicate code.
The allocation of the buffer that is required for I/O can be considered
core functionality. Move the idal buffer allocation to tape_core.c,
make it generally available, and reduce code duplication.
Jan Höppner [Thu, 16 Oct 2025 07:47:15 +0000 (09:47 +0200)]
s390/tape: Fix return value of ccw helper functions
In contrast to all other helper functions used to build CCW chains,
tape_ccw_cc_idal() and tape_ccw_end_idal() return values using
post-increments, which results in returning the same CCW pointer.
Though, the intent of the CCW helper functions is to return the _next_
CCW in the chain, which can then be processed.
There is currently no actual issue, as tape_ccw_cc_idal() is not used
yet and tape_ccw_end_idal() is only used at the end of a chain.
Change both functions return statement to ccw + 1 and bring them in line
with the other helper functions.
Jan Höppner [Thu, 16 Oct 2025 07:47:14 +0000 (09:47 +0200)]
s390/tape: Remove extra CCW allocation for error recovery
The Read Opposite error recovery code required 2 extra CCWs to be
allocated in order to transform the request. As this error recovery code
for both 34xx and 3590 was removed the additional allocation isn't
required anymore. Reduce it to two.
On old native type 3590 tape devices a Read Opposite error recovery
procedure on Error Recovery Action Code (ERA) 26 was issued if a Read
Forward command failed. This recovery procedure was implemented with the
Read Backward command. This is no longer supported.
Remove 3590 ERA 26 and Read Backward related recovery code.
On old native type 3490 tape devices a Read Opposite error recovery
procedure on Error Recovery Action Code (ERA) 26 was issued if a Read
Forward command failed. This recovery procedure was implemented with the
Read Backward command.
As a preparation for a subsequent commit, that adds support for bigger
block sizes, remove the 34xx ERA 26 related recovery code. The recovery
code would need to be adapted to the bigger block sizes, without any
possibility to be tested, as modern Virtual Tape Servers (VTS) do
neither report ERA 26 on a Read Forward command failure nor support the
error recovery procedure anymore.
Heiko Carstens [Tue, 21 Oct 2025 08:19:30 +0000 (10:19 +0200)]
Merge branch 'memory-hotplug'
Sumanth Korikkar says:
====================
Provide a new interface for dynamic configuration and
deconfiguration of hotplug memory on s390, allowing with/without
memmap_on_memory support. It is a follow up on the discussion with David
when introducing memmap_on_memory support for s390 and support dynamic
(de)configuration of memory:
https://lore.kernel.org/all/ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com/
https://lore.kernel.org/all/20241202082732.3959803-1-sumanthk@linux.ibm.com/
The original motivation for introducing memmap_on_memory on s390 was to
avoid using online memory to store struct pages metadata, particularly
for standby memory blocks. This became critical in cases where there was
an imbalance between standby and online memory, potentially leading to
boot failures due to insufficient memory for metadata allocation.
To address this, memmap_on_memory was utilized on s390. However, in its
current form, it adds struct pages metadata at the start of each memory
block at the time of addition (only standby memory), and this
configuration is static. It cannot be changed at runtime (When the user
needs continuous physical memory).
Inorder to provide more flexibility to the user and overcome the above
limitation, add an option to dynamically configure and deconfigure
hotpluggable memory block with/without memmap_on_memory.
With the new interface, s390 will not add all possible hotplug memory in
advance, like before, to make it visible in sysfs for online/offline
actions. Instead, before memory block can be set online, it has to be
configured via a new interface in /sys/firmware/memory/memoryX/config,
which makes s390 similar to others. i.e. Adding of hotpluggable memory is
controlled by the user instead of adding it at boottime.
s390 kernel sysfs interface to configure/deconfigure memory with
memmap_on_memory (with upcoming lsmem changes):
* Initial memory layout:
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff 2G online 0-15 yes no
0x80000000-0xffffffff 2G offline 16-31 no yes
* Configure memory
echo 1 > /sys/firmware/memory/memory16/config
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff 2G online 0-15 yes no
0x80000000-0x87ffffff 128M offline 16 yes yes
0x88000000-0xffffffff 1.9G offline 17-31 no yes
* Deconfigure memory
echo 0 > /sys/firmware/memory/memory16/config
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff 2G online 0-15 yes no
0x80000000-0xffffffff 2G offline 16-31 no yes
* Enable memmap_on_memory and online it.
(Deconfigure first)
echo 0 > /sys/devices/system/memory/memory5/online
echo 0 > /sys/firmware/memory/memory5/config
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x27ffffff 640M online 0-4 yes no
0x28000000-0x2fffffff 128M offline 5 no no
0x30000000-0x7fffffff 1.3G online 6-15 yes no
0x80000000-0xffffffff 2G offline 16-31 no yes
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff 2G online 0-15 yes no
0x80000000-0xffffffff 2G offline 16-31 no yes
* Userspace changes:
lsmem/chmem tool is also changed to use the new interface. I will send
it to util-linux soon.
Patch 1 adds support for removal of boot-allocated memory blocks.
Patch 2 provides option to dynamically configure and deconfigure memory
with/without memmap_on_memory.
Patch 3 removes MHP_OFFLINE_INACCESSIBLE from s390. The mhp flag was
used to mark memory as not accessible until memory hotplug online phase
begins. However, with patch 2, it is no longer essential. Memory can be
brought to accessible state before adding memory, as the memory is added
during runttime now instead of boottime.
Patch 4 removes the MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. It
is no longer needed. Memory can be brought to accessible state before
adding memory now, with runtime (de)configuration of memory.
Note: The patches apply to the linux-next branch.
v3:
Thanks David
* Avoid goto label in create_standby_sclp_mems().
* Use unsigned long instead of u64.
* Add Acked-by.
v2:
Thanks David
* Rename struct mblock/mblock_arg with struct sclp_mem/sclp_mem_arg.
* Rename all mblocks/mblock references with sclp_mems/sclp_mem -
structures, functions.
* Rename create_online_mblock() with create_configured_sclp_mem().
* Rename config_mblock_show()/config_mblock_store() with
config_sclp_mem_show()/config_sclp_mem_store().
* Remove contains_standby_increment() and
sclp_mem_notifier. sclp mem state change is performed when
adding/removing memory. sclp memory notifier - no longer needed with
this patchset.
* Recover sclp mem state when add_memory() fails.
* Refactor and add function init_sclp_mem().
* Use unsigned long instead of unsigned long long.
* Simplify and correct kobj handling. Thanks Heiko.
s390/sysinfo: Replace sprintf() with snprintf() for buffer safety
Replace sprintf() with snprintf() when formatting symlink target name
to prevent potential buffer overflow. The link_to buffer is only 10
bytes, and using snprintf() ensures proper bounds checking if the
topology nesting limit value is unexpectedly large.
s390/extmem: Replace sprintf() with snprintf() for buffer safety
Replace unsafe sprintf() calls with snprintf() in segment_save() to
prevent potential buffer overflows. The function builds command strings
by repeatedly appending to a fixed-size buffer, which could overflow if
segment ranges are numerous or values are large.
s390/cmm: Replace sprintf() with scnprintf() for buffer safety
Replace sprintf() with scnprintf() in cmm_timeout_handler() to prevent
potential buffer overflow. The scnprintf() function ensures we don't
write beyond the buffer size and provides safer string formatting.
Linus Torvalds [Sun, 19 Oct 2025 14:59:43 +0000 (04:59 -1000)]
Merge tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Make sure the check for lost pelt idle time is done unconditionally
to have correct lost idle time accounting
- Stop the deadline server task before a CPU goes offline
* tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix pelt lost idle time detection
sched/deadline: Stop dl_server before CPU goes offline
Linus Torvalds [Sun, 19 Oct 2025 14:54:08 +0000 (04:54 -1000)]
Merge tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Make sure perf reporting works correctly in setups using
overlayfs or FUSE
- Move the uprobe optimization to a better location logically
* tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix MMAP2 event device with backing files
perf/core: Fix MMAP event path names with backing files
perf/core: Fix address filter match with backing files
uprobe: Move arch_uprobe_optimize right after handlers execution
Linus Torvalds [Sun, 19 Oct 2025 14:41:27 +0000 (04:41 -1000)]
Merge tag 'x86_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Reset the why-the-system-rebooted register on AMD to avoid stale bits
remaining from previous boots
- Add a missing barrier in the TLB flushing code to prevent erroneously
not flushing a TLB generation
- Make sure cpa_flush() does not overshoot when computing the end range
of a flush region
- Fix resctrl bandwidth counting on AMD systems when the amount of
monitoring groups created exceeds the number the hardware can track
* tag 'x86_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Prevent reset reasons from being retained across reboot
x86/mm: Fix SMP ordering in switch_mm_irqs_off()
x86/mm: Fix overflow in __cpa_addr()
x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID
Linus Torvalds [Sat, 18 Oct 2025 20:05:13 +0000 (10:05 -1000)]
Merge tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rustfmt fixes from Miguel Ojeda:
"Rust 'rustfmt' cleanup
'rustfmt', by default, formats imports in a way that is prone to
conflicts while merging and rebasing, since in some cases it condenses
several items into the same line.
Document in our guidelines that we will handle this for the moment
with the trailing empty comment workaround and make the tree
'rustfmt'-clean again"
* tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: bitmap: fix formatting
rust: cpufreq: fix formatting
rust: alloc: employ a trailing comment to keep vertical layout
docs: rust: add section on imports formatting
Linus Torvalds [Sat, 18 Oct 2025 18:38:28 +0000 (08:38 -1000)]
Merge tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fix from Jarkko Sakkinen:
"Correct the state transitions for ARM FF-A to match the spec and how
tpm_crb behaves on other platforms"
* tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm_crb: Add idle support for the Arm FF-A start method
Linus Torvalds [Sat, 18 Oct 2025 18:35:09 +0000 (08:35 -1000)]
Merge tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Search for MSI Capability with correct ID to fix an MSI regression on
platforms with Cadence IP (Hans Zhang)
- Revert early bridge resource set up to fix resource assignment
failures that broke at least alpha boot and Snapdragon ath12k WiFi
(Ilpo Järvinen)
- Implement VMD .irq_startup()/.irq_shutdown() to fix IRQ issues that
caused boot crashes and broken devices below VMD (Inochi Amaoto)
- Select CONFIG_SCREEN_INFO on X86 to fix black screen on boot when
SCREEN_INFO not selected (Mario Limonciello)
* tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/VGA: Select SCREEN_INFO on X86
PCI: vmd: Override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info()
PCI: Revert early bridge resource set up
PCI: cadence: Search for MSI Capability with correct ID
Linus Torvalds [Sat, 18 Oct 2025 18:22:07 +0000 (08:22 -1000)]
Merge tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link fixes from Dave Jiang:
"A small collection of CXL fixes. In addition to some misc fixes for
the CXL subsystem, a number of fixes for CXL extended linear cache
support are included to make it functional again.
- Avoid missing port component registers setup due to dport
enumeration failure
- Add check for no entries in cxl_feature_info to address accessing
invalid pointer.
- Use %pa printk format to emit resource_size_t in
validate_region_offset()
CXL extended linear cache support fixes:
- Fix setup of memory resource in cxl_acpi_set_cache_size()
- Set range param for region_res_match_cxl_range() as const
(addresses a compile warning for match_region_by_range() fix)
- Fix match_region_by_range() to use region_res_match_cxl_range()
- Subtract to find an hpa_alias0 in cxl_poison events to correct the
alias math calculation"
* tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/trace: Subtract to find an hpa_alias0 in cxl_poison events
cxl/region: Use %pa printk format to emit resource_size_t
cxl: Fix match_region_by_range() to use region_res_match_cxl_range()
cxl: Set range param for region_res_match_cxl_range() as const
cxl/acpi: Fix setup of memory resource in cxl_acpi_set_cache_size()
cxl/features: Add check for no entries in cxl_feature_info
cxl/port: Avoid missing port component registers setup
Linus Torvalds [Sat, 18 Oct 2025 18:00:43 +0000 (08:00 -1000)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak
imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov)
- Make selftests/bpf arg_parsing.c more robust to errors (Andrii
Nakryiko)
- Fix redefinition of 'off' as different kind of symbol when I40E
driver is builtin (Brahmajit Das)
- Do not disable preemption in bpf_test_run (Sahil Chandna)
- Fix memory leak in __lookup_instance error path (Shardul Bankar)
- Ensure test data is flushed to disk before reading it (Xing Guo)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Fix redefinition of 'off' as different kind of symbol
bpf: Do not disable preemption in bpf_test_run().
bpf: Fix memory leak in __lookup_instance error path
selftests: arg_parsing: Ensure data is flushed to disk before reading.
bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
selftests/bpf: make arg_parsing.c more robust to crashes
bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
Linus Torvalds [Sat, 18 Oct 2025 17:23:59 +0000 (07:23 -1000)]
Merge tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fixes from Namjae Jeon:
- Fix out-of-bounds in FS_IOC_SETFSLABEL
- Add validation for stream entry size to prevent infinite loop
* tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: fix out-of-bounds in exfat_nls_to_ucs2()
exfat: fix improper check of dentry.stream.valid_size
Linus Torvalds [Sat, 18 Oct 2025 17:18:48 +0000 (07:18 -1000)]
Merge tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Fix for FlexFiles mirror->dss allocation
- Apply delay_retrans to async operations
- Check if suid/sgid is cleared after a write when needed
- Fix setting the state renewal timer for early mounts after a reboot
* tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS4: Fix state renewals missing after boot
NFS: check if suid/sgid was cleared after a write as needed
NFS4: Apply delay_retrans to async operations
NFSv4/flexfiles: fix to allocate mirror->dss before use
Linus Torvalds [Sat, 18 Oct 2025 17:11:32 +0000 (07:11 -1000)]
Merge tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"smb client fixes, security and smbdirect improvements, and some minor cleanup:
- Important OOB DFS fix
- Fix various potential tcon refcount leaks
- smbdirect (RDMA) fixes (following up from test event a few weeks
ago):
- Fixes to improve and simplify handling of memory lifetime of
smbdirect_mr_io structures, when a connection gets disconnected
- Make sure we really wait to reach SMBDIRECT_SOCKET_DISCONNECTED
before destroying resources
- Make sure the send/recv submission/completion queues are large
enough to avoid ib_post_send() from failing under pressure
- convert cifs.ko to use the recommended crypto libraries (instead of
crypto_shash), this also can improve performance
- Three small cleanup patches"
* tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
smb: client: Consolidate cmac(aes) shash allocation
smb: client: Remove obsolete crypto_shash allocations
smb: client: Use HMAC-MD5 library for NTLMv2
smb: client: Use MD5 library for SMB1 signature calculation
smb: client: Use MD5 library for M-F symlink hashing
smb: client: Use HMAC-SHA256 library for SMB2 signature calculation
smb: client: Use HMAC-SHA256 library for key generation
smb: client: Use SHA-512 library for SMB3.1.1 preauth hash
cifs: parse_dfs_referrals: prevent oob on malformed input
smb: client: Fix refcount leak for cifs_sb_tlink
smb: client: let smbd_destroy() wait for SMBDIRECT_SOCKET_DISCONNECTED
smb: move some duplicate definitions to common/cifsglob.h
smb: client: let destroy_mr_list() keep smbdirect_mr_io memory if registered
smb: client: let destroy_mr_list() call ib_dereg_mr() before ib_dma_unmap_sg()
smb: client: call ib_dma_unmap_sg if mr->sgt.nents is not 0
smb: client: improve logic in smbd_deregister_mr()
smb: client: improve logic in smbd_register_mr()
smb: client: improve logic in allocate_mr_list()
smb: client: let destroy_mr_list() remove locked from the list
smb: client: let destroy_mr_list() call list_del(&mr->list)
...
Linus Torvalds [Sat, 18 Oct 2025 17:07:14 +0000 (07:07 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the handling of ZCR_EL2 in NV VMs
- Pick the correct translation regime when doing a PTW on the back of
a SEA
- Prevent userspace from injecting an event into a vcpu that isn't
initialised yet
- Move timer save/restore to the sysreg handling code, fixing EL2
timer access in the process
- Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug
- Fix trapping configuration when the host isn't GICv3
- Improve the detection of HCR_EL2.E2H being RES1
- Drop a spurious 'break' statement in the S1 PTW
- Don't try to access SPE when owned by EL3
Documentation updates:
- Document the failure modes of event injection
- Document that a GICv3 guest can be created on a GICv5 host with
FEAT_GCIE_LEGACY
Selftest improvements:
- Add a selftest for the effective value of HCR_EL2.AMO
- Address build warning in the timer selftest when building with
clang
- Teach irqfd selftests about non-x86 architectures
- Add missing sysregs to the set_id_regs selftest
- Fix vcpu allocation in the vgic_lpi_stress selftest
- Correctly enable interrupts in the vgic_lpi_stress selftest
x86:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test
for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return
-EAGAIN if userspace deletes/moves memslot during prefault")
- Don't try to get PMU capabilities from perf when running a CPU with
hybrid CPUs/PMUs, as perf will rightly WARN.
guest_memfd:
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a
more generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to
explicitly set said flag to initialize memory as SHARED,
irrespective of MMAP.
The behavior merged in 6.18 is that enabling mmap() implicitly
initializes memory as SHARED, which would result in an ABI
collision for x86 CoCo VMs as their memory is currently always
initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with
private memory, to enable testing such setups, i.e. to hopefully
flush out any other lurking ABI issues before 6.18 is officially
released.
- Add testcases to the guest_memfd selftest to cover guest_memfd
without MMAP, and host userspace accesses to mmap()'d private
memory"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
arm64: Revamp HCR_EL2.E2H RES1 detection
KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
KVM: arm64: Kill leftovers of ad-hoc timer userspace access
KVM: arm64: Fix WFxT handling of nested virt
KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
KVM: arm64: Make timer_set_offset() generally accessible
KVM: arm64: Replace timer context vcpu pointer with timer_id
KVM: arm64: Introduce timer_context_to_vcpu() helper
KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests
Documentation: KVM: Update GICv3 docs for GICv5 hosts
KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
KVM: arm64: selftests: Allocate vcpus with correct size
...
Linus Torvalds [Sat, 18 Oct 2025 17:02:28 +0000 (07:02 -1000)]
Merge tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix to handle NULL pointer dereference at irq domain teardown
- Fix for handling extraction of struct xive_irq_data
- Fix to skip parameter area allocation when fadump disabled
Thanks to Ganesh Goudar, Hari Bathini, Nam Cao, Ritesh Harjani (IBM),
Sourabh Jain, and Venkat Rao Bagalkote,
* tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/fadump: skip parameter area allocation when fadump is disabled
powerpc, ocxl: Fix extraction of struct xive_irq_data
powerpc/pseries/msi: Fix NULL pointer dereference at irq domain teardown
Linus Torvalds [Sat, 18 Oct 2025 16:59:25 +0000 (06:59 -1000)]
Merge tag 'slab-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Fixes for two bugs that can be triggered when debugging options are
enabled (Hao Ge, Vlastimil Babka)
* tag 'slab-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: reset slab->obj_ext when freeing and it is OBJEXTS_ALLOC_FAIL
slab: fix clearing freelist in free_deferred_objects()