git.ipfire.org Git - thirdparty/kernel/linux.git/log

s390/cmpxchg: Provide arch_cmpxchg128_local()

Just like x86 and arm64 provide a trivial arch_cmpxchg128_local()
implementation by mapping it to arch_cmpxchg128().

Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cmpxchg: Implement arch_xchg() with arch_try_cmpxchg()

Get rid of the arch_xchg() inline assemblies by converting the inline
assemblies to C functions which make use of arch_try_cmpxchg().

With flag output operand support the generated code is at least as good as
the previous version. Without it is slightly worse, however getting rid of
all the inline assembly code is worth it.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cmpxchg: Provide arch_try_cmpxchg()

Since gcc 14 flag output operands are supported also for s390.

Provide an arch_try_cmpxchg() implementation so that all existing
try_cmpxchg() variants provide slightly better code, if compiled
with gcc 14 or newer.

Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cmpxchg: Convert one and two byte case inline assemblies to C

Rewrite __cmpxchg() in order to get rid of the large inline
assemblies. Convert the one and two byte inline assemblies to
C functions.

The generated code of the new implementation is nearly as good or bad as
the old variant, but easier to read.

Note that the new variants are quite close to the generic cmpxchg_emu_u8()
implementation, however a conversion to the generic variant will not follow
since with mm/vmstat.c there is heavy user of one byte cmpxchg(). A not
inlined variant would have a negative performance impact.

Also note that the calls within __arch_cmpxchg() come with rather pointless
"& 0xff..." operations. They exist only to avoid false positive sparse
warnings like "warning: cast truncates bits from constant value ...".

Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/dump: Add firmware sysfs attribute for dump area size

Dump tools from s390-tools such as zipl need to know the correct dump area
size of the machine they run on in order to be able to create valid
standalone dumper images. Therefore, allow it to be obtained through
the new sysfs read-only attribute /sys/firmware/dump/dump_area_size.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/con3270: Use NULL instead of 0 for pointers

Get rid of sparse warnings:
CHECK   drivers/s390/char/con3270.c
  drivers/s390/char/con3270.c:531:15: warning: Using plain integer as NULL pointer
  drivers/s390/char/con3270.c:749:15: warning: Using plain integer as NULL pointer

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pci: Add header guards and includes to internal headers

While pci_iov.h has include guards it doesn't include the necessary
header for struct zpci_dev, pci_bus.h on the other hand lacks even basic
include guards. This isn't only fragile and breaks convention but also
confuses tooling such as clang-analyzer. Add both include guards and the
necessary includes.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uvdevice: Fix and slightly improve kernel-doc comment

Fix incorrect kernel-doc comment style, add missing return statement, fix
incorrect parameter name, and add some additional consistency across all
kernel-doc comments.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uvdevice: Support longer secret lists

Enable the list IOCTL to provide lists longer than one page (85 entries).
The list IOCTL now accepts any argument length in page granularity.
It fills the argument up to this length with entries until the list
ends. User space unaware of this enhancement will still receive one page
of data and an uv_rc 0x0100.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20241104153609.1361388-1-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241104153609.1361388-1-seiden@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/sparsemem: Provide phys_to_target_node() with CONFIG_NUMA

Add a trival phys_to_target_node() implementation which always returns 0 if
CONFIG_NUMA is enabled, since the s390 NUMA implementation only supports
node 0.
This is similar to memory_add_physaddr_to_nid() in order to avoid runtime
warnings.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/configs: Enable CONFIG_VIRTIO_MEM

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

Merge branch 'virtio-mem' into features

David Hildenbrand says:

====================
virtio-mem: s390 support

Let's finally add s390 support for virtio-mem; my last RFC was sent
4 years ago, and a lot changed in the meantime.

The latest QEMU series is available at [1], which contains some more
details and a usage example on s390 (last patch).

There is not too much in here: The biggest part is querying a new diag(500)
STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".

The last three patches are not strictly required but certainly nice-to-have.

Note that -- in contrast to standby memory -- virtio-mem memory must be
configured to be automatically onlined as soon as hotplugged. The easiest
approach is using the "memhp_default_state=" kernel parameter or by using
proper udev rules. More details can be found at [2].

I have reviving+upstreaming a systemd service to handle configuring
that on my todo list, but for some reason I keep getting distracted ...

I tested various things, including:
* Various memory hotplug/hotunplug combinations
* Device hotplug/hotunplug
* /proc/iomem output
* reboot
* kexec
* kdump: make sure we properly enter the "kdump mode" in the virtio-mem
   driver

kdump support for virtio-mem memory on s390 will be sent out separately.

v2 -> v3
* "s390/kdump: make is_kdump_kernel() consistently return "true" in kdump
   environments only"
-> Sent out separately [3]
* "s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory
   devices"
-> No query function for diag500 for now.
-> Update comment above setup_ident_map_size().
-> Optimize/rewrite diag500_storage_limit() [Heiko]
-> Change handling in detect_physmem_online_ranges [Alexander]
-> Improve documentation.
* "s390/sparsemem: provide memory_add_physaddr_to_nid() with CONFIG_NUMA"
-> Added after testing on systems with CONFIG_NUMA=y

v1 -> v2:
* Document the new diag500 subfunction
* Use "s390" instead of "s390x" consistently

[1] https://lkml.kernel.org/r/20241008105455.2302628-1-david@redhat.com
[2] https://virtio-mem.gitlab.io/user-guide/user-guide-linux.html
[3] https://lkml.kernel.org/r/20241023090651.1115507-1-david@redhat.com
====================

Link: https://lore.kernel.org/r/20241025141453.1210600-1-david@redhat.com/
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/sparsemem: Provide memory_add_physaddr_to_nid() with CONFIG_NUMA

virtio-mem uses memory_add_physaddr_to_nid() to determine the NID to use
for memory it adds.

We currently fallback to the dummy implementation in mm/numa.c with
CONFIG_NUMA, which will end up triggering an undesired pr_info_once():

Unknown online node for memory at 0x100000000, assuming node 0

On s390, we map all cpus and memory to node 0, so let's add a simple
memory_add_physaddr_to_nid() implementation that does exactly that,
but without complaining.

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-8-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/sparsemem: Reduce section size to 128 MiB

Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
Before that, we were using a section size of 32 MiB on both
architectures.

Now that we have a new mechanism to expose additional memory to a VM --
virtio-mem -- reduce the section size to 128 MiB to allow for more
flexibility and reduce the metadata overhead when dealing with hot(un)plug
granularity smaller than 256 MiB.

128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
base pages switched to 128 MiB as well: it's just big enough on these
architectures to allows for using a huge page (2 MiB) in the vmemmap in
sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
in the direct mapping, while still allowing for small hot(un)plug
granularity.

For s390, we could even switch to a 64 MiB section size, as our huge page
size is 1 MiB: but the smaller the section size, the more sections we'll
have to manage especially on bigger machines. Making it consistent with
x86-64 and arm64 feels like the right thing for now.

Note that the smallest memory hot(un)plug granularity is also limited by
the memory block size, determined by extracting the memory increment
size from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;
therefore, we'll end up with a memory block size of 128 MiB with a
128 MiB section size.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-7-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

lib/Kconfig.debug: Default STRICT_DEVMEM to "y" on s390

virtio-mem currently depends on !DEVMEM | STRICT_DEVMEM. Let's default
STRICT_DEVMEM to "y" just like we do for arm64 and x86.

There could be ways in the future to filter access to virtio-mem device
memory even without STRICT_DEVMEM, but for now let's just keep it
simple.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-6-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

virtio-mem: s390 support

Now that s390 code is prepared for memory devices that reside above the
maximum storage increment exposed through SCLP, everything is in place
to unlock virtio-mem support.

As virtio-mem in Linux currently supports logically onlining/offlining
memory in pageblock granularity, we have an effective hot(un)plug
granularity of 1 MiB on s390.

As virito-mem adds/removes individual Linux memory blocks (256MB), we
will currently never use gigantic pages in the identity mapping.

It is worth noting that neither storage keys nor storage attributes (e.g.,
data / nodat) are touched when onlining memory blocks, which is good
because we are not supposed to touch these parts for unplugged device
blocks that are logically offline in Linux.

We will currently never initialize storage keys for virtio-mem
memory -- IOW, storage_key_init_range() is never called. It could be added
in the future when plugging device blocks. But as that function
essentially does nothing without modifying the code (changing
PAGE_DEFAULT_ACC), that's just fine for now.

kexec should work as intended and just like on other architectures that
support virtio-mem: we will never place kexec binaries on virtio-mem
memory, and never indicate virtio-mem memory to the 2nd kernel. The
device driver in the 2nd kernel can simply reset the device --
turning all memory unplugged, to then start plugging memory and adding
them to Linux, without causing trouble because the memory is already
used elsewhere.

The special s390 kdump mode, whereby the 2nd kernel creates the ELF
core header, won't currently dump virtio-mem memory. The virtio-mem
driver has a special kdump mode, from where we can detect memory ranges
to dump. Based on this, support for dumping virtio-mem memory can be
added in the future fairly easily.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-5-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/physmem_info: Query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices

To support memory devices under QEMU/KVM, such as virtio-mem,
we have to prepare our kernel virtual address space accordingly and
have to know the highest possible physical memory address we might see
later: the storage limit. The good old SCLP interface is not suitable for
this use case.

In particular, memory owned by memory devices has no relationship to
storage increments, it is always detected using the device driver, and
unaware OSes (no driver) must never try making use of that memory.
Consequently this memory is located outside of the "maximum storage
increment"-indicated memory range.

Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
limit that can exceed the "maximum storage increment", and use the
existing interfaces (i.e., SCLP) to obtain information about the initial
memory that is not owned+managed by memory devices.

If a hypervisor does not support such memory devices, the address exposed
through diag500 STORAGE_LIMIT will correspond to the maximum storage
increment exposed through SCLP.

To teach kdump on s390 to include memory owned by memory devices, there
will be ways to query the relevant memory ranges from the device via a
driver running in special kdump mode (like virtio-mem already implements
to filter /proc/vmcore access so we don't end up reading from unplugged
device blocks).

Update setup_ident_map_size(), to clarify that there can be more than
just online and standby memory.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-4-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

Documentation: s390-diag.rst: Document diag500(STORAGE LIMIT) subfunction

Let's document our new diag500 subfunction that can be implemented by
userspace.

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-3-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

Documentation: s390-diag.rst: Make diag500 a generic KVM hypercall

Let's make it a generic KVM hypercall, allowing other subfunctions to
be more independent of virtio.

While at it, document that unsupported/unimplemented subfunctions result
in a SPECIFICATION exception.

This is a preparation for documenting a new subfunction.

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-2-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/kvm: Mask extra bits from program interrupt code

The program interrupt code has some extra bits that are sometimes set
by hardware for various reasons; those bits should be ignored when the
program interrupt number is needed for interrupt handling.

Fixes: 05066cafa925 ("s390/mm/fault: Handle guest-related program interrupts in KVM")
Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241031120316.25462-1-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/kvm: Initialize uninitialized flags variable

The flags variable was being used uninitialized.
Initialize it to 0 as expected.

For some reason neither gcc nor clang reported a warning.

Fixes: 05066cafa925 ("s390/mm/fault: Handle guest-related program interrupts in KVM")
Reported-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20241030161906.85476-1-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Fix and protect memory allocation of SDBs with mutex

Reservation of the PMU hardware is done at first event creation
and is protected by a pair of mutex_lock() and mutex_unlock().
After reservation of the PMU hardware the memory
required for the PMUs the event is to be installed on is
allocated by allocate_buffers() and alloc_sampling_buffer().
This done outside of the mutex protection.
Without mutex protection two or more concurrent invocations of
perf_event_init() may run in parallel.
This can lead to allocation of Sample Data Blocks (SDBs)
multiple times for the same PMU.
Prevent this and protect memory allocation of SDBs by
mutex.

Fixes: 8a6fe8f21ec4 ("s390/cpum_sf: Use refcount_t instead of atomic_t")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm: Cleanup fault error handling

Combine the two VM_FAULT_ERROR checks in do_exception() and move them
to the exit path, similar to x86. Also remove a random blank line.

Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm: Convert to LOCK_MM_AND_FIND_VMA

With the gmap code gone s390 can be easily converted to
LOCK_MM_AND_FIND_VMA like it has been done for most other
architectures.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-12-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm: Get rid of fault type switch statements

With GMAP_FAULT fault type gone, there are only KERNEL_FAULT and
USER_FAULT fault types left. Therefore there is no need for any fault
type switch statements left.

Rename get_fault_type() into is_kernel_fault() and let it return a
boolean value. Change all switch statements to if statements. This
removes quite a bit of code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-11-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm: Simplify get_fault_type()

With the gmap code gone get_fault_type() can be simplified:

- every fault with user_mode(regs) == true must be a fault in user address
space

- every fault with user_mode(regs) == false is only a fault in user
address space if the used address space is the secondary address space

- every other fault is within the kernel address space

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-10-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390: Remove gmap pointer from lowcore

Remove the gmap pointer from lowcore, since it is not used anymore.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-9-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm/gmap: Remove gmap_{en,dis}able()

Remove gmap_enable(), gmap_disable(), and gmap_get_enabled() since they do
not have any users anymore.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-8-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/kvm: Stop using gmap_{en,dis}able()

Stop using gmap_enable(), gmap_disable(), gmap_get_enabled().

The correct guest ASCE is passed as a parameter of sie64a(), there is
no need to save the current gmap in lowcore.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-7-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm/fault: Handle guest-related program interrupts in KVM

Any program interrupt that happens in the host during the execution of
a KVM guest will now short circuit the fault handler and return to KVM
immediately. Guest fault handling (including pfault) will happen
entirely inside KVM.

When sie64a() returns zero, current->thread.gmap_int_code will contain
the program interrupt number that caused the exit, or zero if the exit
was not caused by a host program interrupt.

KVM will now take care of handling all guest faults in vcpu_post_run().

Since gmap faults will not be visible by the rest of the kernel, remove
GMAP_FAULT, the linux fault handlers for secure execution faults, the
exception table entries for the sie instruction, the nop padding after
the sie instruction, and all other references to guest faults from the
s390 code.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-6-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm/gmap: Fix __gmap_fault() return code

Errors in fixup_user_fault() were masked and -EFAULT was returned for
any error, including out of memory.

Fix this by returning the correct error code. This means that in many
cases the error code will be propagated all the way to userspace.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-5-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/mm/gmap: Refactor gmap_fault() and add support for pfault

When specifying FAULT_FLAG_RETRY_NOWAIT as flag for gmap_fault(), the
gmap fault will be processed only if it can be resolved quickly and
without sleeping. This will be needed for pfault.

Refactor gmap_fault() to improve readability.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-4-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/kvm: Remove kvm_arch_fault_in_page()

kvm_arch_fault_in_page() is a useless wrapper around gmap_fault(); just
use gmap_fault() directly instead.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-3-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/entry: Remove __GMAP_ASCE and use _PIF_GUEST_FAULT again

Now that the guest ASCE is passed as a parameter to __sie64a(),
_PIF_GUEST_FAULT can be used again to determine whether the fault was a
guest or host fault.

Since the guest ASCE will not be taken from the gmap pointer in lowcore
anymore, __GMAP_ASCE can be removed. For the same reason the guest
ASCE needs now to be saved into the cr1 save area unconditionally.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-2-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Rework call to sf_disable()

Setup_pmc_cpu() function body consists of one single switch
statement with two cases PMC_INIT and PMC_RELEASE.
In both of these cases sf_disable() is invoked to turn off the
CPU Measurement sampling facility.
Move sf_disable() out of the switch statement.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/crypto: Add hardware acceleration for full AES-XTS mode

Extend the existing paes cipher to exploit the full AES-XTS hardware
acceleration introduced with message-security assist extension 10.

The full AES-XTS mode requires a protected key of type
PKEY_KEYTYPE_AES_XTS_128 or PKEY_KEYTYPE_AES_XTS_256.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/crypto: Postpone the key split to key conversion

Store the input key material of paes-xts in a single key_blob
structure. The split of the input key material is postponed to the key
conversion. Split the key material only, if the returned protected
keytype requires a second protected key.

For clear key pairs, prepare a clearkey token for each key and convert
them separately to protected keys. Store the concatenated conversion
results as input key in the context. All other input keys are stored
as is.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/crypto: Introduce function for tokenize clearkeys

Move the conversion of a clearkey blob to token into a separate
function.

The functionality of the paes module is not affected by this commit.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/crypto: Generalize parameters for key conversion

As a preparation for multiple key tokens in a key_blob structure, use
separate pointer and length parameters for __paes_keyblob2pkey()
instead a pointer to the struct key_blob.

The functionality of the paes module is not affected by this commit.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/crypto: Use module-local structures for protected keys

The paes module uses only AES related structures and constants of the
pkey module. As pkey also supports protected keys other than AES keys,
the structures and size constants of the pkey module may be
changed. Use module-local structures and size constants for paes to
prevent any unwanted side effect by such a change.

The struct pkey_protkey is used to store the protected key blob
together with its length and type. The structure is only used locally,
it is not required for any pkey API call. So define the module-local
structure struct paes_protkey instead.

While at it, unify the names of struct paes_protkey variables on
stack.

The functionality of the paes module is not affected by this commit.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/crypto: Convert to reverse x-mas tree, rename ret to rc

Reverse x-mas tree order for stack variables in paes module. While at
it, rename stack variables ret to rc.

The functionality of the paes module is not affected by this commit.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pkey: Tolerate larger key blobs

The pkey handlers should only check, if the length of a key blob is big
enough for holding a key. Larger blobs should be tolerated.

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Handle CPU hotplug remove during sampling

CPU hotplug remove handling triggers the following function
call sequence:

   CPUHP_AP_PERF_S390_SF_ONLINE  --> s390_pmu_sf_offline_cpu()
   ...
   CPUHP_AP_PERF_ONLINE          --> perf_event_exit_cpu()

The s390 CPUMF sampling CPU hotplug handler invokes:

s390_pmu_sf_offline_cpu()
+-->  cpusf_pmu_setup()
       +--> setup_pmc_cpu()
            +--> deallocate_buffers()

This function de-allocates all sampling data buffers (SDBs) allocated
for that CPU at event initialization. It also clears the
PMU_F_RESERVED bit. The CPU is gone and can not be sampled.

With the event still being active on the removed CPU, the CPU event
hotplug support in kernel performance subsystem triggers the
following function calls on the removed CPU:

  perf_event_exit_cpu()
  +--> perf_event_exit_cpu_context()
       +--> __perf_event_exit_context()
    +--> __perf_remove_from_context()
         +--> event_sched_out()
              +--> cpumsf_pmu_del()
                   +--> cpumsf_pmu_stop()
                                +--> hw_perf_event_update()

to stop and remove the event. During removal of the event, the
sampling device driver tries to read out the remaining samples from
the sample data buffers (SDBs). But they have already been freed
(and may have been re-assigned). This may lead to a use after free
situation in which case the samples are most likely invalid. In the
best case the memory has not been reassigned and still contains
valid data.

Remedy this situation and check if the CPU is still in reserved
state (bit PMU_F_RESERVED set). In this case the SDBs have not been
released an contain valid data. This is always the case when
the event is removed (and no CPU hotplug off occured).
If the PMU_F_RESERVED bit is not set, the SDB buffers are gone.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Fix format string in pr_err()

Fix format string in pr_err() and use the built-in
hexadecimal prefix %#x to display a number with a leading
hexadecimal indicator 0x.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Use sf_buffer_available()

Use sf_buffer_available() consistently throughtout the code
to test for the existence of sampling buffer.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Consistently use goto out for function exit

When the sampling buffer allocation fails in
__hw_perf_event_init(), jump to the end of the function
and return the result. This is consistent with the other
error handling and return conditions in this function.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-By: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Do not re-enable event after deletion

Event delete removes an event from the event list, but common
code invokes the PMU's enable function later on. This happens
in event_sched_out() and leads to the following call sequence:

  event_sched_out()
  +--> cpumsf_pmu_del()
  +--> cpumsf_pmu_enable()

In cpumsf_pmu_enable() return immediately when the event is not
active. Also remove an unneeded if clause. That if() statement
is only reached when flag PMU_F_IN_USE has been set in
cpumsf_pmu_add(). And this function also sets cpuhw->event
to a valid value.

Remove WARN_ON_ONCE() statement which never triggered.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

MAINTAINERS: Update and add s390 crypto related entries

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pkey: Add new pkey handler module pkey-uv

This new pkey handler module supports the conversion of
Ultravisor retrievable secrets to protected keys.
The new module pkey-uv.ko is able to retrieve and verify
protected keys backed up by the Ultravisor layer which is
only available within protected execution environment.

The module is only automatically loaded if there is the
UV CPU feature flagged as available. Additionally on module
init there is a check for protected execution environment
and for UV supporting retrievable secrets. Also if the kernel
is not running as a protected execution guest, the module
unloads itself with errno ENODEV.

The pkey UV module currently supports these Ultravisor
secrets and is able to retrieve a protected key for these
UV secret types:
  - UV_SECRET_AES_128
  - UV_SECRET_AES_192
  - UV_SECRET_AES_256
  - UV_SECRET_AES_XTS_128
  - UV_SECRET_AES_XTS_256
  - UV_SECRET_HMAC_SHA_256
  - UV_SECRET_HMAC_SHA_512
  - UV_SECRET_ECDSA_P256
  - UV_SECRET_ECDSA_P384
  - UV_SECRET_ECDSA_P521
  - UV_SECRET_ECDSA_ED25519
  - UV_SECRET_ECDSA_ED448

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pkey: Build module name array selectively based on kernel config options

There is a static array of pkey handler kernel module names
used in case the pkey_handler_request_modules() is invoked.
This static array is walked through and if the module is not
already loaded a module_request() is performed.

This patch reworks the code to instead of unconditionally
building up a list of module names into the array, only the
pkey handler modules available based on the current kernel
config options are inserted.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pkey: Fix checkpatch findings in pkey header file

Fix all the complains from checkpatch for the pkey header file:

CHECK: No space is necessary after a cast
+ PKEY_TYPE_CCA_DATA = (__u32) 1,

CHECK: Please use a blank line after function/struct/union/enum declarations
+};
+#define PKEY_GENSECK _IOWR(PKEY_IOCTL_MAGIC, 0x01, struct pkey_genseck)

Suggested-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pkey: Rework pkey verify for protected keys

Rework the verification of protected keys by simple check
for the correct AES wrapping key verification pattern.

A protected key always carries the AES wrapping key
verification pattern within the blob. The old code really
used the protected key for an en/decrypt operation and by
doing so, verified the AES WK VP. But a much simpler and
more generic way is to extract the AES WK VP value from the
key and compare it with AES WK VP from a freshly created
dummy protected key. This also eliminates the limitation to
only be able to verify AES protected keys. With this change
any kind of known protected key can be verified.

Suggested-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pkey: Simplify protected key length calculation code

The calculation of the length of a protected key based on
the protected key type is scattered over certain places within
the pkey code. By introducing a new inline function
pkey_keytype_to_size() this can be centralized and the calling
code can be reduced and simplified.

With this also comes a slight rework of the generation of
protected keys. Now the pkey_pckmo module is able to generate
all but ECC keys.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/zcrypt: Cleanup include zcrypt_api.h

Move include statement for zcrypt_api.h from the
codefiles to the zcrypt_ccamis.h header file.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Suggested-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uv: Retrieve UV secrets sysfs support

Reflect the updated content in the query information UVC to the sysfs at
/sys/firmware/query

* new UV-query sysfs entry for the maximum number of retrievable
  secrets the UV can store for one secure guest.
* new UV-query sysfs entry for the maximum number of association
  secrets the UV can store for one secure guest.
* max_secrets contains the sum of max association and max retrievable
  secrets.

Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20241024062638.1465970-7-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uvdevice: Increase indent in IOCTL definitions

Increase the indentations in the IOCTL defines so that we will not have
problems with upcoming, longer constant names.
While at it, fix a minor typo.

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20241024062638.1465970-5-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uvdevice: Add Retrieve Secret IOCTL

Add a new IOCL number to support the new Retrieve Secret UVC for
user-space.
User-space provides the index of the secret (u16) to retrieve.
The uvdevice calls the Retrieve Secret UVC and copies the secret into
the provided buffer if it fits. To get the secret type, index, and size
user-space needs to call the List UVC first.

Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20241024062638.1465970-4-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uv: Retrieve UV secrets support

Provide a kernel API to retrieve secrets from the UV secret store.
Add two new functions:
* `uv_get_secret_metadata` - get metadata for a given secret identifier
* `uv_retrieve_secret` - get the secret value for the secret index

With those two functions one can extract the secret for a given secret
id, if the secret is retrievable.

Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20241024084107.2418186-1-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uv: Use a constant for more-data rc

Add a define for the UVC rc 0x0100 that indicates that a UV-call was
successful but may serve more data if called with a larger buffer
again.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20241024062638.1465970-2-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uv: Provide host-key hashes in sysfs

Utilize the new Query Ultravisor Keys UVC to give user space the
information which host-keys are installed on the system.

Create a new sysfs directory 'firmware/uv/keys' that contains the hash
of the host-key and the backup host-key of that system. Additionally,
the file 'all' contains the response from the UVC possibly containing
more key-hashes than currently known.

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20241023075529.2561384-1-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/uv: Refactor uv-sysfs creation

Streamline the sysfs generation to make it more extensible.
Add a function to create a sysfs entry in the uv-sysfs dir.
Use this function for the query directory.

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20241015113940.3088249-2-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/netiucv: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the netiucv code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/vfio-ap: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the vfio_ap code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/vmur: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/vmur code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/sclp_cpi: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/sclp_cpi_sys code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/sclp_ocf: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/sclp_ocf code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/vmlogrdr: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/vmlogrdr code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/tape: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/tape_core code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/dcssblk: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the block/dcssblk code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio/scm: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/scm code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio/css: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/css code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio/ccwgroup: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/ccwgroup code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio/cmf: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/cmf code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio/device: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/device code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio/chp: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/chp code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

scsi: zfcp: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the zfcp code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Tested-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/crypto: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the s390/crypto code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/ipl: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/ipl code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/nospec: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/nospec-sysfs code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/perf_event: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/perf_event code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/smp: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/smp code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/time: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/time code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/topology: Switch over to sysfs_emit()

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/topology code.

Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/kdump: Provide is_kdump_kernel() implementation

s390 sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during
setup_arch() to deactivate the "elfcorehdr= kernel" parameter,
resulting in is_kdump_kernel() returning "false".

During vmcore_init()->elfcorehdr_alloc(), if on a dump kernel and
allocation succeeded, elfcorehdr_addr will be set to a valid address
and is_kdump_kernel() will consequently return "true".

is_kdump_kernel() should return a consistent result during all boot
stages, and properly return "true" if in a kdump environment - just
like it is done on powerpc where "false" is indicated in fadump
environments, as added in commit b098f1c32365 ("powerpc/fadump: make
is_kdump_kernel() return false when fadump is active").

Similarly provide a custom is_kdump_kernel() implementation that will only
return "true" in kdump environments, and will do so consistently during
boot.

Update the documentation of dump_available().

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/20241023090651.1115507-1-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pageattr: Implement missing kernel_page_present()

kernel_page_present() was intentionally not implemented when adding
ARCH_HAS_SET_DIRECT_MAP support, since it was only used for suspend/resume
which is not supported anymore on s390.

A new bpf use case led to a compile error specific to s390. Even though
this specific use case went away implement kernel_page_present(), so that
the API is complete and potential future users won't run into this problem.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/045de961-ac69-40cc-b141-ab70ec9377ec@iogearbox.net
Fixes: 0490d6d7ba0a ("s390/mm: enable ARCH_HAS_SET_DIRECT_MAP")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390: Fix various typos

Run codespell on arch/s390 and drivers/s390 and fix all typos.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio: Do not unregister the subchannel based on DNV

Starting with commit 2297791c92d0 ("s390/cio: dont unregister
subchannel from child-drivers"), CIO does not unregister subchannels
when the attached device is invalid or unavailable. Instead, it
allows subchannels to exist without a connected device. However, if
the DNV value is 0, such as, when all the CHPIDs of a subchannel are
configured in standby state, the subchannel is unregistered, which
contradicts the current subchannel specification.

Update the logic so that subchannels are not unregistered based
on the DNV value. Also update the SCHIB information even if the
DNV bit is zero.

Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pci: Align prototypes of zpci IO memcpy functions

The generic memcpy_{from,to}io and memset_io functions have a different
prototype than the zpci_memcpy_{from,to}io and zpci_memset_io functions.
But in driver code zpci functions are used as IO memcpy directly. So,
align their prototypes.

Signed-off-by: Julian Vetter <jvetter@kalrayinc.com>
Link: https://lore.kernel.org/r/20241010130100.710005-2-jvetter@kalrayinc.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pci: Expose FIDPARM attribute in sysfs

This attribute will be used to communicate function type specific
firmware controlled flag bits.

Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_cf: Correct typo CYLCE

Signed-off-by: Antonia Jonas <antonia@toertel.de>
Link: https://lore.kernel.org/r/20241003115648.26188-1-antonia@toertel.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cio: Correct some typos in comments

Fixed some confusing typos that were currently identified with codespell,
the details are as follows:

-in the code comments:
drivers/s390/cio/chsc.c:379: EBCIDC ==> EBCDIC
drivers/s390/cio/cio.h:22: sublass ==> subclass
drivers/s390/cio/cmf.c:49: exended ==> extended
drivers/s390/cio/cmf.c:138: sinlge ==> single
drivers/s390/cio/cmf.c:1230: Reenable ==> Re-enable

Signed-off-by: Shen Lichuan <shenlichuan@vivo.com>
Link: https://lore.kernel.org/r/20240929080353.11690-1-shenlichuan@vivo.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/sclp: Allow user-space to provide PCI reports for optical modules

The new SCLP action qualifier 3 is used by user-space code to provide
optical module monitoring data to the platform.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/virtio_ccw: Fix dma_parm pointer not set up

At least since commit 334304ac2bac ("dma-mapping: don't return errors
from dma_set_max_seg_size") setting up device.dma_parms is basically
mandated by the DMA API. As of now Channel (CCW) I/O in general does not
utilize the DMA API, except for virtio. For virtio-ccw however the
common virtio DMA infrastructure is such that most of the DMA stuff
hinges on the virtio parent device, which is a CCW device.

So lets set up the dma_parms pointer for the CCW parent device and hope
for the best!

Fixes: 334304ac2bac ("dma-mapping: don't return errors from dma_set_max_seg_size")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Link: https://lore.kernel.org/r/20241007201030.204028-1-pasic@linux.ibm.com
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/facilities: Fix warning about shadow of global variable

Compiling the kernel with clang W=2 produces a warning that the
parameter declarations in some routines would shadow the definition of
the global variable stfle_fac_list. Address this warning by renaming the
parameters to fac_list.

Fixes: 17e89e1340a3 ("s390/facilities: move stfl information from lowcore to global data")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pci: Switch over to sysfs_emit

Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert over two
left-over uses in the s390 PCI code.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: Set bit PMU_F_ENABLED enabled after lpp() invocation

Set PMU enabled bit after lpp() call. This ensures the proper
task PID is loaded by the llp instruction into Program Parameter
register from where it is copied to each sampling data buffer
(SDB) sample entry after the sampling was enabled.

The barrier() instruction is not needed as cpumsf_pmu_enable()
changes a CPU specific variable. Only the CPU that task is
running on changes structure members.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pci: Ignore RID for isolated VFs

Ensure that VFs used in isolation, that is with their parent PF not
visible to the configuration but with their RID exposed, are treated
compatibly with existing isolated VF use cases without exposed RID
including RoCE Express VFs. This allows creating configurations where
one LPAR manages PFs while their child VFs are used by other LPARs. This
gives the LPAR managing the PFs a role analogous to that of the
hypervisor in a typical use case of passing child VFs to guests.

Instead of creating a multifunction struct zpci_bus whenever a PCI
function with RID exposed is discovered only create such a bus for
configured physical functions and only consider multifunction busses
when searching for an existing bus. Additionally only set zdev->devfn to
the devfn part of the RID once the function is added to a multifunction
bus.

This also fixes probing of more than 7 such isolated VFs from the same
physical bus. This is because common PCI code in pci_scan_slot() only
looks for more functions when pdev->multifunction is set which somewhat
counter intutively is not the case for VFs.

Note that PFs are looked at before their child VFs is guaranteed because
we sort the zpci_list by RID ascending.

Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pci: Use topology ID for multi-function devices

The newly introduced topology ID (TID) field in the CLP Query PCI
Function explicitly identifies groups of PCI functions whose RIDs belong
to the same (sub-)topology. When available use the TID instead of the
PCHID to match zPCI busses/domains for multi-function devices. Note that
currently only a single PCI bus per TID is supported. This change is
required because in future machines the PCHID will not identify a PCI
card but a specific port in the case of some multi-port NICs while from
a PCI point of view the entire card is a subtopology.

Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/pci: Sort PCI functions prior to creating virtual busses

Instead of relying on the observed but not architected firmware behavior
that PCI functions from the same card are listed in ascending RID order
in clp_list_pci() ensure this by sorting. To allow for sorting separate
the initial clp_list_pci() and creation of the virtual PCI busses.

Note that fundamentally in our per-PCI function hotplug design non RID
order of discovery is still possible. For example when the two PFs of
a two port NIC are hotplugged after initial boot and in descending RID
order. In this case the virtual PCI bus would be created by the second
PF using that PF's UID as domain number instead of that of the first PF.
Thus the domain number would then change from the UID of the second PF
to that of the first PF on reboot but there is really nothing we can do
about that since changing domain numbers at runtime seems even worse.
This only impacts the domain number as the RIDs are consistent and thus
even with just the second PF visible it will show up in the correct
position on the virtual bus.

Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

Linux 6.12-rc2