Heiko Carstens [Wed, 6 Nov 2024 10:03:08 +0000 (11:03 +0100)]
s390/cmpxchg: Implement arch_xchg() with arch_try_cmpxchg()
Get rid of the arch_xchg() inline assemblies by converting the inline
assemblies to C functions which make use of arch_try_cmpxchg().
With flag output operand support the generated code is at least as good as
the previous version. Without it is slightly worse, however getting rid of
all the inline assembly code is worth it.
Heiko Carstens [Wed, 6 Nov 2024 10:03:07 +0000 (11:03 +0100)]
s390/cmpxchg: Provide arch_try_cmpxchg()
Since gcc 14 flag output operands are supported also for s390.
Provide an arch_try_cmpxchg() implementation so that all existing
try_cmpxchg() variants provide slightly better code, if compiled
with gcc 14 or newer.
Heiko Carstens [Wed, 6 Nov 2024 10:03:06 +0000 (11:03 +0100)]
s390/cmpxchg: Convert one and two byte case inline assemblies to C
Rewrite __cmpxchg() in order to get rid of the large inline
assemblies. Convert the one and two byte inline assemblies to
C functions.
The generated code of the new implementation is nearly as good or bad as
the old variant, but easier to read.
Note that the new variants are quite close to the generic cmpxchg_emu_u8()
implementation, however a conversion to the generic variant will not follow
since with mm/vmstat.c there is heavy user of one byte cmpxchg(). A not
inlined variant would have a negative performance impact.
Also note that the calls within __arch_cmpxchg() come with rather pointless
"& 0xff..." operations. They exist only to avoid false positive sparse
warnings like "warning: cast truncates bits from constant value ...".
s390/dump: Add firmware sysfs attribute for dump area size
Dump tools from s390-tools such as zipl need to know the correct dump area
size of the machine they run on in order to be able to create valid
standalone dumper images. Therefore, allow it to be obtained through
the new sysfs read-only attribute /sys/firmware/dump/dump_area_size.
Suggested-by: Heiko Carstens <hca@linux.ibm.com> Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Thu, 7 Nov 2024 16:18:44 +0000 (17:18 +0100)]
s390/con3270: Use NULL instead of 0 for pointers
Get rid of sparse warnings:
CHECK drivers/s390/char/con3270.c
drivers/s390/char/con3270.c:531:15: warning: Using plain integer as NULL pointer
drivers/s390/char/con3270.c:749:15: warning: Using plain integer as NULL pointer
Niklas Schnelle [Tue, 5 Nov 2024 16:28:16 +0000 (17:28 +0100)]
s390/pci: Add header guards and includes to internal headers
While pci_iov.h has include guards it doesn't include the necessary
header for struct zpci_dev, pci_bus.h on the other hand lacks even basic
include guards. This isn't only fragile and breaks convention but also
confuses tooling such as clang-analyzer. Add both include guards and the
necessary includes.
Steffen Eiden [Mon, 4 Nov 2024 15:36:09 +0000 (16:36 +0100)]
s390/uvdevice: Support longer secret lists
Enable the list IOCTL to provide lists longer than one page (85 entries).
The list IOCTL now accepts any argument length in page granularity.
It fills the argument up to this length with entries until the list
ends. User space unaware of this enhancement will still receive one page
of data and an uv_rc 0x0100.
Heiko Carstens [Mon, 4 Nov 2024 10:03:41 +0000 (11:03 +0100)]
s390/sparsemem: Provide phys_to_target_node() with CONFIG_NUMA
Add a trival phys_to_target_node() implementation which always returns 0 if
CONFIG_NUMA is enabled, since the s390 NUMA implementation only supports
node 0.
This is similar to memory_add_physaddr_to_nid() in order to avoid runtime
warnings.
Heiko Carstens [Thu, 7 Nov 2024 09:28:10 +0000 (10:28 +0100)]
Merge branch 'virtio-mem' into features
David Hildenbrand says:
====================
virtio-mem: s390 support
Let's finally add s390 support for virtio-mem; my last RFC was sent
4 years ago, and a lot changed in the meantime.
The latest QEMU series is available at [1], which contains some more
details and a usage example on s390 (last patch).
There is not too much in here: The biggest part is querying a new diag(500)
STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".
The last three patches are not strictly required but certainly nice-to-have.
Note that -- in contrast to standby memory -- virtio-mem memory must be
configured to be automatically onlined as soon as hotplugged. The easiest
approach is using the "memhp_default_state=" kernel parameter or by using
proper udev rules. More details can be found at [2].
I have reviving+upstreaming a systemd service to handle configuring
that on my todo list, but for some reason I keep getting distracted ...
I tested various things, including:
* Various memory hotplug/hotunplug combinations
* Device hotplug/hotunplug
* /proc/iomem output
* reboot
* kexec
* kdump: make sure we properly enter the "kdump mode" in the virtio-mem
driver
kdump support for virtio-mem memory on s390 will be sent out separately.
v2 -> v3
* "s390/kdump: make is_kdump_kernel() consistently return "true" in kdump
environments only"
-> Sent out separately [3]
* "s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory
devices"
-> No query function for diag500 for now.
-> Update comment above setup_ident_map_size().
-> Optimize/rewrite diag500_storage_limit() [Heiko]
-> Change handling in detect_physmem_online_ranges [Alexander]
-> Improve documentation.
* "s390/sparsemem: provide memory_add_physaddr_to_nid() with CONFIG_NUMA"
-> Added after testing on systems with CONFIG_NUMA=y
v1 -> v2:
* Document the new diag500 subfunction
* Use "s390" instead of "s390x" consistently
s390/sparsemem: Provide memory_add_physaddr_to_nid() with CONFIG_NUMA
virtio-mem uses memory_add_physaddr_to_nid() to determine the NID to use
for memory it adds.
We currently fallback to the dummy implementation in mm/numa.c with
CONFIG_NUMA, which will end up triggering an undesired pr_info_once():
Unknown online node for memory at 0x100000000, assuming node 0
On s390, we map all cpus and memory to node 0, so let's add a simple
memory_add_physaddr_to_nid() implementation that does exactly that,
but without complaining.
Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
Before that, we were using a section size of 32 MiB on both
architectures.
Now that we have a new mechanism to expose additional memory to a VM --
virtio-mem -- reduce the section size to 128 MiB to allow for more
flexibility and reduce the metadata overhead when dealing with hot(un)plug
granularity smaller than 256 MiB.
128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
base pages switched to 128 MiB as well: it's just big enough on these
architectures to allows for using a huge page (2 MiB) in the vmemmap in
sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
in the direct mapping, while still allowing for small hot(un)plug
granularity.
For s390, we could even switch to a 64 MiB section size, as our huge page
size is 1 MiB: but the smaller the section size, the more sections we'll
have to manage especially on bigger machines. Making it consistent with
x86-64 and arm64 feels like the right thing for now.
Note that the smallest memory hot(un)plug granularity is also limited by
the memory block size, determined by extracting the memory increment
size from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;
therefore, we'll end up with a memory block size of 128 MiB with a
128 MiB section size.
Tested-by: Mario Casquero <mcasquer@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-7-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Now that s390 code is prepared for memory devices that reside above the
maximum storage increment exposed through SCLP, everything is in place
to unlock virtio-mem support.
As virtio-mem in Linux currently supports logically onlining/offlining
memory in pageblock granularity, we have an effective hot(un)plug
granularity of 1 MiB on s390.
As virito-mem adds/removes individual Linux memory blocks (256MB), we
will currently never use gigantic pages in the identity mapping.
It is worth noting that neither storage keys nor storage attributes (e.g.,
data / nodat) are touched when onlining memory blocks, which is good
because we are not supposed to touch these parts for unplugged device
blocks that are logically offline in Linux.
We will currently never initialize storage keys for virtio-mem
memory -- IOW, storage_key_init_range() is never called. It could be added
in the future when plugging device blocks. But as that function
essentially does nothing without modifying the code (changing
PAGE_DEFAULT_ACC), that's just fine for now.
kexec should work as intended and just like on other architectures that
support virtio-mem: we will never place kexec binaries on virtio-mem
memory, and never indicate virtio-mem memory to the 2nd kernel. The
device driver in the 2nd kernel can simply reset the device --
turning all memory unplugged, to then start plugging memory and adding
them to Linux, without causing trouble because the memory is already
used elsewhere.
The special s390 kdump mode, whereby the 2nd kernel creates the ELF
core header, won't currently dump virtio-mem memory. The virtio-mem
driver has a special kdump mode, from where we can detect memory ranges
to dump. Based on this, support for dumping virtio-mem memory can be
added in the future fairly easily.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-5-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390/physmem_info: Query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
To support memory devices under QEMU/KVM, such as virtio-mem,
we have to prepare our kernel virtual address space accordingly and
have to know the highest possible physical memory address we might see
later: the storage limit. The good old SCLP interface is not suitable for
this use case.
In particular, memory owned by memory devices has no relationship to
storage increments, it is always detected using the device driver, and
unaware OSes (no driver) must never try making use of that memory.
Consequently this memory is located outside of the "maximum storage
increment"-indicated memory range.
Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
limit that can exceed the "maximum storage increment", and use the
existing interfaces (i.e., SCLP) to obtain information about the initial
memory that is not owned+managed by memory devices.
If a hypervisor does not support such memory devices, the address exposed
through diag500 STORAGE_LIMIT will correspond to the maximum storage
increment exposed through SCLP.
To teach kdump on s390 to include memory owned by memory devices, there
will be ways to query the relevant memory ranges from the device via a
driver running in special kdump mode (like virtio-mem already implements
to filter /proc/vmcore access so we don't end up reading from unplugged
device blocks).
Update setup_ident_map_size(), to clarify that there can be more than
just online and standby memory.
Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-4-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Claudio Imbrenda [Thu, 31 Oct 2024 12:03:16 +0000 (13:03 +0100)]
s390/kvm: Mask extra bits from program interrupt code
The program interrupt code has some extra bits that are sometimes set
by hardware for various reasons; those bits should be ignored when the
program interrupt number is needed for interrupt handling.
Fixes: 05066cafa925 ("s390/mm/fault: Handle guest-related program interrupts in KVM") Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241031120316.25462-1-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 30 Oct 2024 11:37:18 +0000 (12:37 +0100)]
s390/cpum_sf: Fix and protect memory allocation of SDBs with mutex
Reservation of the PMU hardware is done at first event creation
and is protected by a pair of mutex_lock() and mutex_unlock().
After reservation of the PMU hardware the memory
required for the PMUs the event is to be installed on is
allocated by allocate_buffers() and alloc_sampling_buffer().
This done outside of the mutex protection.
Without mutex protection two or more concurrent invocations of
perf_event_init() may run in parallel.
This can lead to allocation of Sample Data Blocks (SDBs)
multiple times for the same PMU.
Prevent this and protect memory allocation of SDBs by
mutex.
Fixes: 8a6fe8f21ec4 ("s390/cpum_sf: Use refcount_t instead of atomic_t") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Tue, 22 Oct 2024 12:06:00 +0000 (14:06 +0200)]
s390/mm: Get rid of fault type switch statements
With GMAP_FAULT fault type gone, there are only KERNEL_FAULT and
USER_FAULT fault types left. Therefore there is no need for any fault
type switch statements left.
Rename get_fault_type() into is_kernel_fault() and let it return a
boolean value. Change all switch statements to if statements. This
removes quite a bit of code.
Claudio Imbrenda [Tue, 22 Oct 2024 12:05:55 +0000 (14:05 +0200)]
s390/mm/fault: Handle guest-related program interrupts in KVM
Any program interrupt that happens in the host during the execution of
a KVM guest will now short circuit the fault handler and return to KVM
immediately. Guest fault handling (including pfault) will happen
entirely inside KVM.
When sie64a() returns zero, current->thread.gmap_int_code will contain
the program interrupt number that caused the exit, or zero if the exit
was not caused by a host program interrupt.
KVM will now take care of handling all guest faults in vcpu_post_run().
Since gmap faults will not be visible by the rest of the kernel, remove
GMAP_FAULT, the linux fault handlers for secure execution faults, the
exception table entries for the sie instruction, the nop padding after
the sie instruction, and all other references to guest faults from the
s390 code.
Claudio Imbrenda [Tue, 22 Oct 2024 12:05:53 +0000 (14:05 +0200)]
s390/mm/gmap: Refactor gmap_fault() and add support for pfault
When specifying FAULT_FLAG_RETRY_NOWAIT as flag for gmap_fault(), the
gmap fault will be processed only if it can be resolved quickly and
without sleeping. This will be needed for pfault.
Claudio Imbrenda [Tue, 22 Oct 2024 12:05:51 +0000 (14:05 +0200)]
s390/entry: Remove __GMAP_ASCE and use _PIF_GUEST_FAULT again
Now that the guest ASCE is passed as a parameter to __sie64a(),
_PIF_GUEST_FAULT can be used again to determine whether the fault was a
guest or host fault.
Since the guest ASCE will not be taken from the gmap pointer in lowcore
anymore, __GMAP_ASCE can be removed. For the same reason the guest
ASCE needs now to be saved into the cr1 save area unconditionally.
Thomas Richter [Tue, 29 Oct 2024 07:09:15 +0000 (08:09 +0100)]
s390/cpum_sf: Rework call to sf_disable()
Setup_pmc_cpu() function body consists of one single switch
statement with two cases PMC_INIT and PMC_RELEASE.
In both of these cases sf_disable() is invoked to turn off the
CPU Measurement sampling facility.
Move sf_disable() out of the switch statement.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Holger Dengler [Fri, 25 Oct 2024 15:12:48 +0000 (17:12 +0200)]
s390/crypto: Postpone the key split to key conversion
Store the input key material of paes-xts in a single key_blob
structure. The split of the input key material is postponed to the key
conversion. Split the key material only, if the returned protected
keytype requires a second protected key.
For clear key pairs, prepare a clearkey token for each key and convert
them separately to protected keys. Store the concatenated conversion
results as input key in the context. All other input keys are stored
as is.
Holger Dengler [Fri, 25 Oct 2024 15:12:46 +0000 (17:12 +0200)]
s390/crypto: Generalize parameters for key conversion
As a preparation for multiple key tokens in a key_blob structure, use
separate pointer and length parameters for __paes_keyblob2pkey()
instead a pointer to the struct key_blob.
The functionality of the paes module is not affected by this commit.
Holger Dengler [Fri, 25 Oct 2024 15:12:45 +0000 (17:12 +0200)]
s390/crypto: Use module-local structures for protected keys
The paes module uses only AES related structures and constants of the
pkey module. As pkey also supports protected keys other than AES keys,
the structures and size constants of the pkey module may be
changed. Use module-local structures and size constants for paes to
prevent any unwanted side effect by such a change.
The struct pkey_protkey is used to store the protected key blob
together with its length and type. The structure is only used locally,
it is not required for any pkey API call. So define the module-local
structure struct paes_protkey instead.
While at it, unify the names of struct paes_protkey variables on
stack.
The functionality of the paes module is not affected by this commit.
This function de-allocates all sampling data buffers (SDBs) allocated
for that CPU at event initialization. It also clears the
PMU_F_RESERVED bit. The CPU is gone and can not be sampled.
With the event still being active on the removed CPU, the CPU event
hotplug support in kernel performance subsystem triggers the
following function calls on the removed CPU:
to stop and remove the event. During removal of the event, the
sampling device driver tries to read out the remaining samples from
the sample data buffers (SDBs). But they have already been freed
(and may have been re-assigned). This may lead to a use after free
situation in which case the samples are most likely invalid. In the
best case the memory has not been reassigned and still contains
valid data.
Remedy this situation and check if the CPU is still in reserved
state (bit PMU_F_RESERVED set). In this case the SDBs have not been
released an contain valid data. This is always the case when
the event is removed (and no CPU hotplug off occured).
If the PMU_F_RESERVED bit is not set, the SDB buffers are gone.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Thu, 24 Oct 2024 11:33:56 +0000 (13:33 +0200)]
s390/cpum_sf: Fix format string in pr_err()
Fix format string in pr_err() and use the built-in
hexadecimal prefix %#x to display a number with a leading
hexadecimal indicator 0x.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Thu, 24 Oct 2024 11:33:54 +0000 (13:33 +0200)]
s390/cpum_sf: Consistently use goto out for function exit
When the sampling buffer allocation fails in
__hw_perf_event_init(), jump to the end of the function
and return the result. This is consistent with the other
error handling and return conditions in this function.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-By: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Tue, 8 Oct 2024 12:51:10 +0000 (14:51 +0200)]
s390/cpum_sf: Do not re-enable event after deletion
Event delete removes an event from the event list, but common
code invokes the PMU's enable function later on. This happens
in event_sched_out() and leads to the following call sequence:
In cpumsf_pmu_enable() return immediately when the event is not
active. Also remove an unneeded if clause. That if() statement
is only reached when flag PMU_F_IN_USE has been set in
cpumsf_pmu_add(). And this function also sets cpuhw->event
to a valid value.
Remove WARN_ON_ONCE() statement which never triggered.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
This new pkey handler module supports the conversion of
Ultravisor retrievable secrets to protected keys.
The new module pkey-uv.ko is able to retrieve and verify
protected keys backed up by the Ultravisor layer which is
only available within protected execution environment.
The module is only automatically loaded if there is the
UV CPU feature flagged as available. Additionally on module
init there is a check for protected execution environment
and for UV supporting retrievable secrets. Also if the kernel
is not running as a protected execution guest, the module
unloads itself with errno ENODEV.
The pkey UV module currently supports these Ultravisor
secrets and is able to retrieve a protected key for these
UV secret types:
- UV_SECRET_AES_128
- UV_SECRET_AES_192
- UV_SECRET_AES_256
- UV_SECRET_AES_XTS_128
- UV_SECRET_AES_XTS_256
- UV_SECRET_HMAC_SHA_256
- UV_SECRET_HMAC_SHA_512
- UV_SECRET_ECDSA_P256
- UV_SECRET_ECDSA_P384
- UV_SECRET_ECDSA_P521
- UV_SECRET_ECDSA_ED25519
- UV_SECRET_ECDSA_ED448
s390/pkey: Build module name array selectively based on kernel config options
There is a static array of pkey handler kernel module names
used in case the pkey_handler_request_modules() is invoked.
This static array is walked through and if the module is not
already loaded a module_request() is performed.
This patch reworks the code to instead of unconditionally
building up a list of module names into the array, only the
pkey handler modules available based on the current kernel
config options are inserted.
s390/pkey: Fix checkpatch findings in pkey header file
Fix all the complains from checkpatch for the pkey header file:
CHECK: No space is necessary after a cast
+ PKEY_TYPE_CCA_DATA = (__u32) 1,
CHECK: Please use a blank line after function/struct/union/enum declarations
+};
+#define PKEY_GENSECK _IOWR(PKEY_IOCTL_MAGIC, 0x01, struct pkey_genseck)
Rework the verification of protected keys by simple check
for the correct AES wrapping key verification pattern.
A protected key always carries the AES wrapping key
verification pattern within the blob. The old code really
used the protected key for an en/decrypt operation and by
doing so, verified the AES WK VP. But a much simpler and
more generic way is to extract the AES WK VP value from the
key and compare it with AES WK VP from a freshly created
dummy protected key. This also eliminates the limitation to
only be able to verify AES protected keys. With this change
any kind of known protected key can be verified.
The calculation of the length of a protected key based on
the protected key type is scattered over certain places within
the pkey code. By introducing a new inline function
pkey_keytype_to_size() this can be centralized and the calling
code can be reduced and simplified.
With this also comes a slight rework of the generation of
protected keys. Now the pkey_pckmo module is able to generate
all but ECC keys.
Steffen Eiden [Thu, 24 Oct 2024 06:26:38 +0000 (08:26 +0200)]
s390/uv: Retrieve UV secrets sysfs support
Reflect the updated content in the query information UVC to the sysfs at
/sys/firmware/query
* new UV-query sysfs entry for the maximum number of retrievable
secrets the UV can store for one secure guest.
* new UV-query sysfs entry for the maximum number of association
secrets the UV can store for one secure guest.
* max_secrets contains the sum of max association and max retrievable
secrets.
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241024062638.1465970-7-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Steffen Eiden [Thu, 24 Oct 2024 06:26:35 +0000 (08:26 +0200)]
s390/uvdevice: Add Retrieve Secret IOCTL
Add a new IOCL number to support the new Retrieve Secret UVC for
user-space.
User-space provides the index of the secret (u16) to retrieve.
The uvdevice calls the Retrieve Secret UVC and copies the secret into
the provided buffer if it fits. To get the secret type, index, and size
user-space needs to call the List UVC first.
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241024062638.1465970-4-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Steffen Eiden [Thu, 24 Oct 2024 08:41:07 +0000 (10:41 +0200)]
s390/uv: Retrieve UV secrets support
Provide a kernel API to retrieve secrets from the UV secret store.
Add two new functions:
* `uv_get_secret_metadata` - get metadata for a given secret identifier
* `uv_retrieve_secret` - get the secret value for the secret index
With those two functions one can extract the secret for a given secret
id, if the secret is retrievable.
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241024084107.2418186-1-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Steffen Eiden [Wed, 23 Oct 2024 07:55:28 +0000 (09:55 +0200)]
s390/uv: Provide host-key hashes in sysfs
Utilize the new Query Ultravisor Keys UVC to give user space the
information which host-keys are installed on the system.
Create a new sysfs directory 'firmware/uv/keys' that contains the hash
of the host-key and the backup host-key of that system. Additionally,
the file 'all' contains the response from the UVC possibly containing
more key-hashes than currently known.
Steffen Eiden [Tue, 15 Oct 2024 11:39:39 +0000 (13:39 +0200)]
s390/uv: Refactor uv-sysfs creation
Streamline the sysfs generation to make it more extensible.
Add a function to create a sysfs entry in the uv-sysfs dir.
Use this function for the query directory.
Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241015113940.3088249-2-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Mete Durlu [Wed, 23 Oct 2024 12:11:15 +0000 (14:11 +0200)]
s390/netiucv: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the netiucv code.
Mete Durlu [Wed, 23 Oct 2024 12:11:14 +0000 (14:11 +0200)]
s390/vfio-ap: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the vfio_ap code.
Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Mete Durlu [Wed, 23 Oct 2024 12:11:13 +0000 (14:11 +0200)]
s390/vmur: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/vmur code.
Mete Durlu [Wed, 23 Oct 2024 12:11:12 +0000 (14:11 +0200)]
s390/sclp_cpi: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/sclp_cpi_sys code.
Mete Durlu [Wed, 23 Oct 2024 12:11:11 +0000 (14:11 +0200)]
s390/sclp_ocf: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/sclp_ocf code.
Mete Durlu [Wed, 23 Oct 2024 12:11:10 +0000 (14:11 +0200)]
s390/vmlogrdr: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/vmlogrdr code.
Mete Durlu [Wed, 23 Oct 2024 12:11:09 +0000 (14:11 +0200)]
s390/tape: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the char/tape_core code.
Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Mete Durlu [Wed, 23 Oct 2024 12:11:08 +0000 (14:11 +0200)]
s390/dcssblk: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the block/dcssblk code.
Mete Durlu [Wed, 23 Oct 2024 12:11:07 +0000 (14:11 +0200)]
s390/cio/scm: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/scm code.
Mete Durlu [Wed, 23 Oct 2024 12:11:06 +0000 (14:11 +0200)]
s390/cio/css: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/css code.
Mete Durlu [Wed, 23 Oct 2024 12:11:05 +0000 (14:11 +0200)]
s390/cio/ccwgroup: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/ccwgroup code.
Mete Durlu [Wed, 23 Oct 2024 12:11:04 +0000 (14:11 +0200)]
s390/cio/cmf: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/cmf code.
Mete Durlu [Wed, 23 Oct 2024 12:11:03 +0000 (14:11 +0200)]
s390/cio/device: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/device code.
Mete Durlu [Wed, 23 Oct 2024 12:11:02 +0000 (14:11 +0200)]
s390/cio/chp: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the cio/chp code.
Mete Durlu [Wed, 23 Oct 2024 12:11:01 +0000 (14:11 +0200)]
scsi: zfcp: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the zfcp code.
Signed-off-by: Mete Durlu <meted@linux.ibm.com> Tested-by: Benjamin Block <bblock@linux.ibm.com> Acked-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Mete Durlu [Wed, 23 Oct 2024 12:11:00 +0000 (14:11 +0200)]
s390/crypto: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the s390/crypto code.
Mete Durlu [Wed, 23 Oct 2024 12:10:59 +0000 (14:10 +0200)]
s390/ipl: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/ipl code.
Mete Durlu [Wed, 23 Oct 2024 12:10:58 +0000 (14:10 +0200)]
s390/nospec: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/nospec-sysfs code.
Mete Durlu [Wed, 23 Oct 2024 12:10:57 +0000 (14:10 +0200)]
s390/perf_event: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/perf_event code.
Mete Durlu [Wed, 23 Oct 2024 12:10:56 +0000 (14:10 +0200)]
s390/smp: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/smp code.
Mete Durlu [Wed, 23 Oct 2024 12:10:55 +0000 (14:10 +0200)]
s390/time: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/time code.
Mete Durlu [Wed, 23 Oct 2024 12:10:54 +0000 (14:10 +0200)]
s390/topology: Switch over to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/topology code.
s390/kdump: Provide is_kdump_kernel() implementation
s390 sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during
setup_arch() to deactivate the "elfcorehdr= kernel" parameter,
resulting in is_kdump_kernel() returning "false".
During vmcore_init()->elfcorehdr_alloc(), if on a dump kernel and
allocation succeeded, elfcorehdr_addr will be set to a valid address
and is_kdump_kernel() will consequently return "true".
is_kdump_kernel() should return a consistent result during all boot
stages, and properly return "true" if in a kdump environment - just
like it is done on powerpc where "false" is indicated in fadump
environments, as added in commit b098f1c32365 ("powerpc/fadump: make
is_kdump_kernel() return false when fadump is active").
Similarly provide a custom is_kdump_kernel() implementation that will only
return "true" in kdump environments, and will do so consistently during
boot.
kernel_page_present() was intentionally not implemented when adding
ARCH_HAS_SET_DIRECT_MAP support, since it was only used for suspend/resume
which is not supported anymore on s390.
A new bpf use case led to a compile error specific to s390. Even though
this specific use case went away implement kernel_page_present(), so that
the API is complete and potential future users won't run into this problem.
Reported-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://lore.kernel.org/all/045de961-ac69-40cc-b141-ab70ec9377ec@iogearbox.net Fixes: 0490d6d7ba0a ("s390/mm: enable ARCH_HAS_SET_DIRECT_MAP") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390/cio: Do not unregister the subchannel based on DNV
Starting with commit 2297791c92d0 ("s390/cio: dont unregister
subchannel from child-drivers"), CIO does not unregister subchannels
when the attached device is invalid or unavailable. Instead, it
allows subchannels to exist without a connected device. However, if
the DNV value is 0, such as, when all the CHPIDs of a subchannel are
configured in standby state, the subchannel is unregistered, which
contradicts the current subchannel specification.
Update the logic so that subchannels are not unregistered based
on the DNV value. Also update the SCHIB information even if the
DNV bit is zero.
Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Julian Vetter [Thu, 10 Oct 2024 13:01:00 +0000 (15:01 +0200)]
s390/pci: Align prototypes of zpci IO memcpy functions
The generic memcpy_{from,to}io and memset_io functions have a different
prototype than the zpci_memcpy_{from,to}io and zpci_memset_io functions.
But in driver code zpci functions are used as IO memcpy directly. So,
align their prototypes.
Halil Pasic [Mon, 7 Oct 2024 20:10:30 +0000 (22:10 +0200)]
s390/virtio_ccw: Fix dma_parm pointer not set up
At least since commit 334304ac2bac ("dma-mapping: don't return errors
from dma_set_max_seg_size") setting up device.dma_parms is basically
mandated by the DMA API. As of now Channel (CCW) I/O in general does not
utilize the DMA API, except for virtio. For virtio-ccw however the
common virtio DMA infrastructure is such that most of the DMA stuff
hinges on the virtio parent device, which is a CCW device.
So lets set up the dma_parms pointer for the CCW parent device and hope
for the best!
Fixes: 334304ac2bac ("dma-mapping: don't return errors from dma_set_max_seg_size") Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com> Link: https://lore.kernel.org/r/20241007201030.204028-1-pasic@linux.ibm.com Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390/facilities: Fix warning about shadow of global variable
Compiling the kernel with clang W=2 produces a warning that the
parameter declarations in some routines would shadow the definition of
the global variable stfle_fac_list. Address this warning by renaming the
parameters to fac_list.
Fixes: 17e89e1340a3 ("s390/facilities: move stfl information from lowcore to global data") Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert over two
left-over uses in the s390 PCI code.
Thomas Richter [Mon, 7 Oct 2024 05:45:16 +0000 (07:45 +0200)]
s390/cpum_sf: Set bit PMU_F_ENABLED enabled after lpp() invocation
Set PMU enabled bit after lpp() call. This ensures the proper
task PID is loaded by the llp instruction into Program Parameter
register from where it is copied to each sampling data buffer
(SDB) sample entry after the sampling was enabled.
The barrier() instruction is not needed as cpumsf_pmu_enable()
changes a CPU specific variable. Only the CPU that task is
running on changes structure members.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Ensure that VFs used in isolation, that is with their parent PF not
visible to the configuration but with their RID exposed, are treated
compatibly with existing isolated VF use cases without exposed RID
including RoCE Express VFs. This allows creating configurations where
one LPAR manages PFs while their child VFs are used by other LPARs. This
gives the LPAR managing the PFs a role analogous to that of the
hypervisor in a typical use case of passing child VFs to guests.
Instead of creating a multifunction struct zpci_bus whenever a PCI
function with RID exposed is discovered only create such a bus for
configured physical functions and only consider multifunction busses
when searching for an existing bus. Additionally only set zdev->devfn to
the devfn part of the RID once the function is added to a multifunction
bus.
This also fixes probing of more than 7 such isolated VFs from the same
physical bus. This is because common PCI code in pci_scan_slot() only
looks for more functions when pdev->multifunction is set which somewhat
counter intutively is not the case for VFs.
Note that PFs are looked at before their child VFs is guaranteed because
we sort the zpci_list by RID ascending.
s390/pci: Use topology ID for multi-function devices
The newly introduced topology ID (TID) field in the CLP Query PCI
Function explicitly identifies groups of PCI functions whose RIDs belong
to the same (sub-)topology. When available use the TID instead of the
PCHID to match zPCI busses/domains for multi-function devices. Note that
currently only a single PCI bus per TID is supported. This change is
required because in future machines the PCHID will not identify a PCI
card but a specific port in the case of some multi-port NICs while from
a PCI point of view the entire card is a subtopology.
s390/pci: Sort PCI functions prior to creating virtual busses
Instead of relying on the observed but not architected firmware behavior
that PCI functions from the same card are listed in ascending RID order
in clp_list_pci() ensure this by sorting. To allow for sorting separate
the initial clp_list_pci() and creation of the virtual PCI busses.
Note that fundamentally in our per-PCI function hotplug design non RID
order of discovery is still possible. For example when the two PFs of
a two port NIC are hotplugged after initial boot and in descending RID
order. In this case the virtual PCI bus would be created by the second
PF using that PF's UID as domain number instead of that of the first PF.
Thus the domain number would then change from the UID of the second PF
to that of the first PF on reboot but there is really nothing we can do
about that since changing domain numbers at runtime seems even worse.
This only impacts the domain number as the RIDs are consistent and thus
even with just the second PF visible it will show up in the correct
position on the virtual bus.