Christophe Leroy [Tue, 17 Nov 2020 05:07:58 +0000 (05:07 +0000)]
powerpc: Retire e200 core (mpc555x processor)
There is no defconfig selecting CONFIG_E200, and no platform.
e200 is an earlier version of booke, a predecessor of e500,
with some particularities like an unified cache instead of both an
instruction cache and a data cache.
Christophe Leroy [Thu, 22 Oct 2020 09:29:21 +0000 (09:29 +0000)]
powerpc: Fix update form addressing in inline assembly
In several places, inline assembly uses the "%Un" modifier
to enable the use of instruction with update form addressing,
but the associated "<>" constraint is missing.
As mentioned in previous patch, this fails with gcc 4.9, so
"<>" can't be used directly.
Use UPD_CONSTR macro everywhere %Un modifier is used.
powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
The placeholder for instruction selection should use the second
argument's operand, which is %1, not %0. This could generate incorrect
assembly code if the memory addressing of operand %0 is a different
form from that of operand %1.
Also remove the %Un placeholder because having %Un placeholders
for two operands which are based on the same local var (ptep) doesn't
make much sense. By the way, it doesn't change the current behaviour
because "<>" constraint is missing for the associated "=m".
[chleroy: revised commit log iaw segher's comments and removed %U0]
powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU
We execute certain NPU2 setup code (such as mapping an LPID to a device
in NPU2) unconditionally if an Nvlink bridge is detected. However this
cannot succeed on POWER8NVL machines and errors appear in dmesg. This is
harmless as skiboot returns an error and the only place we check it is
vfio-pci but that code does not get called on P8+ either.
This adds a check if pnv_npu2_xxx helpers are called on a machine with
NPU2 which initializes pnv_phb::npu in pnv_npu2_init();
pnv_phb::npu==NULL on POWER8/NVL (Naples).
While at this, fix NULL derefencing in pnv_npu_peers_take_ownership/
pnv_npu_peers_release_ownership which occurs when GPUs on mentioned P8s
cause EEH which happens if "vfio-pci" disables devices using
the D3 power state; the vfio-pci's disable_idle_d3 module parameter
controls this and must be set on Naples. The EEH handling clears
the entire pnv_ioda_pe struct in pnv_ioda_free_pe() hence
the NULL derefencing. We cannot recover from that but at least we stop
crashing.
powernv/pci: Print an error when device enable is blocked
If the platform decides to block enabling the device nothing is printed
currently. This can lead to some confusion since the dmesg output will
usually print an error with no context e.g.
e1000e: probe of 0022:01:00.0 failed with error -22
This shouldn't be spammy since pci_enable_device() already prints a
messages when it succeeds.
powerpc/pci: Remove LSI mappings on device teardown
When a passthrough IO adapter is removed from a pseries machine using hash
MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
to clear all page table entries related to the adapter. If some are still
present, the RTAS call which isolates the PCI slot returns error 9001
"valid outstanding translations" and the removal of the IO adapter fails.
This is because when the PHBs are scanned, Linux maps automatically the
INTx interrupts in the Linux interrupt number space but these are never
removed.
This problem can be fixed by adding the corresponding unmap operation when
the device is removed. There's no pcibios_* hook for the remove case, but
the same effect can be achieved using a bus notifier.
Because INTx are shared among PHBs (and potentially across the system),
this adds tracking of virq to unmap them only when the last user is gone.
[aik: added refcounter]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201202005222.5477-1-aik@ozlabs.ru
It's sometimes handy to have a config that boots a bit like a system
under secure boot (forcing lockdown=integrity, without needing any
extra stuff like a command line option).
This config file allows that, and also turns on a few assorted security
and hardening options for good measure.
Christophe Leroy [Wed, 25 Nov 2020 07:10:52 +0000 (07:10 +0000)]
powerpc/32s: Use SPRN_SPRG_SCRATCH2 in DSI prolog
Use SPRN_SPRG_SCRATCH2 as an alternative scratch register in
the early part of DSI prolog in order to avoid clobbering
SPRN_SPRG_SCRATCH0/1 used by other prologs.
The 603 doesn't like a jump from DataLoadTLBMiss to the 10 nops
that are now in the beginning of DSI exception as a result of
the feature section. To workaround this, add a jump as alternative.
It also avoids fetching 10 nops for nothing.
Christophe Leroy [Wed, 25 Nov 2020 07:10:46 +0000 (07:10 +0000)]
powerpc/32s: Always map kernel text and rodata with BATs
Since commit 2b279c0348af ("powerpc/32s: Allow mapping with BATs with
DEBUG_PAGEALLOC"), there is no real situation where mapping without
BATs is required.
In order to simplify memory handling, always map kernel text
and rodata with BATs even when "nobats" kernel parameter is set.
Also fix the 603 TLB miss exceptions that don't require anymore
kernel page table if DEBUG_PAGEALLOC.
Add invalidate_range mmu notifier, when required (ATSD access of MMIO
registers is available), to initiate TLB invalidation commands.
For the time being, the ATSD0 set of registers is used by default.
The pasid and bdf values have to be configured in the Process Element
Entry.
The PEE must be set up to match the BDF/PASID of the AFU.
To complete the MMIO based mechanism, the fields: PASID, bus, device and
function of the Process Element Entry have to be filled. (See
OpenCAPI Power Platform Architecture document)
Hypervisor Process Element Entry
Word
0 1 .... 7 8 ...... 12 13 ..15 16.... 19 20 ........... 31
0 OSL Configuration State (0:31)
1 OSL Configuration State (32:63)
2 PASID | Reserved
3 Bus | Device |Function | Reserved
4 Reserved
5 Reserved
6 ....
When a TLB Invalidate is required for the Logical Partition, the following
sequence has to be performed:
1. Load MMIO ATSD AVA register with the necessary value, if required.
2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
command.
3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
has been completed.
ocxl: Assign a register set to a Logical Partition
Platform specific function to assign a register set to a Logical Partition.
The "ibm,mmio-atsd" property, provided by the firmware, contains the 16
base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO
registers (XTS MMIO ATSDx LPARID/AVA/launch/status register).
For the time being, the ATSD0 set of registers is used by default.
Athira Rajeev [Thu, 26 Nov 2020 16:54:44 +0000 (11:54 -0500)]
powerpc/perf: MMCR0 control for PMU registers under PMCC=00
PowerISA v3.1 introduces new control bit (PMCCEXT) for restricting
access to group B PMU registers in problem state when
MMCR0 PMCC=0b00. In problem state and when MMCR0 PMCC=0b00,
setting the Monitor Mode Control Register bit 54 (MMCR0 PMCCEXT),
will restrict read permission on Group B Performance Monitor
Registers (SIER, SIAR, SDAR and MMCR1). When this bit is set to zero,
group B registers will be readable. In other platforms (like power9),
the older behaviour is retained where group B PMU SPRs are readable.
Patch adds support for MMCR0 PMCCEXT bit in power10 by enabling
this bit during boot and during the PMU event enable/disable callback
functions.
Athira Rajeev [Thu, 26 Nov 2020 16:54:42 +0000 (11:54 -0500)]
powerpc/perf: Fix to update generic event codes for power10
Fix the event code for events: branch-instructions (to PM_BR_FIN),
branch-misses (to PM_MPRED_BR_FIN) and cache-misses (to
PM_LD_DEMAND_MISS_L1_FIN) for power10 PMU. Update the
list of generic events with this modified event code.
Athira Rajeev [Thu, 26 Nov 2020 16:54:41 +0000 (11:54 -0500)]
powerpc/perf: Add generic and cache event list for power10 DD1
There are event code updates for some of the generic events
and cache events for power10. Inorder to maintain the current
event codes work with DD1 also, create a new array of generic_events,
cache_events and pmu_attr_groups with suffix _dd1, example,
power10_events_attr_dd1. So that further updates to event codes
can be made in the original list, ie, power10_events_attr. Update the
power10 pmu init code to pick the dd1 list while registering
the power PMU, based on the pvr (Processor Version Register) value.
Athira Rajeev [Thu, 26 Nov 2020 16:54:40 +0000 (11:54 -0500)]
powerpc/perf: Fix the PMU group constraints for threshold events in power10
The PMU group constraints mask for threshold events covers
all thresholding bits which includes threshold control value
(start/stop), select value as well as thresh_cmp value (MMCRA[9:18].
In power9, thresh_cmp bits were part of the event code. But in case
of power10, thresh_cmp bits are not part of event code due to
inclusion of MMCR3 bits. Hence thresh_cmp is not valid for
group constraints for power10.
Fix the PMU group constraints checking for threshold events in
power10 by using constraint mask and value for only threshold control
and select bits.
Athira Rajeev [Thu, 26 Nov 2020 16:54:39 +0000 (11:54 -0500)]
powerpc/perf: Update the PMU group constraints for l2l3 events in power10
In Power9, L2/L3 bus events are always available as a
"bank" of 4 events. To obtain the counts for any of the
l2/l3 bus events in a given bank, the user will have to
program PMC4 with corresponding l2/l3 bus event for that
bank.
Commit 59029136d750 ("powerpc/perf: Add constraints for power9 l2/l3 bus events")
enforced this rule in Power9. But this is not valid for
Power10, since in Power10 Monitor Mode Control Register2
(MMCR2) has bits to configure l2/l3 event bits. Hence remove
this PMC4 constraint check from power10.
Since the l2/l3 bits in MMCR2 are not per-pmc, patch handles
group constrints checks for l2/l3 bits in MMCR2.
Athira Rajeev [Thu, 26 Nov 2020 16:54:38 +0000 (11:54 -0500)]
powerpc/perf: Fix to update radix_scope_qual in power10
power10 uses bit 9 of the raw event code as RADIX_SCOPE_QUAL.
This bit is used for enabling the radix process events.
Patch fixes the PMU counter support functions to program bit
18 of MMCR1 ( Monitor Mode Control Register1 ) with the
RADIX_SCOPE_QUAL bit value. Since this field is not per-pmc,
add this to PMU group constraints to make sure events in a
group will have same bit value for this field. Use bit 21 as
constraint bit field for radix_scope_qual. Patch also updates
the power10 raw event encoding layout information, format field
and constraints bit layout to include the radix_scope_qual bit.
powerpc/book3s64/pkeys: Optimize KUAP and KUEP feature disabled case
If FTR_BOOK3S_KUAP is disabled, kernel will continue to run with the same AMR
value with which it was entered. Hence there is a high chance that
we can return without restoring the AMR value. This also helps the case
when applications are not using the pkey feature. In this case, different
applications will have the same AMR values and hence we can avoid restoring
AMR in this case too.
Also avoid isync() if not really needed.
Do the same for IAMR.
null-syscall benchmark results:
With smap/smep disabled:
Without patch:
957.95 ns 2778.17 cycles
With patch:
858.38 ns 2489.30 cycles
With smap/smep enabled:
Without patch:
1017.26 ns 2950.36 cycles
With patch:
1021.51 ns 2962.44 cycles
powerpc/book3s64/kuap: Restrict access to userspace based on userspace AMR
If an application has configured address protection such that read/write is
denied using pkey even the kernel should receive a FAULT on accessing the same.
This patch use user AMR value stored in pt_regs.amr to achieve the same.
powerpc/book3s64/pkeys: Inherit correctly on fork.
Child thread.kuap value is inherited from the parent in copy_thread_tls. We still
need to make sure when the child returns from a fork in the kernel we start with the kernel
default AMR value.
powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel
This prepare kernel to operate with a different value than userspace AMR/IAMR.
For this, AMR/IAMR need to be saved and restored on entry and return from the
kernel.
With KUAP we modify kernel AMR when accessing user address from the kernel
via copy_to/from_user interfaces. We don't need to modify IAMR value in
similar fashion.
If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering
kernel from userspace. If not we can assume that AMR/IAMR is not modified
from userspace.
We need to save AMR if we have MMU_FTR_BOOK3S_KUAP feature enabled and we are
interrupted within kernel. This is required so that if we get interrupted
within copy_to/from_user we continue with the right AMR value.
If we hae MMU_FTR_BOOK3S_KUEP enabled we need to restore IAMR on
return to userspace beause kernel will be running with a different
IAMR value.
In later patches during exec, we would like to access default regs.amr to
control access to the user mapping. Having thread.regs set early makes the
code changes simpler.
powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation
This patch updates kernel hash page table entries to use storage key 3
for its mapping. This implies all kernel access will now use key 3 to
control READ/WRITE. The patch also prevents the allocation of key 3 from
userspace and UAMOR value is updated such that userspace cannot modify key 3.
powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP and MMU_FTR_KUEP
This is in preparation to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.
MMU_FTR_KUEP is renamed to MMU_FTR_BOOK3S_KUEP to indicate the feature
is only relevant to BOOK3S_64
powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init
This patch consolidates UAMOR update across pkey, kuap and kuep features.
The boot cpu initialize UAMOR via pkey init and both radix/hash do the
secondary cpu UAMOR init in early_init_mmu_secondary.
We don't check for mmu_feature in radix secondary init because UAMOR
is a supported SPRN with all CPUs supporting radix translation.
The old code was not updating UAMOR if we had smap disabled and smep enabled.
This change handles that case.
powerpc/book3s64/kuap/kuep: Add PPC_PKEY config on book3s64
The config CONFIG_PPC_PKEY is used to select the base support that is
required for PPC_MEM_KEYS, KUAP, and KUEP. Adding this dependency
reduces the code complexity(in terms of #ifdefs) and enables us to
move some of the initialization code to pkeys.c
With power7 and above we expect the cpu to support keys. The
number of keys are firmware controlled based on device tree.
PR KVM do not expose key details via device tree. Hence when running with PR KVM
we do run with MMU_FTR_KEY support disabled. But we can still
get updates on UAMOR. Hence ignore access to them and for mfstpr return
0 indicating no AMR/IAMR update is no allowed.
Nicholas Piggin [Sat, 28 Nov 2020 07:07:27 +0000 (17:07 +1000)]
powerpc/64s: Remove "Host" from MCE logging
"Host" caused machine check is printed when the kernel sees a MCE
hit in this kernel or userspace, and "Guest" if it hit one of its
guests. This is confusing when a guest kernel handles a hypervisor-
delivered MCE, it also prints "Host".
Just remove "Host". "Guest" is adequate to make the distinction.
Harmless HMI errors can be triggered by guests in some cases, and don't
contain much useful information anyway. Ratelimit these to avoid
flooding the console/logs.
Nicholas Piggin [Sat, 28 Nov 2020 07:07:23 +0000 (17:07 +1000)]
KVM: PPC: Book3S HV: Don't attempt to recover machine checks for FWNMI enabled guests
Guests that can deal with machine checks would actually prefer the
hypervisor not to try recover for them. For example if SLB multi-hits
are recovered by the hypervisor by clearing the SLB then the guest
will not be able to log the contents and debug its programming error.
If guests don't register for FWNMI, they may not be so capable and so
the hypervisor will continue to recover for those.
Nicholas Piggin [Sat, 28 Nov 2020 07:07:22 +0000 (17:07 +1000)]
powerpc/64s/powernv: Allow KVM to handle guest machine check details
KVM has strategies to perform machine check recovery. If a MCE hits
in a guest, have the low level handler just decode and save the MCE
but not try to recover anything, so KVM can deal with it.
The host does not own SLBs and does not need to report the SLB state
in case of a multi-hit for example, or know about the virtual memory
map of the guest.
UE and memory poisoning of guest pages in the host is one thing that
is possibly not completely robust at the moment, but this too needs
to go via KVM (possibly via the guest and back out to host via hcall)
rather than being handled at a low level in the host handler.
Uwe Kleine-König [Thu, 26 Nov 2020 16:59:50 +0000 (17:59 +0100)]
powerpc/ps3: make system bus's remove and shutdown callbacks return void
The driver core ignores the return value of struct device_driver::remove
because there is only little that can be done. For the shutdown callback
it's ps3_system_bus_shutdown() which ignores the return value.
To simplify the quest to make struct device_driver::remove return void,
let struct ps3_system_bus_driver::remove return void, too. All users
already unconditionally return 0, this commit makes it obvious that
returning an error code is a bad idea and ensures future users behave
accordingly.
Uwe Kleine-König [Thu, 26 Nov 2020 16:59:49 +0000 (17:59 +0100)]
ALSA: ppc: drop if block with always false condition
The remove callback is only called for devices that were probed
successfully before. As the matching probe function cannot complete
without error if dev->match_id != PS3_MATCH_ID_SOUND, we don't have to
check this here.
powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()
If its a shared LPAR but not a KVM guest, then see if the vCPU is
related to the calling vCPU. On PowerVM, only cores can be preempted.
So if one vCPU is a non-preempted state, we can decipher that all
other vCPUs sharing the same core are in non-preempted state.
Performance results:
$ perf stat -r 5 -a perf bench sched pipe -l 10000000 (lesser time is better)
powerpc: Reintroduce is_kvm_guest() as a fast-path check
Introduce a static branch that would be set during boot if the OS
happens to be a KVM guest. Subsequent checks to see if we are on KVM
will rely on this static branch. This static branch would be used in
vcpu_is_preempted() in a subsequent patch.
powerpc: Rename is_kvm_guest() to check_kvm_guest()
We want to reuse the is_kvm_guest() name in a subsequent patch but
with a new body. Hence rename is_kvm_guest() to check_kvm_guest(). No
additional changes.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: kernel test robot <lkp@intel.com> # int -> bool fix
[mpe: Fold in fix from lkp to use true/false not 0/1] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201202050456.164005-3-srikar@linux.vnet.ibm.com
Nicholas Piggin [Sat, 7 Nov 2020 02:33:05 +0000 (12:33 +1000)]
powerpc: show registers when unwinding interrupt frames
It's often useful to know the register state for interrupts in
the stack frame. In the below example (with this patch applied),
the important information is the state of the page fault.
A blatant case like this probably rather should have the page
fault regs passed down to the warning, but quite often there are
less obvious cases where an interrupt shows up that might give
some more clues.
The downside is longer and more complex bug output.
Athira Rajeev [Tue, 1 Dec 2020 09:28:00 +0000 (04:28 -0500)]
powerpc/perf: Invoke per-CPU variable access with disabled interrupts
The power_pmu_event_init() callback access per-cpu variable
(cpu_hw_events) to check for event constraints and Branch Stack
(BHRB). Current usage is to disable preemption when accessing the
per-cpu variable, but this does not prevent timer callback from
interrupting event_init. Fix this by using 'local_irq_save/restore'
to make sure the code path is invoked with disabled interrupts.
This change is tested in mambo simulator to ensure that, if a timer
interrupt comes in during the per-cpu access in event_init, it will be
soft masked and replayed later. For testing purpose, introduced a
udelay() in power_pmu_event_init() to make sure a timer interrupt arrives
while in per-cpu variable access code between local_irq_save/resore.
As expected the timer interrupt was replayed later during local_irq_restore
called from power_pmu_event_init. This was confirmed by adding
breakpoint in mambo and checking the backtrace when timer_interrupt
was hit.
Patch fixes uninitialized variable warning in bad_accesses test
which causes the selftests build to fail in older distibutions
bad_accesses.c: In function ‘bad_access’:
bad_accesses.c:52:9: error: ‘x’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
printf("Bad - no SEGV! (%c)\n", x);
^
cc1: all warnings being treated as errors
Daniel Axtens [Tue, 1 Dec 2020 14:43:44 +0000 (01:43 +1100)]
powerpc/feature-fixups: use a semicolon rather than a comma
In a bunch of our security flushes, we use a comma rather than
a semicolon to 'terminate' an assignment. Nothing breaks, but
checkpatch picks it up if you copy it into another flush.
Jordan Niethe [Tue, 1 Dec 2020 00:52:03 +0000 (11:52 +1100)]
powerpc: Allow relative pointers in bug table entries
This enables GENERIC_BUG_RELATIVE_POINTERS on Power so that 32-bit
offsets are stored in the bug entries rather than 64-bit pointers.
While this doesn't save space for 32-bit machines, use it anyway so
there is only one code path.
Jordan Niethe [Mon, 30 Nov 2020 00:44:04 +0000 (11:44 +1100)]
powerpc/64: Fix an EMIT_BUG_ENTRY in head_64.S
Commit 63ce271b5e37 ("powerpc/prom: convert PROM_BUG() to standard
trap") added an EMIT_BUG_ENTRY for the trap after the branch to
start_kernel(). The EMIT_BUG_ENTRY was for the address "0b", however the
trap was not labeled with "0". Hence the address used for bug is in
relative_toc() where the previous "0" label is. Label the trap as "0" so
the correct address is used.
The VDSO datapage and the text pages are always located immediately
next to each other, so it can be hardcoded without an indirection
through __kernel_datapage_offset