Naveen N Rao [Mon, 19 Jun 2023 09:47:34 +0000 (15:17 +0530)]
powerpc/ftrace: Add support for -fpatchable-function-entry
GCC v13.1 updated support for -fpatchable-function-entry on ppc64le to
emit nops after the local entry point, rather than before it. This
allows us to use this in the kernel for ftrace purposes. A new script is
added under arch/powerpc/tools/ to help detect if nops are emitted after
the function local entry point, or before the global entry point.
With -fpatchable-function-entry, we no longer have the profiling
instructions generated at function entry, so we only need to validate
the presence of two nops at the ftrace location in ftrace_init_nop(). We
patch the preceding instruction with 'mflr r0' to match the
-mprofile-kernel ABI for subsequent ftrace use.
This changes the profiling instructions used on ppc32. The default -pg
option emits an additional 'stw' instruction after 'mflr r0' and before
the branch to _mcount 'bl _mcount'. This is very similar to the original
-mprofile-kernel implementation on ppc64le, where an additional 'std'
instruction was used to save LR to its save location in the caller's
stackframe. Subsequently, this additional store was removed in later
compiler versions for performance reasons. The same reasons apply for
ppc32 so we only patch in a 'mflr r0'.
Naveen N Rao [Mon, 19 Jun 2023 09:47:33 +0000 (15:17 +0530)]
powerpc/ftrace: Implement ftrace_replace_code()
Implement ftrace_replace_code() to consolidate logic from the different
ftrace patching routines: ftrace_make_nop(), ftrace_make_call() and
ftrace_modify_call(). Note that ftrace_make_call() is still required
primarily to handle patching modules during their load time. The other
two routines should no longer be called.
This lays the groundwork to enable better control in patching ftrace
locations, including the ability to nop-out preceding profiling
instructions when ftrace is disabled.
Naveen N Rao [Mon, 19 Jun 2023 09:47:31 +0000 (15:17 +0530)]
powerpc/ftrace: Simplify ftrace_modify_call()
Now that we validate the ftrace location during initialization in
ftrace_init_nop(), we can simplify ftrace_modify_call() to patch-in the
updated branch instruction without worrying about the instructions
surrounding the ftrace location. Note that we continue to ensure we
have the expected branch instruction at the ftrace location before
patching it with the updated branch destination.
Naveen N Rao [Mon, 19 Jun 2023 09:47:30 +0000 (15:17 +0530)]
powerpc/ftrace: Simplify ftrace_make_call()
Now that we validate the ftrace location during initialization in
ftrace_init_nop(), we can simplify ftrace_make_call() to replace the nop
without worrying about the instructions surrounding the ftrace location.
Note that we continue to ensure that we have a nop at the ftrace
location before patching it.
Naveen N Rao [Mon, 19 Jun 2023 09:47:29 +0000 (15:17 +0530)]
powerpc/ftrace: Simplify ftrace_make_nop()
Now that we validate the ftrace location during initialization in
ftrace_init_nop(), we can simplify ftrace_make_nop() to patch-in the nop
without worrying about the instructions surrounding the ftrace location.
Note that we continue to ensure that we have a bl to
ftrace_[regs_]caller at the ftrace location before nop-ing it out.
Naveen N Rao [Mon, 19 Jun 2023 09:47:28 +0000 (15:17 +0530)]
powerpc/ftrace: Add separate ftrace_init_nop() with additional validation
Currently, we validate instructions around the ftrace location every
time we have to enable/disable ftrace. Introduce ftrace_init_nop() to
instead perform all the validation during ftrace initialization. This
allows us to simply patch the necessary instructions during
enabling/disabling ftrace.
Naveen N Rao [Mon, 19 Jun 2023 09:47:27 +0000 (15:17 +0530)]
powerpc/ftrace: Stop re-purposing linker generated long branches for ftrace
Commit 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs")
added ftrace support for ppc64 kernel images with a text section larger
than 32MB. The patch did two things:
1. Add stubs at the end of .text to branch into ftrace_[regs_]caller for
functions that were out of branch range.
2. Re-purpose linker-generated long branches to _mcount to instead branch
to ftrace_[regs_]caller.
Before that, we only supported kernel .text up to ~32MB. With the above,
we now support up to ~96MB:
- The first 32MB of kernel text can branch directly into
ftrace_[regs_]caller since that symbol is usually at the beginning.
- The modified long_branch from (2) above is used by the next 32MB of
kernel text.
- The next 32MB of kernel text can use the stub at the end of text to
branch back to ftrace_[regs_]caller.
While re-purposing the long branch works in practice, it still restricts
ftrace to kernel text up to ~96MB. The stub at the end of kernel text
from (1) already enables us to extend ftrace support for kernel text
up to 64MB, which fulfils the original requirement. Further, once we
switch to -fpatchable-function-entry, there will not be a long branch
that we can use.
Stop re-purposing the linker-generated long branches for ftrace to
simplify the code. If there are good reasons to support ftrace on
kernels beyond 64MB, we can consider adding support by using
-fpatchable-function-entry.
Naveen N Rao [Mon, 19 Jun 2023 09:47:25 +0000 (15:17 +0530)]
powerpc/ftrace: Consolidate ftrace support into fewer files
ftrace_low.S has just the _mcount stub and return_to_handler(). Merge
this back into ftrace_mprofile.S and ftrace_64_pg.S to keep all ftrace
code together, and to allow those to evolve independently.
ftrace_mprofile.S is also not an entirely accurate name since this also
holds ppc32 code. This will be all the more incorrect once support for
-fpatchable-function-entry is added. Rename files here to more
accurately describe the code:
- ftrace_mprofile.S is renamed to ftrace_entry.S
- ftrace_pg.c is renamed to ftrace_64_pg.c
- ftrace_64_pg.S is rename to ftrace_64_pg_entry.S
Naveen N Rao [Mon, 19 Jun 2023 09:47:24 +0000 (15:17 +0530)]
powerpc/ftrace: Extend ftrace support for large kernels to ppc32
Commit 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs")
added ftrace support for ppc64 kernel images with a text section larger
than 32MB. The approach itself isn't specific to ppc64, so extend the
same to also work on ppc32.
While at it, reduce the space reserved for the stub from 64 bytes to 32
bytes since the different stub variants are all less than 8
instructions.
To reduce use of #ifdef, a stub implementation is provided for
kernel_toc_address() and -SZ_2G is cast to 'long long' to prevent
errors on ppc32.
Naveen N Rao [Mon, 19 Jun 2023 09:47:21 +0000 (15:17 +0530)]
powerpc64/ftrace: Move ELFv1 and -pg support code into a separate file
ELFv1 support is deprecated and on the way out. Pre -mprofile-kernel
ftrace support (-pg only) is very limited and is retained primarily for
clang builds. It won't be necessary once clang lands support for
-fpatchable-function-entry.
Copy the existing ftrace code supporting these into ftrace_pg.c.
ftrace.c can then be refactored and enhanced with a focus on ppc32 and
ppc64 ELFv2.
.ftrace.tramp section is not used for any purpose. This code was added
all the way back in the original commit introducing support for dynamic
ftrace on ppc64 modules. Remove it.
Naveen N Rao [Mon, 19 Jun 2023 09:47:19 +0000 (15:17 +0530)]
powerpc/ftrace: Fix dropping weak symbols with older toolchains
The minimum level of gcc supported for building the kernel is v5.1.
v5.x releases of gcc emitted a three instruction sequence for
-mprofile-kernel:
mflr r0
std r0, 16(r1)
bl _mcount
It is only with the v6.x releases that gcc started emitting the two
instruction sequence for -mprofile-kernel, omitting the second store
instruction.
With the older three instruction sequence, the actual ftrace location
can be the 5th instruction into a function. Update the allowed offset
for ftrace location from 12 to 16 to accommodate the same.
Christophe Leroy [Fri, 18 Aug 2023 08:59:44 +0000 (10:59 +0200)]
powerpc/perf: Convert fsl_emb notifier to state machine callbacks
CC arch/powerpc/perf/core-fsl-emb.o
arch/powerpc/perf/core-fsl-emb.c:675:6: error: no previous prototype for 'hw_perf_event_setup' [-Werror=missing-prototypes]
675 | void hw_perf_event_setup(int cpu)
| ^~~~~~~~~~~~~~~~~~~
Looks like fsl_emb was completely missed by commit 3f6da3905398 ("perf:
Rework and fix the arch CPU-hotplug hooks")
So, apply same changes as commit 3f6da3905398 ("perf: Rework and fix
the arch CPU-hotplug hooks") then commit 57ecde42cc74 ("powerpc/perf:
Convert book3s notifier to state machine callbacks")
While at it, also fix following error:
arch/powerpc/perf/core-fsl-emb.c: In function 'perf_event_interrupt':
arch/powerpc/perf/core-fsl-emb.c:648:13: error: variable 'found' set but not used [-Werror=unused-but-set-variable]
648 | int found = 0;
| ^~~~~
PCI: rpaphp: Error out on busy status from get-sensor-state
When certain PHB HW failure causes pHyp to recover PHB, it marks the PE
state as temporarily unavailable until recovery is complete. This also
triggers an EEH handler in Linux which needs to notify drivers, and perform
recovery. But before notifying the driver about the PCI error it uses
get_adapter_status()->rpaphp_get_sensor_state()->rtas_call(get-sensor-state)
operation of the hotplug_slot to determine if the slot contains a device or
not. If the slot is empty, the recovery is skipped entirely.
However on certain PHB failures, the RTAS call rtas_call(get-sensor-state)
returns extended busy error (9902) until PHB is recovered by pHyp. Once PHB
is recovered, the rtas_call(get-sensor-state) returns success with correct
presence status. The RTAS call interface rtas_get_sensor() loops over the
RTAS call on extended delay return code (9902) until the return value is
either success (0) or error (-1). This causes the EEH handler to get stuck
for ~6 seconds before it could notify that the PCI error has been detected
and stop any active operations. Hence with running I/O traffic, during this
6 seconds, the network driver continues its operation and hits a timeout
(netdev watchdog).
On timeouts, network driver starts dumping debug information to console
(e.g bnx2 driver calls bnx2x_panic_dump()), and go into recovery path while
pHyp is still recovering the PHB. As part of recovery, the driver tries to
reset the device and it keeps failing since every PCI read/write returns
ff's. And when EEH recovery kicks-in, the driver is unable to recover the
device. This impacts the ssh connection and leads to the system being
inaccessible. To get the NIC working again it needs a reboot or re-assign
the I/O adapter from HMC.
Hence, it becomes important to inform driver about the PCI error detection
as early as possible, so that driver is aware of PCI error and waits for
EEH handler's next action for successful recovery.
Current implementation uses rtas_get_sensor() API which blocks the slot
check state until RTAS call returns success. To avoid this, fix the PCI
hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence
state can not be detected immediately while PE is in EEH recovery state.
Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state)
directly only if the respective PE is in EEH recovery state, and take
actions based on RTAS return status. This way EEH handler will not be
blocked on rpaphp_get_sensor_state() and can immediately notify driver
about the PCI error and stop any active operations.
In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
invoke rtas_get_sensor() as it was earlier with no change in existing
behavior.
Vaibhav Jain [Fri, 18 Aug 2023 05:07:37 +0000 (10:37 +0530)]
powerpc/idle: Add support for nohlt
This patch enables config option GENERIC_IDLE_POLL_SETUP for arch
powerpc. This adds support for kernel param 'nohlt'.
Powerpc kernel also supports another kernel boot-time param called
'powersave' which can also be used to disable all cpu idle-states and
forces CPU to an idle-loop similar to what cpu_idle_poll() does. This
patch however makes powerpc kernel-parameters better aligned to the
generic boot-time parameters.
Hari Bathini [Fri, 9 Jun 2023 07:14:04 +0000 (12:44 +0530)]
powerpc/fadump: invoke ibm,os-term with rtas_call_unlocked()
Invoke ibm,os-term call with rtas_call_unlocked(), without using the
RTAS spinlock, to avoid deadlock in the unlikely event of a machine
crash while making an RTAS call.
Parse the device tree in early init to find the memory block size to be
used by the kernel. Consolidate the memory block size device tree parsing
to one helper and use that on both powernv and pseries. We still want to
use machine-specific callback because on all machine types other than
powernv and pseries we continue to return MIN_MEMORY_BLOCK_SIZE.
pseries_memory_block_size used to look for the second memory
block (memory@x) to determine the memory_block_size value. This patch
changed that to look at all memory blocks and make sure we can map them all
correctly using the computed memory block size value.
Add workaround to force 256MB memory block size if device driver managed
memory such as GPU memory is present. This helps to add GPU memory
that is not aligned to 1G.
powerpc/fadump: reset dump area size if fadump memory reserve fails
In case fadump_reserve_mem() fails to reserve memory, the
reserve_dump_area_size variable will retain the reserve area size. This
will lead to /sys/kernel/fadump/mem_reserved node displaying an incorrect
memory reserved by fadump.
To fix this problem, reserve dump area size variable is set to 0 if fadump
failed to reserve memory.
Fixes: 8255da95e545 ("powerpc/fadump: release all the memory above boot memory size") Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230704050715.203581-1-sourabhjain@linux.ibm.com
Christophe Leroy [Thu, 17 Aug 2023 12:40:49 +0000 (14:40 +0200)]
powerpc/4xx: Remove WatchdogHandler() to fix no previous prototype error
Building ppc40x_defconfig throws the following error:
CC arch/powerpc/kernel/traps.o
arch/powerpc/kernel/traps.c:2232:29: warning: no previous prototype for 'WatchdogHandler' [-Wmissing-prototypes]
2232 | void __attribute__ ((weak)) WatchdogHandler(struct pt_regs *regs)
| ^~~~~~~~~~~~~~~
This function was imported by commit 14cf11af6cf6 ("powerpc: Merge
enough to start building in arch/powerpc.") as a weak function but
never defined and/or called outside traps.c
As it has only one caller fold it inside its caller and remove it.
Christophe Leroy [Thu, 17 Aug 2023 12:26:45 +0000 (14:26 +0200)]
powerpc/8xx: Remove init_internal_rtc() to fix no previous prototype error
A W=1 build of mpc885_ads_defconfig throws the following error:
CC arch/powerpc/platforms/8xx/m8xx_setup.o
arch/powerpc/platforms/8xx/m8xx_setup.c:41:1: error: no previous prototype for 'init_internal_rtc' [-Werror=missing-prototypes]
41 | init_internal_rtc(void)
| ^~~~~~~~~~~~~~~~~
init_internal_rtc() was introduced by commit df34403dcaac ("[POWERPC]
8xx: Add mpc885ads support and common mpc8xx files") as a weak
function but has never been defined and/or used outside m8xx_setup.c
As it is called only once there, just fold it into its caller and
remove it.
Justin Stitt [Wed, 16 Aug 2023 21:39:24 +0000 (21:39 +0000)]
powerpc/ps3: refactor strncpy usage
`strncpy` is deprecated for use on NUL-terminated destination strings [1].
`make_first_field()` should use similar implementation to `make_field()`
due to memcpy having more obvious behavior here. The end result yields
the same behavior as the previous `strncpy`-based implementation
including the NUL-padding.
Benjamin Gray [Tue, 1 Aug 2023 01:17:44 +0000 (11:17 +1000)]
perf/hw_breakpoint: Remove arch breakpoint hooks
PowerPC was the only user of these hooks, and has been refactored to no
longer require them. There is no need to keep them around, so remove
them to reduce complexity.
ptrace and perf watchpoints were considered incompatible in
commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf
and ptrace events"), but the logic in that commit doesn't really apply.
Ptrace doesn't automatically single step; the ptracer must request this
explicitly. And the ptracer can do so regardless of whether a
ptrace/perf watchpoint triggered or not: it could single step every
instruction if it wanted to. Whatever stopped the ptracee before
executing the instruction that would trigger the perf watchpoint is no
longer relevant by this point.
To get correct behaviour when perf and ptrace are watching the same
data we must ignore the perf watchpoint. After all, ptrace has
before-execute semantics, and perf is after-execute, so perf doesn't
actually care about the watchpoint trigger at this point in time.
Pausing before execution does not mean we will actually end up executing
the instruction.
Importantly though, we don't remove the perf watchpoint yet. This is
key.
The ptracer is free to do whatever it likes right now. E.g., it can
continue the process, single step. or even set the child PC somewhere
completely different.
If it does try to execute the instruction though, without reinserting
the watchpoint (in which case we go back to the start of this example),
the perf watchpoint would immediately trigger. This time there is no
ptrace watchpoint, so we can safely perform a single step and increment
the perf counter. Upon receiving the single step exception, the existing
code already handles propagating or consuming it based on whether
another subsystem (e.g. ptrace) requested a single step. Again, this is
needed with or without perf/ptrace exclusion, because ptrace could be
single stepping this instruction regardless of if a watchpoint is
involved.
Benjamin Gray [Tue, 1 Aug 2023 01:17:40 +0000 (11:17 +1000)]
powerpc/watchpoints: Track perf single step directly on the breakpoint
There is a bug in the current watchpoint tracking logic, where the
teardown in arch_unregister_hw_breakpoint() uses bp->ctx->task, which it
does not have a reference of and parallel threads may be in the process
of destroying. This was partially addressed in commit fb822e6076d9
("powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event"),
but the underlying issue of accessing a struct member in an unknown
state still remained. Syzkaller managed to trigger a null pointer
derefernce due to the race between the task destructor and checking the
pointer and dereferencing it in the loop.
While this null pointer dereference could be fixed by using READ_ONCE
to access the task up front, that just changes the error to manipulating
possbily freed memory.
Instead, the breakpoint logic needs to be reworked to remove any
dependency on a context or task struct during breakpoint removal.
The reason we have this currently is to clear thread.last_hit_ubp. This
member is used to differentiate the perf DAWR single-step sequence from
other causes of single-step, such as userspace just calling
ptrace(PTRACE_SINGLESTEP, ...). We need to differentiate them because,
when the single step interrupt is received, we need to know whether to
re-insert the DAWR breakpoint (perf) or not (ptrace / other).
arch_unregister_hw_breakpoint() needs to clear this information to
prevent dangling pointers to possibly freed memory. These pointers are
dereferenced in single_step_dabr_instruction() without a way to check
their validity.
This patch moves the tracking of this information to the breakpoint
itself. This means we no longer have to do anything special to clean up.
Benjamin Gray [Tue, 1 Aug 2023 01:17:39 +0000 (11:17 +1000)]
powerpc/watchpoints: Don't track info persistently
info is cheap to retrieve, and is likely optimised by the compiler
anyway. On the other hand, propagating it across the functions makes it
possible to be inconsistent and adds needless complexity.
Remove it, and invoke counter_arch_bp() when we need to work with it.
As we don't persist it, we just use the local bp array to track whether
we are ignoring a breakpoint.
Benjamin Gray [Tue, 1 Aug 2023 01:17:38 +0000 (11:17 +1000)]
powerpc/watchpoints: Explain thread_change_pc() more
The behaviour of the thread_change_pc() function is a bit cryptic
without being more familiar with how the watchpoint logic handles
perf's after-execute semantics.
Expand the comment to explain why we can re-insert the breakpoint and
unset the perf_single_step flag.
powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via partition information
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
AFFINITY_DOMAIN_INFORMATION_BY_PARTITION(0XB1), can be used to get
the system affinity domain via partition information. To expose the system
affinity domain via partition information, patch adds sysfs file called
"affinity_domain_via_partition" to the "/sys/devices/hv_gpci/interface/"
of hv_gpci pmu driver.
Add new entry for AFFINITY_DOMAIN_VIA_PAR in sysinfo_counter_request
array, which points to the counter request value
"affinity_domain_via_partition" in hv-gpci.c file. Also add a
new function called "affinity_domain_via_partition_result_parse" to parse
the hcall result and store it in output buffer.
The affinity_domain_via_partition sysfs file is only available for power10
and above platforms. Add a macro called
INTERFACE_AFFINITY_DOMAIN_VIA_PAR_ATTR, which points to the index of NULL
placeholder, for affinity_domain_via_partition attribute in
interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR
macro in hv-gpci.c file.
powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via domain information
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
AFFINITY_DOMAIN_INFORMATION_BY_DOMAIN(0XB0), can be used to get
the system affinity domain via domain information. To expose the system
affinity domain via domain information, patch adds sysfs file called
"affinity_domain_via_domain" to the "/sys/devices/hv_gpci/interface/"
of hv_gpci pmu driver.
Add new entry for AFFINITY_DOMAIN_VIA_DOM in sysinfo_counter_request
array, which points to the counter request value
"affinity_domain_via_domain" in hv-gpci.c file.
The affinity_domain_via_domain sysfs file is only available for power10
and above platforms. Add a macro called
INTERFACE_AFFINITY_DOMAIN_VIA_DOM_ATTR, which points to the index of NULL
placeholder, for affinity_domain_via_domain attribute in interface_attrs
array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.c
file.
powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via virtual processor information
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
AFFINITY_DOMAIN_INFORMATION_BY_VIRTUAL_PROCESSOR(0XA0), can be used to get
the system affinity domain via virtual processor information. To expose
the system affinity domain via virtual processor information, patch adds
sysfs file called "affinity_domain_via_virtual_processor" to the
"/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver.
The affinity_domain_via_virtual_processor sysfs file is only available for
power10 and above platforms. Add a macro called
INTERFACE_AFFINITY_DOMAIN_VIA_VP_ATTR, which points to the index of NULL
placeholder, for affinity_domain_via_virtual_processor attribute in
interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro
in hv-gpci.c file.
powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor config information
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
PROCESSOR_CONFIG(0X90), can be used to get the system
processor configuration information. To expose the system
processor config information, patch adds sysfs file called
"processor_config" to the "/sys/devices/hv_gpci/interface/"
of hv_gpci pmu driver.
Add enum and sysinfo_counter_request array to get required
counter request value in hv-gpci.c file.
Also add a new function called "sysinfo_device_attr_create",
which will create and return required device attribute to the
add_sysinfo_interface_files function.
The processor_config sysfs file is only available for power10
and above platforms. Add a new macro called
INTERFACE_PROCESSOR_CONFIG_ATTR, which points to the index of
NULL placefolder, for processor_config attribute in the interface_attrs
array. Also add macro INTERFACE_NULL_ATTR which points to index of NULL
attribute in interface_attrs array.
powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor bus topology information
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
PROCESSOR_BUS_TOPOLOGY(0XD0), can be used to get the system
topology information. To expose the system topology information,
patch adds sysfs file called "processor_bus_topology" to the
"/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver.
Add macro for PROCESSOR_BUS_TOPOLOGY counter request value
in hv-gpci.c file. Also add a new function called
"systeminfo_gpci_request", to make the H_GET_PERF_COUNTER_INFO hcall
with added macro and populates the output buffer.
The processor_bus_topology sysfs file is only available for power10
and above platforms. Add a new function called
"add_sysinfo_interface_files", which will add processor_bus_topology
attribute in the interface_attrs array, only for power10 and
above platforms.
Also add macro INTERFACE_PROCESSOR_BUS_TOPOLOGY_ATTR in hv-gpci.c
file, which points to the index of NULL placefolder, for
processor_bus_topology attribute.
Linus Walleij [Wed, 9 Aug 2023 08:07:13 +0000 (10:07 +0200)]
powerpc: Make virt_to_pfn() a static inline
Making virt_to_pfn() a static inline taking a strongly typed
(const void *) makes the contract of a passing a pointer of that
type to the function explicit and exposes any misuse of the
macro virt_to_pfn() acting polymorphic and accepting many types
such as (void *), (unitptr_t) or (unsigned long) as arguments
without warnings.
Move the virt_to_pfn() and related functions below the
declaration of __pa() so it compiles.
For symmetry do the same with pfn_to_kaddr().
As the file is included right into the linker file, we need
to surround the functions with ifndef __ASSEMBLY__ so we
don't cause compilation errors.
The conversion moreover exposes the fact that pmd_page_vaddr()
was returning an unsigned long rather than a const void * as
could be expected, so all the sites defining pmd_page_vaddr()
had to be augmented as well.
Finally the KVM code in book3s_64_mmu_hv.c was passing an
unsigned int to virt_to_phys() so fix that up with a cast so the
result compiles.
Xiongfeng Wang [Fri, 4 Aug 2023 08:04:35 +0000 (16:04 +0800)]
powerpc/powernv/pci: use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it mannually. Use pci_dev_id() to
simplify the code a little bit.
Remove an unnecessary piece of code that does an endianness conversion
but does not use the result. The following warning was reported by
Clang's static analyzer:
arch/powerpc/sysdev/xics/ics-opal.c:114:2: warning: Value stored to
'server' is never read [deadcode.DeadStores]
server = be16_to_cpu(oserver);
'server' was used as a parameter to opal_get_xive() in commit 5c7c1e9444d8 ("powerpc/powernv: Add OPAL ICS backend") when it was
introduced. 'server' was also used in an error message for the call to
opal_get_xive().
'server' was always later set by a call to ics_opal_mangle_server()
before being used.
Commit bf8e0f891a32 ("powerpc/powernv: Fix endian issues in OPAL ICS
backend") used a new variable 'oserver' as the parameter to
opal_get_xive() instead of 'server' for endian correctness. It also
removed 'server' from the error message for the call to opal_get_xive().
Fix the warning by removing the server variable assignment.
Fixes: bf8e0f891a32 ("powerpc/powernv: Fix endian issues in OPAL ICS backend") Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230731115543.36991-1-gautam@linux.ibm.com
ruanjinjie [Thu, 10 Nov 2022 01:19:29 +0000 (09:19 +0800)]
powerpc/pseries: fix possible memory leak in ibmebus_bus_init()
If device_register() returns error in ibmebus_bus_init(), name of kobject
which is allocated in dev_set_name() called in device_add() is leaked.
As comment of device_add() says, it should call put_device() to drop
the reference count that was set in device_initialize() when it fails,
so the name can be freed in kobject_cleanup().
Clang didn't recognize the instruction tlbilxlpid. This was fixed in
clang-18 [0] then backported to clang-17 [1]. To support clang-16 and
older, rather than using that instruction bare in inline asm, add it to
ppc-opcode.h and use that macro as is done elsewhere for other
instructions.
Xiongfeng Wang [Fri, 4 Aug 2023 07:56:30 +0000 (15:56 +0800)]
cxl: Use pci_find_vsec_capability() to simplify the code
PCI core add pci_find_vsec_capability() to query VSEC. We can use that
core API to simplify the code.
The only logical change is that pci_find_vsec_capability check the
Vendor ID before finding the VSEC.
PCI spec rev 5.0 says in 7.9.5.2 Vendor-Specific Header:
VSEC ID - This field is a vendor-defined ID number that indicates the
nature and format of the VSEC structure
Software must qualify the Vendor ID before interpreting this field.
Christophe Leroy [Wed, 21 Jun 2023 10:40:50 +0000 (12:40 +0200)]
powerpc/reg: Remove #ifdef around mtspr macro
That ifdef was introduced by commit 1458dd951f7c ("powerpc/8xx:
Handle CPU6 ERRATA directly in mtspr() macro") and left over by
commit 2a45addd21de ("powerpc/8xx: Remove CPU6 ERRATA Workaround")
Since commit 449012daa92a ("[POWERPC] cpm2: Infrastructure code
cleanup.") cpm2_map() is just returning cpm2_immr pointer and
cpm2_unmap() does nothing.
We already have parts of code that use cpm2_immr directly so get rid
of cpm2_map() and cpm2_unmap() by using cpm2_immr directly. And avoid
going through local pointers that hide the pointed structure.
Since commit fb533d0c5a97 ("[POWERPC] 8xx: Infrastructure code cleanup.")
immr_map() is just returning mpc8xxx_immr pointer and immr_unmap()
do nothing.
We already have parts of code that use mpc8xxx_immr directly so get rid
of immr_map() and immr_unmap() by using mpc8xxx_immr directly. And avoid
going through local pointers that hide the pointed structure.
Zheng Zengkai [Fri, 11 Aug 2023 10:20:39 +0000 (18:20 +0800)]
ocxl: Use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it mannually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230811102039.17257-1-zhengzengkai@huawei.com
Don't use kernel-doc "/**" comment format for non-kernel-doc comments.
This prevents a kernel-doc warning:
arch/powerpc/platforms/pseries/plpks.c:186: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Label is combination of label attributes + name.
Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202308040430.GxmPAnwZ-lkp@intel.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230810000740.23756-1-rdunlap@infradead.org
powerpc/radix: Move some functions into #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
With skiboot_defconfig, Clang reports:
CC arch/powerpc/mm/book3s64/radix_tlb.o
arch/powerpc/mm/book3s64/radix_tlb.c:419:20: error: unused function '_tlbie_pid_lpid' [-Werror,-Wunused-function]
static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
^
arch/powerpc/mm/book3s64/radix_tlb.c:663:20: error: unused function '_tlbie_va_range_lpid' [-Werror,-Wunused-function]
static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end,
^
This is because those functions are only called from functions
enclosed in a #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
Move below functions inside that #ifdef
* __tlbie_pid_lpid(unsigned long pid,
* __tlbie_va_lpid(unsigned long va, unsigned long pid,
* fixup_tlbie_pid_lpid(unsigned long pid, unsigned long lpid)
* _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
* fixup_tlbie_va_range_lpid(unsigned long va,
* __tlbie_va_range_lpid(unsigned long start, unsigned long end,
* _tlbie_va_range_lpid(unsigned long start, unsigned long end,
Fixes: f0c6fbbb9050 ("KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307260802.Mjr99P5O-lkp@intel.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/3d72efd39f986ee939d068af69fdce28bd600766.1691568093.git.christophe.leroy@csgroup.eu
Arnd Bergmann [Wed, 9 Aug 2023 13:10:09 +0000 (15:10 +0200)]
powerpc: xmon: remove unused variables
Randconfig testing with W=1 showed up these warnings that I'd like to enable
by default:
arch/powerpc/xmon/xmon.c: In function 'dump_tlb_book3e':
arch/powerpc/xmon/xmon.c:3833:42: error: variable 'lrat' set but not used [-Werror=unused-but-set-variable]
3833 | int i, tlb, ntlbs, pidsz, lpidsz, rasz, lrat = 0;
| ^~~~
arch/powerpc/xmon/xmon.c:3831:23: error: variable 'lpidmask' set but not used [-Werror=unused-but-set-variable]
3831 | u32 mmucfg, pidmask, lpidmask;
| ^~~~~~~~
arch/powerpc/xmon/xmon.c:3831:14: error: variable 'pidmask' set but not used [-Werror=unused-but-set-variable]
3831 | u32 mmucfg, pidmask, lpidmask;
| ^~~~~~~
Just remove these as they have been unused since the code was added in 2010.
Fixes: 03247157f7391 ("powerpc/book3e: Add TLB dump in xmon for Book3E") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230809131024.2039647-2-arnd@kernel.org
Arnd Bergmann [Wed, 9 Aug 2023 13:10:08 +0000 (15:10 +0200)]
powerpc: mark more local variables as volatile
A while ago I created a2305e3de8193 ("powerpc: mark local variables
around longjmp as volatile") in order to allow building powerpc with
-Wextra enabled on gcc-11.
I tried this again with gcc-13 and found two more of the same issues,
presumably based on slightly different optimization paths being taken
here:
arch/powerpc/xmon/xmon.c:3306:27: error: variable 'mm' might be clobbered by 'longjmp' or 'vfork' [-Werror=clobbered]
arch/powerpc/kexec/crash.c:353:22: error: variable 'i' might be clobbered by 'longjmp' or 'vfork' [-Werror=clobbered]
I checked a bunch of randconfigs and found only these two, so just
address them the same way as the others.
Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
parameter.
Implement the recently added hooks to allow partial SMT states, allow
any number of threads per core.
Tie the config symbol to HOTPLUG_CPU, which enables it on the major
platforms that support SMT. If there are other platforms that want the
SMT support that can be tweaked in future.
powerpc/pseries: Initialise CPU hotplug callbacks earlier
As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel
command line.
The way that is implemented is the secondary threads are brought partly
online, and then taken back offline again. That is done to support x86
CPUs needing certain initialisation done on all threads. However powerpc
has similar needs, see commit d70a54e2d085 ("powerpc/powernv: Ignore
smt-enabled on Power8 and later").
For that to work the powerpc CPU hotplug callbacks need to be registered
before secondary CPUs are brought online, otherwise __cpu_disable()
fails due to smp_ops->cpu_disable being NULL.
So split the basic initialisation into pseries_cpu_hotplug_init() which
can be called early from setup_arch(). The DLPAR related initialisation
can still be done later, because it needs to do allocations.
Instead of resorting to BUG() ensure that the driver isn't unbound by
suppressing its bind and unbind sysfs attributes. As the driver is
built-in there is no way to remove a device once bound.
As a nice side effect this allows to drop the remove function.
There are a few warnings in powerpc64 defconfig builds after -Wmissing-prototypes
gets promoted from W=1 to the default warning set:
arch/powerpc/mm/book3s64/pgtable.c:422:6: error: no previous prototype for 'arch_report_meminfo' [-Werror=missing-prototypes]
arch/powerpc/platforms/cell/ras.c:275:5: error: no previous prototype for 'cbe_sysreset_hack' [-Werror=missing-prototypes]
arch/powerpc/platforms/cell/spu_manage.c:29:21: error: no previous prototype for 'spu_devnode' [-Werror=missing-prototypes]
arch/powerpc/platforms/pasemi/time.c:12:17: error: no previous prototype for 'pas_get_boot_time' [-Werror=missing-prototypes]
arch/powerpc/platforms/powermac/feature.c:1532:13: error: no previous prototype for 'g5_phy_disable_cpu1' [-Werror=missing-prototypes]
arch/powerpc/platforms/86xx/pic.c:28:13: error: no previous prototype for 'mpc86xx_init_irq' [-Werror=missing-prototypes]
drivers/pci/pci-sysfs.c:936:13: error: no previous prototype for 'pci_adjust_legacy_attr' [-Werror=missing-prototypes]
Address these by including the right header files or marking the
functions static. The audit.c one is a bit tricky since compat_audit.h
cannot include regular kernel headers tht have conflicting types on
32-bit powerpc.
Benjamin Gray [Tue, 25 Jul 2023 00:58:41 +0000 (10:58 +1000)]
selftests/powerpc/ptrace: Declare test temporary variables as volatile
While the target is volatile, the temporary variables used to access the
target cast away the volatile. This is undefined behaviour, and a
compiler may optimise away/reorder these accesses, breaking the test.
This was observed with GCC 13.1.1, but it can be difficult to reproduce
because of the dependency on compiler behaviour.
Benjamin Gray [Tue, 25 Jul 2023 00:58:40 +0000 (10:58 +1000)]
selftests/powerpc/ptrace: Fix typo in pid_max search error
pid_max_addr() searches for the 'pid_max' symbol in /proc/kallsyms, and
prints an error if it cannot find it. The error message has a typo,
calling it pix_max.
Benjamin Gray [Tue, 25 Jul 2023 00:58:39 +0000 (10:58 +1000)]
selftests/powerpc/ptrace: Explain why tests are skipped
Many tests require specific hardware features/configurations that a
typical machine might not have. As a result, it's common to see a test
is skipped. But it is tedious to find out why a test is skipped
when all it gives is the file location of the skip macro.
Convert SKIP_IF() to SKIP_IF_MSG(), with appropriate descriptions of why
the test is being skipped. This gives a general idea of why a test is
skipped, which can be looked into further if it doesn't make sense.
Rob Herring [Mon, 24 Jul 2023 21:02:42 +0000 (15:02 -0600)]
powerpc: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Nicholas Piggin [Wed, 24 May 2023 06:08:21 +0000 (16:08 +1000)]
powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs
This performs lazy tlb mm shootdown when doing the exit TLB flush when
all mm users go away and user mappings are removed, which avoids having
to do the lazy tlb mm shootdown IPIs on the final mmput when all kernel
references disappear.
powerpc/64s uses a broadcast TLBIE for the exit TLB flush if remote CPUs
need to be invalidated (unless TLBIE is disabled), so this doesn't
necessarily save IPIs but it does avoid a broadcast TLBIE which is quite
expensive.
Nicholas Piggin [Wed, 24 May 2023 06:08:20 +0000 (16:08 +1000)]
powerpc: Add mm_cpumask warning when context switching
When context switching away from an mm, add a CONFIG_DEBUG_VM warning
check to ensure this CPU is still set in the mask. This could catch
bugs where the mask is improperly trimmed while the CPU is still using
the mm.