git.ipfire.org Git - thirdparty/linux.git/log

]> git.ipfire.org Git - thirdparty/linux.git/log

projects / thirdparty / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Thadeu Lima de Souza Cascardo [Tue, 28 Jul 2020 15:50:39 +0000 (12:50 -0300)]

selftests/powerpc: Return skip code for spectre_v2

When running under older versions of qemu of under newer versions with
old machine types, some security features will not be reported to the
guest. This will lead the guest OS to consider itself Vulnerable to
spectre_v2.

So, spectre_v2 test fails in such cases when the host is mitigated and
miss predictions cannot be detected as expected by the test.

Make it return the skip code instead, for this particular case. We
don't want to miss the case when the test fails and the system reports
as mitigated or not affected. But it is not a problem to miss failures
when the system reports as Vulnerable.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728155039.401445-1-cascardo@canonical.com

commit | commitdiff | tree

Balamuruhan S [Tue, 28 Jul 2020 13:03:08 +0000 (18:33 +0530)]

powerpc/test_emulate_step: Add testcases for divde[.] and divdeu[.] instructions

Add testcases for divde, divde., divdeu, divdeu. emulated instructions
to cover few scenarios,
  - with same dividend and divisor to have undefine RT
    for divdeu[.]
  - with divide by zero to have undefine RT for both
    divde[.] and divdeu[.]
  - with negative dividend to cover -|divisor| < r <= 0 if
    the dividend is negative for divde[.]
  - normal case with proper dividend and divisor for both
    divde[.] and divdeu[.]

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728130308.1790982-4-bala24@linux.ibm.com

commit | commitdiff | tree

Balamuruhan S [Tue, 28 Jul 2020 13:03:07 +0000 (18:33 +0530)]

powerpc/sstep: Add support for divde[.] and divdeu[.] instructions

This patch adds emulation support for divde, divdeu instructions,
- Divide Doubleword Extended (divde[.])
- Divide Doubleword Extended Unsigned (divdeu[.])

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728130308.1790982-3-bala24@linux.ibm.com

commit | commitdiff | tree

Balamuruhan S [Tue, 28 Jul 2020 13:03:06 +0000 (18:33 +0530)]

powerpc/ppc-opcode: Add divde and divdeu opcodes

Include instruction opcodes for divde and divdeu as macros.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728130308.1790982-2-bala24@linux.ibm.com

commit | commitdiff | tree

Harish [Tue, 9 Jun 2020 08:14:23 +0000 (13:44 +0530)]

selftests/powerpc: Fix CPU affinity for child process

On systems with large number of cpus, test fails trying to set
affinity by calling sched_setaffinity() with smaller size for affinity
mask. This patch fixes it by making sure that the size of allocated
affinity mask is dependent on the number of CPUs as reported by
get_nprocs().

Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 benchmark")
Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Harish <harish@linux.ibm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609081423.529664-1-harish@linux.ibm.com

commit | commitdiff | tree

Wei Yongjun [Mon, 27 Jul 2020 17:11:12 +0000 (01:11 +0800)]

powerpc/powernv/sriov: Remove unused but set variable 'phb'

Gcc report warning as follows:

arch/powerpc/platforms/powernv/pci-sriov.c:602:25: warning:
variable 'phb' set but not used [-Wunused-but-set-variable]
602 | struct pnv_phb *phb;
| ^~~

This variable is not used, so this commit removing it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727171112.2781-1-weiyongjun1@huawei.com

commit | commitdiff | tree

Qinglang Miao [Tue, 28 Jul 2020 02:28:07 +0000 (10:28 +0800)]

powerpc: use for_each_child_of_node() macro

Use for_each_child_of_node() macro instead of open coding it.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728022807.87815-1-miaoqinglang@huawei.com

commit | commitdiff | tree

Nicholas Piggin [Tue, 3 Mar 2020 01:27:48 +0000 (11:27 +1000)]

powerpc/build: vdso linker warning for orphan sections

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200303012748.4190929-1-npiggin@gmail.com

commit | commitdiff | tree

Gustavo A. R. Silva [Mon, 27 Jul 2020 22:42:01 +0000 (17:42 -0500)]

powerpc: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727224201.GA10133@embeddedor

commit | commitdiff | tree

Aneesh Kumar K.V [Mon, 27 Jul 2020 08:59:08 +0000 (14:29 +0530)]

powerpc/book3s64/radix: Add kernel command line option to disable radix GTSE

This adds a kernel command line option that can be used to disable GTSE support.
Disabling GTSE implies kernel will make hcalls to invalidate TLB entries.

This was done so that we can do VM migration between configs that enable/disable
GTSE support via hypervisor. To migrate a VM from a system that supports
GTSE to a system that doesn't, we can boot the guest with
radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB
invalidates.

The check for hcall availability is done in pSeries_setup_arch so that
the panic message appears on the console. This should only happen on
a hypervisor that doesn't force the guest to hash translation even
though it can't handle the radix GTSE=0 request via CAS. With
radix_hcall_invalidate=on if the hypervisor doesn't support hcall_rpt_invalidate
hcall it should force the LPAR to hash translation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727085908.420806-1-aneesh.kumar@linux.ibm.com

commit | commitdiff | tree

Aneesh Kumar K.V [Mon, 13 Jul 2020 15:07:49 +0000 (20:37 +0530)]

powerpc/kvm/cma: Improve kernel log during boot

Current kernel gives:

[    0.000000] cma: Reserved 26224 MiB at 0x0000007959000000
[    0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node
[    0.000000] cma: Reserved 16384 MiB at 0x0000001800000000

With the fix

[    0.000000] kvm_cma_reserve: reserving 26214 MiB for global area
[    0.000000] cma: Reserved 26224 MiB at 0x0000007959000000
[    0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node
[    0.000000] cma: Reserved 16384 MiB at 0x0000001800000000

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713150749.25245-2-aneesh.kumar@linux.ibm.com

commit | commitdiff | tree

Aneesh Kumar K.V [Mon, 13 Jul 2020 15:07:48 +0000 (20:37 +0530)]

powerpc/hugetlb/cma: Allocate gigantic hugetlb pages using CMA

commit: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
added support for allocating gigantic hugepages using CMA. This patch
enables the same for powerpc

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713150749.25245-1-aneesh.kumar@linux.ibm.com

commit | commitdiff | tree

Balamuruhan S [Mon, 30 Mar 2020 07:59:54 +0000 (13:29 +0530)]

powerpc/xmon: Use `dcbf` inplace of `dcbi` instruction for 64bit Book3S

Data Cache Block Invalidate (dcbi) instruction implemented back in
PowerPC architecture version 2.03. But as per Power Processor Users Manual
it is obsolete and not supported by POWER8/POWER9 core. Attempt to use of
this illegal instruction results in a hypervisor emulation assistance
interrupt. So, ifdef it out the option `i` in xmon for 64bit Book3S.

  0:mon> fi
  cpu 0x0: Vector: 700 (Program Check) at [c000000003be74a0]
      pc: c000000000102030: cacheflush+0x180/0x1a0
      lr: c000000000101f3c: cacheflush+0x8c/0x1a0
      sp: c000000003be7730
     msr: 8000000000081033
    current = 0xc0000000035e5c00
    paca    = 0xc000000001910000   irqmask: 0x03   irq_happened: 0x01
      pid   = 1025, comm = bash
  Linux version 5.6.0-rc5-g5aa19adac (root@ltc-wspoon6) (gcc version 7.4.0
  (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #1 SMP Tue Mar 10 04:38:41 CDT 2020
  cpu 0x0: Exception 700 (Program Check) in xmon, returning to main loop
  [c000000003be7c50] c00000000084abb0 __handle_sysrq+0xf0/0x2a0
  [c000000003be7d00] c00000000084b3c0 write_sysrq_trigger+0xb0/0xe0
  [c000000003be7d30] c0000000004d1edc proc_reg_write+0x8c/0x130
  [c000000003be7d60] c00000000040dc7c __vfs_write+0x3c/0x70
  [c000000003be7d80] c000000000410e70 vfs_write+0xd0/0x210
  [c000000003be7dd0] c00000000041126c ksys_write+0xdc/0x130
  [c000000003be7e20] c00000000000b9d0 system_call+0x5c/0x68
  --- Exception: c01 (System Call) at 00007fffa345e420
  SP (7ffff0b08ab0) is in userspace

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200330075954.538773-1-bala24@linux.ibm.com

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:28 +0000 (23:17 +1000)]

powerpc: Drop old comment about CONFIG_POWER

There's a comment in time.h referring to CONFIG_POWER, which doesn't
exist. That confuses scripts/checkkconfigsymbols.py.

Presumably the comment was referring to a CONFIG_POWER vs CONFIG_PPC,
in which case for CONFIG_POWER we would #define __USE_RTC to 1. But
instead we have CONFIG_PPC_BOOK3S_601, and these days we have
IS_ENABLED().

So the comment is no longer relevant, drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-9-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:27 +0000 (23:17 +1000)]

powerpc/kvm: Use correct CONFIG symbol in comment

This comment refers to the non-existent CONFIG_PPC_BOOK3S_XX, which
confuses scripts/checkkconfigsymbols.py.

Change it to use the correct symbol.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-8-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:26 +0000 (23:17 +1000)]

powerpc/boot: Fix CONFIG_PPC_MPC52XX references

Commit 866bfc75f40e ("powerpc: conditionally compile platform-specific
serial drivers") made some code depend on CONFIG_PPC_MPC52XX, which
doesn't exist.

Fix it to use CONFIG_PPC_MPC52xx.

Fixes: 866bfc75f40e ("powerpc: conditionally compile platform-specific serial drivers")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-7-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:25 +0000 (23:17 +1000)]

powerpc/32s: Remove TAUException wart in traps.c

All 32 and 64-bit builds that don't have CONFIG_TAU_INT enabled (all
of them), get a definition of TAUException() in traps.c.

On 64-bit it's completely useless, and just wastes ~120 bytes of text.
On 32-bit it allows the kernel to link because head_32.S calls it
unconditionally.

Instead follow the example of altivec_assist_exception(), and if
CONFIG_TAU_INT is not enabled just point it at unknown_exception using
the preprocessor.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-6-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:24 +0000 (23:17 +1000)]

powerpc/32s: Fix CONFIG_BOOK3S_601 uses

We have two uses of CONFIG_BOOK3S_601, which doesn't exist. Fix them
to use CONFIG_PPC_BOOK3S_601 which is the correct symbol.

Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-5-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:23 +0000 (23:17 +1000)]

powerpc/64e: Drop dead BOOK3E_MMU_TLB_STATS code

This code was merged 11 years ago in commit 13363ab9b9d0 ("powerpc:
Add definitions used by exception handling on 64-bit Book3E") but was
never able to be built because CONFIG_BOOK3E_MMU_TLB_STATS never
existed. Remove it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-4-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:22 +0000 (23:17 +1000)]

powerpc/52xx: Fix comment about CONFIG_BDI*

There's a comment in lite5200_sleep.S that refers to "CONFIG_BDI*".

This confuses scripts/checkkconfigsymbols.py, which thinks it should
be able to find CONFIG_BDI.

Change the comment to refer to CONFIG_BDI_SWITCH which is presumably
roughly what it was referring to. AFAICS there never has been a
CONFIG_BDI.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-3-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:21 +0000 (23:17 +1000)]

powerpc/configs: Remove dead symbols

Remove references to symbols that no longer exist as reported by
scripts/checkkconfigsymbols.py.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-2-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 13:17:20 +0000 (23:17 +1000)]

powerpc/configs: Drop old symbols from ppc6xx_defconfig

ppc6xx_defconfig refers to quite a few symbols that no longer exist,
as reported by scripts/checkkconfigsymbols.py, remove them.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-1-mpe@ellerman.id.au

commit | commitdiff | tree

Bharata B Rao [Mon, 27 Jul 2020 09:57:04 +0000 (15:27 +0530)]

powerpc/mm: Limit resize_hpt_for_hotplug() call to hash guests only

During memory hotplug and unplug, resize_hpt_for_hotplug() gets called
for both hash and radix guests but it should be called only for hash
guests. Though the call does nothing in the radix guest case, it is
cleaner to push this call into hash specific memory hotplug routines.

Reported-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727095704.1432916-1-bharata@linux.ibm.com

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 09:25:28 +0000 (19:25 +1000)]

selftests/powerpc: Remove powerpc special cases from stack expansion test

Now that the powerpc code behaves the same as other architectures we
can drop the special cases we had.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-5-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 09:25:27 +0000 (19:25 +1000)]

powerpc/mm: Remove custom stack expansion checking

We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.

The logic aims to prevent userspace from doing bad accesses below the
stack pointer. However as long as the stack is < 1MB in size, we allow
all accesses without further checks. Adding some debug I see that I
can do a full kernel build and LTP run, and not a single process has
used more than 1MB of stack. So for the majority of processes the
logic never even fires.

We also recently found a nasty bug in this code which could cause
userspace programs to be killed during signal delivery. It went
unnoticed presumably because most processes use < 1MB of stack.

The generic mm code has also grown support for stack guard pages since
this code was originally written, so the most heinous case of the
stack expanding into other mappings is now handled for us.

Finally although some other arches have special logic in this path,
from what I can tell none of x86, arm64, arm and s390 impose any extra
checks other than those in expand_stack().

So drop our complicated logic and like other architectures just let
the stack expand as long as its within the rlimit.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Daniel Axtens <dja@axtens.net>
Link: https://lore.kernel.org/r/20200724092528.1578671-4-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 09:25:26 +0000 (19:25 +1000)]

selftests/powerpc: Update the stack expansion test

Update the stack expansion load/store test to take into account the
new allowance of 4224 bytes below the stack pointer.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-3-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 09:25:25 +0000 (19:25 +1000)]

powerpc: Allow 4224 bytes of stack expansion for the signal frame

We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.

The code was originally added in 2004 "ported from 2.4". The rough
logic is that the stack is allowed to grow to 1MB with no extra
checking. Over 1MB the access must be within 2048 bytes of the stack
pointer, or be from a user instruction that updates the stack pointer.

The 2048 byte allowance below the stack pointer is there to cover the
288 byte "red zone" as well as the "about 1.5kB" needed by the signal
delivery code.

Unfortunately since then the signal frame has expanded, and is now
4224 bytes on 64-bit kernels with transactional memory enabled. This
means if a process has consumed more than 1MB of stack, and its stack
pointer lies less than 4224 bytes from the next page boundary, signal
delivery will fault when trying to expand the stack and the process
will see a SEGV.

The total size of the signal frame is the size of struct rt_sigframe
(which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on
64-bit).

The 2048 byte allowance was correct until 2008 as the signal frame
was:

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1440 */
        /* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */
        long unsigned int          _unused[2];           /*  1440    16 */
        unsigned int               tramp[6];             /*  1456    24 */
        struct siginfo *           pinfo;                /*  1480     8 */
        void *                     puc;                  /*  1488     8 */
        struct siginfo     info;                         /*  1496   128 */
        /* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */
        char                       abigap[288];          /*  1624   288 */

        /* size: 1920, cachelines: 15, members: 7 */
        /* padding: 8 */
};

1920 + 128 = 2048

Then in commit ce48b2100785 ("powerpc: Add VSX context save/restore,
ptrace and signal support") (Jul 2008) the signal frame expanded to
2304 bytes:

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1696 */ <--
        /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
        long unsigned int          _unused[2];           /*  1696    16 */
        unsigned int               tramp[6];             /*  1712    24 */
        struct siginfo *           pinfo;                /*  1736     8 */
        void *                     puc;                  /*  1744     8 */
        struct siginfo     info;                         /*  1752   128 */
        /* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */
        char                       abigap[288];          /*  1880   288 */

        /* size: 2176, cachelines: 17, members: 7 */
        /* padding: 8 */
};

2176 + 128 = 2304

At this point we should have been exposed to the bug, though as far as
I know it was never reported. I no longer have a system old enough to
easily test on.

Then in 2010 commit 320b2b8de126 ("mm: keep a guard page below a
grow-down stack segment") caused our stack expansion code to never
trigger, as there was always a VMA found for a write up to PAGE_SIZE
below r1.

That meant the bug was hidden as we continued to expand the signal
frame in commit 2b0a576d15e0 ("powerpc: Add new transactional memory
state to the signal context") (Feb 2013):

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1696 */
        /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
        struct ucontext    uc_transact;                  /*  1696  1696 */ <--
        /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
        long unsigned int          _unused[2];           /*  3392    16 */
        unsigned int               tramp[6];             /*  3408    24 */
        struct siginfo *           pinfo;                /*  3432     8 */
        void *                     puc;                  /*  3440     8 */
        struct siginfo     info;                         /*  3448   128 */
        /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
        char                       abigap[288];          /*  3576   288 */

        /* size: 3872, cachelines: 31, members: 8 */
        /* padding: 8 */
        /* last cacheline: 32 bytes */
};

3872 + 128 = 4000

And commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit
userspace to 512 bytes") (Feb 2014):

struct rt_sigframe {
        struct ucontext    uc;                           /*     0  1696 */
        /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
        struct ucontext    uc_transact;                  /*  1696  1696 */
        /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
        long unsigned int          _unused[2];           /*  3392    16 */
        unsigned int               tramp[6];             /*  3408    24 */
        struct siginfo *           pinfo;                /*  3432     8 */
        void *                     puc;                  /*  3440     8 */
        struct siginfo     info;                         /*  3448   128 */
        /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
        char                       abigap[512];          /*  3576   512 */ <--

        /* size: 4096, cachelines: 32, members: 8 */
        /* padding: 8 */
};

4096 + 128 = 4224

Then finally in 2017, commit 1be7107fbe18 ("mm: larger stack guard
gap, between vmas") exposed us to the existing bug, because it changed
the stack VMA to be the correct/real size, meaning our stack expansion
code is now triggered.

Fix it by increasing the allowance to 4224 bytes.

Hard-coding 4224 is obviously unsafe against future expansions of the
signal frame in the same way as the existing code. We can't easily use
sizeof() because the signal frame structure is not in a header. We
will either fix that, or rip out all the custom stack expansion
checking logic entirely.

Fixes: ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support")
Cc: stable@vger.kernel.org # v2.6.27+
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-2-mpe@ellerman.id.au

commit | commitdiff | tree

Michael Ellerman [Fri, 24 Jul 2020 09:25:24 +0000 (19:25 +1000)]

selftests/powerpc: Add test of stack expansion logic

We have custom stack expansion checks that it turns out are extremely
badly tested and contain bugs, surprise. So add some tests that
exercise the code and capture the current boundary conditions.

The signal test currently fails on 64-bit kernels because the 2048
byte allowance for the signal frame is too small, we will fix that in
a subsequent patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-1-mpe@ellerman.id.au

commit | commitdiff | tree

Oliver O'Halloran [Mon, 27 Jul 2020 01:01:27 +0000 (11:01 +1000)]

selftests/powerpc: Squash spurious errors due to device removal

For drivers that don't have the error handling callbacks we implement
recovery by removing the device and re-probing it. This causes the sysfs
directory for the PCI device to be removed which causes the following
spurious error to be printed when checking the PE state:

Breaking 0005:03:00.0...
./eeh-basic.sh: line 13: can't open /sys/bus/pci/devices/0005:03:00.0/eeh_pe_state: no such file
0005:03:00.0, waited 0/60
0005:03:00.0, waited 1/60
0005:03:00.0, waited 2/60
0005:03:00.0, waited 3/60
0005:03:00.0, waited 4/60
0005:03:00.0, waited 5/60
0005:03:00.0, waited 6/60
0005:03:00.0, waited 7/60
0005:03:00.0, Recovered after 8 seconds

We currently try to avoid this by checking if the PE state file exists
before reading from it. This is however inherently racy so re-work the
state checking so that we only read from the file once, and we squash any
errors that occur while reading.

Fixes: 85d86c8aa52e ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727010127.23698-1-oohall@gmail.com

commit | commitdiff | tree

Sandipan Das [Mon, 27 Jul 2020 04:00:40 +0000 (09:30 +0530)]

selftests/powerpc: Add test for pkey siginfo verification

Commit c46241a370a61 ("powerpc/pkeys: Check vma before
returning key fault error to the user") fixes a bug which
causes the kernel to set the wrong pkey in siginfo when a
pkey fault occurs after two competing threads that have
allocated different pkeys, one fully permissive and the
other restrictive, attempt to protect a common page at the
same time. This adds a test to detect the bug.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ce40b6ee270bda52e8f4088578ed2faf7d1d509a.1595821792.git.sandipan@linux.ibm.com

commit | commitdiff | tree

Sandipan Das [Mon, 27 Jul 2020 04:00:39 +0000 (09:30 +0530)]

selftests/powerpc: Add wrapper for gettid

The gettid() syscall wrapper was first introduced in
glibc 2.30. This adds a wrapper for use in distros
running older versions.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8ca3b0eeda989707815d1cf337cc33f090408965.1595821792.git.sandipan@linux.ibm.com

commit | commitdiff | tree

Sandipan Das [Mon, 27 Jul 2020 04:00:38 +0000 (09:30 +0530)]

selftests/powerpc: Add helper to exit on failure

This adds a helper similar to FAIL_IF() which lets a
program exit with code 1 (to indicate failure) when
the given condition is true.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dac282d5c2e96e7816dc522e4e20d56d7c79c898.1595821792.git.sandipan@linux.ibm.com

commit | commitdiff | tree

Sandipan Das [Mon, 27 Jul 2020 04:00:37 +0000 (09:30 +0530)]

selftests/powerpc: Harden test for execute-disabled pkeys

Commit 192b6a7805989 ("powerpc/book3s64/pkeys: Fix
pkey_access_permitted() for execute disable pkey") fixed a
bug that caused repetitive faults for pkeys with no execute
rights alongside some combination of read and write rights.

This removes the last two cases of the test, which check
the behaviour of pkeys with read, write but no execute
rights and all the rights, in favour of checking all the
possible combinations of read, write and execute rights
to be able to detect bugs like the one mentioned above.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/db467500f8af47727bba6b35796e8974a78b71e5.1595821792.git.sandipan@linux.ibm.com

commit | commitdiff | tree

Sandipan Das [Mon, 27 Jul 2020 04:00:36 +0000 (09:30 +0530)]

selftests/powerpc: Add pkey helpers for rights

This adds some new pkey-related helper to print
access rights of a pkey in the "rwx" format and
to generate different valid combinations of pkey
rights starting from a given combination.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6cc1c7d1f686618668a3e090f1d0c2a4cd9dea3f.1595821792.git.sandipan@linux.ibm.com

commit | commitdiff | tree

Sandipan Das [Mon, 27 Jul 2020 04:00:35 +0000 (09:30 +0530)]

selftests/powerpc: Move pkey helpers to headers

This moves all the pkey-related helpers to a new header
file and also a helper to print error messages in signal
handlers to the existing utils header file.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/28e633fa9ec1a6500c12188e09ea1887b10a10c1.1595821792.git.sandipan@linux.ibm.com

commit | commitdiff | tree

Nicholas Piggin [Sun, 26 Jul 2020 03:51:55 +0000 (13:51 +1000)]

powerpc/pseries: Add KVM guest doorbell restrictions

KVM guests have certain restrictions and performance quirks when using
doorbells. This patch moves the EPAPR KVM guest test so it can be shared
with PSERIES, and uses that in doorbell setup code to apply the KVM
guest quirks and improves IPI performance for two cases:

- PowerVM guests may now use doorbells even if they are secure.

- KVM guests no longer use doorbells if XIVE is available.

There is a valid complaint that "KVM guest" is not a very reasonable
thing to test for, it's preferable for the hypervisor to advertise
particular behaviours to the guest so they could change if the
hypervisor implementation or configuration changes. However in this case
we were already assuming a KVM guest worst case, so this patch is about
containing those quirks. If KVM later advertises fast doorbells, we
should test for that and override the quirks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-4-npiggin@gmail.com

commit | commitdiff | tree

Nicholas Piggin [Sun, 26 Jul 2020 03:51:54 +0000 (13:51 +1000)]

powerpc/pseries: Use doorbells even if XIVE is available

KVM supports msgsndp in guests by trapping and emulating the
instruction, so it was decided to always use XIVE for IPIs if it is
available. However on PowerVM systems, msgsndp can be used and gives
better performance. On large systems, high XIVE interrupt rates can
have sub-linear scaling, and using msgsndp can reduce the load on
the interrupt controller.

So switch to using core local doorbells even if XIVE is available.
This reduces performance for KVM guests with an SMT topology by
about 50% for ping-pong context switching between SMT vCPUs. An
option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
to get KVM performance back.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-3-npiggin@gmail.com

commit | commitdiff | tree

Nicholas Piggin [Sun, 26 Jul 2020 03:51:53 +0000 (13:51 +1000)]

powerpc: Inline doorbell sending functions

These are only called in one place for a given platform, so inline
them for performance.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
[mpe: Fix build errors related to KVM]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-2-npiggin@gmail.com

commit | commitdiff | tree

Athira Rajeev [Wed, 29 Jul 2020 04:16:54 +0000 (00:16 -0400)]

powerpc/perf: Fix MMCRA_BHRB_DISABLE define for binutils < 2.28

Commit 9908c826d5ed ("powerpc/perf: Add Power10 PMU feature to DT CPU
features") defines MMCRA_BHRB_DISABLE as `0x2000000000UL`. Binutils
version less than 2.28 doesn't support UL suffix.

  arch/powerpc/kernel/cpu_setup_power.S: Assembler messages:
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: junk at end of line, first unrecognized character is `L'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: junk at end of line, first unrecognized character is `L'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
  arch/powerpc/kernel/cpu_setup_power.S:250: Error: operand out of range (0x0000002000000000 is not between 0xffffffffffff8000 and 0x000000000000ffff)

Fix this by wrapping it with the `_UL` macro.

Fixes: 9908c826d5ed ("Add Power10 PMU feature to DT CPU features")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1595996214-5833-1-git-send-email-atrajeev@linux.vnet.ibm.com

commit | commitdiff | tree

Michael Ellerman [Mon, 27 Jul 2020 03:44:36 +0000 (13:44 +1000)]

powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=y

skiroot_defconfig fails:

arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used
48 | static atomic_t cpus_in_fadump;

Fix it by moving the definition into the #ifdef where it's used.

Fixes: ba608c4fa12c ("powerpc/fadump: fix race between pstore write and fadump crash trigger")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727070341.595634-1-mpe@ellerman.id.au

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:09 +0000 (17:38 -0700)]

powerpc/powernv/pci.h: delete duplicated word

Drop the repeated word "for".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-10-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:08 +0000 (17:38 -0700)]

powerpc/smu.h: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-9-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:07 +0000 (17:38 -0700)]

powerpc/reg.h: delete duplicated word

Drop the repeated word "a".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-8-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:06 +0000 (17:38 -0700)]

powerpc/ppc_asm.h: delete duplicated word

Drop the repeated word "in".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-7-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:05 +0000 (17:38 -0700)]

powerpc/hw_breakpoint.h: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-6-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:04 +0000 (17:38 -0700)]

powerpc/epapr_hcalls.h: delete duplicated words

Drop the repeated words "file" and "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-5-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:03 +0000 (17:38 -0700)]

powerpc/cputime.h: delete duplicated word

Drop the repeated word "use".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-4-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:02 +0000 (17:38 -0700)]

powerpc/book3s/radix-4k.h: delete duplicated word

Drop the repeated word "per".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-3-rdunlap@infradead.org

commit | commitdiff | tree

Randy Dunlap [Sun, 26 Jul 2020 00:38:01 +0000 (17:38 -0700)]

powerpc/book3s/mmu-hash.h: delete duplicated word

Drop the repeated word "below".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-2-rdunlap@infradead.org

commit | commitdiff | tree

Li RongQing [Fri, 26 Apr 2019 11:36:30 +0000 (19:36 +0800)]

powerpc/lib: remove memcpy_flushcache redundant return

Align it with other architectures and none of the callers has
been interested its return

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1556278590-14727-1-git-send-email-lirongqing@baidu.com

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:17:19 +0000 (11:17 +0000)]

powerpc/ptdump: Refactor update of pg_state

In note_page(), the pg_state is updated the same way in two places.

Add note_page_update_state() to do it.

Also include the display of boundary markers there as it is missing
"no level" leg, leading to a mismatch when the first two markers
are at the same address and the first displayed area uses that
address.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a284a809f01c705bbaab303b06fda216f147a99a.1593429426.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:17:18 +0000 (11:17 +0000)]

powerpc/ptdump: Refactor update of st->last_pa

st->last_pa is always updated in note_page() so it can
be done outside the if/elseif/else block.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/610d6b1a60ad0bedef865a90153c1110cfaa507e.1593429426.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:15:26 +0000 (11:15 +0000)]

powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX

When STRICT_KERNEL_RWX is set, we want to set NX bit on vmalloc
segments. But modules require exec.

Use a dedicated segment for modules. There is not much space
above kernel, and we don't waste vmalloc space to do alignment.
Therefore, we take the segment before PAGE_OFFSET for modules.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eb8faba9148b6cf17c696ba776b4e8ee2f6313bf.1593428200.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:15:24 +0000 (11:15 +0000)]

powerpc/32s: Kernel space starts at TASK_SIZE

Kernel space starts at TASK_SIZE. Select kernel page table
when address is over TASK_SIZE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/893425e32cd0a003539573b2d115e0ffa98bc26c.1593428200.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:15:23 +0000 (11:15 +0000)]

powerpc/32: Set user/kernel boundary at TASK_SIZE instead of PAGE_OFFSET

User space stops at TASK_SIZE. At the moment, kernel space starts
at PAGE_OFFSET.

In order to use space between TASK_SIZE and PAGE_OFFSET for modules,
make TASK_SIZE the limit between user and kernel space.

Note that fault.c already considers TASK_SIZE as the boundary between
user and kernel space.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b38b52cd8dabbb56fbd6f9219d6f3cdccbb43b44.1593428200.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:15:22 +0000 (11:15 +0000)]

powerpc/32s: Only leave NX unset on segments used for modules

Instead of leaving NX unset on all segments above the start
of vmalloc space, only leave NX unset on segments used for
modules.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7172c0f5253419315e434a1816ee3d6ed6505bc0.1593428200.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:15:21 +0000 (11:15 +0000)]

powerpc: Use MODULES_VADDR if defined

In order to allow allocation of modules outside of vmalloc space,
use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined.

Redefine module_alloc() when MODULES_VADDR defined.
Unmap corresponding KASAN shadow memory.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7ecf5fff1eef67d450e73fc412b6ec3818483d75.1593428200.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Christophe Leroy [Mon, 29 Jun 2020 11:15:20 +0000 (11:15 +0000)]

powerpc/lib: Prepare code-patching for modules allocated outside vmalloc space

Use is_vmalloc_or_module_addr() instead of is_vmalloc_addr()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d884db0e5a6f521331639d8c0f13e520d5a4fef.1593428200.git.christophe.leroy@csgroup.eu

commit | commitdiff | tree

Wei Yongjun [Sat, 25 Jul 2020 09:19:49 +0000 (17:19 +0800)]

powerpc/papr_scm: Make some symbols static

The sparse tool complains as follows:

arch/powerpc/platforms/pseries/papr_scm.c:97:1: warning:
symbol 'papr_nd_regions' was not declared. Should it be static?
arch/powerpc/platforms/pseries/papr_scm.c:98:1: warning:
symbol 'papr_ndr_lock' was not declared. Should it be static?

Those variables are not used outside of papr_scm.c, so this
commit marks them static.

Fixes: 85343a8da2d9 ("powerpc/papr/scm: Add bad memory ranges to nvdimm bad ranges")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725091949.75234-1-weiyongjun1@huawei.com

commit | commitdiff | tree

Bill Wendling [Fri, 24 Jul 2020 22:49:01 +0000 (15:49 -0700)]

powerpc/64s: allow for clang's objdump differences

Clang's objdump emits slightly different output from GNU's objdump,
causing a list of warnings to be emitted during relocatable builds.
E.g., clang's objdump emits this:

   c000000000000004: 2c 00 00 48  b  0xc000000000000030
   ...
   c000000000005c6c: 10 00 82 40  bf 2, 0xc000000000005c7c

while GNU objdump emits:

   c000000000000004: 2c 00 00 48  b    c000000000000030 <__start+0x30>
   ...
   c000000000005c6c: 10 00 82 40  bne  c000000000005c7c <masked_interrupt+0x3c>

Adjust llvm-objdump's output to remove the extraneous '0x' and convert
'bf' and 'bt' to 'bne' and 'beq' resp. to more closely match GNU
objdump's output.

Note that clang's objdump doesn't yet output the relocation symbols on
PPC.

Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/191c67db31264b69cf6b566fd69851beb3dd0abb.1595630874.git.morbo@google.com

commit | commitdiff | tree

Nicholas Piggin [Fri, 24 Jul 2020 13:14:23 +0000 (23:14 +1000)]

powerpc: Implement smp_cond_load_relaxed()

This implements smp_cond_load_relaxed() with the slowpath busy loop
using the preferred SMT priority pattern.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
[mpe: Make it 64-bit only to fix build errors on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-7-npiggin@gmail.com

commit | commitdiff | tree

Nicholas Piggin [Fri, 24 Jul 2020 13:14:22 +0000 (23:14 +1000)]

powerpc/qspinlock: Optimised atomic_try_cmpxchg_lock() that adds the lock hint

This brings the behaviour of the uncontended fast path back to roughly
equivalent to simple spinlocks -- a single atomic op with lock hint.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-6-npiggin@gmail.com

commit | commitdiff | tree

Nicholas Piggin [Fri, 24 Jul 2020 13:14:21 +0000 (23:14 +1000)]

powerpc/pseries: Implement paravirt qspinlocks for SPLPAR

This implements the generic paravirt qspinlocks using H_PROD and
H_CONFER to kick and wait.

This uses an un-directed yield to any CPU rather than the directed
yield to a pre-empted lock holder that paravirtualised simple
spinlocks use, that requires no kick hcall. This is something that
could be investigated and improved in future.

Performance results can be found in the commit which added queued
spinlocks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-5-npiggin@gmail.com

commit | commitdiff | tree

Nicholas Piggin [Fri, 24 Jul 2020 13:14:20 +0000 (23:14 +1000)]

powerpc/64s: Implement queued spinlocks and rwlocks

These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.

Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).

On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:

https://github.com/linuxppc/issues/issues/305#issuecomment-663487453

Observations are:

- Queued spinlocks are equal when contention is insignificant, as
  expected and as measured with microbenchmarks.

- When there is contention, on bare metal queued spinlocks have better
  throughput and max latency at all points.

- When virtualised, queued spinlocks are slightly worse approaching
  peak throughput, but significantly better throughput and max latency
  at all points beyond peak, until queued spinlock maximum latency
  rises when clients are 2x vCPUs.

The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com

commit | commitdiff | tree

Nicholas Piggin [Fri, 24 Jul 2020 13:14:19 +0000 (23:14 +1000)]

powerpc: Move spinlock implementation to simple_spinlock

To prepare for queued spinlocks. This is a simple rename except to
update preprocessor guard name and a file reference.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-3-npiggin@gmail.com

commit | commitdiff | tree

Nicholas Piggin [Fri, 24 Jul 2020 13:14:18 +0000 (23:14 +1000)]

powerpc/pseries: Move some PAPR paravirt functions to their own file

These functions will be used by the queued spinlock implementation,
and may be useful elsewhere too, so move them out of spinlock.h.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-2-npiggin@gmail.com

commit | commitdiff | tree

Srikar Dronamraju [Fri, 24 Jul 2020 10:58:09 +0000 (16:28 +0530)]

powerpc/numa: Limit possible nodes to within num_possible_nodes

MAX_NUMNODES is a theoretical maximum number of nodes thats is
supported by the kernel. Device tree properties exposes the number of
possible nodes on the current platform. The kernel would detected this
and would use it for most of its resource allocations. If the platform
now increases the nodes to over what was already exposed, then it may
lead to inconsistencies. Hence limit it to the already exposed nodes.

Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724105809.24733-1-srikar@linux.vnet.ibm.com

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Clarify definition of macii_init()

The function prototype correctly specifies the 'static' storage class.
Let the function definition match the declaration for better readability.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c025aed0b1506399b73ff1d1bfa40ed641fcb3e3.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Use the stack for reset request storage

The adb_request struct can be stored on the stack because the request
is synchronous and is completed before the function returns.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a40f80dde90991757007b6962c386a208c970586.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Use unsigned type for autopoll_devs variable

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ca5be30ba745c08c2b7a1f0618f99c61b303e983.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Use bool type for reading_reply variable

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/779551219a11b19e574dfcd87e4ef60af08c4fc3.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Handle poll replies correctly

Userspace applications may use /dev/adb to send Talk requests. Such
requests always have req->reply_expected == 1. The same is true of Talk
requests sent by the kernel, except for poll requests queued internally
by the via-macii driver. Those requests have req->reply_expected == 0.

Consequently, poll reply packets get treated like autopoll reply packets.
(It doesn't make sense to try to distinguish them.) Always enter 'reading'
state after a poll request, so that the reply gets collected and passed
to adb_input(), and none go missing.

All Talk replies passed to adb_input() come from polling or autopolling,
so call adb_input() with the autopoll parameter set to 1.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/754cddfa045e5bfa53e5da199831de02e7d2f27f.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Remove read_done state

The driver state machine may enter the 'read_done' state when leaving the
'idle' or 'reading' state. This transition is pointless, as is the extra
interrupt it requires. The interrupt is produced by the transceiver
(even when it has no data to send) because an extra EVEN/ODD toggle
was signalled by the driver. Drop the extra state to simplify the code.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0253194363af4426f9788796811a6a29fb87c713.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Handle /CTLR_IRQ signal correctly

I'm told that the /CTLR_IRQ signal from the ADB transceiver gets
interpreted by MacOS to mean SRQ, bus timeout or end-of-packet depending
on the circumstances, and that Linux's via-macii driver does not
correctly interpret this signal.

Instead, the via-macii driver interprets certain received byte values
(0x00 and 0xFF) as signalling end of packet and bus timeout
(respectively). Problem is, those values can also appear under other
circumstances.

This patch changes the bus timeout, end of packet and SRQ detection logic
to bring it closer to the logic that MacOS reportedly uses.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6541fda1d8db3ae87c3abe17d189a10dc96e2382.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Poll the device most likely to respond

Poll the most recently polled device by default, rather than the lowest
device address that happens to be enabled in autopoll_devs. This improves
input latency. Re-use macii_queue_poll() rather than duplicate that logic.
This eliminates a static struct and function.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5836f80886ebcfbe5be5fb7e0dc49feed6469712.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]

macintosh/via-macii: Access autopoll_devs when inside lock

The interrupt handler should be excluded when accessing the autopoll_devs
variable.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5952dd8a9bc9de90f1acc4790c51dd42b4c98065.1593318192.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Implement SRQ autopolling

The adb_driver.autopoll method is needed during ADB bus scan and device
address assignment. Implement this method so that the IOP's list of
device addresses can be updated. When the list is empty, disable SRQ
autopolling.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0fb7fdcd99d7820bb27faf1f27f7f6f1923914ef.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Implement sending -> idle state transition

On leaving the 'sending' state, proceed to the 'idle' state if no reply is
expected. Drop redundant test for adb_iop_state == sending && current_req.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6991996dd4aaf0b52cfd650172bf0f6fbe37a452.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Implement idle -> sending state transition

In the present algorithm, the 'idle' state transition does not take
place until there's a bus timeout. Once idle, the driver does not
automatically proceed with the next request.

Change the algorithm so that queued ADB requests will be sent as soon as
the driver becomes idle. This is to take place after the current IOP
message is completed.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dedcdfc62f43e85cc4c2a8d211a7e2fec7bc7c1a.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Resolve static checker warnings

drivers/macintosh/adb-iop.c:215:28: warning: Using plain integer as NULL pointer
drivers/macintosh/adb-iop.c:170:5: warning: symbol 'adb_iop_probe' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:177:5: warning: symbol 'adb_iop_init' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:184:5: warning: symbol 'adb_iop_send_request' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:230:5: warning: symbol 'adb_iop_autopoll' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:236:6: warning: symbol 'adb_iop_poll' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:241:5: warning: symbol 'adb_iop_reset_bus' was not declared. Should it be static?

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/25edf4450abd20e002b166ba3a11189dc1efa906.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Access current_req and adb_iop_state when inside lock

Drop the redundant local_irq_save/restore() from adb_iop_start() because
the caller has to do it anyway. This is the pattern used in via-macii.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbe32b087c7e04d68e2425f6a2df4a414d167c32.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Adopt bus reset algorithm from via-macii driver

This algorithm is slightly shorter and avoids the surprising
adb_iop_start() call in adb_iop_poll().

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b63d56ecb6e75f11a0bf02231f3b2db656a528a3.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Correct comment text

This patch improves comment style and corrects some misunderstandings
in the text.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/996f835d2f3d90baaaf9ee954e252d06e8886c6f.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]

macintosh/adb-iop: Remove dead and redundant code

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7720ffb559c334504e16b24d9c2f3b8973d2d674.1590880623.git.fthain@telegraphics.com.au

commit | commitdiff | tree

Athira Rajeev [Thu, 23 Jul 2020 07:32:37 +0000 (03:32 -0400)]

powerpc/perf: Initialize power10 PMU registers in cpu setup routine

Initialize Monitor Mode Control Register 3 (MMCR3)
SPR which is new in power10. For PowerISA v3.1, BHRB disable
is controlled via Monitor Mode Control Register A (MMCRA) bit,
namely "BHRB Recording Disable (BHRBRD)". This patch also initializes
MMCRA BHRBRD to disable BHRB feature at boot for power10.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1595489557-2047-1-git-send-email-atrajeev@linux.vnet.ibm.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:15 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Remove vfs_expanded

Previously iov->vfs_expanded was used for two purposes.

1) To work out how much we need to multiple the per-VF BAR size to figure
out the total space required for the IOV BAR.

2) To indicate that IOV is not usable with this device (vfs_expanded == 0).

We don't really need the field for either since the multiple in 1) is
always the number PEs supported by the PHB. Similarly, we don't really need
it in 2) either since the IOV data field will be NULL if we can't use IOV
with the device.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-16-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:14 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Make single PE mode a per-BAR setting

Using single PE BARs to map an SR-IOV BAR is really a choice about what
strategy to use when mapping a BAR. It doesn't make much sense for this to
be a global setting since a device might have one large BAR which needs to
be mapped with single PE windows and another smaller BAR that can be mapped
with a regular segmented window. Make the segmented vs single decision a
per-BAR setting and clean up the logic that decides which mode to use.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-15-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:13 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Refactor M64 BAR setup

Split up the logic so that we have one branch that handles setting up a
segmented window and another that handles setting up single PE windows for
each VF.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-14-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:12 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Move M64 BAR allocation into a helper

I want to refactor the loop this code is currently inside of. Hoist it on
out.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-13-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:11 +0000 (16:57 +1000)]

powerpc/powernv/sriov: De-indent setup and teardown

Remove the IODA2 PHB checks. We already assume IODA2 in several places so
there's not much point in wrapping most of the setup and teardown process
in an if block.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-12-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:10 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Drop iov->pe_num_map[]

Currently the iov->pe_num_map[] does one of two things depending on
whether single PE mode is being used or not. When it is, this contains an
array which maps a vf_index to the corresponding PE number. When single PE
mode is not being used this contains a scalar which is the base PE for the
set of enabled VFs (for for VFn is base + n).

The array was necessary because when calling pnv_ioda_alloc_pe() there is
no guarantee that the allocated PEs would be contigious. We can now
allocate contigious blocks of PEs so this is no longer an issue. This
allows us to drop the if (single_mode) {} .. else {} block scattered
through the SR-IOV code which is a nice clean up.

This also fixes a bug in pnv_pci_sriov_disable() which is the non-atomic
bitmap_clear() to manipulate the PE allocation map. Other users of the map
assume it will be accessed with atomic ops.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-11-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:09 +0000 (16:57 +1000)]

powerpc/powernv/pci: Refactor pnv_ioda_alloc_pe()

Rework the PE allocation logic to allow allocating blocks of PEs rather
than individually. We'll use this to allocate contigious blocks of PEs for
the SR-IOVs.

This patch also adds code to pnv_ioda_alloc_pe() and pnv_ioda_reserve_pe() to
use the existing, but unused, phb->pe_alloc_mutex. Currently these functions
use atomic bit ops to release a currently allocated PE number. However,
the pnv_ioda_alloc_pe() wants to have exclusive access to the bit map while
scanning for hole large enough to accomodate the allocation size.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-10-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:08 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Factor out M64 BAR setup

The sequence required to use the single PE BAR mode is kinda janky and
requires a little explanation. The API was designed with P7-IOC style
windows where the setup process is something like:

1. Configure the window start / end address
2. Enable the window
3. Map the segments of each window to the PE

For Single PE BARs the process is:

1. Set the PE for segment zero on a disabled window
2. Set the range
3. Enable the window

Move the OPAL calls into their own helper functions where the quirks can be
contained.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-9-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:07 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Simplify used window tracking

No need for the multi-dimensional arrays, just use a bitmap.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-8-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:06 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Rename truncate_iov

This prevents SR-IOV being used by making the SR-IOV BAR resources
unallocatable. Rename it to reflect what it actually does.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-7-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:05 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Explain how SR-IOV works on PowerNV

SR-IOV support on PowerNV is a byzantine maze of hooks. I have no idea
how anyone is supposed to know how it works except through a lot of
stuffering. Write up some docs about the overall story to help out
the next sucker^Wperson who needs to tinker with it.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-6-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:04 +0000 (16:57 +1000)]

powerpc/powernv/sriov: Move SR-IOV into a separate file

pci-ioda.c is getting a bit unwieldly due to the amount of stuff jammed in
there. The SR-IOV support can be extracted easily enough and is mostly
standalone, so move it into a separate file.

This patch also moves the PowerNV SR-IOV specific fields from pci_dn and
moves them into a platform specific structure. I'm not sure how they ended
up in there in the first place, but leaking platform specifics into common
code has proven to be a terrible idea so far so lets stop doing that.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-5-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:03 +0000 (16:57 +1000)]

powerpc/powernv/pci: Initialise M64 for IODA1 as a 1-1 window

We pre-configure the m64 window for IODA1 as a 1-1 segment-PE mapping,
similar to PHB3. Currently the actual mapping of segments occurs in
pnv_ioda_pick_m64_pe(), but we can move it into pnv_ioda1_init_m64() and
drop the IODA1 specific code paths in the PE setup / teardown.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-4-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:02 +0000 (16:57 +1000)]

powerpc/powernv/pci: Add explicit tracking of the DMA setup state

There's an optimisation in the PE setup which skips performing DMA
setup for a PE if we only have bridges in a PE. The assumption being
that only "real" devices will DMA to system memory, which is probably
fair. However, if we start off with only bridge devices in a PE then
add a non-bridge device the new device won't be able to use DMA because
we never configured it.

Fix this (admittedly pretty weird) edge case by tracking whether we've done
the DMA setup for the PE or not. If a non-bridge device is added to the PE
(via rescan or hotplug, or whatever) we can set up DMA on demand.

This also means the only remaining user of the old "DMA Weight" code is
the IODA1 DMA setup code that it was originally added for, which is good.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-3-oohall@gmail.com

commit | commitdiff | tree

Oliver O'Halloran [Wed, 22 Jul 2020 06:57:01 +0000 (16:57 +1000)]

powerpc/powernv/pci: Always tear down DMA windows on PE release

Currently we have these two functions:

pnv_pci_ioda2_release_dma_pe(), and
pnv_pci_ioda2_release_pe_dma()

The first is used when tearing down VF PEs and the other is used for normal
devices. There's very little difference between the two though. The latter
(non-VF) will skip a call to pnv_pci_ioda2_unset_window() unless
CONFIG_IOMMU_API=y is set. There's no real point in doing this so fold the
two together.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-2-oohall@gmail.com

A mirror of Linus' kernel repository