]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/commitdiff
5.1-stable patches
authorGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Sat, 6 Jul 2019 05:12:36 +0000 (07:12 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Sat, 6 Jul 2019 05:12:36 +0000 (07:12 +0200)
added patches:
kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch
nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch
svcrdma-ignore-source-port-when-computing-drc-hash.patch

queue-5.1/kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch [new file with mode: 0644]
queue-5.1/nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch [new file with mode: 0644]
queue-5.1/series
queue-5.1/svcrdma-ignore-source-port-when-computing-drc-hash.patch [new file with mode: 0644]

diff --git a/queue-5.1/kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch b/queue-5.1/kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch
new file mode 100644 (file)
index 0000000..1273e9f
--- /dev/null
@@ -0,0 +1,138 @@
+From bb34e690e9340bc155ebed5a3d75fc63ff69e082 Mon Sep 17 00:00:00 2001
+From: Wanpeng Li <wanpengli@tencent.com>
+Date: Tue, 2 Jul 2019 17:25:02 +0800
+Subject: KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Wanpeng Li <wanpengli@tencent.com>
+
+commit bb34e690e9340bc155ebed5a3d75fc63ff69e082 upstream.
+
+Thomas reported that:
+
+ | Background:
+ |
+ |    In preparation of supporting IPI shorthands I changed the CPU offline
+ |    code to software disable the local APIC instead of just masking it.
+ |    That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV
+ |    register.
+ |
+ | Failure:
+ |
+ |    When the CPU comes back online the startup code triggers occasionally
+ |    the warning in apic_pending_intr_clear(). That complains that the IRRs
+ |    are not empty.
+ |
+ |    The offending vector is the local APIC timer vector who's IRR bit is set
+ |    and stays set.
+ |
+ | It took me quite some time to reproduce the issue locally, but now I can
+ | see what happens.
+ |
+ | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1
+ | (and hardware support) it behaves correctly.
+ |
+ | Here is the series of events:
+ |
+ |     Guest CPU
+ |
+ |     goes down
+ |
+ |       native_cpu_disable()
+ |
+ |                     apic_soft_disable();
+ |
+ |     play_dead()
+ |
+ |     ....
+ |
+ |     startup()
+ |
+ |       if (apic_enabled())
+ |         apic_pending_intr_clear()   <- Not taken
+ |
+ |      enable APIC
+ |
+ |         apic_pending_intr_clear()   <- Triggers warning because IRR is stale
+ |
+ | When this happens then the deadline timer or the regular APIC timer -
+ | happens with both, has fired shortly before the APIC is disabled, but the
+ | interrupt was not serviced because the guest CPU was in an interrupt
+ | disabled region at that point.
+ |
+ | The state of the timer vector ISR/IRR bits:
+ |
+ |                                     ISR     IRR
+ | before apic_soft_disable()    0           1
+ | after apic_soft_disable()     0           1
+ |
+ | On startup                           0            1
+ |
+ | Now one would assume that the IRR is cleared after the INIT reset, but this
+ | happens only on CPU0.
+ |
+ | Why?
+ |
+ | Because our CPU0 hotplug is just for testing to make sure nothing breaks
+ | and goes through an NMI wakeup vehicle because INIT would send it through
+ | the boots-trap code which is not really working if that CPU was not
+ | physically unplugged.
+ |
+ | Now looking at a real world APIC the situation in that case is:
+ |
+ |                                     ISR     IRR
+ | before apic_soft_disable()    0           1
+ | after apic_soft_disable()     0           1
+ |
+ | On startup                           0            0
+ |
+ | Why?
+ |
+ | Once the dying CPU reenables interrupts the pending interrupt gets
+ | delivered as a spurious interupt and then the state is clear.
+ |
+ | While that CPU0 hotplug test case is surely an esoteric issue, the APIC
+ | emulation is still wrong, Even if the play_dead() code would not enable
+ | interrupts then the pending IRR bit would turn into an ISR .. interrupt
+ | when the APIC is reenabled on startup.
+
+From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled
+* Pending interrupts in the IRR and ISR registers are held and require
+  masking or handling by the CPU.
+
+In Thomas's testing, hardware cpu will not respect soft disable LAPIC
+when IRR has already been set or APICv posted-interrupt is in flight,
+so we can skip soft disable APIC checking when clearing IRR and set ISR,
+continue to respect soft disable APIC when attempting to set IRR.
+
+Reported-by: Rong Chen <rong.a.chen@intel.com>
+Reported-by: Feng Tang <feng.tang@intel.com>
+Reported-by: Thomas Gleixner <tglx@linutronix.de>
+Tested-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Paolo Bonzini <pbonzini@redhat.com>
+Cc: Radim Krčmář <rkrcmar@redhat.com>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Cc: Rong Chen <rong.a.chen@intel.com>
+Cc: Feng Tang <feng.tang@intel.com>
+Cc: stable@vger.kernel.org
+Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
+Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ arch/x86/kvm/lapic.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/x86/kvm/lapic.c
++++ b/arch/x86/kvm/lapic.c
+@@ -2331,7 +2331,7 @@ int kvm_apic_has_interrupt(struct kvm_vc
+       struct kvm_lapic *apic = vcpu->arch.apic;
+       u32 ppr;
+-      if (!apic_enabled(apic))
++      if (!kvm_apic_hw_enabled(apic))
+               return -1;
+       __apic_update_ppr(apic, &ppr);
diff --git a/queue-5.1/nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch b/queue-5.1/nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch
new file mode 100644 (file)
index 0000000..ec70ccc
--- /dev/null
@@ -0,0 +1,68 @@
+From 3b2d4dcf71c4a91b420f835e52ddea8192300a3b Mon Sep 17 00:00:00 2001
+From: Paul Menzel <pmenzel@molgen.mpg.de>
+Date: Wed, 3 Jul 2019 13:28:15 +0200
+Subject: nfsd: Fix overflow causing non-working mounts on 1 TB machines
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Paul Menzel <pmenzel@molgen.mpg.de>
+
+commit 3b2d4dcf71c4a91b420f835e52ddea8192300a3b upstream.
+
+Since commit 10a68cdf10 (nfsd: fix performance-limiting session
+calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with
+1 TB of memory cannot be mounted anymore. The mount just hangs on the
+client.
+
+The gist of commit 10a68cdf10 is the change below.
+
+    -avail = clamp_t(int, avail, slotsize, avail/3);
+    +avail = clamp_t(int, avail, slotsize, total_avail/3);
+
+Here are the macros.
+
+    #define min_t(type, x, y)       __careful_cmp((type)(x), (type)(y), <)
+    #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi)
+
+`total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts
+the values to `int`, which for 32-bit integers can only hold values
+−2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1).
+
+`avail` (in the function signature) is just 65536, so that no overflow
+was happening. Before the commit the assignment would result in 21845,
+and `num = 4`.
+
+When using `total_avail`, it is causing the assignment to be
+18446744072226137429 (printed as %lu), and `num` is then 4164608182.
+
+My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the
+server thinks there is no memory available any more for this client.
+
+Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long`
+fixes the issue.
+
+Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but
+`num = 4` remains the same.
+
+Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation)
+Cc: stable@vger.kernel.org
+Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
+Signed-off-by: J. Bruce Fields <bfields@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/nfsd/nfs4state.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/fs/nfsd/nfs4state.c
++++ b/fs/nfsd/nfs4state.c
+@@ -1562,7 +1562,7 @@ static u32 nfsd4_get_drc_mem(struct nfsd
+        * Never use more than a third of the remaining memory,
+        * unless it's the only way to give this client a slot:
+        */
+-      avail = clamp_t(int, avail, slotsize, total_avail/3);
++      avail = clamp_t(unsigned long, avail, slotsize, total_avail/3);
+       num = min_t(int, num, avail / slotsize);
+       nfsd_drc_mem_used += num * slotsize;
+       spin_unlock(&nfsd_drc_lock);
index 2f3a267a62bfc2b56e1c633091e606dac3dacb86..8836ea5ee0e4894576f015f94c8f0474bef4bca7 100644 (file)
@@ -83,3 +83,6 @@ btrfs-ensure-replaced-device-doesn-t-have-pending-chunk-allocation.patch
 tty-rocket-fix-incorrect-forward-declaration-of-rp_i.patch
 s390-mm-fix-pxd_bad-with-folded-page-tables.patch
 kvm-x86-degrade-warn-to-pr_warn_ratelimited.patch
+kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch
+nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch
+svcrdma-ignore-source-port-when-computing-drc-hash.patch
diff --git a/queue-5.1/svcrdma-ignore-source-port-when-computing-drc-hash.patch b/queue-5.1/svcrdma-ignore-source-port-when-computing-drc-hash.patch
new file mode 100644 (file)
index 0000000..0f5920f
--- /dev/null
@@ -0,0 +1,54 @@
+From 1e091c3bbf51d34d5d96337a59ce5ab2ac3ba2cc Mon Sep 17 00:00:00 2001
+From: Chuck Lever <chuck.lever@oracle.com>
+Date: Tue, 11 Jun 2019 11:01:16 -0400
+Subject: svcrdma: Ignore source port when computing DRC hash
+
+From: Chuck Lever <chuck.lever@oracle.com>
+
+commit 1e091c3bbf51d34d5d96337a59ce5ab2ac3ba2cc upstream.
+
+The DRC appears to be effectively empty after an RPC/RDMA transport
+reconnect. The problem is that each connection uses a different
+source port, which defeats the DRC hash.
+
+Clients always have to disconnect before they send retransmissions
+to reset the connection's credit accounting, thus every retransmit
+on NFS/RDMA will miss the DRC.
+
+An NFS/RDMA client's IP source port is meaningless for RDMA
+transports. The transport layer typically sets the source port value
+on the connection to a random ephemeral port. The server already
+ignores it for the "secure port" check. See commit 16e4d93f6de7
+("NFSD: Ignore client's source port on RDMA transports").
+
+The Linux NFS server's DRC resolves XID collisions from the same
+source IP address by using the checksum of the first 200 bytes of
+the RPC call header.
+
+Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
+Cc: stable@vger.kernel.org # v4.14+
+Signed-off-by: J. Bruce Fields <bfields@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ net/sunrpc/xprtrdma/svc_rdma_transport.c |    7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
++++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
+@@ -211,9 +211,14 @@ static void handle_connect_req(struct rd
+       /* Save client advertised inbound read limit for use later in accept. */
+       newxprt->sc_ord = param->initiator_depth;
+-      /* Set the local and remote addresses in the transport */
+       sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
+       svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
++      /* The remote port is arbitrary and not under the control of the
++       * client ULP. Set it to a fixed value so that the DRC continues
++       * to be effective after a reconnect.
++       */
++      rpc_set_port((struct sockaddr *)&newxprt->sc_xprt.xpt_remote, 0);
++
+       sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.src_addr;
+       svc_xprt_set_local(&newxprt->sc_xprt, sa, svc_addr_len(sa));