From: Greg Kroah-Hartman Date: Sat, 6 Jul 2019 05:12:36 +0000 (+0200) Subject: 5.1-stable patches X-Git-Tag: v5.1.17~11 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=7dedd6dfadfdb9698489ee0c10181f32c6d2a64e;p=thirdparty%2Fkernel%2Fstable-queue.git 5.1-stable patches added patches: kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch svcrdma-ignore-source-port-when-computing-drc-hash.patch --- diff --git a/queue-5.1/kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch b/queue-5.1/kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch new file mode 100644 index 00000000000..1273e9fbfb3 --- /dev/null +++ b/queue-5.1/kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch @@ -0,0 +1,138 @@ +From bb34e690e9340bc155ebed5a3d75fc63ff69e082 Mon Sep 17 00:00:00 2001 +From: Wanpeng Li +Date: Tue, 2 Jul 2019 17:25:02 +0800 +Subject: KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Wanpeng Li + +commit bb34e690e9340bc155ebed5a3d75fc63ff69e082 upstream. + +Thomas reported that: + + | Background: + | + | In preparation of supporting IPI shorthands I changed the CPU offline + | code to software disable the local APIC instead of just masking it. + | That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV + | register. + | + | Failure: + | + | When the CPU comes back online the startup code triggers occasionally + | the warning in apic_pending_intr_clear(). That complains that the IRRs + | are not empty. + | + | The offending vector is the local APIC timer vector who's IRR bit is set + | and stays set. + | + | It took me quite some time to reproduce the issue locally, but now I can + | see what happens. + | + | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1 + | (and hardware support) it behaves correctly. + | + | Here is the series of events: + | + | Guest CPU + | + | goes down + | + | native_cpu_disable() + | + | apic_soft_disable(); + | + | play_dead() + | + | .... + | + | startup() + | + | if (apic_enabled()) + | apic_pending_intr_clear() <- Not taken + | + | enable APIC + | + | apic_pending_intr_clear() <- Triggers warning because IRR is stale + | + | When this happens then the deadline timer or the regular APIC timer - + | happens with both, has fired shortly before the APIC is disabled, but the + | interrupt was not serviced because the guest CPU was in an interrupt + | disabled region at that point. + | + | The state of the timer vector ISR/IRR bits: + | + | ISR IRR + | before apic_soft_disable() 0 1 + | after apic_soft_disable() 0 1 + | + | On startup 0 1 + | + | Now one would assume that the IRR is cleared after the INIT reset, but this + | happens only on CPU0. + | + | Why? + | + | Because our CPU0 hotplug is just for testing to make sure nothing breaks + | and goes through an NMI wakeup vehicle because INIT would send it through + | the boots-trap code which is not really working if that CPU was not + | physically unplugged. + | + | Now looking at a real world APIC the situation in that case is: + | + | ISR IRR + | before apic_soft_disable() 0 1 + | after apic_soft_disable() 0 1 + | + | On startup 0 0 + | + | Why? + | + | Once the dying CPU reenables interrupts the pending interrupt gets + | delivered as a spurious interupt and then the state is clear. + | + | While that CPU0 hotplug test case is surely an esoteric issue, the APIC + | emulation is still wrong, Even if the play_dead() code would not enable + | interrupts then the pending IRR bit would turn into an ISR .. interrupt + | when the APIC is reenabled on startup. + +From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled +* Pending interrupts in the IRR and ISR registers are held and require + masking or handling by the CPU. + +In Thomas's testing, hardware cpu will not respect soft disable LAPIC +when IRR has already been set or APICv posted-interrupt is in flight, +so we can skip soft disable APIC checking when clearing IRR and set ISR, +continue to respect soft disable APIC when attempting to set IRR. + +Reported-by: Rong Chen +Reported-by: Feng Tang +Reported-by: Thomas Gleixner +Tested-by: Thomas Gleixner +Cc: Paolo Bonzini +Cc: Radim Krčmář +Cc: Thomas Gleixner +Cc: Rong Chen +Cc: Feng Tang +Cc: stable@vger.kernel.org +Signed-off-by: Wanpeng Li +Signed-off-by: Paolo Bonzini +Signed-off-by: Greg Kroah-Hartman + +--- + arch/x86/kvm/lapic.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/x86/kvm/lapic.c ++++ b/arch/x86/kvm/lapic.c +@@ -2331,7 +2331,7 @@ int kvm_apic_has_interrupt(struct kvm_vc + struct kvm_lapic *apic = vcpu->arch.apic; + u32 ppr; + +- if (!apic_enabled(apic)) ++ if (!kvm_apic_hw_enabled(apic)) + return -1; + + __apic_update_ppr(apic, &ppr); diff --git a/queue-5.1/nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch b/queue-5.1/nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch new file mode 100644 index 00000000000..ec70cccbada --- /dev/null +++ b/queue-5.1/nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch @@ -0,0 +1,68 @@ +From 3b2d4dcf71c4a91b420f835e52ddea8192300a3b Mon Sep 17 00:00:00 2001 +From: Paul Menzel +Date: Wed, 3 Jul 2019 13:28:15 +0200 +Subject: nfsd: Fix overflow causing non-working mounts on 1 TB machines +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Paul Menzel + +commit 3b2d4dcf71c4a91b420f835e52ddea8192300a3b upstream. + +Since commit 10a68cdf10 (nfsd: fix performance-limiting session +calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with +1 TB of memory cannot be mounted anymore. The mount just hangs on the +client. + +The gist of commit 10a68cdf10 is the change below. + + -avail = clamp_t(int, avail, slotsize, avail/3); + +avail = clamp_t(int, avail, slotsize, total_avail/3); + +Here are the macros. + + #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) + #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) + +`total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts +the values to `int`, which for 32-bit integers can only hold values +−2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). + +`avail` (in the function signature) is just 65536, so that no overflow +was happening. Before the commit the assignment would result in 21845, +and `num = 4`. + +When using `total_avail`, it is causing the assignment to be +18446744072226137429 (printed as %lu), and `num` is then 4164608182. + +My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the +server thinks there is no memory available any more for this client. + +Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` +fixes the issue. + +Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but +`num = 4` remains the same. + +Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) +Cc: stable@vger.kernel.org +Signed-off-by: Paul Menzel +Signed-off-by: J. Bruce Fields +Signed-off-by: Greg Kroah-Hartman + +--- + fs/nfsd/nfs4state.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/fs/nfsd/nfs4state.c ++++ b/fs/nfsd/nfs4state.c +@@ -1562,7 +1562,7 @@ static u32 nfsd4_get_drc_mem(struct nfsd + * Never use more than a third of the remaining memory, + * unless it's the only way to give this client a slot: + */ +- avail = clamp_t(int, avail, slotsize, total_avail/3); ++ avail = clamp_t(unsigned long, avail, slotsize, total_avail/3); + num = min_t(int, num, avail / slotsize); + nfsd_drc_mem_used += num * slotsize; + spin_unlock(&nfsd_drc_lock); diff --git a/queue-5.1/series b/queue-5.1/series index 2f3a267a62b..8836ea5ee0e 100644 --- a/queue-5.1/series +++ b/queue-5.1/series @@ -83,3 +83,6 @@ btrfs-ensure-replaced-device-doesn-t-have-pending-chunk-allocation.patch tty-rocket-fix-incorrect-forward-declaration-of-rp_i.patch s390-mm-fix-pxd_bad-with-folded-page-tables.patch kvm-x86-degrade-warn-to-pr_warn_ratelimited.patch +kvm-lapic-fix-pending-interrupt-in-irr-blocked-by-software-disable-lapic.patch +nfsd-fix-overflow-causing-non-working-mounts-on-1-tb-machines.patch +svcrdma-ignore-source-port-when-computing-drc-hash.patch diff --git a/queue-5.1/svcrdma-ignore-source-port-when-computing-drc-hash.patch b/queue-5.1/svcrdma-ignore-source-port-when-computing-drc-hash.patch new file mode 100644 index 00000000000..0f5920fe612 --- /dev/null +++ b/queue-5.1/svcrdma-ignore-source-port-when-computing-drc-hash.patch @@ -0,0 +1,54 @@ +From 1e091c3bbf51d34d5d96337a59ce5ab2ac3ba2cc Mon Sep 17 00:00:00 2001 +From: Chuck Lever +Date: Tue, 11 Jun 2019 11:01:16 -0400 +Subject: svcrdma: Ignore source port when computing DRC hash + +From: Chuck Lever + +commit 1e091c3bbf51d34d5d96337a59ce5ab2ac3ba2cc upstream. + +The DRC appears to be effectively empty after an RPC/RDMA transport +reconnect. The problem is that each connection uses a different +source port, which defeats the DRC hash. + +Clients always have to disconnect before they send retransmissions +to reset the connection's credit accounting, thus every retransmit +on NFS/RDMA will miss the DRC. + +An NFS/RDMA client's IP source port is meaningless for RDMA +transports. The transport layer typically sets the source port value +on the connection to a random ephemeral port. The server already +ignores it for the "secure port" check. See commit 16e4d93f6de7 +("NFSD: Ignore client's source port on RDMA transports"). + +The Linux NFS server's DRC resolves XID collisions from the same +source IP address by using the checksum of the first 200 bytes of +the RPC call header. + +Signed-off-by: Chuck Lever +Cc: stable@vger.kernel.org # v4.14+ +Signed-off-by: J. Bruce Fields +Signed-off-by: Greg Kroah-Hartman + +--- + net/sunrpc/xprtrdma/svc_rdma_transport.c | 7 ++++++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c ++++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c +@@ -211,9 +211,14 @@ static void handle_connect_req(struct rd + /* Save client advertised inbound read limit for use later in accept. */ + newxprt->sc_ord = param->initiator_depth; + +- /* Set the local and remote addresses in the transport */ + sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr; + svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa)); ++ /* The remote port is arbitrary and not under the control of the ++ * client ULP. Set it to a fixed value so that the DRC continues ++ * to be effective after a reconnect. ++ */ ++ rpc_set_port((struct sockaddr *)&newxprt->sc_xprt.xpt_remote, 0); ++ + sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.src_addr; + svc_xprt_set_local(&newxprt->sc_xprt, sa, svc_addr_len(sa)); +