From: Greg Kroah-Hartman Date: Wed, 4 Jun 2014 05:43:27 +0000 (-0700) Subject: 3.14-stable patches X-Git-Tag: v3.14.6~51 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=9545df791b898cb1fa01d6e642cad52ce2ff80bf;p=thirdparty%2Fkernel%2Fstable-queue.git 3.14-stable patches added patches: arm-mvebu-mvebu-soc-id-add-missing-clk_put-call.patch arm-mvebu-mvebu-soc-id-keep-clock-enabled-if-pcie-unit-is-enabled.patch bus-mvebu-mbus-allow-several-windows-with-the-same-target-attribute.patch memory-mvebu-devbus-fix-the-conversion-of-the-bus-width.patch nfsd-call-rpc_destroy_wait_queue-from-free_client.patch nfsd-call-set_acl-with-a-null-acl-structure-if-no-entries.patch nfsd-move-default-initialisers-from-create_client-to.patch nfsd4-remove-lockowner-when-removing-lock-stateid.patch nfsd4-warn-on-finding-lockowner-without-stateid-s.patch pci-mvebu-fix-off-by-one-in-the-computed-size-of-the-mbus-windows.patch pci-mvebu-split-pcie-bars-into-multiple-mbus-windows-when-needed.patch percpu-make-pcpu_alloc_chunk-use-pcpu_mem_free-instead-of-kfree.patch workqueue-fix-a-possible-race-condition-between-rescuer-and-pwq-release.patch workqueue-fix-bugs-in-wq_update_unbound_numa-failure-path.patch workqueue-make-rescuer_thread-empty-wq-maydays-list-before-exiting.patch --- diff --git a/queue-3.14/arm-mvebu-mvebu-soc-id-add-missing-clk_put-call.patch b/queue-3.14/arm-mvebu-mvebu-soc-id-add-missing-clk_put-call.patch new file mode 100644 index 00000000000..581cba42d99 --- /dev/null +++ b/queue-3.14/arm-mvebu-mvebu-soc-id-add-missing-clk_put-call.patch @@ -0,0 +1,39 @@ +From 42a18d1cf484d02e23afadfa5dc09356e6bef9fa Mon Sep 17 00:00:00 2001 +From: Thomas Petazzoni +Date: Mon, 12 May 2014 16:11:39 +0200 +Subject: ARM: mvebu: mvebu-soc-id: add missing clk_put() call + +From: Thomas Petazzoni + +commit 42a18d1cf484d02e23afadfa5dc09356e6bef9fa upstream. + +The mvebu-soc-id code in mach-mvebu/ needs to enable a clock to read +the SoC device ID and revision number. To do so, it does a clk_get(), +then a clk_prepare_enable(), reads the value, and disables the clock +with clk_disable_unprepare(). However, it forgets to clk_put() the +clock. This commit fixes this issue. + +Signed-off-by: Thomas Petazzoni +Link: https://lkml.kernel.org/r/1399903900-29977-2-git-send-email-thomas.petazzoni@free-electrons.com +Fixes: af8d1c63afcb ("ARM: mvebu: Add support to get the ID and the revision of a SoC") +Acked-by: Gregory CLEMENT +Tested-by: Gregory CLEMENT +Tested-by: Andrew Lunn +Tested-by: Willy Tarreau +Signed-off-by: Jason Cooper +Signed-off-by: Greg Kroah-Hartman + +--- + arch/arm/mach-mvebu/mvebu-soc-id.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/arch/arm/mach-mvebu/mvebu-soc-id.c ++++ b/arch/arm/mach-mvebu/mvebu-soc-id.c +@@ -108,6 +108,7 @@ static int __init mvebu_soc_id_init(void + + res_ioremap: + clk_disable_unprepare(clk); ++ clk_put(clk); + + clk_err: + of_node_put(child); diff --git a/queue-3.14/arm-mvebu-mvebu-soc-id-keep-clock-enabled-if-pcie-unit-is-enabled.patch b/queue-3.14/arm-mvebu-mvebu-soc-id-keep-clock-enabled-if-pcie-unit-is-enabled.patch new file mode 100644 index 00000000000..acf415cf93a --- /dev/null +++ b/queue-3.14/arm-mvebu-mvebu-soc-id-keep-clock-enabled-if-pcie-unit-is-enabled.patch @@ -0,0 +1,79 @@ +From b25bcf1bcaf6687991ae08dd76cd784bf9fe3d05 Mon Sep 17 00:00:00 2001 +From: Thomas Petazzoni +Date: Mon, 12 May 2014 16:11:40 +0200 +Subject: ARM: mvebu: mvebu-soc-id: keep clock enabled if PCIe unit is enabled + +From: Thomas Petazzoni + +commit b25bcf1bcaf6687991ae08dd76cd784bf9fe3d05 upstream. + +Since the mvebu-soc-id code in mach-mvebu/ was introduced, several +users have noticed a regression: the PCIe card connected in the first +PCIe interface is not detected properly. + +This is due to the fact that the mvebu-soc-id code enables the PCIe +clock of the first PCIe interface, reads the SoC device ID and +revision number (yes this information is made available as part of +PCIe registers), and then disables the clock. However, by doing this, +we gate the clock and therefore loose the complex PCIe configuration +that was done by the bootloader. + +Unfortunately, as of today, the kernel is not capable of doing this +complex configuration by itself, so we really need to keep the PCIe +clock enabled. However, we don't want to keep it enabled +unconditionally: if the PCIe interface is not enabled or PCI support +is not compiled into the kernel, there is no reason to keep the PCIe +clock running. + +This issue was discussed with Kevin Hilman, and the suggested solution +was to make the mvebu-soc-id code keep the clock enabled in case it +will be needed for PCIe. This is therefore the solution implemented in +this patch. + +Long term, we hope to make the kernel more capable in terms of PCIe +configuration for this platform, which will anyway be needed to +support the compilation of the PCIe host controller driver as a +module. In the mean time however, we don't have much other choice than +to implement the currently proposed solution. + +Reported-by: Neil Greatorex +Cc: Neil Greatorex +Cc: Jason Gunthorpe +Cc: Kevin Hilman +Signed-off-by: Thomas Petazzoni +Link: https://lkml.kernel.org/r/1399903900-29977-3-git-send-email-thomas.petazzoni@free-electrons.com +Fixes: af8d1c63afcb ("ARM: mvebu: Add support to get the ID and the revision of a SoC") +Acked-by: Gregory CLEMENT +Tested-by: Gregory CLEMENT +Tested-by: Andrew Lunn +Tested-by: Willy Tarreau +Signed-off-by: Jason Cooper +Signed-off-by: Greg Kroah-Hartman + +--- + arch/arm/mach-mvebu/mvebu-soc-id.c | 14 ++++++++++++-- + 1 file changed, 12 insertions(+), 2 deletions(-) + +--- a/arch/arm/mach-mvebu/mvebu-soc-id.c ++++ b/arch/arm/mach-mvebu/mvebu-soc-id.c +@@ -107,8 +107,18 @@ static int __init mvebu_soc_id_init(void + iounmap(pci_base); + + res_ioremap: +- clk_disable_unprepare(clk); +- clk_put(clk); ++ /* ++ * If the PCIe unit is actually enabled and we have PCI ++ * support in the kernel, we intentionally do not release the ++ * reference to the clock. We want to keep it running since ++ * the bootloader does some PCIe link configuration that the ++ * kernel is for now unable to do, and gating the clock would ++ * make us loose this precious configuration. ++ */ ++ if (!of_device_is_available(child) || !IS_ENABLED(CONFIG_PCI_MVEBU)) { ++ clk_disable_unprepare(clk); ++ clk_put(clk); ++ } + + clk_err: + of_node_put(child); diff --git a/queue-3.14/bus-mvebu-mbus-allow-several-windows-with-the-same-target-attribute.patch b/queue-3.14/bus-mvebu-mbus-allow-several-windows-with-the-same-target-attribute.patch new file mode 100644 index 00000000000..a7c290c179d --- /dev/null +++ b/queue-3.14/bus-mvebu-mbus-allow-several-windows-with-the-same-target-attribute.patch @@ -0,0 +1,40 @@ +From b566e782be32145664d96ada3e389f17d32742e5 Mon Sep 17 00:00:00 2001 +From: Thomas Petazzoni +Date: Fri, 18 Apr 2014 14:19:52 +0200 +Subject: bus: mvebu-mbus: allow several windows with the same target/attribute + +From: Thomas Petazzoni + +commit b566e782be32145664d96ada3e389f17d32742e5 upstream. + +Having multiple windows with the same target and attribute is actually +legal, and can be useful for PCIe windows, when PCIe BARs have a size +that isn't a power of two, and we therefore need to create several +MBus windows to cover the PCIe BAR for a given PCIe interface. + +Fixes: fddddb52a6c4 ('bus: introduce an Marvell EBU MBus driver') +Signed-off-by: Thomas Petazzoni +Link: https://lkml.kernel.org/r/1397823593-1932-7-git-send-email-thomas.petazzoni@free-electrons.com +Tested-by: Neil Greatorex +Signed-off-by: Jason Cooper +Signed-off-by: Greg Kroah-Hartman + +--- + drivers/bus/mvebu-mbus.c | 6 ------ + 1 file changed, 6 deletions(-) + +--- a/drivers/bus/mvebu-mbus.c ++++ b/drivers/bus/mvebu-mbus.c +@@ -222,12 +222,6 @@ static int mvebu_mbus_window_conflicts(s + */ + if ((u64)base < wend && end > wbase) + return 0; +- +- /* +- * Check if target/attribute conflicts +- */ +- if (target == wtarget && attr == wattr) +- return 0; + } + + return 1; diff --git a/queue-3.14/memory-mvebu-devbus-fix-the-conversion-of-the-bus-width.patch b/queue-3.14/memory-mvebu-devbus-fix-the-conversion-of-the-bus-width.patch new file mode 100644 index 00000000000..41fca4e9325 --- /dev/null +++ b/queue-3.14/memory-mvebu-devbus-fix-the-conversion-of-the-bus-width.patch @@ -0,0 +1,64 @@ +From ce965c3d2e68c5325dd5624eb101d70423022fef Mon Sep 17 00:00:00 2001 +From: Thomas Petazzoni +Date: Mon, 14 Apr 2014 17:29:18 +0200 +Subject: memory: mvebu-devbus: fix the conversion of the bus width + +From: Thomas Petazzoni + +commit ce965c3d2e68c5325dd5624eb101d70423022fef upstream. + +According to the Armada 370 and Armada XP datasheets, the part of the +Device Bus register that configure the bus width should contain 0 for +a 8 bits bus width, and 1 for a 16 bits bus width (other values are +unsupported/reserved). + +However, the current conversion done in the driver to convert from a +bus width in bits to the value expected by the register leads to +setting the register to 1 for a 8 bits bus, and 2 for a 16 bits bus. + +This mistake was compensated by a mistake in the existing Device Tree +files for Armada 370/XP platforms: they were declaring a 8 bits bus +width, while the hardware in fact uses a 16 bits bus width. + +This commit fixes that by adjusting the conversion logic. + +This patch fixes a bug that was introduced in +3edad321b1bd2e6c8b5f38146c115c8982438f06 ('drivers: memory: Introduce +Marvell EBU Device Bus driver'), which was merged in v3.11. + +Signed-off-by: Thomas Petazzoni +Link: https://lkml.kernel.org/r/1397489361-5833-2-git-send-email-thomas.petazzoni@free-electrons.com +Fixes: 3edad321b1bd ('drivers: memory: Introduce Marvell EBU Device Bus driver') +Acked-by: Ezequiel Garcia +Acked-by: Gregory CLEMENT +Signed-off-by: Jason Cooper +Signed-off-by: Greg Kroah-Hartman + +--- + drivers/memory/mvebu-devbus.c | 15 +++++++++++++-- + 1 file changed, 13 insertions(+), 2 deletions(-) + +--- a/drivers/memory/mvebu-devbus.c ++++ b/drivers/memory/mvebu-devbus.c +@@ -108,8 +108,19 @@ static int devbus_set_timing_params(stru + node->full_name); + return err; + } +- /* Convert bit width to byte width */ +- r.bus_width /= 8; ++ ++ /* ++ * The bus width is encoded into the register as 0 for 8 bits, ++ * and 1 for 16 bits, so we do the necessary conversion here. ++ */ ++ if (r.bus_width == 8) ++ r.bus_width = 0; ++ else if (r.bus_width == 16) ++ r.bus_width = 1; ++ else { ++ dev_err(devbus->dev, "invalid bus width %d\n", r.bus_width); ++ return -EINVAL; ++ } + + err = get_timing_param_ps(devbus, node, "devbus,badr-skew-ps", + &r.badr_skew); diff --git a/queue-3.14/nfsd-call-rpc_destroy_wait_queue-from-free_client.patch b/queue-3.14/nfsd-call-rpc_destroy_wait_queue-from-free_client.patch new file mode 100644 index 00000000000..f3f5ed7b1a5 --- /dev/null +++ b/queue-3.14/nfsd-call-rpc_destroy_wait_queue-from-free_client.patch @@ -0,0 +1,29 @@ +From 4cb57e3032d4e4bf5e97780e9907da7282b02b0c Mon Sep 17 00:00:00 2001 +From: Trond Myklebust +Date: Fri, 18 Apr 2014 14:43:57 -0400 +Subject: NFSd: call rpc_destroy_wait_queue() from free_client() + +From: Trond Myklebust + +commit 4cb57e3032d4e4bf5e97780e9907da7282b02b0c upstream. + +Mainly to ensure that we don't leave any hanging timers. + +Signed-off-by: Trond Myklebust +Signed-off-by: J. Bruce Fields +Signed-off-by: Greg Kroah-Hartman + +--- + fs/nfsd/nfs4state.c | 1 + + 1 file changed, 1 insertion(+) + +--- a/fs/nfsd/nfs4state.c ++++ b/fs/nfsd/nfs4state.c +@@ -1107,6 +1107,7 @@ free_client(struct nfs4_client *clp) + WARN_ON_ONCE(atomic_read(&ses->se_ref)); + free_session(ses); + } ++ rpc_destroy_wait_queue(&clp->cl_cb_waitq); + free_svc_cred(&clp->cl_cred); + kfree(clp->cl_name.data); + idr_destroy(&clp->cl_stateids); diff --git a/queue-3.14/nfsd-call-set_acl-with-a-null-acl-structure-if-no-entries.patch b/queue-3.14/nfsd-call-set_acl-with-a-null-acl-structure-if-no-entries.patch new file mode 100644 index 00000000000..f9c96e4866e --- /dev/null +++ b/queue-3.14/nfsd-call-set_acl-with-a-null-acl-structure-if-no-entries.patch @@ -0,0 +1,140 @@ +From aa07c713ecfc0522916f3cd57ac628ea6127c0ec Mon Sep 17 00:00:00 2001 +From: Kinglong Mee +Date: Fri, 18 Apr 2014 20:49:04 +0800 +Subject: NFSD: Call ->set_acl with a NULL ACL structure if no entries + +From: Kinglong Mee + +commit aa07c713ecfc0522916f3cd57ac628ea6127c0ec upstream. + +After setting ACL for directory, I got two problems that caused +by the cached zero-length default posix acl. + +This patch make sure nfsd4_set_nfs4_acl calls ->set_acl +with a NULL ACL structure if there are no entries. + +Thanks for Christoph Hellwig's advice. + +First problem: +............ hang ........... + +Second problem: +[ 1610.167668] ------------[ cut here ]------------ +[ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239! +[ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC +[ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE) +rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack +rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables +ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 +ip6table_mangle ip6table_security ip6table_raw ip6table_filter +ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 +nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw +auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus +snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev +i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi +[last unloaded: nfsd] +[ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G OE +3.15.0-rc1+ #15 +[ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS +VirtualBox 12/01/2006 +[ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti: +ffff88005a944000 +[ 1610.168320] RIP: 0010:[] [] +_posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd] +[ 1610.168320] RSP: 0018:ffff88005a945b00 EFLAGS: 00010293 +[ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX: +0000000000000000 +[ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI: +ffff880068233300 +[ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09: +0000000000000000 +[ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12: +ffff880068233300 +[ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15: +ffff880068233300 +[ 1610.168320] FS: 0000000000000000(0000) GS:ffff880077800000(0000) +knlGS:0000000000000000 +[ 1610.168320] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b +[ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4: +00000000000006f0 +[ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: +0000000000000000 +[ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: +0000000000000400 +[ 1610.168320] Stack: +[ 1610.168320] ffffffff00000000 0000000b67c83500 000000076700bac0 +0000000000000000 +[ 1610.168320] ffff88006700bac0 ffff880068233300 ffff88005a945c08 +0000000000000002 +[ 1610.168320] 0000000000000000 ffff88005a945b88 ffffffffa034e2d5 +000000065a945b68 +[ 1610.168320] Call Trace: +[ 1610.168320] [] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd] +[ 1610.168320] [] nfsd4_encode_fattr+0x646/0x1e70 [nfsd] +[ 1610.168320] [] ? kmemleak_alloc+0x4e/0xb0 +[ 1610.168320] [] ? +nfsd_setuser_and_check_port+0x52/0x80 [nfsd] +[ 1610.168320] [] ? selinux_cred_prepare+0x1b/0x30 +[ 1610.168320] [] nfsd4_encode_getattr+0x5a/0x60 [nfsd] +[ 1610.168320] [] nfsd4_encode_operation+0x67/0x110 +[nfsd] +[ 1610.168320] [] nfsd4_proc_compound+0x21d/0x810 [nfsd] +[ 1610.168320] [] nfsd_dispatch+0xbb/0x200 [nfsd] +[ 1610.168320] [] svc_process_common+0x46d/0x6d0 [sunrpc] +[ 1610.168320] [] svc_process+0x103/0x170 [sunrpc] +[ 1610.168320] [] nfsd+0xbf/0x130 [nfsd] +[ 1610.168320] [] ? nfsd_destroy+0x80/0x80 [nfsd] +[ 1610.168320] [] kthread+0xd2/0xf0 +[ 1610.168320] [] ? insert_kthread_work+0x40/0x40 +[ 1610.168320] [] ret_from_fork+0x7c/0xb0 +[ 1610.168320] [] ? insert_kthread_work+0x40/0x40 +[ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce +41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd +ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c +[ 1610.168320] RIP [] _posix_to_nfsv4_one+0x3cd/0x3d0 +[nfsd] +[ 1610.168320] RSP +[ 1610.257313] ---[ end trace 838254e3e352285b ]--- + +Signed-off-by: Kinglong Mee +Signed-off-by: J. Bruce Fields +Signed-off-by: Greg Kroah-Hartman + +--- + fs/nfsd/nfs4acl.c | 17 +++++++++-------- + 1 file changed, 9 insertions(+), 8 deletions(-) + +--- a/fs/nfsd/nfs4acl.c ++++ b/fs/nfsd/nfs4acl.c +@@ -402,8 +402,10 @@ sort_pacl(struct posix_acl *pacl) + * by uid/gid. */ + int i, j; + +- if (pacl->a_count <= 4) +- return; /* no users or groups */ ++ /* no users or groups */ ++ if (!pacl || pacl->a_count <= 4) ++ return; ++ + i = 1; + while (pacl->a_entries[i].e_tag == ACL_USER) + i++; +@@ -530,13 +532,12 @@ posix_state_to_acl(struct posix_acl_stat + + /* + * ACLs with no ACEs are treated differently in the inheritable +- * and effective cases: when there are no inheritable ACEs, we +- * set a zero-length default posix acl: ++ * and effective cases: when there are no inheritable ACEs, ++ * calls ->set_acl with a NULL ACL structure. + */ +- if (state->empty && (flags & NFS4_ACL_TYPE_DEFAULT)) { +- pacl = posix_acl_alloc(0, GFP_KERNEL); +- return pacl ? pacl : ERR_PTR(-ENOMEM); +- } ++ if (state->empty && (flags & NFS4_ACL_TYPE_DEFAULT)) ++ return NULL; ++ + /* + * When there are no effective ACEs, the following will end + * up setting a 3-element effective posix ACL with all diff --git a/queue-3.14/nfsd-move-default-initialisers-from-create_client-to.patch b/queue-3.14/nfsd-move-default-initialisers-from-create_client-to.patch new file mode 100644 index 00000000000..0d8219b9a93 --- /dev/null +++ b/queue-3.14/nfsd-move-default-initialisers-from-create_client-to.patch @@ -0,0 +1,72 @@ +From 5694c93e6c4954fa9424c215f75eeb919bddad64 Mon Sep 17 00:00:00 2001 +From: Trond Myklebust +Date: Fri, 18 Apr 2014 14:43:56 -0400 +Subject: NFSd: Move default initialisers from create_client() to + alloc_client() + +From: Trond Myklebust + +commit 5694c93e6c4954fa9424c215f75eeb919bddad64 upstream. + +Aside from making it clearer what is non-trivial in create_client(), it +also fixes a bug whereby we can call free_client() before idr_init() +has been called. + +Signed-off-by: Trond Myklebust +Signed-off-by: J. Bruce Fields +Signed-off-by: Greg Kroah-Hartman + +--- + fs/nfsd/nfs4state.c | 24 ++++++++++++------------ + 1 file changed, 12 insertions(+), 12 deletions(-) + +--- a/fs/nfsd/nfs4state.c ++++ b/fs/nfsd/nfs4state.c +@@ -1078,6 +1078,18 @@ static struct nfs4_client *alloc_client( + return NULL; + } + clp->cl_name.len = name.len; ++ INIT_LIST_HEAD(&clp->cl_sessions); ++ idr_init(&clp->cl_stateids); ++ atomic_set(&clp->cl_refcount, 0); ++ clp->cl_cb_state = NFSD4_CB_UNKNOWN; ++ INIT_LIST_HEAD(&clp->cl_idhash); ++ INIT_LIST_HEAD(&clp->cl_openowners); ++ INIT_LIST_HEAD(&clp->cl_delegations); ++ INIT_LIST_HEAD(&clp->cl_lru); ++ INIT_LIST_HEAD(&clp->cl_callbacks); ++ INIT_LIST_HEAD(&clp->cl_revoked); ++ spin_lock_init(&clp->cl_lock); ++ rpc_init_wait_queue(&clp->cl_cb_waitq, "Backchannel slot table"); + return clp; + } + +@@ -1347,7 +1359,6 @@ static struct nfs4_client *create_client + if (clp == NULL) + return NULL; + +- INIT_LIST_HEAD(&clp->cl_sessions); + ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred); + if (ret) { + spin_lock(&nn->client_lock); +@@ -1355,20 +1366,9 @@ static struct nfs4_client *create_client + spin_unlock(&nn->client_lock); + return NULL; + } +- idr_init(&clp->cl_stateids); +- atomic_set(&clp->cl_refcount, 0); +- clp->cl_cb_state = NFSD4_CB_UNKNOWN; +- INIT_LIST_HEAD(&clp->cl_idhash); +- INIT_LIST_HEAD(&clp->cl_openowners); +- INIT_LIST_HEAD(&clp->cl_delegations); +- INIT_LIST_HEAD(&clp->cl_lru); +- INIT_LIST_HEAD(&clp->cl_callbacks); +- INIT_LIST_HEAD(&clp->cl_revoked); +- spin_lock_init(&clp->cl_lock); + nfsd4_init_callback(&clp->cl_cb_null); + clp->cl_time = get_seconds(); + clear_bit(0, &clp->cl_cb_slot_busy); +- rpc_init_wait_queue(&clp->cl_cb_waitq, "Backchannel slot table"); + copy_verf(clp, verf); + rpc_copy_addr((struct sockaddr *) &clp->cl_addr, sa); + gen_confirm(clp); diff --git a/queue-3.14/nfsd4-remove-lockowner-when-removing-lock-stateid.patch b/queue-3.14/nfsd4-remove-lockowner-when-removing-lock-stateid.patch new file mode 100644 index 00000000000..bf04782eeb5 --- /dev/null +++ b/queue-3.14/nfsd4-remove-lockowner-when-removing-lock-stateid.patch @@ -0,0 +1,48 @@ +From a1b8ff4c97b4375d21b6d6c45d75877303f61b3b Mon Sep 17 00:00:00 2001 +From: "J. Bruce Fields" +Date: Tue, 20 May 2014 15:55:21 -0400 +Subject: nfsd4: remove lockowner when removing lock stateid + +From: "J. Bruce Fields" + +commit a1b8ff4c97b4375d21b6d6c45d75877303f61b3b upstream. + +The nfsv4 state code has always assumed a one-to-one correspondance +between lock stateid's and lockowners even if it appears not to in some +places. + +We may actually change that, but for now when FREE_STATEID releases a +lock stateid it also needs to release the parent lockowner. + +Symptoms were a subsequent LOCK crashing in find_lockowner_str when it +calls same_lockowner_ino on a lockowner that unexpectedly has an empty +so_stateids list. + +Signed-off-by: J. Bruce Fields +Signed-off-by: Greg Kroah-Hartman + +--- + fs/nfsd/nfs4state.c | 11 +++++++++-- + 1 file changed, 9 insertions(+), 2 deletions(-) + +--- a/fs/nfsd/nfs4state.c ++++ b/fs/nfsd/nfs4state.c +@@ -3714,9 +3714,16 @@ out: + static __be32 + nfsd4_free_lock_stateid(struct nfs4_ol_stateid *stp) + { +- if (check_for_locks(stp->st_file, lockowner(stp->st_stateowner))) ++ struct nfs4_lockowner *lo = lockowner(stp->st_stateowner); ++ ++ if (check_for_locks(stp->st_file, lo)) + return nfserr_locks_held; +- release_lock_stateid(stp); ++ /* ++ * Currently there's a 1-1 lock stateid<->lockowner ++ * correspondance, and we have to delete the lockowner when we ++ * delete the lock stateid: ++ */ ++ unhash_lockowner(lo); + return nfs_ok; + } + diff --git a/queue-3.14/nfsd4-warn-on-finding-lockowner-without-stateid-s.patch b/queue-3.14/nfsd4-warn-on-finding-lockowner-without-stateid-s.patch new file mode 100644 index 00000000000..23d9e9ff647 --- /dev/null +++ b/queue-3.14/nfsd4-warn-on-finding-lockowner-without-stateid-s.patch @@ -0,0 +1,32 @@ +From 27b11428b7de097c42f205beabb1764f4365443b Mon Sep 17 00:00:00 2001 +From: "J. Bruce Fields" +Date: Thu, 8 May 2014 11:19:41 -0400 +Subject: nfsd4: warn on finding lockowner without stateid's + +From: "J. Bruce Fields" + +commit 27b11428b7de097c42f205beabb1764f4365443b upstream. + +The current code assumes a one-to-one lockowner<->lock stateid +correspondance. + +Signed-off-by: J. Bruce Fields +Signed-off-by: Greg Kroah-Hartman + +--- + fs/nfsd/nfs4state.c | 4 ++++ + 1 file changed, 4 insertions(+) + +--- a/fs/nfsd/nfs4state.c ++++ b/fs/nfsd/nfs4state.c +@@ -4156,6 +4156,10 @@ static bool same_lockowner_ino(struct nf + + if (!same_owner_str(&lo->lo_owner, owner, clid)) + return false; ++ if (list_empty(&lo->lo_owner.so_stateids)) { ++ WARN_ON_ONCE(1); ++ return false; ++ } + lst = list_first_entry(&lo->lo_owner.so_stateids, + struct nfs4_ol_stateid, st_perstateowner); + return lst->st_file->fi_inode == inode; diff --git a/queue-3.14/pci-mvebu-fix-off-by-one-in-the-computed-size-of-the-mbus-windows.patch b/queue-3.14/pci-mvebu-fix-off-by-one-in-the-computed-size-of-the-mbus-windows.patch new file mode 100644 index 00000000000..b5ae17d88ee --- /dev/null +++ b/queue-3.14/pci-mvebu-fix-off-by-one-in-the-computed-size-of-the-mbus-windows.patch @@ -0,0 +1,53 @@ +From b6d07e0273d3296cfbdc88145b8a00ddbefb310a Mon Sep 17 00:00:00 2001 +From: Willy Tarreau +Date: Fri, 18 Apr 2014 14:19:50 +0200 +Subject: PCI: mvebu: fix off-by-one in the computed size of the mbus windows + +From: Willy Tarreau + +commit b6d07e0273d3296cfbdc88145b8a00ddbefb310a upstream. + +mvebu_pcie_handle_membase_change() and +mvebu_pcie_handle_iobase_change() do not correctly compute the window +size. PCI uses an inclusive start/end address pair, which requires a ++1 when converting to size. + +This only worked because a bug in the mbus driver allowed it to +silently accept and round up bogus sizes. + +Fix this by adding one to the computed size. + +Fixes: 45361a4fe446 ('PCIe driver for Marvell Armada 370/XP systems') +Signed-off-by: Willy Tarreau +Reviewed-By: Jason Gunthorpe +Signed-off-by: Thomas Petazzoni +Link: https://lkml.kernel.org/r/1397823593-1932-5-git-send-email-thomas.petazzoni@free-electrons.com +Tested-by: Neil Greatorex +Acked-by: Bjorn Helgaas +Signed-off-by: Jason Cooper +Signed-off-by: Greg Kroah-Hartman + +--- + drivers/pci/host/pci-mvebu.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/drivers/pci/host/pci-mvebu.c ++++ b/drivers/pci/host/pci-mvebu.c +@@ -329,7 +329,7 @@ static void mvebu_pcie_handle_iobase_cha + port->iowin_base = port->pcie->io.start + iobase; + port->iowin_size = ((0xFFF | ((port->bridge.iolimit & 0xF0) << 8) | + (port->bridge.iolimitupper << 16)) - +- iobase); ++ iobase) + 1; + + mvebu_mbus_add_window_remap_by_id(port->io_target, port->io_attr, + port->iowin_base, port->iowin_size, +@@ -362,7 +362,7 @@ static void mvebu_pcie_handle_membase_ch + port->memwin_base = ((port->bridge.membase & 0xFFF0) << 16); + port->memwin_size = + (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - +- port->memwin_base; ++ port->memwin_base + 1; + + mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, + port->memwin_base, port->memwin_size); diff --git a/queue-3.14/pci-mvebu-split-pcie-bars-into-multiple-mbus-windows-when-needed.patch b/queue-3.14/pci-mvebu-split-pcie-bars-into-multiple-mbus-windows-when-needed.patch new file mode 100644 index 00000000000..bf38f3204de --- /dev/null +++ b/queue-3.14/pci-mvebu-split-pcie-bars-into-multiple-mbus-windows-when-needed.patch @@ -0,0 +1,175 @@ +From 398f5d5e10b6b917cd9d35ef21d545b0afbada22 Mon Sep 17 00:00:00 2001 +From: Thomas Petazzoni +Date: Fri, 18 Apr 2014 14:19:53 +0200 +Subject: PCI: mvebu: split PCIe BARs into multiple MBus windows when needed + +From: Thomas Petazzoni + +commit 398f5d5e10b6b917cd9d35ef21d545b0afbada22 upstream. + +MBus windows are used on Marvell platforms to map certain peripherals +in the physical address space. In the PCIe context, MBus windows are +needed to map PCIe I/O and memory regions in the physical address. + +However, those MBus windows can only have power of two sizes, while +PCIe BAR do not necessarily guarantee this. For this reason, the +current pci-mvebu breaks on platforms where PCIe devices have BARs +that don't sum up to a power of two size at the emulated bridge level. + +This commit fixes this by allowing the pci-mvebu driver to create +multiple contiguous MBus windows (each having a power of two size) to +cover a given PCIe BAR. + +To achieve this, two functions are added: mvebu_pcie_add_windows() and +mvebu_pcie_del_windows() to respectively add and remove all the MBus +windows that are needed to map the provided PCIe region base and +size. The emulated PCI bridge code now calls those functions, instead +of directly calling the mvebu-mbus driver functions. + +Fixes: 45361a4fe446 ('pci: PCIe driver for Marvell Armada 370/XP systems') +Signed-off-by: Thomas Petazzoni +Link: https://lkml.kernel.org/r/1397823593-1932-8-git-send-email-thomas.petazzoni@free-electrons.com +Tested-by: Neil Greatorex +Acked-by: Bjorn Helgaas +Signed-off-by: Jason Cooper +Signed-off-by: Greg Kroah-Hartman + +--- + drivers/pci/host/pci-mvebu.c | 88 ++++++++++++++++++++++++++++++++++++------- + 1 file changed, 74 insertions(+), 14 deletions(-) + +--- a/drivers/pci/host/pci-mvebu.c ++++ b/drivers/pci/host/pci-mvebu.c +@@ -291,6 +291,58 @@ static int mvebu_pcie_hw_wr_conf(struct + return PCIBIOS_SUCCESSFUL; + } + ++/* ++ * Remove windows, starting from the largest ones to the smallest ++ * ones. ++ */ ++static void mvebu_pcie_del_windows(struct mvebu_pcie_port *port, ++ phys_addr_t base, size_t size) ++{ ++ while (size) { ++ size_t sz = 1 << (fls(size) - 1); ++ ++ mvebu_mbus_del_window(base, sz); ++ base += sz; ++ size -= sz; ++ } ++} ++ ++/* ++ * MBus windows can only have a power of two size, but PCI BARs do not ++ * have this constraint. Therefore, we have to split the PCI BAR into ++ * areas each having a power of two size. We start from the largest ++ * one (i.e highest order bit set in the size). ++ */ ++static void mvebu_pcie_add_windows(struct mvebu_pcie_port *port, ++ unsigned int target, unsigned int attribute, ++ phys_addr_t base, size_t size, ++ phys_addr_t remap) ++{ ++ size_t size_mapped = 0; ++ ++ while (size) { ++ size_t sz = 1 << (fls(size) - 1); ++ int ret; ++ ++ ret = mvebu_mbus_add_window_remap_by_id(target, attribute, base, ++ sz, remap); ++ if (ret) { ++ dev_err(&port->pcie->pdev->dev, ++ "Could not create MBus window at 0x%x, size 0x%x: %d\n", ++ base, sz, ret); ++ mvebu_pcie_del_windows(port, base - size_mapped, ++ size_mapped); ++ return; ++ } ++ ++ size -= sz; ++ size_mapped += sz; ++ base += sz; ++ if (remap != MVEBU_MBUS_NO_REMAP) ++ remap += sz; ++ } ++} ++ + static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port) + { + phys_addr_t iobase; +@@ -302,8 +354,8 @@ static void mvebu_pcie_handle_iobase_cha + + /* If a window was configured, remove it */ + if (port->iowin_base) { +- mvebu_mbus_del_window(port->iowin_base, +- port->iowin_size); ++ mvebu_pcie_del_windows(port, port->iowin_base, ++ port->iowin_size); + port->iowin_base = 0; + port->iowin_size = 0; + } +@@ -331,9 +383,9 @@ static void mvebu_pcie_handle_iobase_cha + (port->bridge.iolimitupper << 16)) - + iobase) + 1; + +- mvebu_mbus_add_window_remap_by_id(port->io_target, port->io_attr, +- port->iowin_base, port->iowin_size, +- iobase); ++ mvebu_pcie_add_windows(port, port->io_target, port->io_attr, ++ port->iowin_base, port->iowin_size, ++ iobase); + } + + static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) +@@ -344,8 +396,8 @@ static void mvebu_pcie_handle_membase_ch + + /* If a window was configured, remove it */ + if (port->memwin_base) { +- mvebu_mbus_del_window(port->memwin_base, +- port->memwin_size); ++ mvebu_pcie_del_windows(port, port->memwin_base, ++ port->memwin_size); + port->memwin_base = 0; + port->memwin_size = 0; + } +@@ -364,8 +416,9 @@ static void mvebu_pcie_handle_membase_ch + (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - + port->memwin_base + 1; + +- mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr, +- port->memwin_base, port->memwin_size); ++ mvebu_pcie_add_windows(port, port->mem_target, port->mem_attr, ++ port->memwin_base, port->memwin_size, ++ MVEBU_MBUS_NO_REMAP); + } + + /* +@@ -721,14 +774,21 @@ static resource_size_t mvebu_pcie_align_ + + /* + * On the PCI-to-PCI bridge side, the I/O windows must have at +- * least a 64 KB size and be aligned on their size, and the +- * memory windows must have at least a 1 MB size and be +- * aligned on their size ++ * least a 64 KB size and the memory windows must have at ++ * least a 1 MB size. Moreover, MBus windows need to have a ++ * base address aligned on their size, and their size must be ++ * a power of two. This means that if the BAR doesn't have a ++ * power of two size, several MBus windows will actually be ++ * created. We need to ensure that the biggest MBus window ++ * (which will be the first one) is aligned on its size, which ++ * explains the rounddown_pow_of_two() being done here. + */ + if (res->flags & IORESOURCE_IO) +- return round_up(start, max_t(resource_size_t, SZ_64K, size)); ++ return round_up(start, max_t(resource_size_t, SZ_64K, ++ rounddown_pow_of_two(size))); + else if (res->flags & IORESOURCE_MEM) +- return round_up(start, max_t(resource_size_t, SZ_1M, size)); ++ return round_up(start, max_t(resource_size_t, SZ_1M, ++ rounddown_pow_of_two(size))); + else + return start; + } diff --git a/queue-3.14/percpu-make-pcpu_alloc_chunk-use-pcpu_mem_free-instead-of-kfree.patch b/queue-3.14/percpu-make-pcpu_alloc_chunk-use-pcpu_mem_free-instead-of-kfree.patch new file mode 100644 index 00000000000..1a37854fe4c --- /dev/null +++ b/queue-3.14/percpu-make-pcpu_alloc_chunk-use-pcpu_mem_free-instead-of-kfree.patch @@ -0,0 +1,42 @@ +From 5a838c3b60e3a36ade764cf7751b8f17d7c9c2da Mon Sep 17 00:00:00 2001 +From: Jianyu Zhan +Date: Mon, 14 Apr 2014 13:47:40 +0800 +Subject: percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() + +From: Jianyu Zhan + +commit 5a838c3b60e3a36ade764cf7751b8f17d7c9c2da upstream. + +pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) + + BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long) + +It hardly could be ever bigger than PAGE_SIZE even for large-scale machine, +but for consistency with its couterpart pcpu_mem_zalloc(), +use pcpu_mem_free() instead. + +Commit b4916cb17c26 ("percpu: make pcpu_free_chunk() use +pcpu_mem_free() instead of kfree()") addressed this problem, but +missed this one. + +tj: commit message updated + +Signed-off-by: Jianyu Zhan +Signed-off-by: Tejun Heo +Fixes: 099a19d91ca4 ("percpu: allow limited allocation before slab is online) +Signed-off-by: Greg Kroah-Hartman + +--- + mm/percpu.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/percpu.c ++++ b/mm/percpu.c +@@ -612,7 +612,7 @@ static struct pcpu_chunk *pcpu_alloc_chu + chunk->map = pcpu_mem_zalloc(PCPU_DFL_MAP_ALLOC * + sizeof(chunk->map[0])); + if (!chunk->map) { +- kfree(chunk); ++ pcpu_mem_free(chunk, pcpu_chunk_struct_size); + return NULL; + } + diff --git a/queue-3.14/series b/queue-3.14/series index 920e89d26a6..d7d6b674e31 100644 --- a/queue-3.14/series +++ b/queue-3.14/series @@ -145,3 +145,18 @@ input-synaptics-add-a-matches_pnp_id-helper-function.patch input-synaptics-change-min-max-quirk-table-to-pnp-id-matching.patch alsa-hda-hdmi-set-converter-channel-count-even-without-sink.patch alsa-hda-fix-onboard-audio-on-intel-h97-z97-chipsets.patch +nfsd-move-default-initialisers-from-create_client-to.patch +nfsd-call-rpc_destroy_wait_queue-from-free_client.patch +nfsd-call-set_acl-with-a-null-acl-structure-if-no-entries.patch +nfsd4-warn-on-finding-lockowner-without-stateid-s.patch +nfsd4-remove-lockowner-when-removing-lock-stateid.patch +workqueue-fix-bugs-in-wq_update_unbound_numa-failure-path.patch +workqueue-fix-a-possible-race-condition-between-rescuer-and-pwq-release.patch +workqueue-make-rescuer_thread-empty-wq-maydays-list-before-exiting.patch +memory-mvebu-devbus-fix-the-conversion-of-the-bus-width.patch +pci-mvebu-fix-off-by-one-in-the-computed-size-of-the-mbus-windows.patch +bus-mvebu-mbus-allow-several-windows-with-the-same-target-attribute.patch +pci-mvebu-split-pcie-bars-into-multiple-mbus-windows-when-needed.patch +arm-mvebu-mvebu-soc-id-add-missing-clk_put-call.patch +arm-mvebu-mvebu-soc-id-keep-clock-enabled-if-pcie-unit-is-enabled.patch +percpu-make-pcpu_alloc_chunk-use-pcpu_mem_free-instead-of-kfree.patch diff --git a/queue-3.14/workqueue-fix-a-possible-race-condition-between-rescuer-and-pwq-release.patch b/queue-3.14/workqueue-fix-a-possible-race-condition-between-rescuer-and-pwq-release.patch new file mode 100644 index 00000000000..3f3936b412d --- /dev/null +++ b/queue-3.14/workqueue-fix-a-possible-race-condition-between-rescuer-and-pwq-release.patch @@ -0,0 +1,60 @@ +From 77668c8b559e4fe2acf2a0749c7c83cde49a5025 Mon Sep 17 00:00:00 2001 +From: Lai Jiangshan +Date: Fri, 18 Apr 2014 11:04:16 -0400 +Subject: workqueue: fix a possible race condition between rescuer and pwq-release + +From: Lai Jiangshan + +commit 77668c8b559e4fe2acf2a0749c7c83cde49a5025 upstream. + +There is a race condition between rescuer_thread() and +pwq_unbound_release_workfn(). + +Even after a pwq is scheduled for rescue, the associated work items +may be consumed by any worker. If all of them are consumed before the +rescuer gets to them and the pwq's base ref was put due to attribute +change, the pwq may be released while still being linked on +@wq->maydays list making the rescuer dereference already freed pwq +later. + +Make send_mayday() pin the target pwq until the rescuer is done with +it. + +tj: Updated comment and patch description. + +Signed-off-by: Lai Jiangshan +Signed-off-by: Tejun Heo +Signed-off-by: Greg Kroah-Hartman + +--- + kernel/workqueue.c | 12 ++++++++++++ + 1 file changed, 12 insertions(+) + +--- a/kernel/workqueue.c ++++ b/kernel/workqueue.c +@@ -1909,6 +1909,12 @@ static void send_mayday(struct work_stru + + /* mayday mayday mayday */ + if (list_empty(&pwq->mayday_node)) { ++ /* ++ * If @pwq is for an unbound wq, its base ref may be put at ++ * any time due to an attribute change. Pin @pwq until the ++ * rescuer is done with it. ++ */ ++ get_pwq(pwq); + list_add_tail(&pwq->mayday_node, &wq->maydays); + wake_up_process(wq->rescuer->task); + } +@@ -2438,6 +2444,12 @@ repeat: + process_scheduled_works(rescuer); + + /* ++ * Put the reference grabbed by send_mayday(). @pool won't ++ * go away while we're holding its lock. ++ */ ++ put_pwq(pwq); ++ ++ /* + * Leave this pool. If keep_working() is %true, notify a + * regular worker; otherwise, we end up with 0 concurrency + * and stalling the execution. diff --git a/queue-3.14/workqueue-fix-bugs-in-wq_update_unbound_numa-failure-path.patch b/queue-3.14/workqueue-fix-bugs-in-wq_update_unbound_numa-failure-path.patch new file mode 100644 index 00000000000..0268fb26e83 --- /dev/null +++ b/queue-3.14/workqueue-fix-bugs-in-wq_update_unbound_numa-failure-path.patch @@ -0,0 +1,43 @@ +From 77f300b198f93328c26191b52655ce1b62e202cf Mon Sep 17 00:00:00 2001 +From: Daeseok Youn +Date: Wed, 16 Apr 2014 14:32:29 +0900 +Subject: workqueue: fix bugs in wq_update_unbound_numa() failure path + +From: Daeseok Youn + +commit 77f300b198f93328c26191b52655ce1b62e202cf upstream. + +wq_update_unbound_numa() failure path has the following two bugs. + +- alloc_unbound_pwq() is called without holding wq->mutex; however, if + the allocation fails, it jumps to out_unlock which tries to unlock + wq->mutex. + +- The function should switch to dfl_pwq on failure but didn't do so + after alloc_unbound_pwq() failure. + +Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on +alloc_unbound_pwq() failure. + +Signed-off-by: Daeseok Youn +Acked-by: Lai Jiangshan +Signed-off-by: Tejun Heo +Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues") +Signed-off-by: Greg Kroah-Hartman + +--- + kernel/workqueue.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/kernel/workqueue.c ++++ b/kernel/workqueue.c +@@ -4093,7 +4093,8 @@ static void wq_update_unbound_numa(struc + if (!pwq) { + pr_warning("workqueue: allocation failed while updating NUMA affinity of \"%s\"\n", + wq->name); +- goto out_unlock; ++ mutex_lock(&wq->mutex); ++ goto use_dfl_pwq; + } + + /* diff --git a/queue-3.14/workqueue-make-rescuer_thread-empty-wq-maydays-list-before-exiting.patch b/queue-3.14/workqueue-make-rescuer_thread-empty-wq-maydays-list-before-exiting.patch new file mode 100644 index 00000000000..d24d87f7360 --- /dev/null +++ b/queue-3.14/workqueue-make-rescuer_thread-empty-wq-maydays-list-before-exiting.patch @@ -0,0 +1,78 @@ +From 4d595b866d2c653dc90a492b9973a834eabfa354 Mon Sep 17 00:00:00 2001 +From: Lai Jiangshan +Date: Fri, 18 Apr 2014 11:04:16 -0400 +Subject: workqueue: make rescuer_thread() empty wq->maydays list before exiting + +From: Lai Jiangshan + +commit 4d595b866d2c653dc90a492b9973a834eabfa354 upstream. + +After a @pwq is scheduled for emergency execution, other workers may +consume the affectd work items before the rescuer gets to them. This +means that a workqueue many have pwqs queued on @wq->maydays list +while not having any work item pending or in-flight. If +destroy_workqueue() executes in such condition, the rescuer may exit +without emptying @wq->maydays. + +This currently doesn't cause any actual harm. destroy_workqueue() can +safely destroy all the involved data structures whether @wq->maydays +is populated or not as nobody access the list once the rescuer exits. + +However, this is nasty and makes future development difficult. Let's +update rescuer_thread() so that it empties @wq->maydays after seeing +should_stop to guarantee that the list is empty on rescuer exit. + +tj: Updated comment and patch description. + +Signed-off-by: Lai Jiangshan +Signed-off-by: Tejun Heo +Signed-off-by: Greg Kroah-Hartman + +--- + kernel/workqueue.c | 21 ++++++++++++++++----- + 1 file changed, 16 insertions(+), 5 deletions(-) + +--- a/kernel/workqueue.c ++++ b/kernel/workqueue.c +@@ -2397,6 +2397,7 @@ static int rescuer_thread(void *__rescue + struct worker *rescuer = __rescuer; + struct workqueue_struct *wq = rescuer->rescue_wq; + struct list_head *scheduled = &rescuer->scheduled; ++ bool should_stop; + + set_user_nice(current, RESCUER_NICE_LEVEL); + +@@ -2408,11 +2409,15 @@ static int rescuer_thread(void *__rescue + repeat: + set_current_state(TASK_INTERRUPTIBLE); + +- if (kthread_should_stop()) { +- __set_current_state(TASK_RUNNING); +- rescuer->task->flags &= ~PF_WQ_WORKER; +- return 0; +- } ++ /* ++ * By the time the rescuer is requested to stop, the workqueue ++ * shouldn't have any work pending, but @wq->maydays may still have ++ * pwq(s) queued. This can happen by non-rescuer workers consuming ++ * all the work items before the rescuer got to them. Go through ++ * @wq->maydays processing before acting on should_stop so that the ++ * list is always empty on exit. ++ */ ++ should_stop = kthread_should_stop(); + + /* see whether any pwq is asking for help */ + spin_lock_irq(&wq_mayday_lock); +@@ -2464,6 +2469,12 @@ repeat: + + spin_unlock_irq(&wq_mayday_lock); + ++ if (should_stop) { ++ __set_current_state(TASK_RUNNING); ++ rescuer->task->flags &= ~PF_WQ_WORKER; ++ return 0; ++ } ++ + /* rescuers should never participate in concurrency management */ + WARN_ON_ONCE(!(rescuer->flags & WORKER_NOT_RUNNING)); + schedule();