From: Greg Kroah-Hartman Date: Fri, 14 May 2021 07:49:19 +0000 (+0200) Subject: fix up md patch that quilt messed up X-Git-Tag: v5.10.37~1 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=1f732abf740ed1be13ec8aef7b2ce5db81b5f2a5;p=thirdparty%2Fkernel%2Fstable-queue.git fix up md patch that quilt messed up --- diff --git a/queue-4.14/md-md_open-returns-ebusy-when-entering-racing-area.patch b/queue-4.14/md-md_open-returns-ebusy-when-entering-racing-area.patch index 67b09e36806..3dbe4fcde4f 100644 --- a/queue-4.14/md-md_open-returns-ebusy-when-entering-racing-area.patch +++ b/queue-4.14/md-md_open-returns-ebusy-when-entering-racing-area.patch @@ -23,13 +23,115 @@ md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. +*** env *** +kvm-qemu VM 2C1G with 2 iscsi luns +kernel should be non-preempt + +*** script *** + +about trigger every time with below script + +``` +1 node1="mdcluster1" +2 node2="mdcluster2" +3 +4 mdadm -Ss +5 ssh ${node2} "mdadm -Ss" +6 wipefs -a /dev/sda /dev/sdb +7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ + /dev/sdb --assume-clean +8 +9 for i in {1..10}; do +10 echo ==== $i ====; +11 +12 echo "test ...." +13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" +14 sleep 1 +15 +16 echo "clean ....." +17 ssh ${node2} "mdadm -Ss" +18 done +``` + +I use mdcluster env to trigger soft lockup, but it isn't mdcluster +speical bug. To stop md array in mdcluster env will do more jobs than +non-cluster array, which will leave enough time/gap to allow kernel to +run md_open. + +*** stack *** + +``` +[ 884.226509] mddev_put+0x1c/0xe0 [md_mod] +[ 884.226515] md_open+0x3c/0xe0 [md_mod] +[ 884.226518] __blkdev_get+0x30d/0x710 +[ 884.226520] ? bd_acquire+0xd0/0xd0 +[ 884.226522] blkdev_get+0x14/0x30 +[ 884.226524] do_dentry_open+0x204/0x3a0 +[ 884.226531] path_openat+0x2fc/0x1520 +[ 884.226534] ? seq_printf+0x4e/0x70 +[ 884.226536] do_filp_open+0x9b/0x110 +[ 884.226542] ? md_release+0x20/0x20 [md_mod] +[ 884.226543] ? seq_read+0x1d8/0x3e0 +[ 884.226545] ? kmem_cache_alloc+0x18a/0x270 +[ 884.226547] ? do_sys_open+0x1bd/0x260 +[ 884.226548] do_sys_open+0x1bd/0x260 +[ 884.226551] do_syscall_64+0x5b/0x1e0 +[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 +``` + +*** rootcause *** + +"mdadm -A" (or other array assemble commands) will start a daemon "mdadm +--monitor" by default. When "mdadm -Ss" is running, the stop action will +wakeup "mdadm --monitor". The "--monitor" daemon will immediately get +info from /proc/mdstat. This time mddev in kernel still exist, so +/proc/mdstat still show md device, which makes "mdadm --monitor" to open +/dev/md0. + +The previously "mdadm -Ss" is removing action, the "mdadm --monitor" +open action will trigger md_open which is creating action. Racing is +happening. + +``` +: "mdadm -Ss" +md_release + mddev_put deletes mddev from all_mddevs + queue_work for mddev_delayed_delete + at this time, "/dev/md0" is still available for opening + +: "mdadm --monitor ..." +md_open + + mddev_find can't find mddev of /dev/md0, and create a new mddev and + | return. + + trigger "if (mddev->gendisk != bdev->bd_disk)" and return + -ERESTARTSYS. +``` + +In non-preempt kernel, is occupying on current CPU. and +mddev_delayed_delete which was created in also can't be +schedule. + +In preempt kernel, it can also trigger above racing. But kernel doesn't +allow one thread running on a CPU all the time. after running +some time, the later "mdadm -A" (refer above script line 13) will call +md_alloc to alloc a new gendisk for mddev. it will break md_open +statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, +the soft lockup is broken. + +Cc: stable@vger.kernel.org +Reviewed-by: Christoph Hellwig +Signed-off-by: Zhao Heming +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman --- - drivers/md/md.c | 3 +-- + drivers/md/md.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 368cad6cd53a..464cca5d5952 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c -@@ -7451,8 +7451,7 @@ static int md_open(struct block_device * +@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); @@ -39,3 +141,6 @@ in later commits. } BUG_ON(mddev != bdev->bd_disk->private_data); +-- +2.31.1 + diff --git a/queue-4.19/md-md_open-returns-ebusy-when-entering-racing-area.patch b/queue-4.19/md-md_open-returns-ebusy-when-entering-racing-area.patch index 7019f26b65b..3dbe4fcde4f 100644 --- a/queue-4.19/md-md_open-returns-ebusy-when-entering-racing-area.patch +++ b/queue-4.19/md-md_open-returns-ebusy-when-entering-racing-area.patch @@ -23,13 +23,115 @@ md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. +*** env *** +kvm-qemu VM 2C1G with 2 iscsi luns +kernel should be non-preempt + +*** script *** + +about trigger every time with below script + +``` +1 node1="mdcluster1" +2 node2="mdcluster2" +3 +4 mdadm -Ss +5 ssh ${node2} "mdadm -Ss" +6 wipefs -a /dev/sda /dev/sdb +7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ + /dev/sdb --assume-clean +8 +9 for i in {1..10}; do +10 echo ==== $i ====; +11 +12 echo "test ...." +13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" +14 sleep 1 +15 +16 echo "clean ....." +17 ssh ${node2} "mdadm -Ss" +18 done +``` + +I use mdcluster env to trigger soft lockup, but it isn't mdcluster +speical bug. To stop md array in mdcluster env will do more jobs than +non-cluster array, which will leave enough time/gap to allow kernel to +run md_open. + +*** stack *** + +``` +[ 884.226509] mddev_put+0x1c/0xe0 [md_mod] +[ 884.226515] md_open+0x3c/0xe0 [md_mod] +[ 884.226518] __blkdev_get+0x30d/0x710 +[ 884.226520] ? bd_acquire+0xd0/0xd0 +[ 884.226522] blkdev_get+0x14/0x30 +[ 884.226524] do_dentry_open+0x204/0x3a0 +[ 884.226531] path_openat+0x2fc/0x1520 +[ 884.226534] ? seq_printf+0x4e/0x70 +[ 884.226536] do_filp_open+0x9b/0x110 +[ 884.226542] ? md_release+0x20/0x20 [md_mod] +[ 884.226543] ? seq_read+0x1d8/0x3e0 +[ 884.226545] ? kmem_cache_alloc+0x18a/0x270 +[ 884.226547] ? do_sys_open+0x1bd/0x260 +[ 884.226548] do_sys_open+0x1bd/0x260 +[ 884.226551] do_syscall_64+0x5b/0x1e0 +[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 +``` + +*** rootcause *** + +"mdadm -A" (or other array assemble commands) will start a daemon "mdadm +--monitor" by default. When "mdadm -Ss" is running, the stop action will +wakeup "mdadm --monitor". The "--monitor" daemon will immediately get +info from /proc/mdstat. This time mddev in kernel still exist, so +/proc/mdstat still show md device, which makes "mdadm --monitor" to open +/dev/md0. + +The previously "mdadm -Ss" is removing action, the "mdadm --monitor" +open action will trigger md_open which is creating action. Racing is +happening. + +``` +: "mdadm -Ss" +md_release + mddev_put deletes mddev from all_mddevs + queue_work for mddev_delayed_delete + at this time, "/dev/md0" is still available for opening + +: "mdadm --monitor ..." +md_open + + mddev_find can't find mddev of /dev/md0, and create a new mddev and + | return. + + trigger "if (mddev->gendisk != bdev->bd_disk)" and return + -ERESTARTSYS. +``` + +In non-preempt kernel, is occupying on current CPU. and +mddev_delayed_delete which was created in also can't be +schedule. + +In preempt kernel, it can also trigger above racing. But kernel doesn't +allow one thread running on a CPU all the time. after running +some time, the later "mdadm -A" (refer above script line 13) will call +md_alloc to alloc a new gendisk for mddev. it will break md_open +statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, +the soft lockup is broken. + +Cc: stable@vger.kernel.org +Reviewed-by: Christoph Hellwig +Signed-off-by: Zhao Heming +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman --- - drivers/md/md.c | 3 +-- + drivers/md/md.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 368cad6cd53a..464cca5d5952 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c -@@ -7481,8 +7481,7 @@ static int md_open(struct block_device * +@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); @@ -39,3 +141,6 @@ in later commits. } BUG_ON(mddev != bdev->bd_disk->private_data); +-- +2.31.1 + diff --git a/queue-4.4/md-md_open-returns-ebusy-when-entering-racing-area.patch b/queue-4.4/md-md_open-returns-ebusy-when-entering-racing-area.patch index d0c6ae9e02d..3dbe4fcde4f 100644 --- a/queue-4.4/md-md_open-returns-ebusy-when-entering-racing-area.patch +++ b/queue-4.4/md-md_open-returns-ebusy-when-entering-racing-area.patch @@ -23,13 +23,115 @@ md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. +*** env *** +kvm-qemu VM 2C1G with 2 iscsi luns +kernel should be non-preempt + +*** script *** + +about trigger every time with below script + +``` +1 node1="mdcluster1" +2 node2="mdcluster2" +3 +4 mdadm -Ss +5 ssh ${node2} "mdadm -Ss" +6 wipefs -a /dev/sda /dev/sdb +7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ + /dev/sdb --assume-clean +8 +9 for i in {1..10}; do +10 echo ==== $i ====; +11 +12 echo "test ...." +13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" +14 sleep 1 +15 +16 echo "clean ....." +17 ssh ${node2} "mdadm -Ss" +18 done +``` + +I use mdcluster env to trigger soft lockup, but it isn't mdcluster +speical bug. To stop md array in mdcluster env will do more jobs than +non-cluster array, which will leave enough time/gap to allow kernel to +run md_open. + +*** stack *** + +``` +[ 884.226509] mddev_put+0x1c/0xe0 [md_mod] +[ 884.226515] md_open+0x3c/0xe0 [md_mod] +[ 884.226518] __blkdev_get+0x30d/0x710 +[ 884.226520] ? bd_acquire+0xd0/0xd0 +[ 884.226522] blkdev_get+0x14/0x30 +[ 884.226524] do_dentry_open+0x204/0x3a0 +[ 884.226531] path_openat+0x2fc/0x1520 +[ 884.226534] ? seq_printf+0x4e/0x70 +[ 884.226536] do_filp_open+0x9b/0x110 +[ 884.226542] ? md_release+0x20/0x20 [md_mod] +[ 884.226543] ? seq_read+0x1d8/0x3e0 +[ 884.226545] ? kmem_cache_alloc+0x18a/0x270 +[ 884.226547] ? do_sys_open+0x1bd/0x260 +[ 884.226548] do_sys_open+0x1bd/0x260 +[ 884.226551] do_syscall_64+0x5b/0x1e0 +[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 +``` + +*** rootcause *** + +"mdadm -A" (or other array assemble commands) will start a daemon "mdadm +--monitor" by default. When "mdadm -Ss" is running, the stop action will +wakeup "mdadm --monitor". The "--monitor" daemon will immediately get +info from /proc/mdstat. This time mddev in kernel still exist, so +/proc/mdstat still show md device, which makes "mdadm --monitor" to open +/dev/md0. + +The previously "mdadm -Ss" is removing action, the "mdadm --monitor" +open action will trigger md_open which is creating action. Racing is +happening. + +``` +: "mdadm -Ss" +md_release + mddev_put deletes mddev from all_mddevs + queue_work for mddev_delayed_delete + at this time, "/dev/md0" is still available for opening + +: "mdadm --monitor ..." +md_open + + mddev_find can't find mddev of /dev/md0, and create a new mddev and + | return. + + trigger "if (mddev->gendisk != bdev->bd_disk)" and return + -ERESTARTSYS. +``` + +In non-preempt kernel, is occupying on current CPU. and +mddev_delayed_delete which was created in also can't be +schedule. + +In preempt kernel, it can also trigger above racing. But kernel doesn't +allow one thread running on a CPU all the time. after running +some time, the later "mdadm -A" (refer above script line 13) will call +md_alloc to alloc a new gendisk for mddev. it will break md_open +statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, +the soft lockup is broken. + +Cc: stable@vger.kernel.org +Reviewed-by: Christoph Hellwig +Signed-off-by: Zhao Heming +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman --- - drivers/md/md.c | 3 +-- + drivers/md/md.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 368cad6cd53a..464cca5d5952 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c -@@ -7046,8 +7046,7 @@ static int md_open(struct block_device * +@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); @@ -39,3 +141,6 @@ in later commits. } BUG_ON(mddev != bdev->bd_disk->private_data); +-- +2.31.1 + diff --git a/queue-4.9/md-md_open-returns-ebusy-when-entering-racing-area.patch b/queue-4.9/md-md_open-returns-ebusy-when-entering-racing-area.patch index f49f3e9f285..3dbe4fcde4f 100644 --- a/queue-4.9/md-md_open-returns-ebusy-when-entering-racing-area.patch +++ b/queue-4.9/md-md_open-returns-ebusy-when-entering-racing-area.patch @@ -23,13 +23,115 @@ md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. +*** env *** +kvm-qemu VM 2C1G with 2 iscsi luns +kernel should be non-preempt + +*** script *** + +about trigger every time with below script + +``` +1 node1="mdcluster1" +2 node2="mdcluster2" +3 +4 mdadm -Ss +5 ssh ${node2} "mdadm -Ss" +6 wipefs -a /dev/sda /dev/sdb +7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ + /dev/sdb --assume-clean +8 +9 for i in {1..10}; do +10 echo ==== $i ====; +11 +12 echo "test ...." +13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" +14 sleep 1 +15 +16 echo "clean ....." +17 ssh ${node2} "mdadm -Ss" +18 done +``` + +I use mdcluster env to trigger soft lockup, but it isn't mdcluster +speical bug. To stop md array in mdcluster env will do more jobs than +non-cluster array, which will leave enough time/gap to allow kernel to +run md_open. + +*** stack *** + +``` +[ 884.226509] mddev_put+0x1c/0xe0 [md_mod] +[ 884.226515] md_open+0x3c/0xe0 [md_mod] +[ 884.226518] __blkdev_get+0x30d/0x710 +[ 884.226520] ? bd_acquire+0xd0/0xd0 +[ 884.226522] blkdev_get+0x14/0x30 +[ 884.226524] do_dentry_open+0x204/0x3a0 +[ 884.226531] path_openat+0x2fc/0x1520 +[ 884.226534] ? seq_printf+0x4e/0x70 +[ 884.226536] do_filp_open+0x9b/0x110 +[ 884.226542] ? md_release+0x20/0x20 [md_mod] +[ 884.226543] ? seq_read+0x1d8/0x3e0 +[ 884.226545] ? kmem_cache_alloc+0x18a/0x270 +[ 884.226547] ? do_sys_open+0x1bd/0x260 +[ 884.226548] do_sys_open+0x1bd/0x260 +[ 884.226551] do_syscall_64+0x5b/0x1e0 +[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 +``` + +*** rootcause *** + +"mdadm -A" (or other array assemble commands) will start a daemon "mdadm +--monitor" by default. When "mdadm -Ss" is running, the stop action will +wakeup "mdadm --monitor". The "--monitor" daemon will immediately get +info from /proc/mdstat. This time mddev in kernel still exist, so +/proc/mdstat still show md device, which makes "mdadm --monitor" to open +/dev/md0. + +The previously "mdadm -Ss" is removing action, the "mdadm --monitor" +open action will trigger md_open which is creating action. Racing is +happening. + +``` +: "mdadm -Ss" +md_release + mddev_put deletes mddev from all_mddevs + queue_work for mddev_delayed_delete + at this time, "/dev/md0" is still available for opening + +: "mdadm --monitor ..." +md_open + + mddev_find can't find mddev of /dev/md0, and create a new mddev and + | return. + + trigger "if (mddev->gendisk != bdev->bd_disk)" and return + -ERESTARTSYS. +``` + +In non-preempt kernel, is occupying on current CPU. and +mddev_delayed_delete which was created in also can't be +schedule. + +In preempt kernel, it can also trigger above racing. But kernel doesn't +allow one thread running on a CPU all the time. after running +some time, the later "mdadm -A" (refer above script line 13) will call +md_alloc to alloc a new gendisk for mddev. it will break md_open +statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, +the soft lockup is broken. + +Cc: stable@vger.kernel.org +Reviewed-by: Christoph Hellwig +Signed-off-by: Zhao Heming +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman --- - drivers/md/md.c | 3 +-- + drivers/md/md.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 368cad6cd53a..464cca5d5952 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c -@@ -7112,8 +7112,7 @@ static int md_open(struct block_device * +@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); @@ -39,3 +141,6 @@ in later commits. } BUG_ON(mddev != bdev->bd_disk->private_data); +-- +2.31.1 + diff --git a/queue-5.10/md-md_open-returns-ebusy-when-entering-racing-area.patch b/queue-5.10/md-md_open-returns-ebusy-when-entering-racing-area.patch index 33d9dd12196..3dbe4fcde4f 100644 --- a/queue-5.10/md-md_open-returns-ebusy-when-entering-racing-area.patch +++ b/queue-5.10/md-md_open-returns-ebusy-when-entering-racing-area.patch @@ -23,13 +23,115 @@ md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. +*** env *** +kvm-qemu VM 2C1G with 2 iscsi luns +kernel should be non-preempt + +*** script *** + +about trigger every time with below script + +``` +1 node1="mdcluster1" +2 node2="mdcluster2" +3 +4 mdadm -Ss +5 ssh ${node2} "mdadm -Ss" +6 wipefs -a /dev/sda /dev/sdb +7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ + /dev/sdb --assume-clean +8 +9 for i in {1..10}; do +10 echo ==== $i ====; +11 +12 echo "test ...." +13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" +14 sleep 1 +15 +16 echo "clean ....." +17 ssh ${node2} "mdadm -Ss" +18 done +``` + +I use mdcluster env to trigger soft lockup, but it isn't mdcluster +speical bug. To stop md array in mdcluster env will do more jobs than +non-cluster array, which will leave enough time/gap to allow kernel to +run md_open. + +*** stack *** + +``` +[ 884.226509] mddev_put+0x1c/0xe0 [md_mod] +[ 884.226515] md_open+0x3c/0xe0 [md_mod] +[ 884.226518] __blkdev_get+0x30d/0x710 +[ 884.226520] ? bd_acquire+0xd0/0xd0 +[ 884.226522] blkdev_get+0x14/0x30 +[ 884.226524] do_dentry_open+0x204/0x3a0 +[ 884.226531] path_openat+0x2fc/0x1520 +[ 884.226534] ? seq_printf+0x4e/0x70 +[ 884.226536] do_filp_open+0x9b/0x110 +[ 884.226542] ? md_release+0x20/0x20 [md_mod] +[ 884.226543] ? seq_read+0x1d8/0x3e0 +[ 884.226545] ? kmem_cache_alloc+0x18a/0x270 +[ 884.226547] ? do_sys_open+0x1bd/0x260 +[ 884.226548] do_sys_open+0x1bd/0x260 +[ 884.226551] do_syscall_64+0x5b/0x1e0 +[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 +``` + +*** rootcause *** + +"mdadm -A" (or other array assemble commands) will start a daemon "mdadm +--monitor" by default. When "mdadm -Ss" is running, the stop action will +wakeup "mdadm --monitor". The "--monitor" daemon will immediately get +info from /proc/mdstat. This time mddev in kernel still exist, so +/proc/mdstat still show md device, which makes "mdadm --monitor" to open +/dev/md0. + +The previously "mdadm -Ss" is removing action, the "mdadm --monitor" +open action will trigger md_open which is creating action. Racing is +happening. + +``` +: "mdadm -Ss" +md_release + mddev_put deletes mddev from all_mddevs + queue_work for mddev_delayed_delete + at this time, "/dev/md0" is still available for opening + +: "mdadm --monitor ..." +md_open + + mddev_find can't find mddev of /dev/md0, and create a new mddev and + | return. + + trigger "if (mddev->gendisk != bdev->bd_disk)" and return + -ERESTARTSYS. +``` + +In non-preempt kernel, is occupying on current CPU. and +mddev_delayed_delete which was created in also can't be +schedule. + +In preempt kernel, it can also trigger above racing. But kernel doesn't +allow one thread running on a CPU all the time. after running +some time, the later "mdadm -A" (refer above script line 13) will call +md_alloc to alloc a new gendisk for mddev. it will break md_open +statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, +the soft lockup is broken. + +Cc: stable@vger.kernel.org +Reviewed-by: Christoph Hellwig +Signed-off-by: Zhao Heming +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman --- - drivers/md/md.c | 3 +-- + drivers/md/md.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 368cad6cd53a..464cca5d5952 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c -@@ -7857,8 +7857,7 @@ static int md_open(struct block_device * +@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); @@ -39,3 +141,6 @@ in later commits. } BUG_ON(mddev != bdev->bd_disk->private_data); +-- +2.31.1 + diff --git a/queue-5.11/md-md_open-returns-ebusy-when-entering-racing-area.patch b/queue-5.11/md-md_open-returns-ebusy-when-entering-racing-area.patch index eea7a51b9b6..3dbe4fcde4f 100644 --- a/queue-5.11/md-md_open-returns-ebusy-when-entering-racing-area.patch +++ b/queue-5.11/md-md_open-returns-ebusy-when-entering-racing-area.patch @@ -23,13 +23,115 @@ md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. +*** env *** +kvm-qemu VM 2C1G with 2 iscsi luns +kernel should be non-preempt + +*** script *** + +about trigger every time with below script + +``` +1 node1="mdcluster1" +2 node2="mdcluster2" +3 +4 mdadm -Ss +5 ssh ${node2} "mdadm -Ss" +6 wipefs -a /dev/sda /dev/sdb +7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ + /dev/sdb --assume-clean +8 +9 for i in {1..10}; do +10 echo ==== $i ====; +11 +12 echo "test ...." +13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" +14 sleep 1 +15 +16 echo "clean ....." +17 ssh ${node2} "mdadm -Ss" +18 done +``` + +I use mdcluster env to trigger soft lockup, but it isn't mdcluster +speical bug. To stop md array in mdcluster env will do more jobs than +non-cluster array, which will leave enough time/gap to allow kernel to +run md_open. + +*** stack *** + +``` +[ 884.226509] mddev_put+0x1c/0xe0 [md_mod] +[ 884.226515] md_open+0x3c/0xe0 [md_mod] +[ 884.226518] __blkdev_get+0x30d/0x710 +[ 884.226520] ? bd_acquire+0xd0/0xd0 +[ 884.226522] blkdev_get+0x14/0x30 +[ 884.226524] do_dentry_open+0x204/0x3a0 +[ 884.226531] path_openat+0x2fc/0x1520 +[ 884.226534] ? seq_printf+0x4e/0x70 +[ 884.226536] do_filp_open+0x9b/0x110 +[ 884.226542] ? md_release+0x20/0x20 [md_mod] +[ 884.226543] ? seq_read+0x1d8/0x3e0 +[ 884.226545] ? kmem_cache_alloc+0x18a/0x270 +[ 884.226547] ? do_sys_open+0x1bd/0x260 +[ 884.226548] do_sys_open+0x1bd/0x260 +[ 884.226551] do_syscall_64+0x5b/0x1e0 +[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 +``` + +*** rootcause *** + +"mdadm -A" (or other array assemble commands) will start a daemon "mdadm +--monitor" by default. When "mdadm -Ss" is running, the stop action will +wakeup "mdadm --monitor". The "--monitor" daemon will immediately get +info from /proc/mdstat. This time mddev in kernel still exist, so +/proc/mdstat still show md device, which makes "mdadm --monitor" to open +/dev/md0. + +The previously "mdadm -Ss" is removing action, the "mdadm --monitor" +open action will trigger md_open which is creating action. Racing is +happening. + +``` +: "mdadm -Ss" +md_release + mddev_put deletes mddev from all_mddevs + queue_work for mddev_delayed_delete + at this time, "/dev/md0" is still available for opening + +: "mdadm --monitor ..." +md_open + + mddev_find can't find mddev of /dev/md0, and create a new mddev and + | return. + + trigger "if (mddev->gendisk != bdev->bd_disk)" and return + -ERESTARTSYS. +``` + +In non-preempt kernel, is occupying on current CPU. and +mddev_delayed_delete which was created in also can't be +schedule. + +In preempt kernel, it can also trigger above racing. But kernel doesn't +allow one thread running on a CPU all the time. after running +some time, the later "mdadm -A" (refer above script line 13) will call +md_alloc to alloc a new gendisk for mddev. it will break md_open +statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, +the soft lockup is broken. + +Cc: stable@vger.kernel.org +Reviewed-by: Christoph Hellwig +Signed-off-by: Zhao Heming +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman --- - drivers/md/md.c | 3 +-- + drivers/md/md.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 368cad6cd53a..464cca5d5952 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c -@@ -7856,8 +7856,7 @@ static int md_open(struct block_device * +@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); @@ -39,3 +141,6 @@ in later commits. } BUG_ON(mddev != bdev->bd_disk->private_data); +-- +2.31.1 + diff --git a/queue-5.12/md-md_open-returns-ebusy-when-entering-racing-area.patch b/queue-5.12/md-md_open-returns-ebusy-when-entering-racing-area.patch index 0de59842de4..3dbe4fcde4f 100644 --- a/queue-5.12/md-md_open-returns-ebusy-when-entering-racing-area.patch +++ b/queue-5.12/md-md_open-returns-ebusy-when-entering-racing-area.patch @@ -23,13 +23,115 @@ md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. +*** env *** +kvm-qemu VM 2C1G with 2 iscsi luns +kernel should be non-preempt + +*** script *** + +about trigger every time with below script + +``` +1 node1="mdcluster1" +2 node2="mdcluster2" +3 +4 mdadm -Ss +5 ssh ${node2} "mdadm -Ss" +6 wipefs -a /dev/sda /dev/sdb +7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ + /dev/sdb --assume-clean +8 +9 for i in {1..10}; do +10 echo ==== $i ====; +11 +12 echo "test ...." +13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" +14 sleep 1 +15 +16 echo "clean ....." +17 ssh ${node2} "mdadm -Ss" +18 done +``` + +I use mdcluster env to trigger soft lockup, but it isn't mdcluster +speical bug. To stop md array in mdcluster env will do more jobs than +non-cluster array, which will leave enough time/gap to allow kernel to +run md_open. + +*** stack *** + +``` +[ 884.226509] mddev_put+0x1c/0xe0 [md_mod] +[ 884.226515] md_open+0x3c/0xe0 [md_mod] +[ 884.226518] __blkdev_get+0x30d/0x710 +[ 884.226520] ? bd_acquire+0xd0/0xd0 +[ 884.226522] blkdev_get+0x14/0x30 +[ 884.226524] do_dentry_open+0x204/0x3a0 +[ 884.226531] path_openat+0x2fc/0x1520 +[ 884.226534] ? seq_printf+0x4e/0x70 +[ 884.226536] do_filp_open+0x9b/0x110 +[ 884.226542] ? md_release+0x20/0x20 [md_mod] +[ 884.226543] ? seq_read+0x1d8/0x3e0 +[ 884.226545] ? kmem_cache_alloc+0x18a/0x270 +[ 884.226547] ? do_sys_open+0x1bd/0x260 +[ 884.226548] do_sys_open+0x1bd/0x260 +[ 884.226551] do_syscall_64+0x5b/0x1e0 +[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 +``` + +*** rootcause *** + +"mdadm -A" (or other array assemble commands) will start a daemon "mdadm +--monitor" by default. When "mdadm -Ss" is running, the stop action will +wakeup "mdadm --monitor". The "--monitor" daemon will immediately get +info from /proc/mdstat. This time mddev in kernel still exist, so +/proc/mdstat still show md device, which makes "mdadm --monitor" to open +/dev/md0. + +The previously "mdadm -Ss" is removing action, the "mdadm --monitor" +open action will trigger md_open which is creating action. Racing is +happening. + +``` +: "mdadm -Ss" +md_release + mddev_put deletes mddev from all_mddevs + queue_work for mddev_delayed_delete + at this time, "/dev/md0" is still available for opening + +: "mdadm --monitor ..." +md_open + + mddev_find can't find mddev of /dev/md0, and create a new mddev and + | return. + + trigger "if (mddev->gendisk != bdev->bd_disk)" and return + -ERESTARTSYS. +``` + +In non-preempt kernel, is occupying on current CPU. and +mddev_delayed_delete which was created in also can't be +schedule. + +In preempt kernel, it can also trigger above racing. But kernel doesn't +allow one thread running on a CPU all the time. after running +some time, the later "mdadm -A" (refer above script line 13) will call +md_alloc to alloc a new gendisk for mddev. it will break md_open +statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, +the soft lockup is broken. + +Cc: stable@vger.kernel.org +Reviewed-by: Christoph Hellwig +Signed-off-by: Zhao Heming +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman --- - drivers/md/md.c | 3 +-- + drivers/md/md.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 368cad6cd53a..464cca5d5952 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c -@@ -7841,8 +7841,7 @@ static int md_open(struct block_device * +@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode) /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); @@ -39,3 +141,6 @@ in later commits. } BUG_ON(mddev != bdev->bd_disk->private_data); +-- +2.31.1 +