git.ipfire.org Git - thirdparty/mdadm.git/commit

author	Martin Wilck <mwilck@arcor.de>
	Tue, 30 Jul 2013 21:18:33 +0000 (23:18 +0200)
committer	NeilBrown <neilb@suse.de>
	Wed, 31 Jul 2013 03:00:46 +0000 (13:00 +1000)
commit	6ca1e6eccb2c6661ec111a455bcc2f3f5593cb06
tree	7b9872615725c2d76261a4dee6b51467a0616faf	tree \| snapshot
parent	30b83120ede68eb28f28118e3af4ff9c1de91fa0	commit \| diff

mdmon: manage_member: fix race condition during slow meta data writes

In order to track kernel state changes, the monitor needs to
notice changes in sysfs. If the changes are transient, and the
monitor is busy writing meta data, it can happen that the changes
are missed. This will cause the meta data to be inconsistent with
the real state of the array.

I can reproduce this in a test scenario with a DDF container and
two subarrays, where I set a disk to "failed" and then add a global
hot-spare. On a typical MD test setup with loop devices, I can
reliably reproduce a failure where the metadata show degraded members
although the kernel finished the recovery successfully.

This patch fixes this problem by applying two changes. First, when
a metadata update is queued, wait until it is certain that the monitor
actually applied these meta data (the for loop is actually needed to
avoid failures completely in my test case). Second, after triggering the
recovery, set prev_state of the changed array to "recover", in case
the monitor misses the transient "recover" state.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>