git.ipfire.org Git - thirdparty/mdadm.git/commit

mdcheck: simplify start / continue logic and add "--restart" logic

The current logic of "mdcheck" is susceptible to races when multiple
mdcheck instances run simultaneously, as checks can be initiated from both
"mdcheck_start.service" and "mdcheck_continue.service".

The previous commit 8aa4ea95db35 ("systemd: start mdcheck_continue.timer
before mdcheck_start.timer") fixed this for the default configuration by
inverting the ordering of timers. But users can customize the timer
settings, which can cause the race to reappear.

This patch avoids this kind of race entirely, by changing the logic as
follows:

* When `mdcheck` has finished checking a RAID array, it will create a
  marker `/var/lib/mdcheckChecked_$UUID`.
* A new option `--restart` is introduced. `mdcheck --restart` removes all
  `/var/lib/mdcheck/Checked_*` markers.
  This is called from `mdcheck_start.service`, which is typically started
  by a timer in large time intervals (default once per month).
* `mdcheck --continue` works as it used to. It continues previously started
  checks (where the `/var/lib/mdcheck/MD_UUID_$UUID` file is present and
  contains a start position) for the check.
  This usage is *not recommended any more*.
* `mdcheck` with no arguments is like `--continue`, but it also starts new
  checks for all arrays for which no check has previously been
  started, *except* for arrays for which a marker
  `/var/lib/mdcheck/Checked_$UUID` exists.
  `mdcheck_continue.service` calls `mdcheck` this way. It is called in
  short time intervals, by default once per day.
* Combining `--restart` and `--continue` is an error.

This way, the only systemd service that actually triggers a kernel-level
RAID check is `mdcheck_continue.service`, which avoids races.

When all checks have finished, `mdcheck_continue.service` is no-op.
When `mdcheck_start.service` runs, the checks re re-enabled and will be
started from 0 by the next `mdcheck_continue.service` invocation.

Signed-off-by: Martin Wilck <mwilck@suse.com>

author	Martin Wilck <mwilck@suse.com>
	Thu, 14 Aug 2025 15:09:35 +0000 (17:09 +0200)
committer	Mariusz Tkaczyk <mtkaczyk@kernel.org>
	Tue, 4 Nov 2025 07:51:28 +0000 (08:51 +0100)
commit	cfca87ca367df937b6f6ec7ed0131b563311a3c5
tree	5a23420d7f1dcbb73650be87545d96442e7b3eba	tree
parent	4d3ac0fc9ad4afc1b63ce47ce4d5071a2a47f1d0	commit \| diff

misc/mdcheck		diff \| blob \| blame \| history
systemd/mdcheck_continue.service		diff \| blob \| blame \| history
systemd/mdcheck_start.service		diff \| blob \| blame \| history
systemd/mdcheck_start.timer		diff \| blob \| blame \| history