]> git.ipfire.org Git - thirdparty/systemd.git/blob - docs/BLOCK_DEVICE_LOCKING.md
mount: disable mount-storm protection while mount unit is starting.
[thirdparty/systemd.git] / docs / BLOCK_DEVICE_LOCKING.md
1 # Locking Block Device Access
2
3 *TL;DR: Use BSD file locks
4 [(`flock(2)`)](http://man7.org/linux/man-pages/man2/flock.2.html) on block
5 device nodes to synchronize access for partitioning and file system formatting
6 tools.*
7
8 `systemd-udevd` probes all block devices showing up for file system superblock
9 and partition table information (utilizing `libblkid`). If another program
10 concurrently modifies a superblock or partition table this probing might be
11 affected, which is bad in itself, but also might in turn result in undesired
12 effects in programs subscribing to `udev` events.
13
14 Applications manipulating a block device can temporarily stop `systemd-udevd`
15 from processing rules on it — and thus bar it from probing the device — by
16 taking a BSD file lock on the block device node. Specifically, whenever
17 `systemd-udevd` starts processing a block device it takes a `LOCK_SH|LOCK_NB`
18 lock using [`flock(2)`](http://man7.org/linux/man-pages/man2/flock.2.html) on
19 the main block device (i.e. never on any partition block device, but on the
20 device the partition belongs to). If this lock cannot be taken (i.e. `flock()`
21 returns `EBUSY`), it refrains from processing the device. If it manages to take
22 the lock it is kept for the entire time the device is processed.
23
24 Note that `systemd-udevd` also watches all block device nodes it manages for
25 `inotify()` `IN_CLOSE` events: whenever such an event is seen, this is used as
26 trigger to re-run the rule-set for the device.
27
28 These two concepts allow tools such as disk partitioners or file system
29 formatting tools to safely and easily take exclusive ownership of a block
30 device while operating: before starting work on the block device, they should
31 take an `LOCK_EX` lock on it. This has two effects: first of all, in case
32 `systemd-udevd` is still processing the device the tool will wait for it to
33 finish. Second, after the lock is taken, it can be sure that that
34 `systemd-udevd` will refrain from processing the block device, and thus all
35 other client applications subscribed to it won't get device notifications from
36 potentially half-written data either. After the operation is complete the
37 partitioner/formatter can simply close the device node. This has two effects:
38 it implicitly releases the lock, so that `systemd-udevd` can process events on
39 the device node again. Secondly, it results an `IN_CLOSE` event, which causes
40 `systemd-udevd` to immediately re-process the device — seeing all changes the
41 tool made — and notify subscribed clients about it.
42
43 Besides synchronizing block device access between `systemd-udevd` and such
44 tools this scheme may also be used to synchronize access between those tools
45 themselves. However, do note that `flock()` locks are advisory only. This means
46 if one tool honours this scheme and another tool does not, they will of course
47 not be synchronized properly, and might interfere with each other's work.
48
49 Note that the file locks follow the usual access semantics of BSD locks: since
50 `systemd-udevd` never writes to such block devices it only takes a `LOCK_SH`
51 *shared* lock. A program intending to make changes to the block device should
52 take a `LOCK_EX` *exclusive* lock instead. For further details, see the
53 `flock(2)` man page.
54
55 And please keep in mind: BSD file locks (`flock()`) and POSIX file locks
56 (`lockf()`, `F_SETLK`, …) are different concepts, and in their effect
57 orthogonal. The scheme discussed above uses the former and not the latter,
58 because these types of locks more closely match the required semantics.
59
60 Summarizing: it is recommended to take `LOCK_EX` BSD file locks when
61 manipulating block devices in all tools that change file system block devices
62 (`mkfs`, `fsck`, …) or partition tables (`fdisk`, `parted`, …), right after
63 opening the node.