]> git.ipfire.org Git - thirdparty/mdadm.git/blame - mdmon.8
platform-intel - cache 'intel_devices' for a few seconds.
[thirdparty/mdadm.git] / mdmon.8
CommitLineData
7675959b 1.\" See file COPYING in distribution for details.
3b2aad6e 2.TH MDMON 8 "" v3.2.5
7675959b
DW
3.SH NAME
4mdmon \- monitor MD external metadata arrays
5
6.SH SYNOPSIS
7
da827518 8.BI mdmon " [--all] [--takeover] [--offroot] CONTAINER"
7675959b
DW
9
10.SH OVERVIEW
11The 2.6.27 kernel brings the ability to support external metadata arrays.
12External metadata implies that user space handles all updates to the metadata.
13The kernel's responsibility is to notify user space when a "metadata event"
14occurs, like disk failures and clean-to-dirty transitions. The kernel, in
15important cases, waits for user space to take action on these notifications.
16
17.SH DESCRIPTION
e0fe762a
N
18.SS Metadata updates:
19To service metadata update requests a daemon,
20.IR mdmon ,
21is introduced.
22.I Mdmon
23is tasked with polling the sysfs namespace looking for changes in
cd9a8b5c 24.BR array_state ,
7675959b
DW
25.BR sync_action ,
26and per disk
27.BR state
28attributes. When a change is detected it calls a per metadata type
29handler to make modifications to the metadata. The following actions
30are taken:
31.RS
32.TP
33.B array_state \- inactive
34Clear the dirty bit for the volume and let the array be stopped
35.TP
36.B array_state \- write pending
37Set the dirty bit for the array and then set
38.B array_state
39to
40.BR active .
41Writes
42are blocked until userspace writes
43.BR active.
44.TP
45.B array_state \- active-idle
46The safe mode timer has expired so set array state to clean to block writes to the array
47.TP
48.B array_state \- clean
49Clear the dirty bit for the volume
50.TP
51.B array_state \- read-only
e0fe762a
N
52This is the initial state that all arrays start at.
53.I mdmon
54takes one of the three actions:
7675959b
DW
55.RS
56.TP
571/
58Transition the array to read-auto keeping the dirty bit clear if the metadata
59handler determines that the array does not need resyncing or other modification
60.TP
612/
62Transition the array to active if the metadata handler determines a resync or
63some other manipulation is necessary
64.TP
653/
66Leave the array read\-only if the volume is marked to not be monitored; for
67example, the metadata version has been set to "external:\-dev/md127" instead of
68"external:/dev/md127"
69.RE
70.TP
71.B sync_action \- resync\-to\-idle
72Notify the metadata handler that a resync may have completed. If a resync
73process is idled before it completes this event allows the metadata handler to
74checkpoint resync.
75.TP
76.B sync_action \- recover\-to\-idle
77A spare may have completed rebuilding so tell the metadata handler about the
e0fe762a
N
78state of each disk. This is the metadata handler's opportunity to clear
79any "out-of-sync" bits and clear the volume's degraded status. If a recovery
7675959b
DW
80process is idled before it completes this event allows the metadata handler to
81checkpoint recovery.
82.TP
83.B <disk>/state \- faulty
84A disk failure kicks off a series of events. First, notify the metadata
85handler that a disk has failed, and then notify the kernel that it can unblock
86writes that were dependent on this disk. After unblocking the kernel this disk
e0fe762a 87is set to be removed+ from the member array. Finally the disk is marked failed
7675959b
DW
88in all other member arrays in the container.
89.IP
e0fe762a 90+ Note This behavior differs slightly from native MD arrays where
7675959b
DW
91removal is reserved for a
92.B mdadm --remove
93event. In the external metadata case the container holds the final
94reference on a block device and a
95.B mdadm --remove <container> <victim>
96call is still required.
97.RE
98
e0fe762a 99.SS Containers:
7675959b
DW
100.P
101External metadata formats, like DDF, differ from the native MD metadata
102formats in that they define a set of disks and a series of sub-arrays
103within those disks. MD metadata in comparison defines a 1:1
104relationship between a set of block devices and a raid array. For
105example to create 2 arrays at different raid levels on a single
106set of disks, MD metadata requires the disks be partitioned and then
2f48b33d 107each array can be created with a subset of those partitions. The
7675959b
DW
108supported external formats perform this disk carving internally.
109.P
110Container devices simply hold references to all member disks and allow
e0fe762a
N
111tools like
112.I mdmon
113to determine which active arrays belong to which
7675959b
DW
114container. Some array management commands like disk removal and disk
115add are now only valid at the container level. Attempts to perform
116these actions on member arrays are blocked with error messages like:
117.IP
118"mdadm: Cannot remove disks from a \'member\' array, perform this
119operation on the parent container"
120.P
121Containers are identified in /proc/mdstat with a metadata version string
122"external:<metadata name>". Member devices are identified by
123"external:/<container device>/<member index>", or "external:-<container
124device>/<member index>" if the array is to remain readonly.
125
126.SH OPTIONS
127.TP
128CONTAINER
129The
130.B container
b5c727dc
N
131device to monitor. It can be a full path like /dev/md/container, or a
132simple md device name like md127.
7675959b 133.TP
b5c727dc
N
134.B \-\-takeover
135This instructs
136.I mdmon
137to replace any active
138.I mdmon
139which is currently monitoring the array. This is primarily used late
140in the boot process to replace any
141.I mdmon
142which was started from an
143.B initramfs
144before the root filesystem was mounted. This avoids holding a
145reference on that
146.B initramfs
147indefinitely and ensures that the
148.I pid
149and
150.I sock
151files used to communicate with
152.I mdmon
153are in a standard place.
5d4d1b26 154.TP
b5c727dc
N
155.B \-\-all
156This tells mdmon to find any active containers and start monitoring
157each of them if appropriate. This is normally used with
158.B \-\-takeover
159late in the boot sequence.
eb49460b
LB
160A separate
161.I mdmon
162process is started for each container as the
163.B \-\-all
164argument is over-written with the name of the container. To allow for
165containers with names longer than 5 characters, this argument can be
166arbitrarily extended, e.g. to
167.BR \-\-all-active-arrays .
da827518
JS
168.TP
169.BR \-\-offroot
170Set first character of argv[0] to @ to indicate mdmon was launched
171from initrd/initramfs and should not be shutdown by systemd as part of
172the regular shutdown process. This option is normally only used by
173the system's initscripts. Please see here for more details on how
174systemd handled argv[0]:
175.IP
176.B http://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons
177.PP
5d4d1b26 178
e0fe762a
N
179.PP
180Note that
181.I mdmon
182is automatically started by
183.I mdadm
184when needed and so does not need to be considered when working with
2f48b33d 185RAID arrays. The only times it is run other than by
e0fe762a
N
186.I mdadm
187is when the boot scripts need to restart it after mounting the new
188root filesystem.
7675959b 189
cd9a8b5c
N
190.SH START UP AND SHUTDOWN
191
192As
193.I mdmon
194needs to be running whenever any filesystem on the monitored device is
195mounted there are special considerations when the root filesystem is
196mounted from an
197.I mdmon
198monitored device.
ecdbb368
N
199Note that in general
200.I mdmon
201is needed even if the filesystem is mounted read-only as some
202filesystems can still write to the device in those circumstances, for
203example to replay a journal after an unclean shutdown.
cd9a8b5c
N
204
205When the array is assembled by the
206.B initramfs
207code, mdadm will automatically start
208.I mdmon
209as required. This means that
210.I mdmon
211must be installed on the
212.B initramfs
9fdcb471 213and there must be a writable filesystem (typically tmpfs) in which
cd9a8b5c
N
214.B mdmon
215can create a
216.B .pid
217and
218.B .sock
9fdcb471 219file. The particular filesystem to use is given to mdmon at compile
cd9a8b5c 220time and defaults to
96fd06ed 221.BR /run/mdadm .
cd9a8b5c 222
9fdcb471 223This filesystem must persist through to shutdown time.
cd9a8b5c
N
224
225After the final root filesystem has be instantiated (usually with
226.BR pivot_root )
cd9a8b5c
N
227.I mdmon
228should be run with
229.I "\-\-all \-\-takeover"
230so that the
231.I mdmon
232running from the
233.B initramfs
9fdcb471
N
234can be replaced with one running in the main root, and so the
235memory used by the initramfs can be released.
cd9a8b5c
N
236
237At shutdown time,
238.I mdmon
239should not be killed along with other processes. Also as it holds a
240file (socket actually) open in
9fdcb471
N
241.B /dev
242(by default) it will not be possible to unmount
243.B /dev
244if it is a separate filesystem.
cd9a8b5c 245
b5c727dc 246.SH EXAMPLES
5d4d1b26 247
eb49460b 248.B " mdmon \-\-all-active-arrays \-\-takeover"
5d4d1b26
N
249.br
250Any
251.I mdmon
252which is currently running is killed and a new instance is started.
9fdcb471
N
253This should be run during in the boot sequence if an initramfs was
254used, so that any mdmon running from the initramfs will not hold
255the initramfs active.
e0fe762a
N
256.SH SEE ALSO
257.IR mdadm (8),
258.IR md (4).