]> git.ipfire.org Git - thirdparty/mdadm.git/blob - mdmon.8
Improve partition table code.
[thirdparty/mdadm.git] / mdmon.8
1 .\" See file COPYING in distribution for details.
2 .TH MDMON 8 "" v3.1.1
3 .SH NAME
4 mdmon \- monitor MD external metadata arrays
5
6 .SH SYNOPSIS
7
8 .BI mdmon " [--all] [--takeover] CONTAINER"
9
10 .SH OVERVIEW
11 The 2.6.27 kernel brings the ability to support external metadata arrays.
12 External metadata implies that user space handles all updates to the metadata.
13 The kernel's responsibility is to notify user space when a "metadata event"
14 occurs, like disk failures and clean-to-dirty transitions. The kernel, in
15 important cases, waits for user space to take action on these notifications.
16
17 .SH DESCRIPTION
18 .SS Metadata updates:
19 To service metadata update requests a daemon,
20 .IR mdmon ,
21 is introduced.
22 .I Mdmon
23 is tasked with polling the sysfs namespace looking for changes in
24 .BR array_state ,
25 .BR sync_action ,
26 and per disk
27 .BR state
28 attributes. When a change is detected it calls a per metadata type
29 handler to make modifications to the metadata. The following actions
30 are taken:
31 .RS
32 .TP
33 .B array_state \- inactive
34 Clear the dirty bit for the volume and let the array be stopped
35 .TP
36 .B array_state \- write pending
37 Set the dirty bit for the array and then set
38 .B array_state
39 to
40 .BR active .
41 Writes
42 are blocked until userspace writes
43 .BR active.
44 .TP
45 .B array_state \- active-idle
46 The safe mode timer has expired so set array state to clean to block writes to the array
47 .TP
48 .B array_state \- clean
49 Clear the dirty bit for the volume
50 .TP
51 .B array_state \- read-only
52 This is the initial state that all arrays start at.
53 .I mdmon
54 takes one of the three actions:
55 .RS
56 .TP
57 1/
58 Transition the array to read-auto keeping the dirty bit clear if the metadata
59 handler determines that the array does not need resyncing or other modification
60 .TP
61 2/
62 Transition the array to active if the metadata handler determines a resync or
63 some other manipulation is necessary
64 .TP
65 3/
66 Leave the array read\-only if the volume is marked to not be monitored; for
67 example, the metadata version has been set to "external:\-dev/md127" instead of
68 "external:/dev/md127"
69 .RE
70 .TP
71 .B sync_action \- resync\-to\-idle
72 Notify the metadata handler that a resync may have completed. If a resync
73 process is idled before it completes this event allows the metadata handler to
74 checkpoint resync.
75 .TP
76 .B sync_action \- recover\-to\-idle
77 A spare may have completed rebuilding so tell the metadata handler about the
78 state of each disk. This is the metadata handler's opportunity to clear
79 any "out-of-sync" bits and clear the volume's degraded status. If a recovery
80 process is idled before it completes this event allows the metadata handler to
81 checkpoint recovery.
82 .TP
83 .B <disk>/state \- faulty
84 A disk failure kicks off a series of events. First, notify the metadata
85 handler that a disk has failed, and then notify the kernel that it can unblock
86 writes that were dependent on this disk. After unblocking the kernel this disk
87 is set to be removed+ from the member array. Finally the disk is marked failed
88 in all other member arrays in the container.
89 .IP
90 + Note This behavior differs slightly from native MD arrays where
91 removal is reserved for a
92 .B mdadm --remove
93 event. In the external metadata case the container holds the final
94 reference on a block device and a
95 .B mdadm --remove <container> <victim>
96 call is still required.
97 .RE
98
99 .SS Containers:
100 .P
101 External metadata formats, like DDF, differ from the native MD metadata
102 formats in that they define a set of disks and a series of sub-arrays
103 within those disks. MD metadata in comparison defines a 1:1
104 relationship between a set of block devices and a raid array. For
105 example to create 2 arrays at different raid levels on a single
106 set of disks, MD metadata requires the disks be partitioned and then
107 each array can created be created with a subset of those partitions. The
108 supported external formats perform this disk carving internally.
109 .P
110 Container devices simply hold references to all member disks and allow
111 tools like
112 .I mdmon
113 to determine which active arrays belong to which
114 container. Some array management commands like disk removal and disk
115 add are now only valid at the container level. Attempts to perform
116 these actions on member arrays are blocked with error messages like:
117 .IP
118 "mdadm: Cannot remove disks from a \'member\' array, perform this
119 operation on the parent container"
120 .P
121 Containers are identified in /proc/mdstat with a metadata version string
122 "external:<metadata name>". Member devices are identified by
123 "external:/<container device>/<member index>", or "external:-<container
124 device>/<member index>" if the array is to remain readonly.
125
126 .SH OPTIONS
127 .TP
128 CONTAINER
129 The
130 .B container
131 device to monitor. It can be a full path like /dev/md/container, or a
132 simple md device name like md127.
133 .TP
134 .B \-\-takeover
135 This instructs
136 .I mdmon
137 to replace any active
138 .I mdmon
139 which is currently monitoring the array. This is primarily used late
140 in the boot process to replace any
141 .I mdmon
142 which was started from an
143 .B initramfs
144 before the root filesystem was mounted. This avoids holding a
145 reference on that
146 .B initramfs
147 indefinitely and ensures that the
148 .I pid
149 and
150 .I sock
151 files used to communicate with
152 .I mdmon
153 are in a standard place.
154 .TP
155 .B \-\-all
156 This tells mdmon to find any active containers and start monitoring
157 each of them if appropriate. This is normally used with
158 .B \-\-takeover
159 late in the boot sequence.
160 A separate
161 .I mdmon
162 process is started for each container as the
163 .B \-\-all
164 argument is over-written with the name of the container. To allow for
165 containers with names longer than 5 characters, this argument can be
166 arbitrarily extended, e.g. to
167 .BR \-\-all-active-arrays .
168
169 .PP
170 Note that
171 .I mdmon
172 is automatically started by
173 .I mdadm
174 when needed and so does not need to be considered when working with
175 RAID arrays. The only times it is run other that by
176 .I mdadm
177 is when the boot scripts need to restart it after mounting the new
178 root filesystem.
179
180 .SH START UP AND SHUTDOWN
181
182 As
183 .I mdmon
184 needs to be running whenever any filesystem on the monitored device is
185 mounted there are special considerations when the root filesystem is
186 mounted from an
187 .I mdmon
188 monitored device.
189
190 When the array is assembled by the
191 .B initramfs
192 code, mdadm will automatically start
193 .I mdmon
194 as required. This means that
195 .I mdmon
196 must be installed on the
197 .B initramfs
198 and there must be a writable filesystem (typically tmpfs) which
199 .B mdmon
200 can create a
201 .B .pid
202 and
203 .B .sock
204 file on. The particular filesystem to use is given to mdmon at compile
205 time and defaults to
206 .BR /lib/init/rw .
207
208 This filesystem must persist through to the end of the boot sequence.
209
210 After the final root filesystem has be instantiated (usually with
211 .BR pivot_root )
212 and after
213 .B /var
214 is mounted writable,
215 .I mdmon
216 should be run with
217 .I "\-\-all \-\-takeover"
218 so that the
219 .I mdmon
220 running from the
221 .B initramfs
222 can be replaced with one running in the main root.
223
224 At shutdown time,
225 .I mdmon
226 should not be killed along with other processes. Also as it holds a
227 file (socket actually) open in
228 .B /var
229 it will not be possible to unmount
230 .B /var
231 if it is a separate filesystem. Rather the
232 .B /var
233 filesystem, like the root filesystem, should be remounted read-only.
234
235
236
237 .SH EXAMPLES
238
239 .B " mdmon \-\-all-active-arrays \-\-takeover"
240 .br
241 Any
242 .I mdmon
243 which is currently running is killed and a new instance is started.
244 This should be run late in the boot sequence and particularly after
245 .B /var
246 is mounted and writable.
247 .SH SEE ALSO
248 .IR mdadm (8),
249 .IR md (4).