]>
Commit | Line | Data |
---|---|---|
7675959b | 1 | .\" See file COPYING in distribution for details. |
c61b1c0b | 2 | .TH MDMON 8 "" v3.4 |
7675959b DW |
3 | .SH NAME |
4 | mdmon \- monitor MD external metadata arrays | |
5 | ||
6 | .SH SYNOPSIS | |
7 | ||
03041982 | 8 | .BI mdmon " [--all] [--takeover] [--foreground] CONTAINER" |
7675959b DW |
9 | |
10 | .SH OVERVIEW | |
11 | The 2.6.27 kernel brings the ability to support external metadata arrays. | |
12 | External metadata implies that user space handles all updates to the metadata. | |
13 | The kernel's responsibility is to notify user space when a "metadata event" | |
14 | occurs, like disk failures and clean-to-dirty transitions. The kernel, in | |
15 | important cases, waits for user space to take action on these notifications. | |
16 | ||
17 | .SH DESCRIPTION | |
e0fe762a N |
18 | .SS Metadata updates: |
19 | To service metadata update requests a daemon, | |
20 | .IR mdmon , | |
21 | is introduced. | |
22 | .I Mdmon | |
23 | is tasked with polling the sysfs namespace looking for changes in | |
cd9a8b5c | 24 | .BR array_state , |
7675959b DW |
25 | .BR sync_action , |
26 | and per disk | |
27 | .BR state | |
28 | attributes. When a change is detected it calls a per metadata type | |
29 | handler to make modifications to the metadata. The following actions | |
30 | are taken: | |
31 | .RS | |
32 | .TP | |
33 | .B array_state \- inactive | |
34 | Clear the dirty bit for the volume and let the array be stopped | |
35 | .TP | |
36 | .B array_state \- write pending | |
37 | Set the dirty bit for the array and then set | |
38 | .B array_state | |
39 | to | |
40 | .BR active . | |
41 | Writes | |
42 | are blocked until userspace writes | |
43 | .BR active. | |
44 | .TP | |
45 | .B array_state \- active-idle | |
46 | The safe mode timer has expired so set array state to clean to block writes to the array | |
47 | .TP | |
48 | .B array_state \- clean | |
49 | Clear the dirty bit for the volume | |
50 | .TP | |
51 | .B array_state \- read-only | |
e0fe762a N |
52 | This is the initial state that all arrays start at. |
53 | .I mdmon | |
54 | takes one of the three actions: | |
7675959b DW |
55 | .RS |
56 | .TP | |
57 | 1/ | |
58 | Transition the array to read-auto keeping the dirty bit clear if the metadata | |
59 | handler determines that the array does not need resyncing or other modification | |
60 | .TP | |
61 | 2/ | |
62 | Transition the array to active if the metadata handler determines a resync or | |
63 | some other manipulation is necessary | |
64 | .TP | |
65 | 3/ | |
66 | Leave the array read\-only if the volume is marked to not be monitored; for | |
67 | example, the metadata version has been set to "external:\-dev/md127" instead of | |
68 | "external:/dev/md127" | |
69 | .RE | |
70 | .TP | |
71 | .B sync_action \- resync\-to\-idle | |
72 | Notify the metadata handler that a resync may have completed. If a resync | |
73 | process is idled before it completes this event allows the metadata handler to | |
74 | checkpoint resync. | |
75 | .TP | |
76 | .B sync_action \- recover\-to\-idle | |
77 | A spare may have completed rebuilding so tell the metadata handler about the | |
e0fe762a N |
78 | state of each disk. This is the metadata handler's opportunity to clear |
79 | any "out-of-sync" bits and clear the volume's degraded status. If a recovery | |
7675959b DW |
80 | process is idled before it completes this event allows the metadata handler to |
81 | checkpoint recovery. | |
82 | .TP | |
83 | .B <disk>/state \- faulty | |
84 | A disk failure kicks off a series of events. First, notify the metadata | |
85 | handler that a disk has failed, and then notify the kernel that it can unblock | |
86 | writes that were dependent on this disk. After unblocking the kernel this disk | |
e0fe762a | 87 | is set to be removed+ from the member array. Finally the disk is marked failed |
7675959b DW |
88 | in all other member arrays in the container. |
89 | .IP | |
e0fe762a | 90 | + Note This behavior differs slightly from native MD arrays where |
7675959b DW |
91 | removal is reserved for a |
92 | .B mdadm --remove | |
93 | event. In the external metadata case the container holds the final | |
94 | reference on a block device and a | |
95 | .B mdadm --remove <container> <victim> | |
96 | call is still required. | |
97 | .RE | |
98 | ||
e0fe762a | 99 | .SS Containers: |
7675959b DW |
100 | .P |
101 | External metadata formats, like DDF, differ from the native MD metadata | |
102 | formats in that they define a set of disks and a series of sub-arrays | |
103 | within those disks. MD metadata in comparison defines a 1:1 | |
956a13fb CAM |
104 | relationship between a set of block devices and a RAID array. For |
105 | example to create 2 arrays at different RAID levels on a single | |
7675959b | 106 | set of disks, MD metadata requires the disks be partitioned and then |
2f48b33d | 107 | each array can be created with a subset of those partitions. The |
7675959b DW |
108 | supported external formats perform this disk carving internally. |
109 | .P | |
110 | Container devices simply hold references to all member disks and allow | |
e0fe762a N |
111 | tools like |
112 | .I mdmon | |
113 | to determine which active arrays belong to which | |
7675959b DW |
114 | container. Some array management commands like disk removal and disk |
115 | add are now only valid at the container level. Attempts to perform | |
116 | these actions on member arrays are blocked with error messages like: | |
117 | .IP | |
118 | "mdadm: Cannot remove disks from a \'member\' array, perform this | |
119 | operation on the parent container" | |
120 | .P | |
121 | Containers are identified in /proc/mdstat with a metadata version string | |
122 | "external:<metadata name>". Member devices are identified by | |
123 | "external:/<container device>/<member index>", or "external:-<container | |
124 | device>/<member index>" if the array is to remain readonly. | |
125 | ||
126 | .SH OPTIONS | |
127 | .TP | |
128 | CONTAINER | |
129 | The | |
130 | .B container | |
b5c727dc N |
131 | device to monitor. It can be a full path like /dev/md/container, or a |
132 | simple md device name like md127. | |
7675959b | 133 | .TP |
03041982 N |
134 | .B \-\-foreground |
135 | Normally, | |
136 | .I mdmon | |
137 | will fork and continue in the background. Adding this option will | |
138 | skip that step and run | |
139 | .I mdmon | |
140 | in the foreground. | |
141 | .TP | |
b5c727dc N |
142 | .B \-\-takeover |
143 | This instructs | |
144 | .I mdmon | |
145 | to replace any active | |
146 | .I mdmon | |
147 | which is currently monitoring the array. This is primarily used late | |
148 | in the boot process to replace any | |
149 | .I mdmon | |
150 | which was started from an | |
151 | .B initramfs | |
152 | before the root filesystem was mounted. This avoids holding a | |
153 | reference on that | |
154 | .B initramfs | |
155 | indefinitely and ensures that the | |
156 | .I pid | |
157 | and | |
158 | .I sock | |
159 | files used to communicate with | |
160 | .I mdmon | |
161 | are in a standard place. | |
5d4d1b26 | 162 | .TP |
b5c727dc N |
163 | .B \-\-all |
164 | This tells mdmon to find any active containers and start monitoring | |
165 | each of them if appropriate. This is normally used with | |
166 | .B \-\-takeover | |
167 | late in the boot sequence. | |
eb49460b LB |
168 | A separate |
169 | .I mdmon | |
170 | process is started for each container as the | |
171 | .B \-\-all | |
172 | argument is over-written with the name of the container. To allow for | |
173 | containers with names longer than 5 characters, this argument can be | |
174 | arbitrarily extended, e.g. to | |
175 | .BR \-\-all-active-arrays . | |
da827518 | 176 | .TP |
5d4d1b26 | 177 | |
e0fe762a N |
178 | .PP |
179 | Note that | |
180 | .I mdmon | |
181 | is automatically started by | |
182 | .I mdadm | |
183 | when needed and so does not need to be considered when working with | |
2f48b33d | 184 | RAID arrays. The only times it is run other than by |
e0fe762a N |
185 | .I mdadm |
186 | is when the boot scripts need to restart it after mounting the new | |
187 | root filesystem. | |
7675959b | 188 | |
cd9a8b5c N |
189 | .SH START UP AND SHUTDOWN |
190 | ||
191 | As | |
192 | .I mdmon | |
193 | needs to be running whenever any filesystem on the monitored device is | |
194 | mounted there are special considerations when the root filesystem is | |
195 | mounted from an | |
196 | .I mdmon | |
197 | monitored device. | |
ecdbb368 N |
198 | Note that in general |
199 | .I mdmon | |
200 | is needed even if the filesystem is mounted read-only as some | |
201 | filesystems can still write to the device in those circumstances, for | |
202 | example to replay a journal after an unclean shutdown. | |
cd9a8b5c N |
203 | |
204 | When the array is assembled by the | |
205 | .B initramfs | |
206 | code, mdadm will automatically start | |
207 | .I mdmon | |
208 | as required. This means that | |
209 | .I mdmon | |
210 | must be installed on the | |
211 | .B initramfs | |
9fdcb471 | 212 | and there must be a writable filesystem (typically tmpfs) in which |
cd9a8b5c N |
213 | .B mdmon |
214 | can create a | |
215 | .B .pid | |
216 | and | |
217 | .B .sock | |
9fdcb471 | 218 | file. The particular filesystem to use is given to mdmon at compile |
cd9a8b5c | 219 | time and defaults to |
96fd06ed | 220 | .BR /run/mdadm . |
cd9a8b5c | 221 | |
9fdcb471 | 222 | This filesystem must persist through to shutdown time. |
cd9a8b5c N |
223 | |
224 | After the final root filesystem has be instantiated (usually with | |
225 | .BR pivot_root ) | |
cd9a8b5c N |
226 | .I mdmon |
227 | should be run with | |
228 | .I "\-\-all \-\-takeover" | |
229 | so that the | |
230 | .I mdmon | |
231 | running from the | |
232 | .B initramfs | |
9fdcb471 N |
233 | can be replaced with one running in the main root, and so the |
234 | memory used by the initramfs can be released. | |
cd9a8b5c N |
235 | |
236 | At shutdown time, | |
237 | .I mdmon | |
238 | should not be killed along with other processes. Also as it holds a | |
239 | file (socket actually) open in | |
9fdcb471 N |
240 | .B /dev |
241 | (by default) it will not be possible to unmount | |
242 | .B /dev | |
243 | if it is a separate filesystem. | |
cd9a8b5c | 244 | |
b5c727dc | 245 | .SH EXAMPLES |
5d4d1b26 | 246 | |
eb49460b | 247 | .B " mdmon \-\-all-active-arrays \-\-takeover" |
5d4d1b26 N |
248 | .br |
249 | Any | |
250 | .I mdmon | |
251 | which is currently running is killed and a new instance is started. | |
9fdcb471 N |
252 | This should be run during in the boot sequence if an initramfs was |
253 | used, so that any mdmon running from the initramfs will not hold | |
254 | the initramfs active. | |
e0fe762a N |
255 | .SH SEE ALSO |
256 | .IR mdadm (8), | |
257 | .IR md (4). |