]>
Commit | Line | Data |
---|---|---|
7675959b | 1 | .\" See file COPYING in distribution for details. |
40bc78f5 | 2 | .TH MDMON 8 "" v3.1.1 |
7675959b DW |
3 | .SH NAME |
4 | mdmon \- monitor MD external metadata arrays | |
5 | ||
6 | .SH SYNOPSIS | |
7 | ||
b5c727dc | 8 | .BI mdmon " [--all] [--takeover] CONTAINER" |
7675959b DW |
9 | |
10 | .SH OVERVIEW | |
11 | The 2.6.27 kernel brings the ability to support external metadata arrays. | |
12 | External metadata implies that user space handles all updates to the metadata. | |
13 | The kernel's responsibility is to notify user space when a "metadata event" | |
14 | occurs, like disk failures and clean-to-dirty transitions. The kernel, in | |
15 | important cases, waits for user space to take action on these notifications. | |
16 | ||
17 | .SH DESCRIPTION | |
e0fe762a N |
18 | .SS Metadata updates: |
19 | To service metadata update requests a daemon, | |
20 | .IR mdmon , | |
21 | is introduced. | |
22 | .I Mdmon | |
23 | is tasked with polling the sysfs namespace looking for changes in | |
7675959b DW |
24 | .BR array_state , |
25 | .BR sync_action , | |
26 | and per disk | |
27 | .BR state | |
28 | attributes. When a change is detected it calls a per metadata type | |
29 | handler to make modifications to the metadata. The following actions | |
30 | are taken: | |
31 | .RS | |
32 | .TP | |
33 | .B array_state \- inactive | |
34 | Clear the dirty bit for the volume and let the array be stopped | |
35 | .TP | |
36 | .B array_state \- write pending | |
37 | Set the dirty bit for the array and then set | |
38 | .B array_state | |
39 | to | |
40 | .BR active . | |
41 | Writes | |
42 | are blocked until userspace writes | |
43 | .BR active. | |
44 | .TP | |
45 | .B array_state \- active-idle | |
46 | The safe mode timer has expired so set array state to clean to block writes to the array | |
47 | .TP | |
48 | .B array_state \- clean | |
49 | Clear the dirty bit for the volume | |
50 | .TP | |
51 | .B array_state \- read-only | |
e0fe762a N |
52 | This is the initial state that all arrays start at. |
53 | .I mdmon | |
54 | takes one of the three actions: | |
7675959b DW |
55 | .RS |
56 | .TP | |
57 | 1/ | |
58 | Transition the array to read-auto keeping the dirty bit clear if the metadata | |
59 | handler determines that the array does not need resyncing or other modification | |
60 | .TP | |
61 | 2/ | |
62 | Transition the array to active if the metadata handler determines a resync or | |
63 | some other manipulation is necessary | |
64 | .TP | |
65 | 3/ | |
66 | Leave the array read\-only if the volume is marked to not be monitored; for | |
67 | example, the metadata version has been set to "external:\-dev/md127" instead of | |
68 | "external:/dev/md127" | |
69 | .RE | |
70 | .TP | |
71 | .B sync_action \- resync\-to\-idle | |
72 | Notify the metadata handler that a resync may have completed. If a resync | |
73 | process is idled before it completes this event allows the metadata handler to | |
74 | checkpoint resync. | |
75 | .TP | |
76 | .B sync_action \- recover\-to\-idle | |
77 | A spare may have completed rebuilding so tell the metadata handler about the | |
e0fe762a N |
78 | state of each disk. This is the metadata handler's opportunity to clear |
79 | any "out-of-sync" bits and clear the volume's degraded status. If a recovery | |
7675959b DW |
80 | process is idled before it completes this event allows the metadata handler to |
81 | checkpoint recovery. | |
82 | .TP | |
83 | .B <disk>/state \- faulty | |
84 | A disk failure kicks off a series of events. First, notify the metadata | |
85 | handler that a disk has failed, and then notify the kernel that it can unblock | |
86 | writes that were dependent on this disk. After unblocking the kernel this disk | |
e0fe762a | 87 | is set to be removed+ from the member array. Finally the disk is marked failed |
7675959b DW |
88 | in all other member arrays in the container. |
89 | .IP | |
e0fe762a | 90 | + Note This behavior differs slightly from native MD arrays where |
7675959b DW |
91 | removal is reserved for a |
92 | .B mdadm --remove | |
93 | event. In the external metadata case the container holds the final | |
94 | reference on a block device and a | |
95 | .B mdadm --remove <container> <victim> | |
96 | call is still required. | |
97 | .RE | |
98 | ||
e0fe762a | 99 | .SS Containers: |
7675959b DW |
100 | .P |
101 | External metadata formats, like DDF, differ from the native MD metadata | |
102 | formats in that they define a set of disks and a series of sub-arrays | |
103 | within those disks. MD metadata in comparison defines a 1:1 | |
104 | relationship between a set of block devices and a raid array. For | |
105 | example to create 2 arrays at different raid levels on a single | |
106 | set of disks, MD metadata requires the disks be partitioned and then | |
107 | each array can created be created with a subset of those partitions. The | |
108 | supported external formats perform this disk carving internally. | |
109 | .P | |
110 | Container devices simply hold references to all member disks and allow | |
e0fe762a N |
111 | tools like |
112 | .I mdmon | |
113 | to determine which active arrays belong to which | |
7675959b DW |
114 | container. Some array management commands like disk removal and disk |
115 | add are now only valid at the container level. Attempts to perform | |
116 | these actions on member arrays are blocked with error messages like: | |
117 | .IP | |
118 | "mdadm: Cannot remove disks from a \'member\' array, perform this | |
119 | operation on the parent container" | |
120 | .P | |
121 | Containers are identified in /proc/mdstat with a metadata version string | |
122 | "external:<metadata name>". Member devices are identified by | |
123 | "external:/<container device>/<member index>", or "external:-<container | |
124 | device>/<member index>" if the array is to remain readonly. | |
125 | ||
126 | .SH OPTIONS | |
127 | .TP | |
128 | CONTAINER | |
129 | The | |
130 | .B container | |
b5c727dc N |
131 | device to monitor. It can be a full path like /dev/md/container, or a |
132 | simple md device name like md127. | |
7675959b | 133 | .TP |
b5c727dc N |
134 | .B \-\-takeover |
135 | This instructs | |
136 | .I mdmon | |
137 | to replace any active | |
138 | .I mdmon | |
139 | which is currently monitoring the array. This is primarily used late | |
140 | in the boot process to replace any | |
141 | .I mdmon | |
142 | which was started from an | |
143 | .B initramfs | |
144 | before the root filesystem was mounted. This avoids holding a | |
145 | reference on that | |
146 | .B initramfs | |
147 | indefinitely and ensures that the | |
148 | .I pid | |
149 | and | |
150 | .I sock | |
151 | files used to communicate with | |
152 | .I mdmon | |
153 | are in a standard place. | |
5d4d1b26 | 154 | .TP |
b5c727dc N |
155 | .B \-\-all |
156 | This tells mdmon to find any active containers and start monitoring | |
157 | each of them if appropriate. This is normally used with | |
158 | .B \-\-takeover | |
159 | late in the boot sequence. | |
5d4d1b26 | 160 | |
e0fe762a N |
161 | .PP |
162 | Note that | |
163 | .I mdmon | |
164 | is automatically started by | |
165 | .I mdadm | |
166 | when needed and so does not need to be considered when working with | |
167 | RAID arrays. The only times it is run other that by | |
168 | .I mdadm | |
169 | is when the boot scripts need to restart it after mounting the new | |
170 | root filesystem. | |
7675959b | 171 | |
b5c727dc | 172 | .SH EXAMPLES |
5d4d1b26 | 173 | |
b5c727dc | 174 | .B " mdmon \-\-all \-\-takeover" |
5d4d1b26 N |
175 | .br |
176 | Any | |
177 | .I mdmon | |
178 | which is currently running is killed and a new instance is started. | |
b5c727dc N |
179 | This should be run late in the boot sequence and particularly after |
180 | .B /var | |
181 | is mounted and writable. | |
e0fe762a N |
182 | .SH SEE ALSO |
183 | .IR mdadm (8), | |
184 | .IR md (4). |