]>
Commit | Line | Data |
---|---|---|
7675959b | 1 | .\" See file COPYING in distribution for details. |
b9d77223 | 2 | .TH MDMON 8 "" v3.0-devel3 |
7675959b DW |
3 | .SH NAME |
4 | mdmon \- monitor MD external metadata arrays | |
5 | ||
6 | .SH SYNOPSIS | |
7 | ||
8 | .BI mdmon " CONTAINER [NEWROOT]" | |
9 | ||
10 | .SH OVERVIEW | |
11 | The 2.6.27 kernel brings the ability to support external metadata arrays. | |
12 | External metadata implies that user space handles all updates to the metadata. | |
13 | The kernel's responsibility is to notify user space when a "metadata event" | |
14 | occurs, like disk failures and clean-to-dirty transitions. The kernel, in | |
15 | important cases, waits for user space to take action on these notifications. | |
16 | ||
17 | .SH DESCRIPTION | |
18 | .P | |
19 | .B Metadata updates: | |
20 | .P | |
21 | To service metadata update requests a daemon, mdmon, is introduced. | |
22 | Mdmon is tasked with polling the sysfs namespace looking for changes in | |
23 | .BR array_state , | |
24 | .BR sync_action , | |
25 | and per disk | |
26 | .BR state | |
27 | attributes. When a change is detected it calls a per metadata type | |
28 | handler to make modifications to the metadata. The following actions | |
29 | are taken: | |
30 | .RS | |
31 | .TP | |
32 | .B array_state \- inactive | |
33 | Clear the dirty bit for the volume and let the array be stopped | |
34 | .TP | |
35 | .B array_state \- write pending | |
36 | Set the dirty bit for the array and then set | |
37 | .B array_state | |
38 | to | |
39 | .BR active . | |
40 | Writes | |
41 | are blocked until userspace writes | |
42 | .BR active. | |
43 | .TP | |
44 | .B array_state \- active-idle | |
45 | The safe mode timer has expired so set array state to clean to block writes to the array | |
46 | .TP | |
47 | .B array_state \- clean | |
48 | Clear the dirty bit for the volume | |
49 | .TP | |
50 | .B array_state \- read-only | |
51 | This is the initial state that all arrays start at. mdmon takes one of the three actions: | |
52 | .RS | |
53 | .TP | |
54 | 1/ | |
55 | Transition the array to read-auto keeping the dirty bit clear if the metadata | |
56 | handler determines that the array does not need resyncing or other modification | |
57 | .TP | |
58 | 2/ | |
59 | Transition the array to active if the metadata handler determines a resync or | |
60 | some other manipulation is necessary | |
61 | .TP | |
62 | 3/ | |
63 | Leave the array read\-only if the volume is marked to not be monitored; for | |
64 | example, the metadata version has been set to "external:\-dev/md127" instead of | |
65 | "external:/dev/md127" | |
66 | .RE | |
67 | .TP | |
68 | .B sync_action \- resync\-to\-idle | |
69 | Notify the metadata handler that a resync may have completed. If a resync | |
70 | process is idled before it completes this event allows the metadata handler to | |
71 | checkpoint resync. | |
72 | .TP | |
73 | .B sync_action \- recover\-to\-idle | |
74 | A spare may have completed rebuilding so tell the metadata handler about the | |
75 | state of each disk. This is the metadata handler’s opportunity to clear any | |
76 | "out-of-sync" bits and clear the volume’s degraded status. If a recovery | |
77 | process is idled before it completes this event allows the metadata handler to | |
78 | checkpoint recovery. | |
79 | .TP | |
80 | .B <disk>/state \- faulty | |
81 | A disk failure kicks off a series of events. First, notify the metadata | |
82 | handler that a disk has failed, and then notify the kernel that it can unblock | |
83 | writes that were dependent on this disk. After unblocking the kernel this disk | |
84 | is set to be removed* from the member array. Finally the disk is marked failed | |
85 | in all other member arrays in the container. | |
86 | .IP | |
87 | \* Note This behavior differs slightly from native MD arrays where | |
88 | removal is reserved for a | |
89 | .B mdadm --remove | |
90 | event. In the external metadata case the container holds the final | |
91 | reference on a block device and a | |
92 | .B mdadm --remove <container> <victim> | |
93 | call is still required. | |
94 | .RE | |
95 | ||
96 | .P | |
97 | .B Containers: | |
98 | .P | |
99 | External metadata formats, like DDF, differ from the native MD metadata | |
100 | formats in that they define a set of disks and a series of sub-arrays | |
101 | within those disks. MD metadata in comparison defines a 1:1 | |
102 | relationship between a set of block devices and a raid array. For | |
103 | example to create 2 arrays at different raid levels on a single | |
104 | set of disks, MD metadata requires the disks be partitioned and then | |
105 | each array can created be created with a subset of those partitions. The | |
106 | supported external formats perform this disk carving internally. | |
107 | .P | |
108 | Container devices simply hold references to all member disks and allow | |
109 | tools like mdmon to determine which active arrays belong to which | |
110 | container. Some array management commands like disk removal and disk | |
111 | add are now only valid at the container level. Attempts to perform | |
112 | these actions on member arrays are blocked with error messages like: | |
113 | .IP | |
114 | "mdadm: Cannot remove disks from a \'member\' array, perform this | |
115 | operation on the parent container" | |
116 | .P | |
117 | Containers are identified in /proc/mdstat with a metadata version string | |
118 | "external:<metadata name>". Member devices are identified by | |
119 | "external:/<container device>/<member index>", or "external:-<container | |
120 | device>/<member index>" if the array is to remain readonly. | |
121 | ||
122 | .SH OPTIONS | |
123 | .TP | |
124 | CONTAINER | |
125 | The | |
126 | .B container | |
127 | device to monitor. It can be a full path like /dev/md/container, a simple md | |
128 | device name like md127, or /proc/mdstat which tells mdmon to scan for | |
129 | containers and launch an mdmon instance for each one found. | |
130 | .TP | |
131 | [NEWROOT] | |
132 | In order to support an external metadata raid array as the rootfs mdmon needs | |
133 | to be started in the initramfs environment. Once the initramfs environment | |
134 | mounts the final rootfs mdmon needs to be restarted in the new namespace. When | |
135 | NEWROOT is specified mdmon will terminate any mdmon instances that are running | |
136 | in the current namespace, chroot(2) to NEWROOT, and continue monitoring the | |
137 | container. | |
138 |