]>
Commit | Line | Data |
---|---|---|
56eb10c0 NB |
1 | .TH MD 4 |
2 | .SH NAME | |
3 | md \- Multiple Device driver aka Linux Software Raid | |
4 | .SH SYNOPSIS | |
5 | .BI /dev/md n | |
6 | .br | |
7 | .BI /dev/md/ n | |
8 | .SH DESCRIPTION | |
9 | The | |
10 | .B md | |
11 | driver provides virtual devices that are created from one or more | |
e0d19036 | 12 | independent underlying devices. This array of devices often contains |
56eb10c0 | 13 | redundancy, and hence the acronym RAID which stands for a Redundant |
e0d19036 | 14 | Array of Independent Devices. |
56eb10c0 NB |
15 | .PP |
16 | .B md | |
2d465520 | 17 | supports RAID levels 1 (mirroring) 4 (striped array with parity |
98c6faba NB |
18 | device), 5 (striped array with distributed parity information) and 6 |
19 | (striped array with distributed dual redundancy information.) If a | |
20 | some number of underlying devices fails while using one of these | |
21 | levels, the array will continue to function; this number is one for | |
22 | RAID levels 4 and 5, two for RAID level 6, and all but one (N-1) for | |
23 | RAID level 1. | |
56eb10c0 NB |
24 | .PP |
25 | .B md | |
e0d19036 | 26 | also supports a number of pseudo RAID (non-redundant) configurations |
56eb10c0 NB |
27 | including RAID0 (striped array), LINEAR (catenated array) and |
28 | MULTIPATH (a set of different interfaces to the same device). | |
29 | ||
11a3e71d | 30 | .SS MD SUPER BLOCK |
56eb10c0 | 31 | With the exception of Legacy Arrays described below, each device that |
e0d19036 | 32 | is incorporated into an MD array has a |
56eb10c0 NB |
33 | .I super block |
34 | written towards the end of the device. This superblock records | |
35 | information about the structure and state of the array so that the | |
11a3e71d | 36 | array can be reliably re-assembled after a shutdown. |
56eb10c0 NB |
37 | |
38 | The superblock is 4K long and is written into a 64K aligned block that | |
11a3e71d | 39 | starts at least 64K and less than 128K from the end of the device |
56eb10c0 NB |
40 | (i.e. to get the address of the superblock round the size of the |
41 | device down to a multiple of 64K and then subtract 64K). | |
11a3e71d | 42 | The available size of each device is the amount of space before the |
56eb10c0 NB |
43 | super block, so between 64K and 128K is lost when a device in |
44 | incorporated into an MD array. | |
45 | ||
46 | The superblock contains, among other things: | |
47 | .TP | |
48 | LEVEL | |
11a3e71d NB |
49 | The manner in which the devices are arranged into the array |
50 | (linear, raid0, raid1, raid4, raid5, multipath). | |
56eb10c0 NB |
51 | .TP |
52 | UUID | |
53 | a 128 bit Universally Unique Identifier that identifies the array that | |
54 | this device is part of. | |
55 | ||
11a3e71d NB |
56 | .SS LEGACY ARRAYS |
57 | Early versions of the | |
58 | .B md | |
59 | driver only supported Linear and Raid0 configurations and so | |
2d465520 | 60 | did not use an MD superblock (as there is no state that needs to be |
11a3e71d NB |
61 | recorded). While it is strongly recommended that all newly created |
62 | arrays utilise a superblock to help ensure that they are assembled | |
63 | properly, the | |
64 | .B md | |
65 | driver still supports legacy linear and raid0 md arrays that | |
66 | do not have a superblock. | |
67 | ||
56eb10c0 | 68 | .SS LINEAR |
11a3e71d NB |
69 | |
70 | A linear array simply catenates the available space on each | |
71 | drive together to form one large virtual drive. | |
72 | ||
73 | One advantage of this arrangement over the more common RAID0 | |
74 | arrangement is that the array may be reconfigured at a later time with | |
75 | an extra drive and so the array is made bigger without disturbing the | |
76 | data that is on the array. However this cannot be done on a live | |
77 | array. | |
78 | ||
79 | ||
56eb10c0 | 80 | .SS RAID0 |
11a3e71d NB |
81 | |
82 | A RAID0 array (which has zero redundancy) is also known as a | |
83 | striped array. | |
e0d19036 NB |
84 | A RAID0 array is configured at creation with a |
85 | .B "Chunk Size" | |
c913b90e | 86 | which must be a power of two, and at least 4 kibibytes. |
e0d19036 | 87 | |
2d465520 | 88 | The RAID0 driver assigns the first chunk of the array to the first |
e0d19036 | 89 | device, the second chunk to the second device, and so on until all |
2d465520 | 90 | drives have been assigned one chunk. This collection of chunks forms |
e0d19036 NB |
91 | a |
92 | .BR stripe . | |
93 | Further chunks are gathered into stripes in the same way which are | |
94 | assigned to the remaining space in the drives. | |
95 | ||
2d465520 NB |
96 | If devices in the array are not all the same size, then once the |
97 | smallest device has been exhausted, the RAID0 driver starts | |
e0d19036 NB |
98 | collecting chunks into smaller stripes that only span the drives which |
99 | still have remaining space. | |
100 | ||
101 | ||
56eb10c0 | 102 | .SS RAID1 |
e0d19036 NB |
103 | |
104 | A RAID1 array is also known as a mirrored set (though mirrors tend to | |
5787fa49 | 105 | provide reflected images, which RAID1 does not) or a plex. |
e0d19036 NB |
106 | |
107 | Once initialised, each device in a RAID1 array contains exactly the | |
108 | same data. Changes are written to all devices in parallel. Data is | |
109 | read from any one device. The driver attempts to distribute read | |
110 | requests across all devices to maximise performance. | |
111 | ||
112 | All devices in a RAID1 array should be the same size. If they are | |
113 | not, then only the amount of space available on the smallest device is | |
114 | used. Any extra space on other devices is wasted. | |
115 | ||
56eb10c0 | 116 | .SS RAID4 |
e0d19036 NB |
117 | |
118 | A RAID4 array is like a RAID0 array with an extra device for storing | |
aa88f531 NB |
119 | parity. This device is the last of the active devices in the |
120 | array. Unlike RAID0, RAID4 also requires that all stripes span all | |
e0d19036 NB |
121 | drives, so extra space on devices that are larger than the smallest is |
122 | wasted. | |
123 | ||
124 | When any block in a RAID4 array is modified the parity block for that | |
125 | stripe (i.e. the block in the parity device at the same device offset | |
126 | as the stripe) is also modified so that the parity block always | |
127 | contains the "parity" for the whole stripe. i.e. its contents is | |
128 | equivalent to the result of performing an exclusive-or operation | |
129 | between all the data blocks in the stripe. | |
130 | ||
131 | This allows the array to continue to function if one device fails. | |
132 | The data that was on that device can be calculated as needed from the | |
133 | parity block and the other data blocks. | |
134 | ||
56eb10c0 | 135 | .SS RAID5 |
e0d19036 NB |
136 | |
137 | RAID5 is very similar to RAID4. The difference is that the parity | |
138 | blocks for each stripe, instead of being on a single device, are | |
139 | distributed across all devices. This allows more parallelism when | |
140 | writing as two different block updates will quite possibly affect | |
141 | parity blocks on different devices so there is less contention. | |
142 | ||
143 | This also allows more parallelism when reading as read requests are | |
144 | distributed over all the devices in the array instead of all but one. | |
145 | ||
98c6faba NB |
146 | .SS RAID6 |
147 | ||
148 | RAID6 is similar to RAID5, but can handle the loss of any \fItwo\fP | |
149 | devices without data loss. Accordingly, it requires N+2 drives to | |
150 | store N drives worth of data. | |
151 | ||
152 | The performance for RAID6 is slightly lower but comparable to RAID5 in | |
153 | normal mode and single disk failure mode. It is very slow in dual | |
154 | disk failure mode, however. | |
155 | ||
11a3e71d | 156 | .SS MUTIPATH |
e0d19036 NB |
157 | |
158 | MULTIPATH is not really a RAID at all as there is only one real device | |
159 | in a MULTIPATH md array. However there are multiple access points | |
160 | (paths) to this device, and one of these paths might fail, so there | |
161 | are some similarities. | |
162 | ||
2d465520 NB |
163 | A MULTIPATH array is composed of a number of logical different |
164 | devices, often fibre channel interfaces, that all refer the the same | |
165 | real device. If one of these interfaces fails (e.g. due to cable | |
166 | problems), the multipath driver to attempt to redirect requests to | |
167 | another interface. | |
e0d19036 NB |
168 | |
169 | ||
170 | .SS UNCLEAN SHUTDOWN | |
171 | ||
98c6faba | 172 | When changes are made to a RAID1, RAID4, RAID5 or RAID6 array there is a |
e0d19036 NB |
173 | possibility of inconsistency for short periods of time as each update |
174 | requires are least two block to be written to different devices, and | |
175 | these writes probably wont happen at exactly the same time. | |
2d465520 | 176 | Thus if a system with one of these arrays is shutdown in the middle of |
e0d19036 NB |
177 | a write operation (e.g. due to power failure), the array may not be |
178 | consistent. | |
179 | ||
2d465520 | 180 | To handle this situation, the md driver marks an array as "dirty" |
e0d19036 | 181 | before writing any data to it, and marks it as "clean" when the array |
98c6faba NB |
182 | is being disabled, e.g. at shutdown. If the md driver finds an array |
183 | to be dirty at startup, it proceeds to correct any possibly | |
184 | inconsistency. For RAID1, this involves copying the contents of the | |
185 | first drive onto all other drives. For RAID4, RAID5 and RAID6 this | |
186 | involves recalculating the parity for each stripe and making sure that | |
187 | the parity block has the correct data. This process, known as | |
188 | "resynchronising" or "resync" is performed in the background. The | |
189 | array can still be used, though possibly with reduced performance. | |
190 | ||
191 | If a RAID4, RAID5 or RAID6 array is degraded (missing at least one | |
192 | drive) when it is restarted after an unclean shutdown, it cannot | |
193 | recalculate parity, and so it is possible that data might be | |
194 | undetectably corrupted. The 2.4 md driver | |
e0d19036 | 195 | .B does not |
5787fa49 | 196 | alert the operator to this condition. The 2.5 md driver will fail to |
e0d19036 NB |
197 | start an array in this condition without manual intervention. |
198 | ||
199 | .SS RECOVERY | |
200 | ||
98c6faba NB |
201 | If the md driver detects any error on a device in a RAID1, RAID4, |
202 | RAID5 or RAID6 array, it immediately disables that device (marking it | |
203 | as faulty) and continues operation on the remaining devices. If there | |
204 | is a spare drive, the driver will start recreating on one of the spare | |
205 | drives the data what was on that failed drive, either by copying a | |
206 | working drive in a RAID1 configuration, or by doing calculations with | |
207 | the parity block on RAID4, RAID5 or RAID6. | |
e0d19036 | 208 | |
2d465520 | 209 | While this recovery process is happening, the md driver will monitor |
e0d19036 NB |
210 | accesses to the array and will slow down the rate of recovery if other |
211 | activity is happening, so that normal access to the array will not be | |
212 | unduly affected. When no other activity is happening, the recovery | |
213 | process proceeds at full speed. The actual speed targets for the two | |
214 | different situations can be controlled by the | |
215 | .B speed_limit_min | |
216 | and | |
217 | .B speed_limit_max | |
218 | control files mentioned below. | |
219 | ||
5787fa49 NB |
220 | .SS KERNEL PARAMETERS |
221 | ||
222 | The md driver recognised three different kernel parameters. | |
223 | .TP | |
224 | .B raid=noautodetect | |
225 | This will disable the normal detection of md arrays that happens at | |
226 | boot time. If a drive is partitioned with MS-DOS style partitions, | |
227 | then if any of the 4 main partitions has a partition type of 0xFD, | |
228 | then that partition will normally be inspected to see if it is part of | |
229 | an MD array, and if any full arrays are found, they are started. This | |
230 | kernel paramenter disables this behaviour. | |
231 | ||
232 | .TP | |
233 | .BI md= n , dev , dev ,... | |
234 | This tells the md driver to assemble | |
235 | .B /dev/md n | |
236 | from the listed devices. It is only necessary to start the device | |
237 | holding the root filesystem this way. Other arrays are best started | |
238 | once the system is booted. | |
239 | ||
240 | .TP | |
241 | .BI md= n , l , c , i , dev... | |
242 | This tells the md driver to assemble a legacy RAID0 or LINEAR array | |
243 | without a superblock. | |
244 | .I n | |
245 | gives the md device number, | |
246 | .I l | |
247 | gives the level, 0 for RAID0 or -1 for LINEAR, | |
248 | .I c | |
249 | gives the chunk size as a base-2 logarithm offset by twelve, so 0 | |
250 | means 4K, 1 means 8K. | |
251 | .I i | |
252 | is ignored (legacy support). | |
e0d19036 | 253 | |
56eb10c0 NB |
254 | .SH FILES |
255 | .TP | |
256 | .B /proc/mdstat | |
257 | Contains information about the status of currently running array. | |
258 | .TP | |
259 | .B /proc/sys/dev/raid/speed_limit_min | |
260 | A readable and writable file that reflects the current goal rebuild | |
261 | speed for times when non-rebuild activity is current on an array. | |
262 | The speed is in Kibibytes per second, and is a per-device rate, not a | |
263 | per-array rate (which means that an array with more disc will shuffle | |
264 | more data for a given speed). The default is 100. | |
265 | ||
266 | .TP | |
267 | .B /proc/sys/dev/raid/speed_limit_max | |
268 | A readable and writable file that reflects the current goal rebuild | |
269 | speed for times when no non-rebuild activity is current on an array. | |
270 | The default is 100,000. | |
271 | ||
272 | .SH SEE ALSO | |
273 | .BR mdadm (8), | |
274 | .BR mkraid (8). |