]>
Commit | Line | Data |
---|---|---|
56eb10c0 NB |
1 | .TH MD 4 |
2 | .SH NAME | |
3 | md \- Multiple Device driver aka Linux Software Raid | |
4 | .SH SYNOPSIS | |
5 | .BI /dev/md n | |
6 | .br | |
7 | .BI /dev/md/ n | |
8 | .SH DESCRIPTION | |
9 | The | |
10 | .B md | |
11 | driver provides virtual devices that are created from one or more | |
e0d19036 | 12 | independent underlying devices. This array of devices often contains |
56eb10c0 | 13 | redundancy, and hence the acronym RAID which stands for a Redundant |
e0d19036 | 14 | Array of Independent Devices. |
56eb10c0 NB |
15 | .PP |
16 | .B md | |
2d465520 | 17 | supports RAID levels 1 (mirroring) 4 (striped array with parity |
98c6faba | 18 | device), 5 (striped array with distributed parity information) and 6 |
570c0542 | 19 | (striped array with distributed dual redundancy information.) If |
98c6faba NB |
20 | some number of underlying devices fails while using one of these |
21 | levels, the array will continue to function; this number is one for | |
22 | RAID levels 4 and 5, two for RAID level 6, and all but one (N-1) for | |
23 | RAID level 1. | |
56eb10c0 NB |
24 | .PP |
25 | .B md | |
e0d19036 | 26 | also supports a number of pseudo RAID (non-redundant) configurations |
570c0542 NB |
27 | including RAID0 (striped array), LINEAR (catenated array), |
28 | MULTIPATH (a set of different interfaces to the same device), | |
29 | and FAULTY (a layer over a single device into which errors can be injected). | |
56eb10c0 | 30 | |
11a3e71d | 31 | .SS MD SUPER BLOCK |
570c0542 NB |
32 | Each device in an array may have a |
33 | .I superblock | |
34 | which records information about the structure and state of the array. | |
35 | This allows the array to be reliably re-assembled after a shutdown. | |
56eb10c0 | 36 | |
570c0542 NB |
37 | From Linux kernel version 2.6.10, |
38 | .B md | |
39 | provides support for two different formats of this superblock, and | |
40 | other formats can be added. Prior to this release, only one format is | |
41 | supported. | |
42 | ||
43 | The common format - known as version 0.90 - has | |
44 | a superblock that is 4K long and is written into a 64K aligned block that | |
11a3e71d | 45 | starts at least 64K and less than 128K from the end of the device |
56eb10c0 NB |
46 | (i.e. to get the address of the superblock round the size of the |
47 | device down to a multiple of 64K and then subtract 64K). | |
11a3e71d | 48 | The available size of each device is the amount of space before the |
56eb10c0 NB |
49 | super block, so between 64K and 128K is lost when a device in |
50 | incorporated into an MD array. | |
570c0542 NB |
51 | This superblock stores multi-byte fields in a processor-dependant |
52 | manner, so arrays cannot easily be moved between computers with | |
53 | different processors. | |
54 | ||
55 | The new format - known as version 1 - has a superblock that is | |
56 | normally 1K long, but can be longer. It is normally stored between 8K | |
57 | and 12K from the end of the device, on a 4K boundary, though | |
58 | variations can be stored at the start of the device (version 1.1) or 4K from | |
59 | the start of the device (version 1.2). | |
60 | This superblock format stores multibyte data in a | |
61 | processor-independant format and has supports upto hundreds of | |
62 | component devices (version 0.90 only supports 28). | |
56eb10c0 NB |
63 | |
64 | The superblock contains, among other things: | |
65 | .TP | |
66 | LEVEL | |
11a3e71d NB |
67 | The manner in which the devices are arranged into the array |
68 | (linear, raid0, raid1, raid4, raid5, multipath). | |
56eb10c0 NB |
69 | .TP |
70 | UUID | |
71 | a 128 bit Universally Unique Identifier that identifies the array that | |
72 | this device is part of. | |
73 | ||
570c0542 NB |
74 | .SS ARRAYS WITHOUT SUPERBLOCKS |
75 | While it is usually best to create arrays with superblocks so that | |
76 | they can be assembled reliably, there are some circumstances where an | |
77 | array without superblocks in preferred. This include: | |
78 | .TP | |
79 | LEGACY ARRAYS | |
11a3e71d NB |
80 | Early versions of the |
81 | .B md | |
570c0542 NB |
82 | driver only supported Linear and Raid0 configurations and did not use |
83 | a superblock (which is less critical with these configurations). | |
84 | While such arrays should be rebuilt with superblocks if possible, | |
11a3e71d | 85 | .B md |
570c0542 NB |
86 | continues to support them. |
87 | .TP | |
88 | FAULTY | |
89 | Being a largely transparent layer over a different device, the FAULTY | |
90 | personality doesn't gain anything from having a superblock. | |
91 | .TP | |
92 | MULTIPATH | |
93 | It is often possible to detect devices which are different paths to | |
94 | the same storage directly rather than having a distinctive superblock | |
95 | written to the device and searched for on all paths. In this case, | |
96 | a MULTIPATH array with no superblock makes sense. | |
97 | .TP | |
98 | RAID1 | |
99 | In some configurations it might be desired to create a raid1 | |
100 | configuration that does use a superblock, and to maintain the state of | |
101 | the array elsewhere. While not encouraged, this is supported. | |
11a3e71d | 102 | |
56eb10c0 | 103 | .SS LINEAR |
11a3e71d NB |
104 | |
105 | A linear array simply catenates the available space on each | |
106 | drive together to form one large virtual drive. | |
107 | ||
108 | One advantage of this arrangement over the more common RAID0 | |
109 | arrangement is that the array may be reconfigured at a later time with | |
110 | an extra drive and so the array is made bigger without disturbing the | |
111 | data that is on the array. However this cannot be done on a live | |
112 | array. | |
113 | ||
114 | ||
56eb10c0 | 115 | .SS RAID0 |
11a3e71d NB |
116 | |
117 | A RAID0 array (which has zero redundancy) is also known as a | |
118 | striped array. | |
e0d19036 NB |
119 | A RAID0 array is configured at creation with a |
120 | .B "Chunk Size" | |
c913b90e | 121 | which must be a power of two, and at least 4 kibibytes. |
e0d19036 | 122 | |
2d465520 | 123 | The RAID0 driver assigns the first chunk of the array to the first |
e0d19036 | 124 | device, the second chunk to the second device, and so on until all |
2d465520 | 125 | drives have been assigned one chunk. This collection of chunks forms |
e0d19036 NB |
126 | a |
127 | .BR stripe . | |
128 | Further chunks are gathered into stripes in the same way which are | |
129 | assigned to the remaining space in the drives. | |
130 | ||
2d465520 NB |
131 | If devices in the array are not all the same size, then once the |
132 | smallest device has been exhausted, the RAID0 driver starts | |
e0d19036 NB |
133 | collecting chunks into smaller stripes that only span the drives which |
134 | still have remaining space. | |
135 | ||
136 | ||
56eb10c0 | 137 | .SS RAID1 |
e0d19036 NB |
138 | |
139 | A RAID1 array is also known as a mirrored set (though mirrors tend to | |
5787fa49 | 140 | provide reflected images, which RAID1 does not) or a plex. |
e0d19036 NB |
141 | |
142 | Once initialised, each device in a RAID1 array contains exactly the | |
143 | same data. Changes are written to all devices in parallel. Data is | |
144 | read from any one device. The driver attempts to distribute read | |
145 | requests across all devices to maximise performance. | |
146 | ||
147 | All devices in a RAID1 array should be the same size. If they are | |
148 | not, then only the amount of space available on the smallest device is | |
149 | used. Any extra space on other devices is wasted. | |
150 | ||
56eb10c0 | 151 | .SS RAID4 |
e0d19036 NB |
152 | |
153 | A RAID4 array is like a RAID0 array with an extra device for storing | |
aa88f531 NB |
154 | parity. This device is the last of the active devices in the |
155 | array. Unlike RAID0, RAID4 also requires that all stripes span all | |
e0d19036 NB |
156 | drives, so extra space on devices that are larger than the smallest is |
157 | wasted. | |
158 | ||
159 | When any block in a RAID4 array is modified the parity block for that | |
160 | stripe (i.e. the block in the parity device at the same device offset | |
161 | as the stripe) is also modified so that the parity block always | |
162 | contains the "parity" for the whole stripe. i.e. its contents is | |
163 | equivalent to the result of performing an exclusive-or operation | |
164 | between all the data blocks in the stripe. | |
165 | ||
166 | This allows the array to continue to function if one device fails. | |
167 | The data that was on that device can be calculated as needed from the | |
168 | parity block and the other data blocks. | |
169 | ||
56eb10c0 | 170 | .SS RAID5 |
e0d19036 NB |
171 | |
172 | RAID5 is very similar to RAID4. The difference is that the parity | |
173 | blocks for each stripe, instead of being on a single device, are | |
174 | distributed across all devices. This allows more parallelism when | |
175 | writing as two different block updates will quite possibly affect | |
176 | parity blocks on different devices so there is less contention. | |
177 | ||
178 | This also allows more parallelism when reading as read requests are | |
179 | distributed over all the devices in the array instead of all but one. | |
180 | ||
98c6faba NB |
181 | .SS RAID6 |
182 | ||
183 | RAID6 is similar to RAID5, but can handle the loss of any \fItwo\fP | |
184 | devices without data loss. Accordingly, it requires N+2 drives to | |
185 | store N drives worth of data. | |
186 | ||
187 | The performance for RAID6 is slightly lower but comparable to RAID5 in | |
188 | normal mode and single disk failure mode. It is very slow in dual | |
189 | disk failure mode, however. | |
190 | ||
11a3e71d | 191 | .SS MUTIPATH |
e0d19036 NB |
192 | |
193 | MULTIPATH is not really a RAID at all as there is only one real device | |
194 | in a MULTIPATH md array. However there are multiple access points | |
195 | (paths) to this device, and one of these paths might fail, so there | |
196 | are some similarities. | |
197 | ||
2d465520 NB |
198 | A MULTIPATH array is composed of a number of logical different |
199 | devices, often fibre channel interfaces, that all refer the the same | |
200 | real device. If one of these interfaces fails (e.g. due to cable | |
201 | problems), the multipath driver to attempt to redirect requests to | |
202 | another interface. | |
e0d19036 | 203 | |
b5e64645 NB |
204 | .SS FAULTY |
205 | The FAULTY md module is provided for testing purposes. A faulty array | |
206 | has exactly one component device and is normally assembled without a | |
207 | superblock, so the md array created provides direct access to all of | |
208 | the data in the component device. | |
209 | ||
210 | The FAULTY module may be requested to simulate faults to allow testing | |
211 | of other md levels or of filesystem. Faults can be chosen to trigger | |
212 | on read requests or write requests, and can be transient (a subsequent | |
213 | read/write at the address will probably succeed) or persistant | |
214 | (subsequent read/write of the same address will fail). Further, read | |
215 | faults can be "fixable" meaning that they persist until a write | |
216 | request at the same address. | |
217 | ||
218 | Fault types can be requested with a period. In this case the fault | |
219 | will recur repeatedly after the given number of request of the | |
220 | relevant time. For example if persistent read faults have a period of | |
221 | 100, then ever 100th read request would generate a fault, and the | |
222 | faulty sector would be recorded so that subsequent reads on that | |
223 | sector would also fail. | |
224 | ||
225 | There is a limit to the number of faulty sectors that are remembered. | |
226 | Faults generated after this limit is exhausted are treated as | |
227 | transient. | |
228 | ||
229 | It list of faulty sectors can be flushed, and the active list of | |
230 | failure modes can be cleared. | |
e0d19036 NB |
231 | |
232 | .SS UNCLEAN SHUTDOWN | |
233 | ||
98c6faba | 234 | When changes are made to a RAID1, RAID4, RAID5 or RAID6 array there is a |
e0d19036 NB |
235 | possibility of inconsistency for short periods of time as each update |
236 | requires are least two block to be written to different devices, and | |
237 | these writes probably wont happen at exactly the same time. | |
2d465520 | 238 | Thus if a system with one of these arrays is shutdown in the middle of |
e0d19036 NB |
239 | a write operation (e.g. due to power failure), the array may not be |
240 | consistent. | |
241 | ||
2d465520 | 242 | To handle this situation, the md driver marks an array as "dirty" |
e0d19036 | 243 | before writing any data to it, and marks it as "clean" when the array |
98c6faba NB |
244 | is being disabled, e.g. at shutdown. If the md driver finds an array |
245 | to be dirty at startup, it proceeds to correct any possibly | |
246 | inconsistency. For RAID1, this involves copying the contents of the | |
247 | first drive onto all other drives. For RAID4, RAID5 and RAID6 this | |
248 | involves recalculating the parity for each stripe and making sure that | |
249 | the parity block has the correct data. This process, known as | |
250 | "resynchronising" or "resync" is performed in the background. The | |
251 | array can still be used, though possibly with reduced performance. | |
252 | ||
253 | If a RAID4, RAID5 or RAID6 array is degraded (missing at least one | |
254 | drive) when it is restarted after an unclean shutdown, it cannot | |
255 | recalculate parity, and so it is possible that data might be | |
256 | undetectably corrupted. The 2.4 md driver | |
e0d19036 | 257 | .B does not |
5787fa49 | 258 | alert the operator to this condition. The 2.5 md driver will fail to |
e0d19036 NB |
259 | start an array in this condition without manual intervention. |
260 | ||
261 | .SS RECOVERY | |
262 | ||
98c6faba NB |
263 | If the md driver detects any error on a device in a RAID1, RAID4, |
264 | RAID5 or RAID6 array, it immediately disables that device (marking it | |
265 | as faulty) and continues operation on the remaining devices. If there | |
266 | is a spare drive, the driver will start recreating on one of the spare | |
267 | drives the data what was on that failed drive, either by copying a | |
268 | working drive in a RAID1 configuration, or by doing calculations with | |
269 | the parity block on RAID4, RAID5 or RAID6. | |
e0d19036 | 270 | |
2d465520 | 271 | While this recovery process is happening, the md driver will monitor |
e0d19036 NB |
272 | accesses to the array and will slow down the rate of recovery if other |
273 | activity is happening, so that normal access to the array will not be | |
274 | unduly affected. When no other activity is happening, the recovery | |
275 | process proceeds at full speed. The actual speed targets for the two | |
276 | different situations can be controlled by the | |
277 | .B speed_limit_min | |
278 | and | |
279 | .B speed_limit_max | |
280 | control files mentioned below. | |
281 | ||
5787fa49 NB |
282 | .SS KERNEL PARAMETERS |
283 | ||
284 | The md driver recognised three different kernel parameters. | |
285 | .TP | |
286 | .B raid=noautodetect | |
287 | This will disable the normal detection of md arrays that happens at | |
288 | boot time. If a drive is partitioned with MS-DOS style partitions, | |
289 | then if any of the 4 main partitions has a partition type of 0xFD, | |
290 | then that partition will normally be inspected to see if it is part of | |
291 | an MD array, and if any full arrays are found, they are started. This | |
292 | kernel paramenter disables this behaviour. | |
293 | ||
294 | .TP | |
295 | .BI md= n , dev , dev ,... | |
296 | This tells the md driver to assemble | |
297 | .B /dev/md n | |
298 | from the listed devices. It is only necessary to start the device | |
299 | holding the root filesystem this way. Other arrays are best started | |
300 | once the system is booted. | |
301 | ||
302 | .TP | |
303 | .BI md= n , l , c , i , dev... | |
304 | This tells the md driver to assemble a legacy RAID0 or LINEAR array | |
305 | without a superblock. | |
306 | .I n | |
307 | gives the md device number, | |
308 | .I l | |
309 | gives the level, 0 for RAID0 or -1 for LINEAR, | |
310 | .I c | |
311 | gives the chunk size as a base-2 logarithm offset by twelve, so 0 | |
312 | means 4K, 1 means 8K. | |
313 | .I i | |
314 | is ignored (legacy support). | |
e0d19036 | 315 | |
56eb10c0 NB |
316 | .SH FILES |
317 | .TP | |
318 | .B /proc/mdstat | |
319 | Contains information about the status of currently running array. | |
320 | .TP | |
321 | .B /proc/sys/dev/raid/speed_limit_min | |
322 | A readable and writable file that reflects the current goal rebuild | |
323 | speed for times when non-rebuild activity is current on an array. | |
324 | The speed is in Kibibytes per second, and is a per-device rate, not a | |
325 | per-array rate (which means that an array with more disc will shuffle | |
326 | more data for a given speed). The default is 100. | |
327 | ||
328 | .TP | |
329 | .B /proc/sys/dev/raid/speed_limit_max | |
330 | A readable and writable file that reflects the current goal rebuild | |
331 | speed for times when no non-rebuild activity is current on an array. | |
332 | The default is 100,000. | |
333 | ||
334 | .SH SEE ALSO | |
335 | .BR mdadm (8), | |
336 | .BR mkraid (8). |