]>
Commit | Line | Data |
---|---|---|
56eb10c0 NB |
1 | .TH MD 4 |
2 | .SH NAME | |
3 | md \- Multiple Device driver aka Linux Software Raid | |
4 | .SH SYNOPSIS | |
5 | .BI /dev/md n | |
6 | .br | |
7 | .BI /dev/md/ n | |
8 | .SH DESCRIPTION | |
9 | The | |
10 | .B md | |
11 | driver provides virtual devices that are created from one or more | |
e0d19036 | 12 | independent underlying devices. This array of devices often contains |
56eb10c0 | 13 | redundancy, and hence the acronym RAID which stands for a Redundant |
e0d19036 | 14 | Array of Independent Devices. |
56eb10c0 NB |
15 | .PP |
16 | .B md | |
2d465520 NB |
17 | supports RAID levels 1 (mirroring) 4 (striped array with parity |
18 | device) and 5 (striped array with distributed parity information). | |
19 | If a single underlying device fails while using one of these levels, | |
20 | the array will continue to function. | |
56eb10c0 NB |
21 | .PP |
22 | .B md | |
e0d19036 | 23 | also supports a number of pseudo RAID (non-redundant) configurations |
56eb10c0 NB |
24 | including RAID0 (striped array), LINEAR (catenated array) and |
25 | MULTIPATH (a set of different interfaces to the same device). | |
26 | ||
11a3e71d | 27 | .SS MD SUPER BLOCK |
56eb10c0 | 28 | With the exception of Legacy Arrays described below, each device that |
e0d19036 | 29 | is incorporated into an MD array has a |
56eb10c0 NB |
30 | .I super block |
31 | written towards the end of the device. This superblock records | |
32 | information about the structure and state of the array so that the | |
11a3e71d | 33 | array can be reliably re-assembled after a shutdown. |
56eb10c0 NB |
34 | |
35 | The superblock is 4K long and is written into a 64K aligned block that | |
11a3e71d | 36 | starts at least 64K and less than 128K from the end of the device |
56eb10c0 NB |
37 | (i.e. to get the address of the superblock round the size of the |
38 | device down to a multiple of 64K and then subtract 64K). | |
11a3e71d | 39 | The available size of each device is the amount of space before the |
56eb10c0 NB |
40 | super block, so between 64K and 128K is lost when a device in |
41 | incorporated into an MD array. | |
42 | ||
43 | The superblock contains, among other things: | |
44 | .TP | |
45 | LEVEL | |
11a3e71d NB |
46 | The manner in which the devices are arranged into the array |
47 | (linear, raid0, raid1, raid4, raid5, multipath). | |
56eb10c0 NB |
48 | .TP |
49 | UUID | |
50 | a 128 bit Universally Unique Identifier that identifies the array that | |
51 | this device is part of. | |
52 | ||
11a3e71d NB |
53 | .SS LEGACY ARRAYS |
54 | Early versions of the | |
55 | .B md | |
56 | driver only supported Linear and Raid0 configurations and so | |
2d465520 | 57 | did not use an MD superblock (as there is no state that needs to be |
11a3e71d NB |
58 | recorded). While it is strongly recommended that all newly created |
59 | arrays utilise a superblock to help ensure that they are assembled | |
60 | properly, the | |
61 | .B md | |
62 | driver still supports legacy linear and raid0 md arrays that | |
63 | do not have a superblock. | |
64 | ||
56eb10c0 | 65 | .SS LINEAR |
11a3e71d NB |
66 | |
67 | A linear array simply catenates the available space on each | |
68 | drive together to form one large virtual drive. | |
69 | ||
70 | One advantage of this arrangement over the more common RAID0 | |
71 | arrangement is that the array may be reconfigured at a later time with | |
72 | an extra drive and so the array is made bigger without disturbing the | |
73 | data that is on the array. However this cannot be done on a live | |
74 | array. | |
75 | ||
76 | ||
56eb10c0 | 77 | .SS RAID0 |
11a3e71d NB |
78 | |
79 | A RAID0 array (which has zero redundancy) is also known as a | |
80 | striped array. | |
e0d19036 NB |
81 | A RAID0 array is configured at creation with a |
82 | .B "Chunk Size" | |
c913b90e | 83 | which must be a power of two, and at least 4 kibibytes. |
e0d19036 | 84 | |
2d465520 | 85 | The RAID0 driver assigns the first chunk of the array to the first |
e0d19036 | 86 | device, the second chunk to the second device, and so on until all |
2d465520 | 87 | drives have been assigned one chunk. This collection of chunks forms |
e0d19036 NB |
88 | a |
89 | .BR stripe . | |
90 | Further chunks are gathered into stripes in the same way which are | |
91 | assigned to the remaining space in the drives. | |
92 | ||
2d465520 NB |
93 | If devices in the array are not all the same size, then once the |
94 | smallest device has been exhausted, the RAID0 driver starts | |
e0d19036 NB |
95 | collecting chunks into smaller stripes that only span the drives which |
96 | still have remaining space. | |
97 | ||
98 | ||
56eb10c0 | 99 | .SS RAID1 |
e0d19036 NB |
100 | |
101 | A RAID1 array is also known as a mirrored set (though mirrors tend to | |
5787fa49 | 102 | provide reflected images, which RAID1 does not) or a plex. |
e0d19036 NB |
103 | |
104 | Once initialised, each device in a RAID1 array contains exactly the | |
105 | same data. Changes are written to all devices in parallel. Data is | |
106 | read from any one device. The driver attempts to distribute read | |
107 | requests across all devices to maximise performance. | |
108 | ||
109 | All devices in a RAID1 array should be the same size. If they are | |
110 | not, then only the amount of space available on the smallest device is | |
111 | used. Any extra space on other devices is wasted. | |
112 | ||
56eb10c0 | 113 | .SS RAID4 |
e0d19036 NB |
114 | |
115 | A RAID4 array is like a RAID0 array with an extra device for storing | |
116 | parity. Unlike RAID0, RAID4 also requires that all stripes span all | |
117 | drives, so extra space on devices that are larger than the smallest is | |
118 | wasted. | |
119 | ||
120 | When any block in a RAID4 array is modified the parity block for that | |
121 | stripe (i.e. the block in the parity device at the same device offset | |
122 | as the stripe) is also modified so that the parity block always | |
123 | contains the "parity" for the whole stripe. i.e. its contents is | |
124 | equivalent to the result of performing an exclusive-or operation | |
125 | between all the data blocks in the stripe. | |
126 | ||
127 | This allows the array to continue to function if one device fails. | |
128 | The data that was on that device can be calculated as needed from the | |
129 | parity block and the other data blocks. | |
130 | ||
56eb10c0 | 131 | .SS RAID5 |
e0d19036 NB |
132 | |
133 | RAID5 is very similar to RAID4. The difference is that the parity | |
134 | blocks for each stripe, instead of being on a single device, are | |
135 | distributed across all devices. This allows more parallelism when | |
136 | writing as two different block updates will quite possibly affect | |
137 | parity blocks on different devices so there is less contention. | |
138 | ||
139 | This also allows more parallelism when reading as read requests are | |
140 | distributed over all the devices in the array instead of all but one. | |
141 | ||
11a3e71d | 142 | .SS MUTIPATH |
e0d19036 NB |
143 | |
144 | MULTIPATH is not really a RAID at all as there is only one real device | |
145 | in a MULTIPATH md array. However there are multiple access points | |
146 | (paths) to this device, and one of these paths might fail, so there | |
147 | are some similarities. | |
148 | ||
2d465520 NB |
149 | A MULTIPATH array is composed of a number of logical different |
150 | devices, often fibre channel interfaces, that all refer the the same | |
151 | real device. If one of these interfaces fails (e.g. due to cable | |
152 | problems), the multipath driver to attempt to redirect requests to | |
153 | another interface. | |
e0d19036 NB |
154 | |
155 | ||
156 | .SS UNCLEAN SHUTDOWN | |
157 | ||
2d465520 | 158 | When changes are made to a RAID1, RAID4, or RAID5 array there is a |
e0d19036 NB |
159 | possibility of inconsistency for short periods of time as each update |
160 | requires are least two block to be written to different devices, and | |
161 | these writes probably wont happen at exactly the same time. | |
2d465520 | 162 | Thus if a system with one of these arrays is shutdown in the middle of |
e0d19036 NB |
163 | a write operation (e.g. due to power failure), the array may not be |
164 | consistent. | |
165 | ||
2d465520 | 166 | To handle this situation, the md driver marks an array as "dirty" |
e0d19036 NB |
167 | before writing any data to it, and marks it as "clean" when the array |
168 | is being disabled, e.g. at shutdown. | |
169 | If the md driver finds an array to be dirty at startup, it proceeds to | |
170 | correct any possibly inconsistency. For RAID1, this involves copying | |
171 | the contents of the first drive onto all other drives. | |
172 | For RAID4 or RAID5 this involves recalculating the parity for each | |
173 | stripe and making sure that the parity block has the correct data. | |
5787fa49 NB |
174 | This process, known as "resynchronising" or "resync" is performed in |
175 | the background. The array can still be used, though possibly with | |
176 | reduced performance. | |
e0d19036 NB |
177 | |
178 | If a RAID4 or RAID5 array is degraded (missing one drive) when it is | |
179 | restarted after an unclean shutdown, it cannot recalculate parity, and | |
180 | so it is possible that data might be undetectably corrupted. | |
5787fa49 | 181 | The 2.4 md driver |
e0d19036 | 182 | .B does not |
5787fa49 | 183 | alert the operator to this condition. The 2.5 md driver will fail to |
e0d19036 NB |
184 | start an array in this condition without manual intervention. |
185 | ||
186 | .SS RECOVERY | |
187 | ||
188 | If the md driver detects any error on a device in a RAID1, RAID4, or | |
189 | RAID5 array, it immediately disables that device (marking it as faulty) | |
190 | and continues operation on the remaining devices. If there is a spare | |
191 | drive, the driver will start recreating on one of the spare drives the | |
192 | data what was on that failed drive, either by copying a working drive | |
193 | in a RAID1 configuration, or by doing calculations with the parity | |
194 | block on RAID4 and RAID5. | |
195 | ||
2d465520 | 196 | While this recovery process is happening, the md driver will monitor |
e0d19036 NB |
197 | accesses to the array and will slow down the rate of recovery if other |
198 | activity is happening, so that normal access to the array will not be | |
199 | unduly affected. When no other activity is happening, the recovery | |
200 | process proceeds at full speed. The actual speed targets for the two | |
201 | different situations can be controlled by the | |
202 | .B speed_limit_min | |
203 | and | |
204 | .B speed_limit_max | |
205 | control files mentioned below. | |
206 | ||
5787fa49 NB |
207 | .SS KERNEL PARAMETERS |
208 | ||
209 | The md driver recognised three different kernel parameters. | |
210 | .TP | |
211 | .B raid=noautodetect | |
212 | This will disable the normal detection of md arrays that happens at | |
213 | boot time. If a drive is partitioned with MS-DOS style partitions, | |
214 | then if any of the 4 main partitions has a partition type of 0xFD, | |
215 | then that partition will normally be inspected to see if it is part of | |
216 | an MD array, and if any full arrays are found, they are started. This | |
217 | kernel paramenter disables this behaviour. | |
218 | ||
219 | .TP | |
220 | .BI md= n , dev , dev ,... | |
221 | This tells the md driver to assemble | |
222 | .B /dev/md n | |
223 | from the listed devices. It is only necessary to start the device | |
224 | holding the root filesystem this way. Other arrays are best started | |
225 | once the system is booted. | |
226 | ||
227 | .TP | |
228 | .BI md= n , l , c , i , dev... | |
229 | This tells the md driver to assemble a legacy RAID0 or LINEAR array | |
230 | without a superblock. | |
231 | .I n | |
232 | gives the md device number, | |
233 | .I l | |
234 | gives the level, 0 for RAID0 or -1 for LINEAR, | |
235 | .I c | |
236 | gives the chunk size as a base-2 logarithm offset by twelve, so 0 | |
237 | means 4K, 1 means 8K. | |
238 | .I i | |
239 | is ignored (legacy support). | |
e0d19036 | 240 | |
56eb10c0 NB |
241 | .SH FILES |
242 | .TP | |
243 | .B /proc/mdstat | |
244 | Contains information about the status of currently running array. | |
245 | .TP | |
246 | .B /proc/sys/dev/raid/speed_limit_min | |
247 | A readable and writable file that reflects the current goal rebuild | |
248 | speed for times when non-rebuild activity is current on an array. | |
249 | The speed is in Kibibytes per second, and is a per-device rate, not a | |
250 | per-array rate (which means that an array with more disc will shuffle | |
251 | more data for a given speed). The default is 100. | |
252 | ||
253 | .TP | |
254 | .B /proc/sys/dev/raid/speed_limit_max | |
255 | A readable and writable file that reflects the current goal rebuild | |
256 | speed for times when no non-rebuild activity is current on an array. | |
257 | The default is 100,000. | |
258 | ||
259 | .SH SEE ALSO | |
260 | .BR mdadm (8), | |
261 | .BR mkraid (8). |