]>
Commit | Line | Data |
---|---|---|
56eb10c0 NB |
1 | .TH MD 4 |
2 | .SH NAME | |
3 | md \- Multiple Device driver aka Linux Software Raid | |
4 | .SH SYNOPSIS | |
5 | .BI /dev/md n | |
6 | .br | |
7 | .BI /dev/md/ n | |
8 | .SH DESCRIPTION | |
9 | The | |
10 | .B md | |
11 | driver provides virtual devices that are created from one or more | |
e0d19036 | 12 | independent underlying devices. This array of devices often contains |
56eb10c0 | 13 | redundancy, and hence the acronym RAID which stands for a Redundant |
e0d19036 | 14 | Array of Independent Devices. |
56eb10c0 NB |
15 | .PP |
16 | .B md | |
2d465520 NB |
17 | supports RAID levels 1 (mirroring) 4 (striped array with parity |
18 | device) and 5 (striped array with distributed parity information). | |
19 | If a single underlying device fails while using one of these levels, | |
20 | the array will continue to function. | |
56eb10c0 NB |
21 | .PP |
22 | .B md | |
e0d19036 | 23 | also supports a number of pseudo RAID (non-redundant) configurations |
56eb10c0 NB |
24 | including RAID0 (striped array), LINEAR (catenated array) and |
25 | MULTIPATH (a set of different interfaces to the same device). | |
26 | ||
11a3e71d | 27 | .SS MD SUPER BLOCK |
56eb10c0 | 28 | With the exception of Legacy Arrays described below, each device that |
e0d19036 | 29 | is incorporated into an MD array has a |
56eb10c0 NB |
30 | .I super block |
31 | written towards the end of the device. This superblock records | |
32 | information about the structure and state of the array so that the | |
11a3e71d | 33 | array can be reliably re-assembled after a shutdown. |
56eb10c0 NB |
34 | |
35 | The superblock is 4K long and is written into a 64K aligned block that | |
11a3e71d | 36 | starts at least 64K and less than 128K from the end of the device |
56eb10c0 NB |
37 | (i.e. to get the address of the superblock round the size of the |
38 | device down to a multiple of 64K and then subtract 64K). | |
11a3e71d | 39 | The available size of each device is the amount of space before the |
56eb10c0 NB |
40 | super block, so between 64K and 128K is lost when a device in |
41 | incorporated into an MD array. | |
42 | ||
43 | The superblock contains, among other things: | |
44 | .TP | |
45 | LEVEL | |
11a3e71d NB |
46 | The manner in which the devices are arranged into the array |
47 | (linear, raid0, raid1, raid4, raid5, multipath). | |
56eb10c0 NB |
48 | .TP |
49 | UUID | |
50 | a 128 bit Universally Unique Identifier that identifies the array that | |
51 | this device is part of. | |
52 | ||
11a3e71d NB |
53 | .SS LEGACY ARRAYS |
54 | Early versions of the | |
55 | .B md | |
56 | driver only supported Linear and Raid0 configurations and so | |
2d465520 | 57 | did not use an MD superblock (as there is no state that needs to be |
11a3e71d NB |
58 | recorded). While it is strongly recommended that all newly created |
59 | arrays utilise a superblock to help ensure that they are assembled | |
60 | properly, the | |
61 | .B md | |
62 | driver still supports legacy linear and raid0 md arrays that | |
63 | do not have a superblock. | |
64 | ||
56eb10c0 | 65 | .SS LINEAR |
11a3e71d NB |
66 | |
67 | A linear array simply catenates the available space on each | |
68 | drive together to form one large virtual drive. | |
69 | ||
70 | One advantage of this arrangement over the more common RAID0 | |
71 | arrangement is that the array may be reconfigured at a later time with | |
72 | an extra drive and so the array is made bigger without disturbing the | |
73 | data that is on the array. However this cannot be done on a live | |
74 | array. | |
75 | ||
76 | ||
56eb10c0 | 77 | .SS RAID0 |
11a3e71d NB |
78 | |
79 | A RAID0 array (which has zero redundancy) is also known as a | |
80 | striped array. | |
e0d19036 NB |
81 | A RAID0 array is configured at creation with a |
82 | .B "Chunk Size" | |
c913b90e | 83 | which must be a power of two, and at least 4 kibibytes. |
e0d19036 | 84 | |
2d465520 | 85 | The RAID0 driver assigns the first chunk of the array to the first |
e0d19036 | 86 | device, the second chunk to the second device, and so on until all |
2d465520 | 87 | drives have been assigned one chunk. This collection of chunks forms |
e0d19036 NB |
88 | a |
89 | .BR stripe . | |
90 | Further chunks are gathered into stripes in the same way which are | |
91 | assigned to the remaining space in the drives. | |
92 | ||
2d465520 NB |
93 | If devices in the array are not all the same size, then once the |
94 | smallest device has been exhausted, the RAID0 driver starts | |
e0d19036 NB |
95 | collecting chunks into smaller stripes that only span the drives which |
96 | still have remaining space. | |
97 | ||
98 | ||
56eb10c0 | 99 | .SS RAID1 |
e0d19036 NB |
100 | |
101 | A RAID1 array is also known as a mirrored set (though mirrors tend to | |
5787fa49 | 102 | provide reflected images, which RAID1 does not) or a plex. |
e0d19036 NB |
103 | |
104 | Once initialised, each device in a RAID1 array contains exactly the | |
105 | same data. Changes are written to all devices in parallel. Data is | |
106 | read from any one device. The driver attempts to distribute read | |
107 | requests across all devices to maximise performance. | |
108 | ||
109 | All devices in a RAID1 array should be the same size. If they are | |
110 | not, then only the amount of space available on the smallest device is | |
111 | used. Any extra space on other devices is wasted. | |
112 | ||
56eb10c0 | 113 | .SS RAID4 |
e0d19036 NB |
114 | |
115 | A RAID4 array is like a RAID0 array with an extra device for storing | |
aa88f531 NB |
116 | parity. This device is the last of the active devices in the |
117 | array. Unlike RAID0, RAID4 also requires that all stripes span all | |
e0d19036 NB |
118 | drives, so extra space on devices that are larger than the smallest is |
119 | wasted. | |
120 | ||
121 | When any block in a RAID4 array is modified the parity block for that | |
122 | stripe (i.e. the block in the parity device at the same device offset | |
123 | as the stripe) is also modified so that the parity block always | |
124 | contains the "parity" for the whole stripe. i.e. its contents is | |
125 | equivalent to the result of performing an exclusive-or operation | |
126 | between all the data blocks in the stripe. | |
127 | ||
128 | This allows the array to continue to function if one device fails. | |
129 | The data that was on that device can be calculated as needed from the | |
130 | parity block and the other data blocks. | |
131 | ||
56eb10c0 | 132 | .SS RAID5 |
e0d19036 NB |
133 | |
134 | RAID5 is very similar to RAID4. The difference is that the parity | |
135 | blocks for each stripe, instead of being on a single device, are | |
136 | distributed across all devices. This allows more parallelism when | |
137 | writing as two different block updates will quite possibly affect | |
138 | parity blocks on different devices so there is less contention. | |
139 | ||
140 | This also allows more parallelism when reading as read requests are | |
141 | distributed over all the devices in the array instead of all but one. | |
142 | ||
11a3e71d | 143 | .SS MUTIPATH |
e0d19036 NB |
144 | |
145 | MULTIPATH is not really a RAID at all as there is only one real device | |
146 | in a MULTIPATH md array. However there are multiple access points | |
147 | (paths) to this device, and one of these paths might fail, so there | |
148 | are some similarities. | |
149 | ||
2d465520 NB |
150 | A MULTIPATH array is composed of a number of logical different |
151 | devices, often fibre channel interfaces, that all refer the the same | |
152 | real device. If one of these interfaces fails (e.g. due to cable | |
153 | problems), the multipath driver to attempt to redirect requests to | |
154 | another interface. | |
e0d19036 NB |
155 | |
156 | ||
157 | .SS UNCLEAN SHUTDOWN | |
158 | ||
2d465520 | 159 | When changes are made to a RAID1, RAID4, or RAID5 array there is a |
e0d19036 NB |
160 | possibility of inconsistency for short periods of time as each update |
161 | requires are least two block to be written to different devices, and | |
162 | these writes probably wont happen at exactly the same time. | |
2d465520 | 163 | Thus if a system with one of these arrays is shutdown in the middle of |
e0d19036 NB |
164 | a write operation (e.g. due to power failure), the array may not be |
165 | consistent. | |
166 | ||
2d465520 | 167 | To handle this situation, the md driver marks an array as "dirty" |
e0d19036 NB |
168 | before writing any data to it, and marks it as "clean" when the array |
169 | is being disabled, e.g. at shutdown. | |
170 | If the md driver finds an array to be dirty at startup, it proceeds to | |
171 | correct any possibly inconsistency. For RAID1, this involves copying | |
172 | the contents of the first drive onto all other drives. | |
173 | For RAID4 or RAID5 this involves recalculating the parity for each | |
174 | stripe and making sure that the parity block has the correct data. | |
5787fa49 NB |
175 | This process, known as "resynchronising" or "resync" is performed in |
176 | the background. The array can still be used, though possibly with | |
177 | reduced performance. | |
e0d19036 NB |
178 | |
179 | If a RAID4 or RAID5 array is degraded (missing one drive) when it is | |
180 | restarted after an unclean shutdown, it cannot recalculate parity, and | |
181 | so it is possible that data might be undetectably corrupted. | |
5787fa49 | 182 | The 2.4 md driver |
e0d19036 | 183 | .B does not |
5787fa49 | 184 | alert the operator to this condition. The 2.5 md driver will fail to |
e0d19036 NB |
185 | start an array in this condition without manual intervention. |
186 | ||
187 | .SS RECOVERY | |
188 | ||
189 | If the md driver detects any error on a device in a RAID1, RAID4, or | |
190 | RAID5 array, it immediately disables that device (marking it as faulty) | |
191 | and continues operation on the remaining devices. If there is a spare | |
192 | drive, the driver will start recreating on one of the spare drives the | |
193 | data what was on that failed drive, either by copying a working drive | |
194 | in a RAID1 configuration, or by doing calculations with the parity | |
195 | block on RAID4 and RAID5. | |
196 | ||
2d465520 | 197 | While this recovery process is happening, the md driver will monitor |
e0d19036 NB |
198 | accesses to the array and will slow down the rate of recovery if other |
199 | activity is happening, so that normal access to the array will not be | |
200 | unduly affected. When no other activity is happening, the recovery | |
201 | process proceeds at full speed. The actual speed targets for the two | |
202 | different situations can be controlled by the | |
203 | .B speed_limit_min | |
204 | and | |
205 | .B speed_limit_max | |
206 | control files mentioned below. | |
207 | ||
5787fa49 NB |
208 | .SS KERNEL PARAMETERS |
209 | ||
210 | The md driver recognised three different kernel parameters. | |
211 | .TP | |
212 | .B raid=noautodetect | |
213 | This will disable the normal detection of md arrays that happens at | |
214 | boot time. If a drive is partitioned with MS-DOS style partitions, | |
215 | then if any of the 4 main partitions has a partition type of 0xFD, | |
216 | then that partition will normally be inspected to see if it is part of | |
217 | an MD array, and if any full arrays are found, they are started. This | |
218 | kernel paramenter disables this behaviour. | |
219 | ||
220 | .TP | |
221 | .BI md= n , dev , dev ,... | |
222 | This tells the md driver to assemble | |
223 | .B /dev/md n | |
224 | from the listed devices. It is only necessary to start the device | |
225 | holding the root filesystem this way. Other arrays are best started | |
226 | once the system is booted. | |
227 | ||
228 | .TP | |
229 | .BI md= n , l , c , i , dev... | |
230 | This tells the md driver to assemble a legacy RAID0 or LINEAR array | |
231 | without a superblock. | |
232 | .I n | |
233 | gives the md device number, | |
234 | .I l | |
235 | gives the level, 0 for RAID0 or -1 for LINEAR, | |
236 | .I c | |
237 | gives the chunk size as a base-2 logarithm offset by twelve, so 0 | |
238 | means 4K, 1 means 8K. | |
239 | .I i | |
240 | is ignored (legacy support). | |
e0d19036 | 241 | |
56eb10c0 NB |
242 | .SH FILES |
243 | .TP | |
244 | .B /proc/mdstat | |
245 | Contains information about the status of currently running array. | |
246 | .TP | |
247 | .B /proc/sys/dev/raid/speed_limit_min | |
248 | A readable and writable file that reflects the current goal rebuild | |
249 | speed for times when non-rebuild activity is current on an array. | |
250 | The speed is in Kibibytes per second, and is a per-device rate, not a | |
251 | per-array rate (which means that an array with more disc will shuffle | |
252 | more data for a given speed). The default is 100. | |
253 | ||
254 | .TP | |
255 | .B /proc/sys/dev/raid/speed_limit_max | |
256 | A readable and writable file that reflects the current goal rebuild | |
257 | speed for times when no non-rebuild activity is current on an array. | |
258 | The default is 100,000. | |
259 | ||
260 | .SH SEE ALSO | |
261 | .BR mdadm (8), | |
262 | .BR mkraid (8). |