]> git.ipfire.org Git - thirdparty/mdadm.git/blame - md.4
Fix parsing of /dev/md/N in is_standard
[thirdparty/mdadm.git] / md.4
CommitLineData
56eb10c0
NB
1.TH MD 4
2.SH NAME
3md \- Multiple Device driver aka Linux Software Raid
4.SH SYNOPSIS
5.BI /dev/md n
6.br
7.BI /dev/md/ n
8.SH DESCRIPTION
9The
10.B md
11driver provides virtual devices that are created from one or more
e0d19036 12independent underlying devices. This array of devices often contains
56eb10c0 13redundancy, and hence the acronym RAID which stands for a Redundant
e0d19036 14Array of Independent Devices.
56eb10c0
NB
15.PP
16.B md
2d465520 17supports RAID levels 1 (mirroring) 4 (striped array with parity
98c6faba 18device), 5 (striped array with distributed parity information) and 6
570c0542 19(striped array with distributed dual redundancy information.) If
98c6faba
NB
20some number of underlying devices fails while using one of these
21levels, the array will continue to function; this number is one for
22RAID levels 4 and 5, two for RAID level 6, and all but one (N-1) for
23RAID level 1.
56eb10c0
NB
24.PP
25.B md
e0d19036 26also supports a number of pseudo RAID (non-redundant) configurations
570c0542
NB
27including RAID0 (striped array), LINEAR (catenated array),
28MULTIPATH (a set of different interfaces to the same device),
29and FAULTY (a layer over a single device into which errors can be injected).
56eb10c0 30
11a3e71d 31.SS MD SUPER BLOCK
570c0542
NB
32Each device in an array may have a
33.I superblock
34which records information about the structure and state of the array.
35This allows the array to be reliably re-assembled after a shutdown.
56eb10c0 36
570c0542
NB
37From Linux kernel version 2.6.10,
38.B md
39provides support for two different formats of this superblock, and
40other formats can be added. Prior to this release, only one format is
41supported.
42
43The common format - known as version 0.90 - has
44a superblock that is 4K long and is written into a 64K aligned block that
11a3e71d 45starts at least 64K and less than 128K from the end of the device
56eb10c0
NB
46(i.e. to get the address of the superblock round the size of the
47device down to a multiple of 64K and then subtract 64K).
11a3e71d 48The available size of each device is the amount of space before the
56eb10c0
NB
49super block, so between 64K and 128K is lost when a device in
50incorporated into an MD array.
570c0542
NB
51This superblock stores multi-byte fields in a processor-dependant
52manner, so arrays cannot easily be moved between computers with
53different processors.
54
55The new format - known as version 1 - has a superblock that is
56normally 1K long, but can be longer. It is normally stored between 8K
57and 12K from the end of the device, on a 4K boundary, though
58variations can be stored at the start of the device (version 1.1) or 4K from
59the start of the device (version 1.2).
60This superblock format stores multibyte data in a
61processor-independant format and has supports upto hundreds of
62component devices (version 0.90 only supports 28).
56eb10c0
NB
63
64The superblock contains, among other things:
65.TP
66LEVEL
11a3e71d
NB
67The manner in which the devices are arranged into the array
68(linear, raid0, raid1, raid4, raid5, multipath).
56eb10c0
NB
69.TP
70UUID
71a 128 bit Universally Unique Identifier that identifies the array that
72this device is part of.
73
570c0542
NB
74.SS ARRAYS WITHOUT SUPERBLOCKS
75While it is usually best to create arrays with superblocks so that
76they can be assembled reliably, there are some circumstances where an
77array without superblocks in preferred. This include:
78.TP
79LEGACY ARRAYS
11a3e71d
NB
80Early versions of the
81.B md
570c0542
NB
82driver only supported Linear and Raid0 configurations and did not use
83a superblock (which is less critical with these configurations).
84While such arrays should be rebuilt with superblocks if possible,
11a3e71d 85.B md
570c0542
NB
86continues to support them.
87.TP
88FAULTY
89Being a largely transparent layer over a different device, the FAULTY
90personality doesn't gain anything from having a superblock.
91.TP
92MULTIPATH
93It is often possible to detect devices which are different paths to
94the same storage directly rather than having a distinctive superblock
95written to the device and searched for on all paths. In this case,
96a MULTIPATH array with no superblock makes sense.
97.TP
98RAID1
99In some configurations it might be desired to create a raid1
100configuration that does use a superblock, and to maintain the state of
101the array elsewhere. While not encouraged, this is supported.
11a3e71d 102
56eb10c0 103.SS LINEAR
11a3e71d
NB
104
105A linear array simply catenates the available space on each
106drive together to form one large virtual drive.
107
108One advantage of this arrangement over the more common RAID0
109arrangement is that the array may be reconfigured at a later time with
110an extra drive and so the array is made bigger without disturbing the
111data that is on the array. However this cannot be done on a live
112array.
113
114
56eb10c0 115.SS RAID0
11a3e71d
NB
116
117A RAID0 array (which has zero redundancy) is also known as a
118striped array.
e0d19036
NB
119A RAID0 array is configured at creation with a
120.B "Chunk Size"
c913b90e 121which must be a power of two, and at least 4 kibibytes.
e0d19036 122
2d465520 123The RAID0 driver assigns the first chunk of the array to the first
e0d19036 124device, the second chunk to the second device, and so on until all
2d465520 125drives have been assigned one chunk. This collection of chunks forms
e0d19036
NB
126a
127.BR stripe .
128Further chunks are gathered into stripes in the same way which are
129assigned to the remaining space in the drives.
130
2d465520
NB
131If devices in the array are not all the same size, then once the
132smallest device has been exhausted, the RAID0 driver starts
e0d19036
NB
133collecting chunks into smaller stripes that only span the drives which
134still have remaining space.
135
136
56eb10c0 137.SS RAID1
e0d19036
NB
138
139A RAID1 array is also known as a mirrored set (though mirrors tend to
5787fa49 140provide reflected images, which RAID1 does not) or a plex.
e0d19036
NB
141
142Once initialised, each device in a RAID1 array contains exactly the
143same data. Changes are written to all devices in parallel. Data is
144read from any one device. The driver attempts to distribute read
145requests across all devices to maximise performance.
146
147All devices in a RAID1 array should be the same size. If they are
148not, then only the amount of space available on the smallest device is
149used. Any extra space on other devices is wasted.
150
56eb10c0 151.SS RAID4
e0d19036
NB
152
153A RAID4 array is like a RAID0 array with an extra device for storing
aa88f531
NB
154parity. This device is the last of the active devices in the
155array. Unlike RAID0, RAID4 also requires that all stripes span all
e0d19036
NB
156drives, so extra space on devices that are larger than the smallest is
157wasted.
158
159When any block in a RAID4 array is modified the parity block for that
160stripe (i.e. the block in the parity device at the same device offset
161as the stripe) is also modified so that the parity block always
162contains the "parity" for the whole stripe. i.e. its contents is
163equivalent to the result of performing an exclusive-or operation
164between all the data blocks in the stripe.
165
166This allows the array to continue to function if one device fails.
167The data that was on that device can be calculated as needed from the
168parity block and the other data blocks.
169
56eb10c0 170.SS RAID5
e0d19036
NB
171
172RAID5 is very similar to RAID4. The difference is that the parity
173blocks for each stripe, instead of being on a single device, are
174distributed across all devices. This allows more parallelism when
175writing as two different block updates will quite possibly affect
176parity blocks on different devices so there is less contention.
177
178This also allows more parallelism when reading as read requests are
179distributed over all the devices in the array instead of all but one.
180
98c6faba
NB
181.SS RAID6
182
183RAID6 is similar to RAID5, but can handle the loss of any \fItwo\fP
184devices without data loss. Accordingly, it requires N+2 drives to
185store N drives worth of data.
186
187The performance for RAID6 is slightly lower but comparable to RAID5 in
188normal mode and single disk failure mode. It is very slow in dual
189disk failure mode, however.
190
11a3e71d 191.SS MUTIPATH
e0d19036
NB
192
193MULTIPATH is not really a RAID at all as there is only one real device
194in a MULTIPATH md array. However there are multiple access points
195(paths) to this device, and one of these paths might fail, so there
196are some similarities.
197
2d465520
NB
198A MULTIPATH array is composed of a number of logical different
199devices, often fibre channel interfaces, that all refer the the same
200real device. If one of these interfaces fails (e.g. due to cable
201problems), the multipath driver to attempt to redirect requests to
202another interface.
e0d19036 203
b5e64645
NB
204.SS FAULTY
205The FAULTY md module is provided for testing purposes. A faulty array
206has exactly one component device and is normally assembled without a
207superblock, so the md array created provides direct access to all of
208the data in the component device.
209
210The FAULTY module may be requested to simulate faults to allow testing
211of other md levels or of filesystem. Faults can be chosen to trigger
212on read requests or write requests, and can be transient (a subsequent
213read/write at the address will probably succeed) or persistant
214(subsequent read/write of the same address will fail). Further, read
215faults can be "fixable" meaning that they persist until a write
216request at the same address.
217
218Fault types can be requested with a period. In this case the fault
219will recur repeatedly after the given number of request of the
220relevant time. For example if persistent read faults have a period of
221100, then ever 100th read request would generate a fault, and the
222faulty sector would be recorded so that subsequent reads on that
223sector would also fail.
224
225There is a limit to the number of faulty sectors that are remembered.
226Faults generated after this limit is exhausted are treated as
227transient.
228
229It list of faulty sectors can be flushed, and the active list of
230failure modes can be cleared.
e0d19036
NB
231
232.SS UNCLEAN SHUTDOWN
233
98c6faba 234When changes are made to a RAID1, RAID4, RAID5 or RAID6 array there is a
e0d19036
NB
235possibility of inconsistency for short periods of time as each update
236requires are least two block to be written to different devices, and
237these writes probably wont happen at exactly the same time.
2d465520 238Thus if a system with one of these arrays is shutdown in the middle of
e0d19036
NB
239a write operation (e.g. due to power failure), the array may not be
240consistent.
241
2d465520 242To handle this situation, the md driver marks an array as "dirty"
e0d19036 243before writing any data to it, and marks it as "clean" when the array
98c6faba
NB
244is being disabled, e.g. at shutdown. If the md driver finds an array
245to be dirty at startup, it proceeds to correct any possibly
246inconsistency. For RAID1, this involves copying the contents of the
247first drive onto all other drives. For RAID4, RAID5 and RAID6 this
248involves recalculating the parity for each stripe and making sure that
249the parity block has the correct data. This process, known as
250"resynchronising" or "resync" is performed in the background. The
251array can still be used, though possibly with reduced performance.
252
253If a RAID4, RAID5 or RAID6 array is degraded (missing at least one
254drive) when it is restarted after an unclean shutdown, it cannot
255recalculate parity, and so it is possible that data might be
256undetectably corrupted. The 2.4 md driver
e0d19036 257.B does not
5787fa49 258alert the operator to this condition. The 2.5 md driver will fail to
e0d19036
NB
259start an array in this condition without manual intervention.
260
261.SS RECOVERY
262
98c6faba
NB
263If the md driver detects any error on a device in a RAID1, RAID4,
264RAID5 or RAID6 array, it immediately disables that device (marking it
265as faulty) and continues operation on the remaining devices. If there
266is a spare drive, the driver will start recreating on one of the spare
267drives the data what was on that failed drive, either by copying a
268working drive in a RAID1 configuration, or by doing calculations with
269the parity block on RAID4, RAID5 or RAID6.
e0d19036 270
2d465520 271While this recovery process is happening, the md driver will monitor
e0d19036
NB
272accesses to the array and will slow down the rate of recovery if other
273activity is happening, so that normal access to the array will not be
274unduly affected. When no other activity is happening, the recovery
275process proceeds at full speed. The actual speed targets for the two
276different situations can be controlled by the
277.B speed_limit_min
278and
279.B speed_limit_max
280control files mentioned below.
281
5787fa49
NB
282.SS KERNEL PARAMETERS
283
284The md driver recognised three different kernel parameters.
285.TP
286.B raid=noautodetect
287This will disable the normal detection of md arrays that happens at
288boot time. If a drive is partitioned with MS-DOS style partitions,
289then if any of the 4 main partitions has a partition type of 0xFD,
290then that partition will normally be inspected to see if it is part of
291an MD array, and if any full arrays are found, they are started. This
292kernel paramenter disables this behaviour.
293
294.TP
295.BI md= n , dev , dev ,...
296This tells the md driver to assemble
297.B /dev/md n
298from the listed devices. It is only necessary to start the device
299holding the root filesystem this way. Other arrays are best started
300once the system is booted.
301
302.TP
303.BI md= n , l , c , i , dev...
304This tells the md driver to assemble a legacy RAID0 or LINEAR array
305without a superblock.
306.I n
307gives the md device number,
308.I l
309gives the level, 0 for RAID0 or -1 for LINEAR,
310.I c
311gives the chunk size as a base-2 logarithm offset by twelve, so 0
312means 4K, 1 means 8K.
313.I i
314is ignored (legacy support).
e0d19036 315
56eb10c0
NB
316.SH FILES
317.TP
318.B /proc/mdstat
319Contains information about the status of currently running array.
320.TP
321.B /proc/sys/dev/raid/speed_limit_min
322A readable and writable file that reflects the current goal rebuild
323speed for times when non-rebuild activity is current on an array.
324The speed is in Kibibytes per second, and is a per-device rate, not a
325per-array rate (which means that an array with more disc will shuffle
326more data for a given speed). The default is 100.
327
328.TP
329.B /proc/sys/dev/raid/speed_limit_max
330A readable and writable file that reflects the current goal rebuild
331speed for times when no non-rebuild activity is current on an array.
332The default is 100,000.
333
334.SH SEE ALSO
335.BR mdadm (8),
336.BR mkraid (8).