X-Git-Url: http://git.ipfire.org/?a=blobdiff_plain;f=md.4;h=5e796393ed621f080628ab1f977968e3353cfe18;hb=1ae42d9d99713ec40a56b9ba0289e362ace75f87;hp=87f5a35df2c3f3d9e3f22f3f5609b5d5abc529b3;hpb=56eb10c0b6e8f21540af444c8a28aa9e8f138ce6;p=thirdparty%2Fmdadm.git diff --git a/md.4 b/md.4 index 87f5a35d..5e796393 100644 --- a/md.4 +++ b/md.4 @@ -1,77 +1,802 @@ +.\" Copyright Neil Brown and others. +.\" This program is free software; you can redistribute it and/or modify +.\" it under the terms of the GNU General Public License as published by +.\" the Free Software Foundation; either version 2 of the License, or +.\" (at your option) any later version. +.\" See file COPYING in distribution for details. .TH MD 4 .SH NAME -md \- Multiple Device driver aka Linux Software Raid +md \- Multiple Device driver aka Linux Software RAID .SH SYNOPSIS .BI /dev/md n .br .BI /dev/md/ n +.br +.BR /dev/md/ name .SH DESCRIPTION The .B md driver provides virtual devices that are created from one or more -independant underlying devices. This array of devices often contains -redundancy, and hence the acronym RAID which stands for a Redundant -Array of Independant Devices. +independent underlying devices. This array of devices often contains +redundancy and the devices are often disk drives, hence the acronym RAID +which stands for a Redundant Array of Independent Disks. .PP .B md -support RAID levels 1 (mirroring) 4 (striped array with parity device) and 5 -(striped array with distributed parity information. If a single underlying -device fails while using one of these level, they array will continue -to function. +supports RAID levels +1 (mirroring), +4 (striped array with parity device), +5 (striped array with distributed parity information), +6 (striped array with distributed dual redundancy information), and +10 (striped and mirrored). +If some number of underlying devices fails while using one of these +levels, the array will continue to function; this number is one for +RAID levels 4 and 5, two for RAID level 6, and all but one (N-1) for +RAID level 1, and dependent on configuration for level 10. .PP .B md -also supports a number of pseudo RAID (non-redundant) configuations -including RAID0 (striped array), LINEAR (catenated array) and -MULTIPATH (a set of different interfaces to the same device). - -.SS RAID SUPER BLOCK -With the exception of Legacy Arrays described below, each device that -is incorportated into an MD array has a -.I super block -written towards the end of the device. This superblock records -information about the structure and state of the array so that the -array an be reliably re-assembled after a shutdown. - -The superblock is 4K long and is written into a 64K aligned block that -start at least 64K and less than 128K from the end of the device +also supports a number of pseudo RAID (non-redundant) configurations +including RAID0 (striped array), LINEAR (catenated array), +MULTIPATH (a set of different interfaces to the same device), +and FAULTY (a layer over a single device into which errors can be injected). + +.SS MD METADATA +Each device in an array may have some +.I metadata +stored in the device. This metadata is sometimes called a +.BR superblock . +The metadata records information about the structure and state of the array. +This allows the array to be reliably re-assembled after a shutdown. + +From Linux kernel version 2.6.10, +.B md +provides support for two different formats of metadata, and +other formats can be added. Prior to this release, only one format is +supported. + +The common format \(em known as version 0.90 \(em has +a superblock that is 4K long and is written into a 64K aligned block that +starts at least 64K and less than 128K from the end of the device (i.e. to get the address of the superblock round the size of the device down to a multiple of 64K and then subtract 64K). -The available size of each device is the ammount of space before the +The available size of each device is the amount of space before the super block, so between 64K and 128K is lost when a device in incorporated into an MD array. +This superblock stores multi-byte fields in a processor-dependent +manner, so arrays cannot easily be moved between computers with +different processors. -The superblock contains, among other things: +The new format \(em known as version 1 \(em has a superblock that is +normally 1K long, but can be longer. It is normally stored between 8K +and 12K from the end of the device, on a 4K boundary, though +variations can be stored at the start of the device (version 1.1) or 4K from +the start of the device (version 1.2). +This metadata format stores multibyte data in a +processor-independent format and supports up to hundreds of +component devices (version 0.90 only supports 28). + +The metadata contains, among other things: .TP LEVEL -The +The manner in which the devices are arranged into the array +(linear, raid0, raid1, raid4, raid5, raid10, multipath). .TP UUID a 128 bit Universally Unique Identifier that identifies the array that -this device is part of. +contains this device. + +.PP +When a version 0.90 array is being reshaped (e.g. adding extra devices +to a RAID5), the version number is temporarily set to 0.91. This +ensures that if the reshape process is stopped in the middle (e.g. by +a system crash) and the machine boots into an older kernel that does +not support reshaping, then the array will not be assembled (which +would cause data corruption) but will be left untouched until a kernel +that can complete the reshape processes is used. + +.SS ARRAYS WITHOUT METADATA +While it is usually best to create arrays with superblocks so that +they can be assembled reliably, there are some circumstances when an +array without superblocks is preferred. These include: +.TP +LEGACY ARRAYS +Early versions of the +.B md +driver only supported Linear and Raid0 configurations and did not use +a superblock (which is less critical with these configurations). +While such arrays should be rebuilt with superblocks if possible, +.B md +continues to support them. +.TP +FAULTY +Being a largely transparent layer over a different device, the FAULTY +personality doesn't gain anything from having a superblock. +.TP +MULTIPATH +It is often possible to detect devices which are different paths to +the same storage directly rather than having a distinctive superblock +written to the device and searched for on all paths. In this case, +a MULTIPATH array with no superblock makes sense. +.TP +RAID1 +In some configurations it might be desired to create a raid1 +configuration that does not use a superblock, and to maintain the state of +the array elsewhere. While not encouraged for general us, it does +have special-purpose uses and is supported. + +.SS ARRAYS WITH EXTERNAL METADATA + +From release 2.6.28, the +.I md +driver supports arrays with externally managed metadata. That is, +the metadata is not managed by the kernel by rather by a user-space +program which is external to the kernel. This allows support for a +variety of metadata formats without cluttering the kernel with lots of +details. +.PP +.I md +is able to communicate with the user-space program through various +sysfs attributes so that it can make appropriate changes to the +metadata \- for example to make a device as faulty. When necessary, +.I md +will wait for the program to acknowledge the event by writing to a +sysfs attribute. +The manual page for +.IR mdmon (8) +contains more detail about this interaction. + +.SS CONTAINERS +Many metadata formats use a single block of metadata to describe a +number of different arrays which all use the same set of devices. +In this case it is helpful for the kernel to know about the full set +of devices as a whole. This set is known to md as a +.IR container . +A container is an +.I md +array with externally managed metadata and with device offset and size +so that it just covers the metadata part of the devices. The +remainder of each device is available to be incorporated into various +arrays. .SS LINEAR + +A linear array simply catenates the available space on each +drive to form one large virtual drive. + +One advantage of this arrangement over the more common RAID0 +arrangement is that the array may be reconfigured at a later time with +an extra drive, so the array is made bigger without disturbing the +data that is on the array. This can even be done on a live +array. + +If a chunksize is given with a LINEAR array, the usable space on each +device is rounded down to a multiple of this chunksize. + .SS RAID0 + +A RAID0 array (which has zero redundancy) is also known as a +striped array. +A RAID0 array is configured at creation with a +.B "Chunk Size" +which must be a power of two (prior to Linux 2.6.31), and at least 4 +kibibytes. + +The RAID0 driver assigns the first chunk of the array to the first +device, the second chunk to the second device, and so on until all +drives have been assigned one chunk. This collection of chunks forms a +.BR stripe . +Further chunks are gathered into stripes in the same way, and are +assigned to the remaining space in the drives. + +If devices in the array are not all the same size, then once the +smallest device has been exhausted, the RAID0 driver starts +collecting chunks into smaller stripes that only span the drives which +still have remaining space. + + .SS RAID1 + +A RAID1 array is also known as a mirrored set (though mirrors tend to +provide reflected images, which RAID1 does not) or a plex. + +Once initialised, each device in a RAID1 array contains exactly the +same data. Changes are written to all devices in parallel. Data is +read from any one device. The driver attempts to distribute read +requests across all devices to maximise performance. + +All devices in a RAID1 array should be the same size. If they are +not, then only the amount of space available on the smallest device is +used (any extra space on other devices is wasted). + +Note that the read balancing done by the driver does not make the RAID1 +performance profile be the same as for RAID0; a single stream of +sequential input will not be accelerated (e.g. a single dd), but +multiple sequential streams or a random workload will use more than one +spindle. In theory, having an N-disk RAID1 will allow N sequential +threads to read from all disks. + +Individual devices in a RAID1 can be marked as "write-mostly". +This drives are excluded from the normal read balancing and will only +be read from when there is no other option. This can be useful for +devices connected over a slow link. + .SS RAID4 + +A RAID4 array is like a RAID0 array with an extra device for storing +parity. This device is the last of the active devices in the +array. Unlike RAID0, RAID4 also requires that all stripes span all +drives, so extra space on devices that are larger than the smallest is +wasted. + +When any block in a RAID4 array is modified, the parity block for that +stripe (i.e. the block in the parity device at the same device offset +as the stripe) is also modified so that the parity block always +contains the "parity" for the whole stripe. I.e. its content is +equivalent to the result of performing an exclusive-or operation +between all the data blocks in the stripe. + +This allows the array to continue to function if one device fails. +The data that was on that device can be calculated as needed from the +parity block and the other data blocks. + .SS RAID5 -.SS REBUILD/RESYNC + +RAID5 is very similar to RAID4. The difference is that the parity +blocks for each stripe, instead of being on a single device, are +distributed across all devices. This allows more parallelism when +writing, as two different block updates will quite possibly affect +parity blocks on different devices so there is less contention. + +This also allows more parallelism when reading, as read requests are +distributed over all the devices in the array instead of all but one. + +.SS RAID6 + +RAID6 is similar to RAID5, but can handle the loss of any \fItwo\fP +devices without data loss. Accordingly, it requires N+2 drives to +store N drives worth of data. + +The performance for RAID6 is slightly lower but comparable to RAID5 in +normal mode and single disk failure mode. It is very slow in dual +disk failure mode, however. + +.SS RAID10 + +RAID10 provides a combination of RAID1 and RAID0, and is sometimes known +as RAID1+0. Every datablock is duplicated some number of times, and +the resulting collection of datablocks are distributed over multiple +drives. + +When configuring a RAID10 array, it is necessary to specify the number +of replicas of each data block that are required (this will normally +be 2) and whether the replicas should be 'near', 'offset' or 'far'. +(Note that the 'offset' layout is only available from 2.6.18). + +When 'near' replicas are chosen, the multiple copies of a given chunk +are laid out consecutively across the stripes of the array, so the two +copies of a datablock will likely be at the same offset on two +adjacent devices. + +When 'far' replicas are chosen, the multiple copies of a given chunk +are laid out quite distant from each other. The first copy of all +data blocks will be striped across the early part of all drives in +RAID0 fashion, and then the next copy of all blocks will be striped +across a later section of all drives, always ensuring that all copies +of any given block are on different drives. + +The 'far' arrangement can give sequential read performance equal to +that of a RAID0 array, but at the cost of reduced write performance. + +When 'offset' replicas are chosen, the multiple copies of a given +chunk are laid out on consecutive drives and at consecutive offsets. +Effectively each stripe is duplicated and the copies are offset by one +device. This should give similar read characteristics to 'far' if a +suitably large chunk size is used, but without as much seeking for +writes. + +It should be noted that the number of devices in a RAID10 array need +not be a multiple of the number of replica of each data block; however, +there must be at least as many devices as replicas. + +If, for example, an array is created with 5 devices and 2 replicas, +then space equivalent to 2.5 of the devices will be available, and +every block will be stored on two different devices. + +Finally, it is possible to have an array with both 'near' and 'far' +copies. If an array is configured with 2 near copies and 2 far +copies, then there will be a total of 4 copies of each block, each on +a different drive. This is an artifact of the implementation and is +unlikely to be of real value. + +.SS MULTIPATH + +MULTIPATH is not really a RAID at all as there is only one real device +in a MULTIPATH md array. However there are multiple access points +(paths) to this device, and one of these paths might fail, so there +are some similarities. + +A MULTIPATH array is composed of a number of logically different +devices, often fibre channel interfaces, that all refer the the same +real device. If one of these interfaces fails (e.g. due to cable +problems), the multipath driver will attempt to redirect requests to +another interface. + +The MULTIPATH drive is not receiving any ongoing development and +should be considered a legacy driver. The device-mapper based +multipath drivers should be preferred for new installations. + +.SS FAULTY +The FAULTY md module is provided for testing purposes. A faulty array +has exactly one component device and is normally assembled without a +superblock, so the md array created provides direct access to all of +the data in the component device. + +The FAULTY module may be requested to simulate faults to allow testing +of other md levels or of filesystems. Faults can be chosen to trigger +on read requests or write requests, and can be transient (a subsequent +read/write at the address will probably succeed) or persistent +(subsequent read/write of the same address will fail). Further, read +faults can be "fixable" meaning that they persist until a write +request at the same address. + +Fault types can be requested with a period. In this case, the fault +will recur repeatedly after the given number of requests of the +relevant type. For example if persistent read faults have a period of +100, then every 100th read request would generate a fault, and the +faulty sector would be recorded so that subsequent reads on that +sector would also fail. + +There is a limit to the number of faulty sectors that are remembered. +Faults generated after this limit is exhausted are treated as +transient. + +The list of faulty sectors can be flushed, and the active list of +failure modes can be cleared. + +.SS UNCLEAN SHUTDOWN + +When changes are made to a RAID1, RAID4, RAID5, RAID6, or RAID10 array +there is a possibility of inconsistency for short periods of time as +each update requires at least two block to be written to different +devices, and these writes probably won't happen at exactly the same +time. Thus if a system with one of these arrays is shutdown in the +middle of a write operation (e.g. due to power failure), the array may +not be consistent. + +To handle this situation, the md driver marks an array as "dirty" +before writing any data to it, and marks it as "clean" when the array +is being disabled, e.g. at shutdown. If the md driver finds an array +to be dirty at startup, it proceeds to correct any possibly +inconsistency. For RAID1, this involves copying the contents of the +first drive onto all other drives. For RAID4, RAID5 and RAID6 this +involves recalculating the parity for each stripe and making sure that +the parity block has the correct data. For RAID10 it involves copying +one of the replicas of each block onto all the others. This process, +known as "resynchronising" or "resync" is performed in the background. +The array can still be used, though possibly with reduced performance. + +If a RAID4, RAID5 or RAID6 array is degraded (missing at least one +drive, two for RAID6) when it is restarted after an unclean shutdown, it cannot +recalculate parity, and so it is possible that data might be +undetectably corrupted. The 2.4 md driver +.B does not +alert the operator to this condition. The 2.6 md driver will fail to +start an array in this condition without manual intervention, though +this behaviour can be overridden by a kernel parameter. + +.SS RECOVERY + +If the md driver detects a write error on a device in a RAID1, RAID4, +RAID5, RAID6, or RAID10 array, it immediately disables that device +(marking it as faulty) and continues operation on the remaining +devices. If there are spare drives, the driver will start recreating +on one of the spare drives the data which was on that failed drive, +either by copying a working drive in a RAID1 configuration, or by +doing calculations with the parity block on RAID4, RAID5 or RAID6, or +by finding and copying originals for RAID10. + +In kernels prior to about 2.6.15, a read error would cause the same +effect as a write error. In later kernels, a read-error will instead +cause md to attempt a recovery by overwriting the bad block. i.e. it +will find the correct data from elsewhere, write it over the block +that failed, and then try to read it back again. If either the write +or the re-read fail, md will treat the error the same way that a write +error is treated, and will fail the whole device. + +While this recovery process is happening, the md driver will monitor +accesses to the array and will slow down the rate of recovery if other +activity is happening, so that normal access to the array will not be +unduly affected. When no other activity is happening, the recovery +process proceeds at full speed. The actual speed targets for the two +different situations can be controlled by the +.B speed_limit_min +and +.B speed_limit_max +control files mentioned below. + +.SS SCRUBBING AND MISMATCHES + +As storage devices can develop bad blocks at any time it is valuable +to regularly read all blocks on all devices in an array so as to catch +such bad blocks early. This process is called +.IR scrubbing . + +md arrays can be scrubbed by writing either +.I check +or +.I repair +to the file +.I md/sync_action +in the +.I sysfs +directory for the device. + +Requesting a scrub will cause +.I md +to read every block on every device in the array, and check that the +data is consistent. For RAID1 and RAID10, this means checking that the copies +are identical. For RAID4, RAID5, RAID6 this means checking that the +parity block is (or blocks are) correct. + +If a read error is detected during this process, the normal read-error +handling causes correct data to be found from other devices and to be +written back to the faulty device. In many case this will +effectively +.I fix +the bad block. + +If all blocks read successfully but are found to not be consistent, +then this is regarded as a +.IR mismatch . + +If +.I check +was used, then no action is taken to handle the mismatch, it is simply +recorded. +If +.I repair +was used, then a mismatch will be repaired in the same way that +.I resync +repairs arrays. For RAID5/RAID6 new parity blocks are written. For RAID1/RAID10, +all but one block are overwritten with the content of that one block. + +A count of mismatches is recorded in the +.I sysfs +file +.IR md/mismatch_cnt . +This is set to zero when a +scrub starts and is incremented whenever a sector is +found that is a mismatch. +.I md +normally works in units much larger than a single sector and when it +finds a mismatch, it does not determin exactly how many actual sectors were +affected but simply adds the number of sectors in the IO unit that was +used. So a value of 128 could simply mean that a single 64KB check +found an error (128 x 512bytes = 64KB). + +If an array is created by +.I mdadm +with +.I \-\-assume\-clean +then a subsequent check could be expected to find some mismatches. + +On a truly clean RAID5 or RAID6 array, any mismatches should indicate +a hardware problem at some level - software issues should never cause +such a mismatch. + +However on RAID1 and RAID10 it is possible for software issues to +cause a mismatch to be reported. This does not necessarily mean that +the data on the array is corrupted. It could simply be that the +system does not care what is stored on that part of the array - it is +unused space. + +The most likely cause for an unexpected mismatch on RAID1 or RAID10 +occurs if a swap partition or swap file is stored on the array. + +When the swap subsystem wants to write a page of memory out, it flags +the page as 'clean' in the memory manager and requests the swap device +to write it out. It is quite possible that the memory will be +changed while the write-out is happening. In that case the 'clean' +flag will be found to be clear when the write completes and so the +swap subsystem will simply forget that the swapout had been attempted, +and will possibly choose a different page to write out. + +If the swap device was on RAID1 (or RAID10), then the data is sent +from memory to a device twice (or more depending on the number of +devices in the array). Thus it is possible that the memory gets changed +between the times it is sent, so different data can be written to +the different devices in the array. This will be detected by +.I check +as a mismatch. However it does not reflect any corruption as the +block where this mismatch occurs is being treated by the swap system as +being empty, and the data will never be read from that block. + +It is conceivable for a similar situation to occur on non-swap files, +though it is less likely. + +Thus the +.I mismatch_cnt +value can not be interpreted very reliably on RAID1 or RAID10, +especially when the device is used for swap. + + +.SS BITMAP WRITE-INTENT LOGGING + +From Linux 2.6.13, +.I md +supports a bitmap based write-intent log. If configured, the bitmap +is used to record which blocks of the array may be out of sync. +Before any write request is honoured, md will make sure that the +corresponding bit in the log is set. After a period of time with no +writes to an area of the array, the corresponding bit will be cleared. + +This bitmap is used for two optimisations. + +Firstly, after an unclean shutdown, the resync process will consult +the bitmap and only resync those blocks that correspond to bits in the +bitmap that are set. This can dramatically reduce resync time. + +Secondly, when a drive fails and is removed from the array, md stops +clearing bits in the intent log. If that same drive is re-added to +the array, md will notice and will only recover the sections of the +drive that are covered by bits in the intent log that are set. This +can allow a device to be temporarily removed and reinserted without +causing an enormous recovery cost. + +The intent log can be stored in a file on a separate device, or it can +be stored near the superblocks of an array which has superblocks. + +It is possible to add an intent log to an active array, or remove an +intent log if one is present. + +In 2.6.13, intent bitmaps are only supported with RAID1. Other levels +with redundancy are supported from 2.6.15. + +.SS WRITE-BEHIND + +From Linux 2.6.14, +.I md +supports WRITE-BEHIND on RAID1 arrays. + +This allows certain devices in the array to be flagged as +.IR write-mostly . +MD will only read from such devices if there is no +other option. + +If a write-intent bitmap is also provided, write requests to +write-mostly devices will be treated as write-behind requests and md +will not wait for writes to those requests to complete before +reporting the write as complete to the filesystem. + +This allows for a RAID1 with WRITE-BEHIND to be used to mirror data +over a slow link to a remote computer (providing the link isn't too +slow). The extra latency of the remote link will not slow down normal +operations, but the remote system will still have a reasonably +up-to-date copy of all data. + +.SS RESTRIPING + +.IR Restriping , +also known as +.IR Reshaping , +is the processes of re-arranging the data stored in each stripe into a +new layout. This might involve changing the number of devices in the +array (so the stripes are wider), changing the chunk size (so stripes +are deeper or shallower), or changing the arrangement of data and +parity (possibly changing the raid level, e.g. 1 to 5 or 5 to 6). + +As of Linux 2.6.35, md can reshape a RAID4, RAID5, or RAID6 array to +have a different number of devices (more or fewer) and to have a +different layout or chunk size. It can also convert between these +different RAID levels. It can also convert between RAID0 and RAID10, +and between RAID0 and RAID4 or RAID5. +Other possibilities may follow in future kernels. + +During any stripe process there is a 'critical section' during which +live data is being overwritten on disk. For the operation of +increasing the number of drives in a raid5, this critical section +covers the first few stripes (the number being the product of the old +and new number of devices). After this critical section is passed, +data is only written to areas of the array which no longer hold live +data \(em the live data has already been located away. + +For a reshape which reduces the number of devices, the 'critical +section' is at the end of the reshape process. + +md is not able to ensure data preservation if there is a crash +(e.g. power failure) during the critical section. If md is asked to +start an array which failed during a critical section of restriping, +it will fail to start the array. + +To deal with this possibility, a user-space program must +.IP \(bu 4 +Disable writes to that section of the array (using the +.B sysfs +interface), +.IP \(bu 4 +take a copy of the data somewhere (i.e. make a backup), +.IP \(bu 4 +allow the process to continue and invalidate the backup and restore +write access once the critical section is passed, and +.IP \(bu 4 +provide for restoring the critical data before restarting the array +after a system crash. +.PP + +.B mdadm +versions from 2.4 do this for growing a RAID5 array. + +For operations that do not change the size of the array, like simply +increasing chunk size, or converting RAID5 to RAID6 with one extra +device, the entire process is the critical section. In this case, the +restripe will need to progress in stages, as a section is suspended, +backed up, restriped, and released. + +.SS SYSFS INTERFACE +Each block device appears as a directory in +.I sysfs +(which is usually mounted at +.BR /sys ). +For MD devices, this directory will contain a subdirectory called +.B md +which contains various files for providing access to information about +the array. + +This interface is documented more fully in the file +.B Documentation/md.txt +which is distributed with the kernel sources. That file should be +consulted for full documentation. The following are just a selection +of attribute files that are available. + +.TP +.B md/sync_speed_min +This value, if set, overrides the system-wide setting in +.B /proc/sys/dev/raid/speed_limit_min +for this array only. +Writing the value +.B "system" +to this file will cause the system-wide setting to have effect. + +.TP +.B md/sync_speed_max +This is the partner of +.B md/sync_speed_min +and overrides +.B /proc/sys/dev/raid/spool_limit_max +described below. + +.TP +.B md/sync_action +This can be used to monitor and control the resync/recovery process of +MD. +In particular, writing "check" here will cause the array to read all +data block and check that they are consistent (e.g. parity is correct, +or all mirror replicas are the same). Any discrepancies found are +.B NOT +corrected. + +A count of problems found will be stored in +.BR md/mismatch_count . + +Alternately, "repair" can be written which will cause the same check +to be performed, but any errors will be corrected. + +Finally, "idle" can be written to stop the check/repair process. + +.TP +.B md/stripe_cache_size +This is only available on RAID5 and RAID6. It records the size (in +pages per device) of the stripe cache which is used for synchronising +all write operations to the array and all read operations if the array +is degraded. The default is 256. Valid values are 17 to 32768. +Increasing this number can increase performance in some situations, at +some cost in system memory. Note, setting this value too high can +result in an "out of memory" condition for the system. + +memory_consumed = system_page_size * nr_disks * stripe_cache_size + +.TP +.B md/preread_bypass_threshold +This is only available on RAID5 and RAID6. This variable sets the +number of times MD will service a full-stripe-write before servicing a +stripe that requires some "prereading". For fairness this defaults to +1. Valid values are 0 to stripe_cache_size. Setting this to 0 +maximizes sequential-write throughput at the cost of fairness to threads +doing small or random writes. + +.SS KERNEL PARAMETERS + +The md driver recognised several different kernel parameters. +.TP +.B raid=noautodetect +This will disable the normal detection of md arrays that happens at +boot time. If a drive is partitioned with MS-DOS style partitions, +then if any of the 4 main partitions has a partition type of 0xFD, +then that partition will normally be inspected to see if it is part of +an MD array, and if any full arrays are found, they are started. This +kernel parameter disables this behaviour. + +.TP +.B raid=partitionable +.TP +.B raid=part +These are available in 2.6 and later kernels only. They indicate that +autodetected MD arrays should be created as partitionable arrays, with +a different major device number to the original non-partitionable md +arrays. The device number is listed as +.I mdp +in +.IR /proc/devices . + +.TP +.B md_mod.start_ro=1 +.TP +.B /sys/module/md_mod/parameters/start_ro +This tells md to start all arrays in read-only mode. This is a soft +read-only that will automatically switch to read-write on the first +write request. However until that write request, nothing is written +to any device by md, and in particular, no resync or recovery +operation is started. + +.TP +.B md_mod.start_dirty_degraded=1 +.TP +.B /sys/module/md_mod/parameters/start_dirty_degraded +As mentioned above, md will not normally start a RAID4, RAID5, or +RAID6 that is both dirty and degraded as this situation can imply +hidden data loss. This can be awkward if the root filesystem is +affected. Using this module parameter allows such arrays to be started +at boot time. It should be understood that there is a real (though +small) risk of data corruption in this situation. + +.TP +.BI md= n , dev , dev ,... +.TP +.BI md=d n , dev , dev ,... +This tells the md driver to assemble +.B /dev/md n +from the listed devices. It is only necessary to start the device +holding the root filesystem this way. Other arrays are best started +once the system is booted. + +In 2.6 kernels, the +.B d +immediately after the +.B = +indicates that a partitionable device (e.g. +.BR /dev/md/d0 ) +should be created rather than the original non-partitionable device. + +.TP +.BI md= n , l , c , i , dev... +This tells the md driver to assemble a legacy RAID0 or LINEAR array +without a superblock. +.I n +gives the md device number, +.I l +gives the level, 0 for RAID0 or -1 for LINEAR, +.I c +gives the chunk size as a base-2 logarithm offset by twelve, so 0 +means 4K, 1 means 8K. +.I i +is ignored (legacy support). + .SH FILES .TP .B /proc/mdstat Contains information about the status of currently running array. .TP .B /proc/sys/dev/raid/speed_limit_min -A readable and writable file that reflects the current goal rebuild +A readable and writable file that reflects the current "goal" rebuild speed for times when non-rebuild activity is current on an array. The speed is in Kibibytes per second, and is a per-device rate, not a -per-array rate (which means that an array with more disc will shuffle -more data for a given speed). The default is 100. +per-array rate (which means that an array with more disks will shuffle +more data for a given speed). The default is 1000. .TP .B /proc/sys/dev/raid/speed_limit_max -A readable and writable file that reflects the current goal rebuild +A readable and writable file that reflects the current "goal" rebuild speed for times when no non-rebuild activity is current on an array. -The default is 100,000. +The default is 200,000. .SH SEE ALSO .BR mdadm (8),