+.SS SCRUBBING AND MISMATCHES
+
+As storage devices can develop bad blocks at any time it is valuable
+to regularly read all blocks on all devices in an array so as to catch
+such bad blocks early. This process is called
+.IR scrubbing .
+
+md arrays can be scrubbed by writing either
+.I check
+or
+.I repair
+to the file
+.I md/sync_action
+in the
+.I sysfs
+directory for the device.
+
+Requesting a scrub will cause
+.I md
+to read every block on every device in the array, and check that the
+data is consistent. For RAID1 and RAID10, this means checking that the copies
+are identical. For RAID4, RAID5, RAID6 this means checking that the
+parity block is (or blocks are) correct.
+
+If a read error is detected during this process, the normal read-error
+handling causes correct data to be found from other devices and to be
+written back to the faulty device. In many case this will
+effectively
+.I fix
+the bad block.
+
+If all blocks read successfully but are found to not be consistent,
+then this is regarded as a
+.IR mismatch .
+
+If
+.I check
+was used, then no action is taken to handle the mismatch, it is simply
+recorded.
+If
+.I repair
+was used, then a mismatch will be repaired in the same way that
+.I resync
+repairs arrays. For RAID5/RAID6 new parity blocks are written. For RAID1/RAID10,
+all but one block are overwritten with the content of that one block.
+
+A count of mismatches is recorded in the
+.I sysfs
+file
+.IR md/mismatch_cnt .
+This is set to zero when a
+scrub starts and is incremented whenever a sector is
+found that is a mismatch.
+.I md
+normally works in units much larger than a single sector and when it
+finds a mismatch, it does not determine exactly how many actual sectors were
+affected but simply adds the number of sectors in the IO unit that was
+used. So a value of 128 could simply mean that a single 64KB check
+found an error (128 x 512bytes = 64KB).
+
+If an array is created by
+.I mdadm
+with
+.I \-\-assume\-clean
+then a subsequent check could be expected to find some mismatches.
+
+On a truly clean RAID5 or RAID6 array, any mismatches should indicate
+a hardware problem at some level - software issues should never cause
+such a mismatch.
+
+However on RAID1 and RAID10 it is possible for software issues to
+cause a mismatch to be reported. This does not necessarily mean that
+the data on the array is corrupted. It could simply be that the
+system does not care what is stored on that part of the array - it is
+unused space.
+
+The most likely cause for an unexpected mismatch on RAID1 or RAID10
+occurs if a swap partition or swap file is stored on the array.
+
+When the swap subsystem wants to write a page of memory out, it flags
+the page as 'clean' in the memory manager and requests the swap device
+to write it out. It is quite possible that the memory will be
+changed while the write-out is happening. In that case the 'clean'
+flag will be found to be clear when the write completes and so the
+swap subsystem will simply forget that the swapout had been attempted,
+and will possibly choose a different page to write out.
+
+If the swap device was on RAID1 (or RAID10), then the data is sent
+from memory to a device twice (or more depending on the number of
+devices in the array). Thus it is possible that the memory gets changed
+between the times it is sent, so different data can be written to
+the different devices in the array. This will be detected by
+.I check
+as a mismatch. However it does not reflect any corruption as the
+block where this mismatch occurs is being treated by the swap system as
+being empty, and the data will never be read from that block.
+
+It is conceivable for a similar situation to occur on non-swap files,
+though it is less likely.
+
+Thus the
+.I mismatch_cnt
+value can not be interpreted very reliably on RAID1 or RAID10,
+especially when the device is used for swap.
+
+
+.SS BITMAP WRITE-INTENT LOGGING
+
+From Linux 2.6.13,
+.I md
+supports a bitmap based write-intent log. If configured, the bitmap
+is used to record which blocks of the array may be out of sync.
+Before any write request is honoured, md will make sure that the
+corresponding bit in the log is set. After a period of time with no
+writes to an area of the array, the corresponding bit will be cleared.
+
+This bitmap is used for two optimisations.
+
+Firstly, after an unclean shutdown, the resync process will consult
+the bitmap and only resync those blocks that correspond to bits in the
+bitmap that are set. This can dramatically reduce resync time.
+
+Secondly, when a drive fails and is removed from the array, md stops
+clearing bits in the intent log. If that same drive is re-added to
+the array, md will notice and will only recover the sections of the
+drive that are covered by bits in the intent log that are set. This
+can allow a device to be temporarily removed and reinserted without
+causing an enormous recovery cost.
+
+The intent log can be stored in a file on a separate device, or it can
+be stored near the superblocks of an array which has superblocks.
+
+It is possible to add an intent log to an active array, or remove an
+intent log if one is present.
+
+In 2.6.13, intent bitmaps are only supported with RAID1. Other levels
+with redundancy are supported from 2.6.15.
+
+.SS BAD BLOCK LIST
+
+From Linux 3.5 each device in an
+.I md
+array can store a list of known-bad-blocks. This list is 4K in size
+and usually positioned at the end of the space between the superblock
+and the data.
+
+When a block cannot be read and cannot be repaired by writing data
+recovered from other devices, the address of the block is stored in
+the bad block list. Similarly if an attempt to write a block fails,
+the address will be recorded as a bad block. If attempting to record
+the bad block fails, the whole device will be marked faulty.
+
+Attempting to read from a known bad block will cause a read error.
+Attempting to write to a known bad block will be ignored if any write
+errors have been reported by the device. If there have been no write
+errors then the data will be written to the known bad block and if
+that succeeds, the address will be removed from the list.
+
+This allows an array to fail more gracefully - a few blocks on
+different devices can be faulty without taking the whole array out of
+action.
+
+The list is particularly useful when recovering to a spare. If a few blocks
+cannot be read from the other devices, the bulk of the recovery can
+complete and those few bad blocks will be recorded in the bad block list.
+
+.SS WRITE-BEHIND
+
+From Linux 2.6.14,
+.I md
+supports WRITE-BEHIND on RAID1 arrays.
+
+This allows certain devices in the array to be flagged as
+.IR write-mostly .
+MD will only read from such devices if there is no
+other option.
+
+If a write-intent bitmap is also provided, write requests to
+write-mostly devices will be treated as write-behind requests and md
+will not wait for writes to those requests to complete before
+reporting the write as complete to the filesystem.
+
+This allows for a RAID1 with WRITE-BEHIND to be used to mirror data
+over a slow link to a remote computer (providing the link isn't too
+slow). The extra latency of the remote link will not slow down normal
+operations, but the remote system will still have a reasonably
+up-to-date copy of all data.
+
+.SS RESTRIPING
+
+.IR Restriping ,
+also known as
+.IR Reshaping ,
+is the processes of re-arranging the data stored in each stripe into a
+new layout. This might involve changing the number of devices in the
+array (so the stripes are wider), changing the chunk size (so stripes
+are deeper or shallower), or changing the arrangement of data and
+parity (possibly changing the RAID level, e.g. 1 to 5 or 5 to 6).
+
+As of Linux 2.6.35, md can reshape a RAID4, RAID5, or RAID6 array to
+have a different number of devices (more or fewer) and to have a
+different layout or chunk size. It can also convert between these
+different RAID levels. It can also convert between RAID0 and RAID10,
+and between RAID0 and RAID4 or RAID5.
+Other possibilities may follow in future kernels.
+
+During any stripe process there is a 'critical section' during which
+live data is being overwritten on disk. For the operation of
+increasing the number of drives in a RAID5, this critical section
+covers the first few stripes (the number being the product of the old
+and new number of devices). After this critical section is passed,
+data is only written to areas of the array which no longer hold live
+data \(em the live data has already been located away.
+
+For a reshape which reduces the number of devices, the 'critical
+section' is at the end of the reshape process.
+
+md is not able to ensure data preservation if there is a crash
+(e.g. power failure) during the critical section. If md is asked to
+start an array which failed during a critical section of restriping,
+it will fail to start the array.
+
+To deal with this possibility, a user-space program must
+.IP \(bu 4
+Disable writes to that section of the array (using the
+.B sysfs
+interface),
+.IP \(bu 4
+take a copy of the data somewhere (i.e. make a backup),
+.IP \(bu 4
+allow the process to continue and invalidate the backup and restore
+write access once the critical section is passed, and
+.IP \(bu 4
+provide for restoring the critical data before restarting the array
+after a system crash.
+.PP
+
+.B mdadm
+versions from 2.4 do this for growing a RAID5 array.
+
+For operations that do not change the size of the array, like simply
+increasing chunk size, or converting RAID5 to RAID6 with one extra
+device, the entire process is the critical section. In this case, the
+restripe will need to progress in stages, as a section is suspended,
+backed up, restriped, and released.
+
+.SS SYSFS INTERFACE
+Each block device appears as a directory in
+.I sysfs
+(which is usually mounted at
+.BR /sys ).
+For MD devices, this directory will contain a subdirectory called
+.B md
+which contains various files for providing access to information about
+the array.
+
+This interface is documented more fully in the file
+.B Documentation/md.txt
+which is distributed with the kernel sources. That file should be
+consulted for full documentation. The following are just a selection
+of attribute files that are available.
+
+.TP
+.B md/sync_speed_min
+This value, if set, overrides the system-wide setting in
+.B /proc/sys/dev/raid/speed_limit_min
+for this array only.
+Writing the value
+.B "system"
+to this file will cause the system-wide setting to have effect.
+
+.TP
+.B md/sync_speed_max
+This is the partner of
+.B md/sync_speed_min
+and overrides
+.B /proc/sys/dev/raid/speed_limit_max
+described below.
+
+.TP
+.B md/sync_action
+This can be used to monitor and control the resync/recovery process of
+MD.
+In particular, writing "check" here will cause the array to read all
+data block and check that they are consistent (e.g. parity is correct,
+or all mirror replicas are the same). Any discrepancies found are
+.B NOT
+corrected.
+
+A count of problems found will be stored in
+.BR md/mismatch_count .
+
+Alternately, "repair" can be written which will cause the same check
+to be performed, but any errors will be corrected.
+
+Finally, "idle" can be written to stop the check/repair process.
+
+.TP
+.B md/stripe_cache_size
+This is only available on RAID5 and RAID6. It records the size (in
+pages per device) of the stripe cache which is used for synchronising
+all write operations to the array and all read operations if the array
+is degraded. The default is 256. Valid values are 17 to 32768.
+Increasing this number can increase performance in some situations, at
+some cost in system memory. Note, setting this value too high can
+result in an "out of memory" condition for the system.
+
+memory_consumed = system_page_size * nr_disks * stripe_cache_size
+
+.TP
+.B md/preread_bypass_threshold
+This is only available on RAID5 and RAID6. This variable sets the
+number of times MD will service a full-stripe-write before servicing a
+stripe that requires some "prereading". For fairness this defaults to
+1. Valid values are 0 to stripe_cache_size. Setting this to 0
+maximizes sequential-write throughput at the cost of fairness to threads
+doing small or random writes.
+