.\" the Free Software Foundation; either version 2 of the License, or
.\" (at your option) any later version.
.\" See file COPYING in distribution for details.
-.TH MDADM 8 "" v3.3.4
+.TH MDADM 8 "" v4.3
.SH NAME
mdadm \- manage MD devices
.I aka
Linear and RAID levels 0/1/4/5/6,
changing the RAID level between 0, 1, 5, and 6, and between 0 and 10,
changing the chunk size and layout for RAID 0,4,5,6,10 as well as adding or
-removing a write-intent bitmap.
+removing a write-intent bitmap and changing the array's consistency policy.
.TP
.B "Incremental Assembly"
.B Misc
This is an 'everything else' mode that supports operations on active
arrays, operations on component devices such as erasing old superblocks, and
-information gathering operations.
+information-gathering operations.
.\"This mode allows operations on independent devices such as examine MD
.\"superblocks, erasing old superblocks and stopping active arrays.
.TP
.BR \-h ", " \-\-help
-Display general help message or, after one of the above options, a
+Display a general help message or, after one of the above options, a
mode-specific help message.
.TP
.B \-\-help\-options
-Display more detailed help about command line parsing and some commonly
+Display more detailed help about command-line parsing and some commonly
used options.
.TP
.TP
.BR \-c ", " \-\-config=
-Specify the config file or directory. Default is to use
-.B /etc/mdadm.conf
-and
-.BR /etc/mdadm.conf.d ,
-or if those are missing then
-.B /etc/mdadm/mdadm.conf
-and
-.BR /etc/mdadm/mdadm.conf.d .
+Specify the config file or directory. If not specified, the default config file
+and default conf.d directory will be used. See
+.BR mdadm.conf (5)
+for more details.
+
If the config file given is
.B "partitions"
then nothing will be read, but
which is managed in a similar manner to DDF, and is supported by an
option-rom on some platforms:
.IP
-.B http://www.intel.com/design/chipsets/matrixstorage_sb.htm
+.B https://www.intel.com/content/www/us/en/support/products/122484
.PP
.RE
.B homehost
will be recorded in the metadata. For version-1 superblocks, it will
be prefixed to the array name. For version-0.90 superblocks, part of
-the SHA1 hash of the hostname will be stored in the later half of the
+the SHA1 hash of the hostname will be stored in the latter half of the
UUID.
When reporting information about an array, any array which is tagged
When using Auto-Assemble, only arrays tagged for the given homehost
will be allowed to use 'local' names (i.e. not ending in '_' followed
by a digit string). See below under
-.BR "Auto Assembly" .
+.BR "Auto-Assembly" .
The special name "\fBany\fP" can be used as a wild card. If an array
is created with
.I mdadm
needs to print the name for a device it normally finds the name in
.B /dev
-which refers to the device and is shortest. When a path component is
+which refers to the device and is the shortest. When a path component is
given with
.B \-\-prefer
.I mdadm
and
.BR \-\-monitor .
+.TP
+.B \-\-home\-cluster=
+specifies the cluster name for the md device. The md device can be assembled
+only on the cluster which matches the name specified. If this option is not
+provided, mdadm tries to detect the cluster name automatically.
+
.SH For create, build, or grow:
.TP
.TP
.BR \-z ", " \-\-size=
-Amount (in Kibibytes) of space to use from each drive in RAID levels 1/4/5/6.
+Amount (in Kilobytes) of space to use from each drive in RAID levels 1/4/5/6/10
+and for RAID 0 on external metadata.
This must be a multiple of the chunk size, and must leave about 128Kb
of space at the end of the drive for the RAID superblock.
If this is not specified
size, though if there is a variance among the drives of greater than 1%, a warning is
issued.
-A suffix of 'M' or 'G' can be given to indicate Megabytes or
-Gigabytes respectively.
+A suffix of 'K', 'M', 'G' or 'T' can be given to indicate Kilobytes,
+Megabytes, Gigabytes or Terabytes respectively.
Sometimes a replacement drive can be a little smaller than the
original drives though this should be minimised by IDEMA standards.
slightly smaller than the smaller device with the aim that it will
still be larger than any replacement.
+This option can be used with
+.B \-\-create
+for determining the initial size of an array. For external metadata,
+it can be used on a volume, but not on a container itself.
+Setting the initial size of
+.B RAID 0
+array is only valid for external metadata.
+
This value can be set with
.B \-\-grow
-for RAID level 1/4/5/6 though
-.B CONTAINER
-based arrays such as those with IMSM metadata may not be able to
-support this.
+for RAID level 1/4/5/6/10 though
+DDF arrays may not be able to support this.
+RAID 0 array size cannot be changed.
If the array was created with a size smaller than the currently
active drives, the extra space can be accessed using
.BR \-\-grow .
.B "\-\-grow \-\-size="
command.
-This value cannot be used when creating a
-.B CONTAINER
-such as with DDF and IMSM metadata, though it perfectly valid when
-creating an array inside a container.
-
.TP
.BR \-Z ", " \-\-array\-size=
This is only meaningful with
.B "\-\-grow \-\-array\-size="
command.
-A suffix of 'M' or 'G' can be given to indicate Megabytes or
-Gigabytes respectively.
+A suffix of 'K', 'M', 'G' or 'T' can be given to indicate Kilobytes,
+Megabytes, Gigabytes or Terabytes respectively.
A value of
.B max
restores the apparent size of the array to be whatever the real
amount of available space is.
+Clustered arrays do not support this parameter yet.
+
.TP
.BR \-c ", " \-\-chunk=
-Specify chunk size of kibibytes. The default when creating an
+Specify chunk size in kilobytes. The default when creating an
array is 512KB. To ensure compatibility with earlier versions, the
default when building an array with no persistent metadata is 64KB.
This is only meaningful for RAID0, RAID4, RAID5, RAID6, and RAID10.
RAID4, RAID5, RAID6, and RAID10 require the chunk size to be a power
-of 2. In any case it must be a multiple of 4KB.
+of 2, with minimal chunk size being 4KB.
-A suffix of 'M' or 'G' can be given to indicate Megabytes or
-Gigabytes respectively.
+A suffix of 'K', 'M', 'G' or 'T' can be given to indicate Kilobytes,
+Megabytes, Gigabytes or Terabytes respectively.
.TP
.BR \-\-rounding=
-Specify rounding factor for a Linear array. The size of each
+Specify the rounding factor for a Linear array. The size of each
component will be rounded down to a multiple of this size.
This is a synonym for
.B \-\-chunk
This option configures the fine details of data layout for RAID5, RAID6,
and RAID10 arrays, and controls the failure modes for
.IR faulty .
+It can also be used for working around a kernel bug with RAID0, but generally
+doesn't need to be used explicitly.
The layout of the RAID5 parity block can be one of
.BR left\-asymmetric ,
"clear" or "none" will remove any pending or periodic failure modes,
and "flush" will clear any persistent faults.
-Finally, the layout options for RAID10 are one of 'n', 'o' or 'f' followed
-by a small number. The default is 'n2'. The supported options are:
+The layout options for RAID10 are one of 'n', 'o' or 'f' followed
+by a small number signifying the number of copies of each datablock.
+The default is 'n2'. The supported options are:
.I 'n'
signals 'near' copies. Multiple copies of one data block are at
(multiple copies have very different offsets).
See md(4) for more detail about 'near', 'offset', and 'far'.
-The number is the number of copies of each datablock. 2 is normal, 3
+As for the number of copies of each data block, 2 is normal, 3
can be useful. This number can be at most equal to the number of
devices in the array. It does not need to divide evenly into that
number (e.g. it is perfectly legal to have an 'n2' layout for an array
with an odd number of devices).
+A bug introduced in Linux 3.14 means that RAID0 arrays
+.B "with devices of differing sizes"
+started using a different layout. This could lead to
+data corruption. Since Linux 5.4 (and various stable releases that received
+backports), the kernel will not accept such an array unless
+a layout is explicitly set. It can be set to
+.RB ' original '
+or
+.RB ' alternate '.
+When creating a new array,
+.I mdadm
+will select
+.RB ' original '
+by default, so the layout does not normally need to be set.
+An array created for either
+.RB ' original '
+or
+.RB ' alternate '
+will not be recognized by an (unpatched) kernel prior to 5.4. To create
+a RAID0 array with devices of differing sizes that can be used on an
+older kernel, you can set the layout to
+.RB ' dangerous '.
+This will use whichever layout the running kernel supports, so the data
+on the array may become corrupt when changing kernel from pre-3.14 to a
+later kernel.
+
When an array is converted between RAID5 and RAID6 an intermediate
RAID6 layout is used in which the second parity block (Q) is always on
the last device. To convert a RAID5 to RAID6 and leave it in this new
.B "none"
is given with
.B \-\-grow
-mode, then any bitmap that is present is removed.
+mode, then any bitmap that is present is removed. If the word
+.B "clustered"
+is given, the array is created for a clustered environment. One bitmap
+is created for each node as defined by the
+.B \-\-nodes
+parameter and are stored internally.
To help catch typing errors, the filename must contain at least one
slash ('/') if it is a real file (not 'internal' or 'none').
.I mdadm
automatically adds an internal bitmap as it will usually be
beneficial. This can be suppressed with
-.B "\-\-bitmap=none".
+.B "\-\-bitmap=none"
+or by selecting a different consistency policy with
+.BR \-\-consistency\-policy .
.TP
.BR \-\-bitmap\-chunk=
-Set the chunksize of the bitmap. Each bit corresponds to that many
+Set the chunk size of the bitmap. Each bit corresponds to that many
Kilobytes of storage.
-When using a file based bitmap, the default is to use the smallest
-size that is at-least 4 and requires no more than 2^21 chunks.
+When using a file-based bitmap, the default is to use the smallest
+size that is at least 4 and requires no more than 2^21 chunks.
When using an
.B internal
-bitmap, the chunksize defaults to 64Meg, or larger if necessary to
+bitmap, the chunk size defaults to 64Meg, or larger if necessary to
fit the bitmap into the available space.
-A suffix of 'M' or 'G' can be given to indicate Megabytes or
-Gigabytes respectively.
+A suffix of 'K', 'M', 'G' or 'T' can be given to indicate Kilobytes,
+Megabytes, Gigabytes or Terabytes respectively.
.TP
.BR \-W ", " \-\-write\-mostly
.BR \-\-create ,
or
.B \-\-add
-command will be flagged as 'write-mostly'. This is valid for RAID1
+command will be flagged as 'write\-mostly'. This is valid for RAID1
only and means that the 'md' driver will avoid reading from these
devices if at all possible. This can be useful if mirroring over a
slow link.
mode, and write-behind is only attempted on drives marked as
.IR write-mostly .
+.TP
+.BR \-\-failfast
+subsequent devices listed in a
+.B \-\-create
+or
+.B \-\-add
+command will be flagged as 'failfast'. This is valid for RAID1 and
+RAID10 only. IO requests to these devices will be encouraged to fail
+quickly rather than cause long delays due to error handling. Also no
+attempt is made to repair a read error on these devices.
+
+If an array becomes degraded so that the 'failfast' device is the only
+usable device, the 'failfast' flag will then be ignored and extended
+delays will be preferred to complete failure.
+
+The 'failfast' flag is appropriate for storage arrays which have a
+low probability of true failure, but which may sometimes
+cause unacceptable delays due to internal maintenance functions.
+
.TP
.BR \-\-assume\-clean
Tell
.B \-\-assume\-clean
can be used with that command to avoid the automatic resync.
+.TP
+.BR \-\-write-zeroes
+When creating an array, send write zeroes requests to all the block
+devices. This should zero the data area on all disks such that the
+initial sync is not necessary and, if successfull, will behave
+as if
+.B \-\-assume\-clean
+was specified.
+.IP
+This is intended for use with devices that have hardware offload for
+zeroing, but despite this zeroing can still take several minutes for
+large disks. Thus a message is printed before and after zeroing and
+each disk is zeroed in parallel with the others.
+.IP
+This is only meaningful with --create.
+
.TP
.BR \-\-backup\-file=
This is needed when
.B \-\-grow
-is used to increase the number of raid-devices in a RAID5 or RAID6 if
+is used to increase the number of raid devices in a RAID5 or RAID6 if
there are no spare devices available, or to shrink, change RAID level
or layout. See the GROW MODE section below on RAID\-DEVICES CHANGES.
The file must be stored on a separate device, not on the RAID array
which computed a different offset.
Setting the offset explicitly over-rides the default. The value given
-is in Kilobytes unless an 'M' or 'G' suffix is given.
+is in Kilobytes unless a suffix of 'K', 'M', 'G' or 'T' is used to explicitly
+indicate Kilobytes, Megabytes, Gigabytes or Terabytes respectively.
Since Linux 3.4,
.B \-\-data\-offset
.B \-\-data\-offset
can be specified as
.BR variable .
-In the case each member device is expected to have a offset appended
+In the case each member device is expected to have an offset appended
to the name, separated by a colon. This makes it possible to recreate
exactly an array which has varying data offsets (as can happen when
different versions of
.BR \-N ", " \-\-name=
Set a
.B name
-for the array. This is currently only effective when creating an
-array with a version-1 superblock, or an array in a DDF container.
-The name is a simple textual string that can be used to identify array
-components when assembling. If name is needed but not specified, it
-is taken from the basename of the device that is being created.
-e.g. when creating
-.I /dev/md/home
-the
-.B name
-will default to
-.IR home .
+for the array. It must be
+.BR "POSIX PORTABLE NAME"
+compatible and cannot be longer than 32 chars. This is effective when creating an array
+with a v1 metadata, or an external array.
+
+If name is needed but not specified, it is taken from the basename of the device
+that is being created. See
+.BR "DEVICE NAMES"
.TP
.BR \-R ", " \-\-run
.I mdadm
accept the geometry and layout specified without question. Normally
.I mdadm
-will not allow creation of an array with only one device, and will try
+will not allow the creation of an array with only one device, and will try
to create a RAID5 array with one missing drive (as this makes the
initial resync work faster). With
.BR \-\-force ,
Start the array
.B read only
rather than read-write as normal. No writes will be allowed to the
-array, and no resync, recovery, or reshape will be started.
+array, and no resync, recovery, or reshape will be started. It works with
+Create, Assemble, Manage and Misc mode.
.TP
.BR \-a ", " "\-\-auto{=yes,md,mdp,part,p}{NN}"
If the md device name is in a 'standard' format as described in DEVICE
NAMES, then it will be created, if necessary, with the appropriate
device number based on that name. If the device name is not in one of these
-formats, then a unused device number will be allocated. The device
+formats, then an unused device number will be allocated. The device
number will be considered unused if there is no active array for that
number, and there is no entry in /dev for that number and with a
non-standard name. Names that are not in 'standard' format are only
.B \-\-add
can be used to add some extra devices to be included in the array.
In most cases this is not needed as the extra devices can be added as
-spares first, and then the number of raid-disks can be changed.
-However for RAID0, it is not possible to add spares. So to increase
+spares first, and then the number of raid disks can be changed.
+However, for RAID0 it is not possible to add spares. So to increase
the number of devices in a RAID0, it is necessary to set the new
number of devices, and to add the new devices, in the same command.
+.TP
+.BR \-\-nodes
+Only works when the array is created for a clustered environment. It specifies
+the maximum number of nodes in the cluster that will use this device
+simultaneously. If not specified, this defaults to 4.
+
+.TP
+.BR \-\-write-journal
+Specify journal device for the RAID-4/5/6 array. The journal device
+should be an SSD with a reasonable lifetime.
+
+.TP
+.BR \-k ", " \-\-consistency\-policy=
+Specify how the array maintains consistency in the case of an unexpected shutdown.
+Only relevant for RAID levels with redundancy.
+Currently supported options are:
+.RS
+
+.TP
+.B resync
+Full resync is performed and all redundancy is regenerated when the array is
+started after an unclean shutdown.
+
+.TP
+.B bitmap
+Resync assisted by a write-intent bitmap. Implicitly selected when using
+.BR \-\-bitmap .
+
+.TP
+.B journal
+For RAID levels 4/5/6, the journal device is used to log transactions and replay
+after an unclean shutdown. Implicitly selected when using
+.BR \-\-write\-journal .
+
+.TP
+.B ppl
+For RAID5 only, Partial Parity Log is used to close the write hole and
+eliminate resync. PPL is stored in the metadata region of RAID member drives,
+no additional journal drive is needed.
+
+.PP
+Can be used with \-\-grow to change the consistency policy of an active array
+in some cases. See CONSISTENCY POLICY CHANGES below.
+.RE
+
+
.SH For assemble:
.TP
.TP
.BR \-N ", " \-\-name=
-Specify the name of the array to assemble. This must be the name
-that was specified when creating the array. It must either match
+Specify the name of the array to assemble. It must be
+.BR "POSIX PORTABLE NAME"
+compatible and cannot be longer than 32 chars. This must be the name
+that was specified when creating the array. It must either match
the name stored in the superblock exactly, or it must match
with the current
.I homehost
.I mdadm
cannot find enough working devices to start the array, but can find
some devices that are recorded as having failed, then it will mark
-those devices as working so that the array can be started.
+those devices as working so that the array can be started. This works only for
+native. For external metadata it allows to start dirty degraded RAID 4, 5, 6.
An array which requires
.B \-\-force
to be started may contain data corruption. Use it carefully.
.BR summaries ,
.BR uuid ,
.BR name ,
+.BR nodes ,
.BR homehost ,
+.BR home-cluster ,
.BR resync ,
.BR byteorder ,
.BR devicesize ,
.BR no\-bitmap ,
.BR bbl ,
.BR no\-bbl ,
+.BR ppl ,
+.BR no\-ppl ,
+.BR layout\-original ,
+.BR layout\-alternate ,
+.BR layout\-unspecified ,
.BR metadata ,
or
.BR super\-minor .
reports a different "Preferred Minor" to
.BR \-\-detail .
In some cases this update will be performed automatically
-by the kernel driver. In particular the update happens automatically
+by the kernel driver. In particular, the update happens automatically
at the first write to an array with redundancy (RAID level 1 or
greater) on a 2.6 (or later) kernel.
of the array as stored in the superblock. This is only supported for
version-1 superblocks.
+The
+.B nodes
+option will change the
+.I nodes
+of the array as stored in the bitmap superblock. This option only
+works for a clustered environment.
+
The
.B homehost
option will change the
same as updating the UUID.
For version-1 superblocks, this involves updating the name.
+The
+.B home\-cluster
+option will change the cluster name as recorded in the superblock and
+bitmap. This option only works for a clustered environment.
+
The
.B resync
option will cause the array to be marked
The
.B byteorder
option allows arrays to be moved between machines with different
-byte-order.
+byte-order, such as from a big-endian machine like a Sparc or some
+MIPS machines, to a little-endian x86_64 machine.
When assembling such an array for the first time after a move, giving
.B "\-\-update=byteorder"
will cause
removed. If the bad block list contains entries, this will fail, as
removing the list could cause data corruption.
+The
+.B ppl
+option will enable PPL for a RAID5 array and reserve space for PPL on each
+device. There must be enough free space between the data and superblock and a
+write-intent bitmap or journal must not be used.
+
+The
+.B no\-ppl
+option will disable PPL in the superblock.
+
+The
+.B layout\-original
+and
+.B layout\-alternate
+options are for RAID0 arrays with non-uniform devices size that were in
+use before Linux 5.4. If the array was being used with Linux 3.13 or
+earlier, then to assemble the array on a new kernel,
+.B \-\-update=layout\-original
+must be given. If the array was created and used with a kernel from Linux 3.14 to
+Linux 5.3, then
+.B \-\-update=layout\-alternate
+must be given. This only needs to be given once. Subsequent assembly of the array
+will happen normally.
+For more information, see
+.IR md (4).
+
+The
+.B layout\-unspecified
+option reverts the effect of
+.B layout\-orignal
+or
+.B layout\-alternate
+and allows the array to be again used on a kernel prior to Linux 5.3.
+This option should be used with great caution.
+
.TP
.BR \-\-freeze\-reshape
-Option is intended to be used in start-up scripts during initrd boot phase.
-When array under reshape is assembled during initrd phase, this option
-stops reshape after reshape critical section is being restored. This happens
-before file system pivot operation and avoids loss of file system context.
+This option is intended to be used in start-up scripts during the initrd boot phase.
+When the array under reshape is assembled during the initrd phase, this option
+stops the reshape after the reshape-critical section has been restored. This happens
+before the file system pivot operation and avoids loss of filesystem context.
Losing file system context would cause reshape to be broken.
Reshape can be continued later using the
If the metadata on the device reports that it is a member of the
array, and the slot that it used is still vacant, then the device will
be added back to the array in the same position. This will normally
-cause the data for that device to be recovered. However based on the
+cause the data for that device to be recovered. However, based on the
event count on the device, the recovery may only require sections that
-are flagged a write-intent bitmap to be recovered or may not require
+are flagged by a write-intent bitmap to be recovered or may not require
any recovery at all.
When used on an array that has no metadata (i.e. it was built with
it will be assumed that bitmap-based recovery is enough to make the
device fully consistent with the array.
-When used with v1.x metadata,
.B \-\-re\-add
-can be accompanied by
+can also be accompanied by
.BR \-\-update=devicesize ,
.BR \-\-update=bbl ", or"
.BR \-\-update=no\-bbl .
-See the description of these option when used in Assemble mode for an
+See descriptions of these options when used in Assemble mode for an
explanation of their use.
If the device name given is
except that it does not attempt
.B \-\-re\-add
first. The device will be added as a spare even if it looks like it
-could be an recent member of the array.
+could be a recent member of the array.
.TP
.BR \-r ", " \-\-remove
.B set-A
can be given to
.BR \-\-remove .
-The first causes all failed device to be removed. The second causes
+The first causes all failed devices to be removed. The second causes
any device which is no longer connected to the system (i.e an 'open'
returns
.BR ENXIO )
to be removed.
-The third will remove a set as describe below under
+The third will remove a set as described below under
.BR \-\-fail .
.TP
of devices, the devices can be conceptually divided into sets where
each set contains a single complete copy of the data on the array.
Sometimes a RAID10 array will be configured so that these sets are on
-separate controllers. In this case all the devices in one set can be
+separate controllers. In this case, all the devices in one set can be
failed by giving a name like
.B set\-A
or
.B \-\-replace
devices. The devices listed after
.B \-\-with
-will be preferentially used to replace the devices listed after
+will preferentially be used to replace the devices listed after
.BR \-\-replace .
-These device must already be spare devices in the array.
+These devices must already be spare devices in the array.
.TP
.BR \-\-write\-mostly
.BR \-\-readwrite
Subsequent devices that are added or re\-added will have the 'write-mostly'
flag cleared.
+.TP
+.BR \-\-cluster\-confirm
+Confirm the existence of the device. This is issued in response to an \-\-add
+request by a node in a cluster. When a node adds a device it sends a message
+to all nodes in the cluster to look for a device with a UUID. This translates
+to a udev notification with the UUID of the device to be added and the slot
+number. The receiving node must acknowledge this message
+with \-\-cluster\-confirm. Valid arguments are <slot>:<devicename> in case
+the device is found or <slot>:missing in case the device is not found.
+
+.TP
+.BR \-\-add-journal
+Add a journal to an existing array, or recreate journal for a RAID-4/5/6 array
+that lost a journal device. To avoid interrupting ongoing write operations,
+.B \-\-add-journal
+only works for array in Read-Only state.
+
+.TP
+.BR \-\-failfast
+Subsequent devices that are added or re\-added will have
+the 'failfast' flag set. This is only valid for RAID1 and RAID10 and
+means that the 'md' driver will avoid long timeouts on error handling
+where possible.
+.TP
+.BR \-\-nofailfast
+Subsequent devices that are re\-added will be re\-added without
+the 'failfast' flag set.
.P
Each of these options requires that the first device listed is the array
.TP
.BR \-\-detail\-platform
Print details of the platform's RAID capabilities (firmware / hardware
-topology) for a given metadata format. If used without argument, mdadm
+topology) for a given metadata format. If used without an argument, mdadm
will scan all controllers looking for their capabilities. Otherwise, mdadm
-will only look at the controller specified by the argument in form of an
+will only look at the controller specified by the argument in the form of an
absolute filepath or a link, e.g.
.IR /sys/devices/pci0000:00/0000:00:1f.2 .
.TP
.B \-\-examine\-badblocks
List the bad-blocks recorded for the device, if a bad-blocks list has
-been configured. Currently only
+been configured. Currently only
.B 1.x
-metadata supports bad-blocks lists.
+and
+.B IMSM
+metadata support bad-blocks lists.
.TP
.BI \-\-dump= directory
the block where the superblock would be is overwritten even if it
doesn't appear to be valid.
+.B Note:
+Be careful when calling \-\-zero\-superblock with clustered raid. Make sure
+the array isn't used or assembled in another cluster node before executing it.
+
.TP
.B \-\-kill\-subarray=
If the device is a container and the argument to \-\-kill\-subarray
is given, arrange for the array to be marked clean as soon as possible.
.I mdadm
will return with success if the array uses external metadata and we
-successfully waited. For native arrays this returns immediately as the
+successfully waited. For native arrays, this returns immediately as the
kernel handles dirty-clean transitions at shutdown. No action is taken
if safe-mode handling is disabled.
.TP
.BR \-\-run ", " \-R
-Run any array assembled as soon as a minimal number of devices are
+Run any array assembled as soon as a minimal number of devices is
available, rather than waiting until all expected devices are present.
.TP
a new device appears at the same location it can be automatically
added to the same array. This allows the failed device to be
automatically replaced by a new device without metadata if it appears
-at specified path. This option is normally only set by a
+at specified path. This option is normally only set by an
.I udev
script.
.PP
This usage assembles one or more RAID arrays from pre-existing components.
For each array, mdadm needs to know the md device, the identity of the
-array, and a number of component-devices. These can be found in a number of ways.
+array, and the number of component devices. These can be found in a number of ways.
In the first usage example (without the
.BR \-\-scan )
.B \-\-config
or requested with (a possibly implicit)
.BR \-\-scan .
-In the later case,
-.B /etc/mdadm.conf
-or
-.B /etc/mdadm/mdadm.conf
-is used.
+In the latter case, the default config file is used. See
+.BR mdadm.conf (5)
+for more details.
If
.B \-\-scan
.B /dev
itself.
-In Linux kernels prior to version 2.6.28 there were two distinctly
-different types of md devices that could be created: one that could be
+In Linux kernels prior to version 2.6.28 there were two distinct
+types of md devices that could be created: one that could be
partitioned using standard partitioning tools and one that could not.
-Since 2.6.28 that distinction is no longer relevant as both type of
+Since 2.6.28 that distinction is no longer relevant as both types of
devices can be partitioned.
.I mdadm
will normally create the type that originally could not be partitioned
-as it has a well defined major number (9).
+as it has a well-defined major number (9).
Prior to 2.6.28, it is important that mdadm chooses the correct type
of array device to use. This can be controlled with the
.B auto=
on the ARRAY line for the relevant array.
-.SS Auto Assembly
+.SS Auto-Assembly
When
.B \-\-assemble
is used with
.IR mdadm.conf (5)
for further details.
-Note: Auto assembly cannot be used for assembling and activating some
+Note: Auto-assembly cannot be used for assembling and activating some
arrays which are undergoing reshape. In particular as the
.B backup\-file
-cannot be given, any reshape which requires a backup-file to continue
-cannot be started by auto assembly. An array which is growing to more
+cannot be given, any reshape which requires a backup file to continue
+cannot be started by auto-assembly. An array which is growing to more
devices and has passed the critical section can be assembled using
auto-assembly.
.I md-device
.BI \-\-chunk= X
.BI \-\-level= Y
-.br
.BI \-\-raid\-devices= Z
.I devices
.PP
-This usage will initialise a new md array, associate some devices with
+This usage will initialize a new md array, associate some devices with
it, and activate the array.
+.I md-device
+is a new device. This could be standard name or chosen name. For details see:
+.BR "DEVICE NAMES"
+
The named device will normally not exist when
.I "mdadm \-\-create"
is run, but will be created by
.I udev
once the array becomes active.
+The max length md-device name is limited to 32 characters.
+Different metadata types have more strict limitation
+(like IMSM where only 16 characters are allowed).
+For that reason, long name could be truncated or rejected, it depends on metadata policy.
+
As devices are added, they are checked to see if they contain RAID
superblocks or filesystems. They are also checked to see if the variance in
device size exceeds 1%.
.B \-\-force
option.
-When creating an array with version-1 metadata a name for the array is
-required.
-If this is not given with the
-.B \-\-name
-option,
-.I mdadm
-will choose a name based on the last component of the name of the
-device being created. So if
-.B /dev/md3
-is being created, then the name
-.B 3
-will be chosen.
-If
-.B /dev/md/home
-is being created, then the name
-.B home
-will be used.
-
When creating a partition based array, using
.I mdadm
with version-1.x metadata, the partition type should be set to
.B 0xDA
-(non fs-data). This type selection allows for greater precision since
+(non fs-data). This type of selection allows for greater precision since
using any other [RAID auto-detect (0xFD) or a GNU/Linux partition (0x83)],
might create problems in the event of array recovery through a live cdrom.
setting.
.\"If the
.\".B \-\-size
-.\"option is given, it is not necessary to list any component-devices in this command.
+.\"option is given, it is not necessary to list any component devices in this command.
.\"They can be added later, before a
.\".B \-\-run.
.\"If no
will automatically be added unless some other option is explicitly
requested with the
.B \-\-bitmap
-option. In any case space for a bitmap will be reserved so that one
-can be added layer with
+option or a different consistency policy is selected with the
+.B \-\-consistency\-policy
+option. In any case, space for a bitmap will be reserved so that one
+can be added later with
.BR "\-\-grow \-\-bitmap=internal" .
-If the metadata type supports it (currently only 1.x metadata), space
-will be allocated to store a bad block list. This allows a modest
+If the metadata type supports it (currently only 1.x and IMSM metadata),
+space will be allocated to store a bad block list. This allows a modest
number of bad blocks to be recorded, allowing the drive to remain in
service while only partially functional.
.TP
.B \-\-readonly
-start the array readonly \(em not supported yet.
+start the array in readonly mode.
.SH MANAGE MODE
.HP 12
as faulty in
.B /dev/md0
and will then remove it from the array and finally add it back
-in as a spare. However only one md array can be affected by a single
+in as a spare. However, only one md array can be affected by a single
command.
When a device is added to an active array, mdadm checks to see if it
.B \-U
or
.B \-\-update=
-option. Currently only
-.B name
-is supported.
+option. The supported options are
+.BR name ,
+.BR ppl ,
+.BR no\-ppl ,
+.BR bitmap
+and
+.BR no\-bitmap .
The
.B name
-option updates the subarray name in the metadata, it may not affect the
-device node name or the device node symlink until the subarray is
-re\-assembled. If updating
-.B name
-would change the UUID of an active subarray this operation is blocked,
-and the command will end in an error.
+option updates the subarray name in the metadata. It must be
+.BR "POSIX PORTABLE NAME"
+compatible and cannot be longer than 32 chars. If successes, new value will be respected after
+next assembly.
+
+The
+.B ppl
+and
+.B no\-ppl
+options enable and disable PPL in the metadata. Currently supported only for
+IMSM subarrays.
+
+The
+.B bitmap
+and
+.B no\-bitmap
+options enable and disable write-intent bitmap in the metadata. Currently supported only for
+IMSM subarrays.
.TP
.B \-\-examine
If the device contains RAID metadata, a file will be created in the
.I directory
and the metadata will be written to it. The file will be the same
-size as the device and have the metadata written in the file at the
-same locate that it exists in the device. However the file will be "sparse" so
+size as the device and will have the metadata written at the
+same location as it exists in the device. However, the file will be "sparse" so
that only those blocks containing metadata will be allocated. The
total space used will be small.
-The file name used in the
+The filename used in the
.I directory
-will be the base name of the device. Further if any links appear in
+will be the base name of the device. Further, if any links appear in
.I /dev/disk/by-id
which point to the device, then hard links to the file will be created
in
.I options... devices...
.PP
-This usage causes
+Monitor option can work in two modes:
+.IP \(bu 4
+system wide mode, follow all md devices based on
+.B /proc/mdstat,
+.IP \(bu 4
+follow only specified MD devices in command line.
+.PP
+
+.B \-\-scan -
+indicates system wide mode. Option causes the
+.I monitor
+to track all md devices that appear in
+.B /proc/mdstat.
+If it is not set, then at least one
+.B device
+must be specified.
+
+Monitor usage causes
.I mdadm
to periodically poll a number of md arrays and to report on any events
noticed.
-.I mdadm
-will never exit once it decides that there are arrays to be checked,
-so it should normally be run in the background.
+
+In both modes,
+.I monitor
+will work as long as there is an active array with redundancy and it is defined to follow (for
+.B \-\-scan
+every array is followed).
As well as reporting events,
.I mdadm
.B domain
and if the destination array has a failed drive but no spares.
-If any devices are listed on the command line,
-.I mdadm
-will only monitor those devices. Otherwise all arrays listed in the
-configuration file will be monitored. Further, if
-.B \-\-scan
-is given, then any other md devices that appear in
-.B /proc/mdstat
-will also be monitored.
-
The result of monitoring the arrays is the generation of events.
These events are passed to a separate program (if specified) and may
be mailed to a given E-mail address.
If
.B \-\-scan
-is given, then a program or an E-mail address must be specified on the
-command line or in the config file. If neither are available, then
+is given, then a
+.B program
+or an
+.B e-mail
+address must be specified on the
+command line or in the config file. If neither are available, then
.I mdadm
will not monitor anything.
-Without
-.B \-\-scan,
-.I mdadm
-will continue monitoring as long as something was found to monitor. If
-no program or email is given, then each event is reported to
-.BR stdout .
+For devices given directly in command line, without
+.B program
+or
+.B email
+specified, each event is reported to
+.BR stdout.
+
+Note: For systems where
+.If mdadm monitor
+is configured via systemd,
+.B mdmonitor(mdmonitor.service)
+should be configured. The service is designed to be primary solution for array monitoring,
+it is configured to work in system wide mode.
+It is automatically started and stopped according to current state and types of MD arrays in system.
+The service may require additional configuration, like
+.B e-mail
+or
+.B delay.
+That should be done in
+.B mdadm.conf.
The different events are:
.BI Rebuild NN
Where
.I NN
-is a two-digit number (ie. 05, 48). This indicates that rebuild
-has passed that many percent of the total. The events are generated
-with fixed increment since 0. Increment size may be specified with
-a commandline option (default is 20). (syslog priority: Warning)
+is a two-digit number (eg. 05, 48). This indicates that the rebuild
+has reached that percentage of the total. The events are generated
+at a fixed increment from 0. The increment size may be specified with
+a command-line option (the default is 20). (syslog priority: Warning)
.TP
.B RebuildFinished
detects that an array in a spare group has fewer active
devices than necessary for the complete array, and has no spare
devices, it will look for another array in the same spare group that
-has a full complement of working drive and a spare. It will then
-attempt to remove the spare from the second drive and add it to the
+has a full complement of working drives and a spare. It will then
+attempt to remove the spare from the second array and add it to the
first.
If the removal succeeds but the adding fails, then it is added back to
the original array.
.SH GROW MODE
The GROW mode is used for changing the size or shape of an active
array.
-For this to work, the kernel must support the necessary change.
-Various types of growth are being added during 2.6 development.
-Currently the supported changes include
+During the kernel 2.6 era the following changes were added:
.IP \(bu 4
change the "size" attribute for RAID1, RAID4, RAID5 and RAID6.
.IP \(bu 4
.IP \(bu 4
add a write-intent bitmap to any array which supports these bitmaps, or
remove a write-intent bitmap from such an array.
+.IP \(bu 4
+change the array's consistency policy.
.PP
Using GROW on containers is currently supported only for Intel's IMSM
increased - which affects all arrays in the container - or an array
in a container can be converted between levels where those levels are
supported by the container, and the conversion is on of those listed
-above. Resizing arrays in an IMSM container with
-.B "--grow --size"
-is not yet supported.
-
-Grow functionality (e.g. expand a number of raid devices) for Intel's
-IMSM container format has an experimental status. It is guarded by the
-.B MDADM_EXPERIMENTAL
-environment variable which must be set to '1' for a GROW command to
-succeed.
-This is for the following reasons:
-
-.IP 1.
-Intel's native IMSM check-pointing is not fully tested yet.
-This can causes IMSM incompatibility during the grow process: an array
-which is growing cannot roam between Microsoft Windows(R) and Linux
-systems.
-
-.IP 2.
-Interrupting a grow operation is not recommended, because it
-has not been fully tested for Intel's IMSM container format yet.
+above.
.PP
-Note: Intel's native checkpointing doesn't use
+Notes:
+.IP \(bu 4
+Intel's native checkpointing doesn't use
.B --backup-file
option and it is transparent for assembly feature.
+.IP \(bu 4
+Roaming between Windows(R) and Linux systems for IMSM metadata is not
+supported during grow process.
+.IP \(bu 4
+When growing a raid0 device, the new component disk size (or external
+backup size) should be larger than LCM(old, new) * chunk-size * 2,
+where LCM() is the least common multiple of the old and new count of
+component disks, and "* 2" comes from the fact that mdadm refuses to
+use more than half of a spare device for backup space.
.SS SIZE CHANGES
Normally when an array is built the "size" is taken from the smallest
-of the drives. If all the small drives in an arrays are, one at a
-time, removed and replaced with larger drives, then you could have an
+of the drives. If all the small drives in an arrays are, over time,
+removed and replaced with larger drives, then you could have an
array of large drives with only a small amount used. In this
situation, changing the "size" with "GROW" mode will allow the extra
space to start being used. If the size is increased in this way, a
.B prior
to shrinking the array.
-Also the size of an array cannot be changed while it has an active
+Also, the size of an array cannot be changed while it has an active
bitmap. If an array has a bitmap, it must be removed before the size
can be changed. Once the change is complete a new bitmap can be created.
+.PP
+Note:
+.B "--grow --size"
+is not yet supported for external file bitmap.
+
.SS RAID\-DEVICES CHANGES
A RAID1 array can work with any number of devices from 1 upwards
is required. If the array is not simultaneously being grown or
shrunk, so that the array size will remain the same - for example,
reshaping a 3-drive RAID5 into a 4-drive RAID6 - the backup file will
-be used not just for a "cricital section" but throughout the reshape
+be used not just for a "critical section" but throughout the reshape
operation, as described below under LAYOUT CHANGES.
.SS CHUNK-SIZE AND LAYOUT CHANGES
-Changing the chunk-size of layout without also changing the number of
+Changing the chunk-size or layout without also changing the number of
devices as the same time will involve re-writing all blocks in-place.
To ensure against data loss in the case of a crash, a
.B --backup-file
If the reshape is interrupted for any reason, this backup file must be
made available to
.B "mdadm --assemble"
-so the array can be reassembled. Consequently the file cannot be
+so the array can be reassembled. Consequently, the file cannot be
stored on the device being reshaped.
in a filesystem that is on the RAID array being affected, the system
will deadlock. The bitmap must be on a separate filesystem.
+.SS CONSISTENCY POLICY CHANGES
+
+The consistency policy of an active array can be changed by using the
+.B \-\-consistency\-policy
+option in Grow mode. Currently this works only for the
+.B ppl
+and
+.B resync
+policies and allows to enable or disable the RAID5 Partial Parity Log (PPL).
+
.SH INCREMENTAL MODE
.HP 12
usually provided by a
.I udev
rules mentioning
-.BR ${DEVLINKS} .
+.BR $env{DEVLINKS} .
.IP +
Does the device have a valid md superblock? If a specific metadata
recovery. You should be aware that interoperability may be
compromised by setting this value.
+These change can also be suppressed by adding
+.B mdadm.imsm.test=1
+to the kernel command line. This makes it easy to test IMSM
+code in a virtual machine that doesn't have IMSM virtual hardware.
+
.TP
.B MDADM_GROW_ALLOW_OLD
If an array is stopped while it is performing a reshape and that
is given in Misc mode, and to monitor array reconstruction
on Monitor mode.
-.SS /etc/mdadm.conf
+.SS {CONFFILE} (or {CONFFILE2})
-The config file lists which devices may be scanned to see if
-they contain MD super block, and gives identifying information
-(e.g. UUID) about known MD arrays. See
+Default config file. See
.BR mdadm.conf (5)
for more details.
-.SS /etc/mdadm.conf.d
+.SS {CONFFILE}.d (or {CONFFILE2}.d)
-A directory containing configuration files which are read in lexical
-order.
+Default directory containing configuration files. See
+.BR mdadm.conf (5)
+for more details.
.SS {MAP_PATH}
When
.B \-\-incremental
mode is used, this file gets a list of arrays currently being created.
+.SH POSIX PORTABLE NAME
+A valid name can only consist of characters "A-Za-z0-9.-_".
+The name cannot start with a leading "-" and cannot exceed 255 chars.
+
.SH DEVICE NAMES
.I mdadm
.I home
can be given.
+In every style, raw name must be compatible with
+.BR "POSIX PORTABLE NAME"
+and has to be no longer than 32 chars.
+
When
.I mdadm
chooses device names during auto-assembly or incremental assembly, it
since version 3.3 provided they are enabled in
.IR mdadm.conf .
+.SH UNDERSTANDING OUTPUT
+
+.TP
+EXAMINE
+
+.TP
+.B checkpoint
+Checkpoint value is reported when array is performing some action including
+resync, recovery or reshape. Checkpoints allow resuming action from certain
+point if it was interrupted.
+
+Checkpoint is reported as combination of two values: current migration unit
+and number of blocks per unit. By multiplying those values and dividing by
+array size checkpoint progress percentage can be obtained in relation to
+current progress reported in /proc/mdstat. Checkpoint is also related to (and
+sometimes based on) sysfs entry sync_completed but depending on action units
+may differ. Even if units are the same, it should not be expected that
+checkpoint and sync_completed will be exact match nor updated simultaneously.
+
.SH NOTE
.I mdadm
was previously known as
For further information on mdadm usage, MD and the various levels of
RAID, see:
.IP
-.B http://raid.wiki.kernel.org/
+.B https://raid.wiki.kernel.org/
.PP
(based upon Jakob \(/Ostergaard's Software\-RAID.HOWTO)
.PP
.I mdadm
should always be available from
.IP
-.B http://www.kernel.org/pub/linux/utils/raid/mdadm/
+.B https://www.kernel.org/pub/linux/utils/raid/mdadm/
.PP
Related man pages:
.PP