[thirdparty/mdadm.git] / mdmon.8

.\" See file COPYING in distribution for details.
.TH MDMON 8 "" v3.1.1
.SH NAME
mdmon \- monitor MD external metadata arrays

.SH SYNOPSIS

.BI mdmon " [--all] [--takeover] CONTAINER"

.SH OVERVIEW
The 2.6.27 kernel brings the ability to support external metadata arrays.
External metadata implies that user space handles all updates to the metadata.
The kernel's responsibility is to notify user space when a "metadata event"
occurs, like disk failures and clean-to-dirty transitions.  The kernel, in
important cases, waits for user space to take action on these notifications.

.SH DESCRIPTION
.SS Metadata updates:
To service metadata update requests a daemon,
.IR mdmon ,
is introduced.
.I Mdmon
is tasked with polling the sysfs namespace looking for changes in
.BR array_state , 
.BR sync_action ,
and per disk
.BR state
attributes.  When a change is detected it calls a per metadata type
handler to make modifications to the metadata.  The following actions
are taken:
.RS
.TP
.B array_state \- inactive
Clear the dirty bit for the volume and let the array be stopped
.TP
.B array_state \- write pending
Set the dirty bit for the array and then set
.B array_state
to
.BR active .
Writes
are blocked until userspace writes
.BR active.
.TP
.B array_state \- active-idle
The safe mode timer has expired so set array state to clean to block writes to the array
.TP
.B array_state \- clean
Clear the dirty bit for the volume
.TP
.B array_state \- read-only
This is the initial state that all arrays start at.
.I mdmon
takes one of the three actions:
.RS
.TP
1/
Transition the array to read-auto keeping the dirty bit clear if the metadata
handler determines that the array does not need resyncing or other modification
.TP
2/
Transition the array to active if the metadata handler determines a resync or
some other manipulation is necessary
.TP
3/
Leave the array read\-only if the volume is marked to not be monitored; for
example, the metadata version has been set to "external:\-dev/md127" instead of
"external:/dev/md127"
.RE
.TP
.B sync_action \- resync\-to\-idle
Notify the metadata handler that a resync may have completed.  If a resync
process is idled before it completes this event allows the metadata handler to
checkpoint resync.
.TP
.B sync_action \- recover\-to\-idle
A spare may have completed rebuilding so tell the metadata handler about the
state of each disk.  This is the metadata handler's opportunity to clear
any "out-of-sync" bits and clear the volume's degraded status.  If a recovery
process is idled before it completes this event allows the metadata handler to
checkpoint recovery.
.TP
.B <disk>/state \- faulty
A disk failure kicks off a series of events.  First, notify the metadata
handler that a disk has failed, and then notify the kernel that it can unblock
writes that were dependent on this disk.  After unblocking the kernel this disk
is set to be removed+ from the member array.  Finally the disk is marked failed
in all other member arrays in the container.
.IP
+ Note This behavior differs slightly from native MD arrays where
removal is reserved for a
.B mdadm --remove
event.  In the external metadata case the container holds the final
reference on a block device and a
.B mdadm --remove <container> <victim>
call is still required.
.RE

.SS Containers:
.P
External metadata formats, like DDF, differ from the native MD metadata
formats in that they define a set of disks and a series of sub-arrays
within those disks.  MD metadata in comparison defines a 1:1
relationship between a set of block devices and a raid array.  For
example to create 2 arrays at different raid levels on a single
set of disks, MD metadata requires the disks be partitioned and then
each array can created be created with a subset of those partitions.  The
supported external formats perform this disk carving internally.
.P
Container devices simply hold references to all member disks and allow
tools like
.I mdmon
to determine which active arrays belong to which
container.  Some array management commands like disk removal and disk
add are now only valid at the container level.  Attempts to perform
these actions on member arrays are blocked with error messages like:
.IP
"mdadm: Cannot remove disks from a \'member\' array, perform this
operation on the parent container"
.P
Containers are identified in /proc/mdstat with a metadata version string
"external:<metadata name>". Member devices are identified by
"external:/<container device>/<member index>", or "external:-<container
device>/<member index>" if the array is to remain readonly.

.SH OPTIONS
.TP
CONTAINER
The
.B container
device to monitor.  It can be a full path like /dev/md/container, or a
simple md device name like md127.
.TP
.B \-\-takeover
This instructs
.I mdmon
to replace any active
.I mdmon
which is currently monitoring the array.  This is primarily used late
in the boot process to replace any
.I mdmon
which was started from an
.B initramfs
before the root filesystem was mounted.  This avoids holding a
reference on that
.B initramfs
indefinitely and ensures that the
.I pid
and
.I sock
files used to communicate with
.I mdmon
are in a standard place.
.TP
.B \-\-all
This tells mdmon to find any active containers and start monitoring
each of them if appropriate.  This is normally used with
.B \-\-takeover
late in the boot sequence.

.PP
Note that
.I mdmon
is automatically started by
.I mdadm
when needed and so does not need to be considered when working with
RAID arrays.  The only times it is run other that by
.I  mdadm
is when the boot scripts need to restart it after mounting the new
root filesystem.

.SH EXAMPLES

.B "  mdmon \-\-all \-\-takeover"
.br
Any
.I mdmon
which is currently running is killed and a new instance is started.
This should be run late in the boot sequence and particularly after
.B /var
is mounted and writable.
.SH SEE ALSO
.IR mdadm (8),
.IR md (4).
Commit	Line	Data
7675959b	1	.\" See file COPYING in distribution for details.
40bc78f5	2	.TH MDMON 8 "" v3.1.1
7675959b DW	3	.SH NAME
	4	mdmon \- monitor MD external metadata arrays
	5
	6	.SH SYNOPSIS
	7
b5c727dc	8	.BI mdmon " [--all] [--takeover] CONTAINER"
7675959b DW	9
	10	.SH OVERVIEW
	11	The 2.6.27 kernel brings the ability to support external metadata arrays.
	12	External metadata implies that user space handles all updates to the metadata.
	13	The kernel's responsibility is to notify user space when a "metadata event"
	14	occurs, like disk failures and clean-to-dirty transitions. The kernel, in
	15	important cases, waits for user space to take action on these notifications.
	16
	17	.SH DESCRIPTION
e0fe762a N	18	.SS Metadata updates:
	19	To service metadata update requests a daemon,
	20	.IR mdmon ,
	21	is introduced.
	22	.I Mdmon
	23	is tasked with polling the sysfs namespace looking for changes in
7675959b DW	24	.BR array_state ,
	25	.BR sync_action ,
	26	and per disk
	27	.BR state
	28	attributes. When a change is detected it calls a per metadata type
	29	handler to make modifications to the metadata. The following actions
	30	are taken:
	31	.RS
	32	.TP
	33	.B array_state \- inactive
	34	Clear the dirty bit for the volume and let the array be stopped
	35	.TP
	36	.B array_state \- write pending
	37	Set the dirty bit for the array and then set
	38	.B array_state
	39	to
	40	.BR active .
	41	Writes
	42	are blocked until userspace writes
	43	.BR active.
	44	.TP
	45	.B array_state \- active-idle
	46	The safe mode timer has expired so set array state to clean to block writes to the array
	47	.TP
	48	.B array_state \- clean
	49	Clear the dirty bit for the volume
	50	.TP
	51	.B array_state \- read-only
e0fe762a N	52	This is the initial state that all arrays start at.
	53	.I mdmon
	54	takes one of the three actions:
7675959b DW	55	.RS
	56	.TP
	57	1/
	58	Transition the array to read-auto keeping the dirty bit clear if the metadata
	59	handler determines that the array does not need resyncing or other modification
	60	.TP
	61	2/
	62	Transition the array to active if the metadata handler determines a resync or
	63	some other manipulation is necessary
	64	.TP
	65	3/
	66	Leave the array read\-only if the volume is marked to not be monitored; for
	67	example, the metadata version has been set to "external:\-dev/md127" instead of
	68	"external:/dev/md127"
	69	.RE
	70	.TP
	71	.B sync_action \- resync\-to\-idle
	72	Notify the metadata handler that a resync may have completed. If a resync
	73	process is idled before it completes this event allows the metadata handler to
	74	checkpoint resync.
	75	.TP
	76	.B sync_action \- recover\-to\-idle
	77	A spare may have completed rebuilding so tell the metadata handler about the
e0fe762a N	78	state of each disk. This is the metadata handler's opportunity to clear
e0fe762a N	79	any "out-of-sync" bits and clear the volume's degraded status. If a recovery
7675959b DW	80	process is idled before it completes this event allows the metadata handler to
	81	checkpoint recovery.
	82	.TP
	83	.B <disk>/state \- faulty
	84	A disk failure kicks off a series of events. First, notify the metadata
	85	handler that a disk has failed, and then notify the kernel that it can unblock
	86	writes that were dependent on this disk. After unblocking the kernel this disk
e0fe762a	87	is set to be removed+ from the member array. Finally the disk is marked failed
7675959b DW	88	in all other member arrays in the container.
7675959b DW	89	.IP
e0fe762a	90	+ Note This behavior differs slightly from native MD arrays where
7675959b DW	91	removal is reserved for a
	92	.B mdadm --remove
	93	event. In the external metadata case the container holds the final
	94	reference on a block device and a
	95	.B mdadm --remove <container> <victim>
	96	call is still required.
	97	.RE
	98
e0fe762a	99	.SS Containers:
7675959b DW	100	.P
	101	External metadata formats, like DDF, differ from the native MD metadata
	102	formats in that they define a set of disks and a series of sub-arrays
	103	within those disks. MD metadata in comparison defines a 1:1
	104	relationship between a set of block devices and a raid array. For
	105	example to create 2 arrays at different raid levels on a single
	106	set of disks, MD metadata requires the disks be partitioned and then
	107	each array can created be created with a subset of those partitions. The
	108	supported external formats perform this disk carving internally.
	109	.P
	110	Container devices simply hold references to all member disks and allow
e0fe762a N	111	tools like
	112	.I mdmon
	113	to determine which active arrays belong to which
7675959b DW	114	container. Some array management commands like disk removal and disk
	115	add are now only valid at the container level. Attempts to perform
	116	these actions on member arrays are blocked with error messages like:
	117	.IP
	118	"mdadm: Cannot remove disks from a \'member\' array, perform this
	119	operation on the parent container"
	120	.P
	121	Containers are identified in /proc/mdstat with a metadata version string
	122	"external:<metadata name>". Member devices are identified by
	123	"external:/<container device>/<member index>", or "external:-<container
	124	device>/<member index>" if the array is to remain readonly.
	125
	126	.SH OPTIONS
	127	.TP
	128	CONTAINER
	129	The
	130	.B container
b5c727dc N	131	device to monitor. It can be a full path like /dev/md/container, or a
b5c727dc N	132	simple md device name like md127.
7675959b	133	.TP
b5c727dc N	134	.B \-\-takeover
	135	This instructs
	136	.I mdmon
	137	to replace any active
	138	.I mdmon
	139	which is currently monitoring the array. This is primarily used late
	140	in the boot process to replace any
	141	.I mdmon
	142	which was started from an
	143	.B initramfs
	144	before the root filesystem was mounted. This avoids holding a
	145	reference on that
	146	.B initramfs
	147	indefinitely and ensures that the
	148	.I pid
	149	and
	150	.I sock
	151	files used to communicate with
	152	.I mdmon
	153	are in a standard place.
5d4d1b26	154	.TP
b5c727dc N	155	.B \-\-all
	156	This tells mdmon to find any active containers and start monitoring
	157	each of them if appropriate. This is normally used with
	158	.B \-\-takeover
	159	late in the boot sequence.
5d4d1b26	160
e0fe762a N	161	.PP
	162	Note that
	163	.I mdmon
	164	is automatically started by
	165	.I mdadm
	166	when needed and so does not need to be considered when working with
	167	RAID arrays. The only times it is run other that by
	168	.I mdadm
	169	is when the boot scripts need to restart it after mounting the new
	170	root filesystem.
7675959b	171
b5c727dc	172	.SH EXAMPLES
5d4d1b26	173
b5c727dc	174	.B " mdmon \-\-all \-\-takeover"
5d4d1b26 N	175	.br
	176	Any
	177	.I mdmon
	178	which is currently running is killed and a new instance is started.
b5c727dc N	179	This should be run late in the boot sequence and particularly after
	180	.B /var
	181	is mounted and writable.
e0fe762a N	182	.SH SEE ALSO
	183	.IR mdadm (8),
	184	.IR md (4).