mdmon.8

   1 .\" See file COPYING in distribution for details.
   2 .TH MDMON 8 "" v3.1.1
   3 .SH NAME
   4 mdmon \- monitor MD external metadata arrays
   5
   6 .SH SYNOPSIS
   7
   8 .BI mdmon " [--all] [--takeover] CONTAINER"
   9
  10 .SH OVERVIEW
  11 The 2.6.27 kernel brings the ability to support external metadata arrays.
  12 External metadata implies that user space handles all updates to the metadata.
  13 The kernel's responsibility is to notify user space when a "metadata event"
  14 occurs, like disk failures and clean-to-dirty transitions.  The kernel, in
  15 important cases, waits for user space to take action on these notifications.
  16
  17 .SH DESCRIPTION
  18 .SS Metadata updates:
  19 To service metadata update requests a daemon,
  20 .IR mdmon ,
  21 is introduced.
  22 .I Mdmon
  23 is tasked with polling the sysfs namespace looking for changes in
  24 .BR array_state ,
  25 .BR sync_action ,
  26 and per disk
  27 .BR state
  28 attributes.  When a change is detected it calls a per metadata type
  29 handler to make modifications to the metadata.  The following actions
  30 are taken:
  31 .RS
  32 .TP
  33 .B array_state \- inactive
  34 Clear the dirty bit for the volume and let the array be stopped
  35 .TP
  36 .B array_state \- write pending
  37 Set the dirty bit for the array and then set
  38 .B array_state
  39 to
  40 .BR active .
  41 Writes
  42 are blocked until userspace writes
  43 .BR active.
  44 .TP
  45 .B array_state \- active-idle
  46 The safe mode timer has expired so set array state to clean to block writes to the array
  47 .TP
  48 .B array_state \- clean
  49 Clear the dirty bit for the volume
  50 .TP
  51 .B array_state \- read-only
  52 This is the initial state that all arrays start at.
  53 .I mdmon
  54 takes one of the three actions:
  55 .RS
  56 .TP
  57 1/
  58 Transition the array to read-auto keeping the dirty bit clear if the metadata
  59 handler determines that the array does not need resyncing or other modification
  60 .TP
  61 2/
  62 Transition the array to active if the metadata handler determines a resync or
  63 some other manipulation is necessary
  64 .TP
  65 3/
  66 Leave the array read\-only if the volume is marked to not be monitored; for
  67 example, the metadata version has been set to "external:\-dev/md127" instead of
  68 "external:/dev/md127"
  69 .RE
  70 .TP
  71 .B sync_action \- resync\-to\-idle
  72 Notify the metadata handler that a resync may have completed.  If a resync
  73 process is idled before it completes this event allows the metadata handler to
  74 checkpoint resync.
  75 .TP
  76 .B sync_action \- recover\-to\-idle
  77 A spare may have completed rebuilding so tell the metadata handler about the
  78 state of each disk.  This is the metadata handler's opportunity to clear
  79 any "out-of-sync" bits and clear the volume's degraded status.  If a recovery
  80 process is idled before it completes this event allows the metadata handler to
  81 checkpoint recovery.
  82 .TP
  83 .B <disk>/state \- faulty
  84 A disk failure kicks off a series of events.  First, notify the metadata
  85 handler that a disk has failed, and then notify the kernel that it can unblock
  86 writes that were dependent on this disk.  After unblocking the kernel this disk
  87 is set to be removed+ from the member array.  Finally the disk is marked failed
  88 in all other member arrays in the container.
  89 .IP
  90 + Note This behavior differs slightly from native MD arrays where
  91 removal is reserved for a
  92 .B mdadm --remove
  93 event.  In the external metadata case the container holds the final
  94 reference on a block device and a
  95 .B mdadm --remove <container> <victim>
  96 call is still required.
  97 .RE
  98
  99 .SS Containers:
 100 .P
 101 External metadata formats, like DDF, differ from the native MD metadata
 102 formats in that they define a set of disks and a series of sub-arrays
 103 within those disks.  MD metadata in comparison defines a 1:1
 104 relationship between a set of block devices and a raid array.  For
 105 example to create 2 arrays at different raid levels on a single
 106 set of disks, MD metadata requires the disks be partitioned and then
 107 each array can created be created with a subset of those partitions.  The
 108 supported external formats perform this disk carving internally.
 109 .P
 110 Container devices simply hold references to all member disks and allow
 111 tools like
 112 .I mdmon
 113 to determine which active arrays belong to which
 114 container.  Some array management commands like disk removal and disk
 115 add are now only valid at the container level.  Attempts to perform
 116 these actions on member arrays are blocked with error messages like:
 117 .IP
 118 "mdadm: Cannot remove disks from a \'member\' array, perform this
 119 operation on the parent container"
 120 .P
 121 Containers are identified in /proc/mdstat with a metadata version string
 122 "external:<metadata name>". Member devices are identified by
 123 "external:/<container device>/<member index>", or "external:-<container
 124 device>/<member index>" if the array is to remain readonly.
 125
 126 .SH OPTIONS
 127 .TP
 128 CONTAINER
 129 The
 130 .B container
 131 device to monitor.  It can be a full path like /dev/md/container, or a
 132 simple md device name like md127.
 133 .TP
 134 .B \-\-takeover
 135 This instructs
 136 .I mdmon
 137 to replace any active
 138 .I mdmon
 139 which is currently monitoring the array.  This is primarily used late
 140 in the boot process to replace any
 141 .I mdmon
 142 which was started from an
 143 .B initramfs
 144 before the root filesystem was mounted.  This avoids holding a
 145 reference on that
 146 .B initramfs
 147 indefinitely and ensures that the
 148 .I pid
 149 and
 150 .I sock
 151 files used to communicate with
 152 .I mdmon
 153 are in a standard place.
 154 .TP
 155 .B \-\-all
 156 This tells mdmon to find any active containers and start monitoring
 157 each of them if appropriate.  This is normally used with
 158 .B \-\-takeover
 159 late in the boot sequence.
 160 A separate
 161 .I mdmon
 162 process is started for each container as the
 163 .B \-\-all
 164 argument is over-written with the name of the container.  To allow for
 165 containers with names longer than 5 characters, this argument can be
 166 arbitrarily extended, e.g. to
 167 .BR \-\-all-active-arrays .
 168
 169 .PP
 170 Note that
 171 .I mdmon
 172 is automatically started by
 173 .I mdadm
 174 when needed and so does not need to be considered when working with
 175 RAID arrays.  The only times it is run other that by
 176 .I  mdadm
 177 is when the boot scripts need to restart it after mounting the new
 178 root filesystem.
 179
 180 .SH START UP AND SHUTDOWN
 181
 182 As
 183 .I mdmon
 184 needs to be running whenever any filesystem on the monitored device is
 185 mounted there are special considerations when the root filesystem is
 186 mounted from an
 187 .I mdmon
 188 monitored device.
 189
 190 When the array is assembled by the
 191 .B initramfs
 192 code, mdadm will automatically start
 193 .I mdmon
 194 as required.  This means that
 195 .I mdmon
 196 must be installed on the
 197 .B initramfs
 198 and there must be a writable filesystem (typically tmpfs) which
 199 .B mdmon
 200 can create a
 201 .B .pid
 202 and
 203 .B .sock
 204 file on.  The particular filesystem to use is given to mdmon at compile
 205 time and defaults to
 206 .BR /lib/init/rw .
 207
 208 This filesystem must persist through to the end of the boot sequence.
 209
 210 After the final root filesystem has be instantiated (usually with
 211 .BR pivot_root )
 212 and after
 213 .B /var
 214 is mounted writable,
 215 .I mdmon
 216 should be run with
 217 .I "\-\-all \-\-takeover"
 218 so that the
 219 .I mdmon
 220 running from the
 221 .B initramfs
 222 can be replaced with one running in the main root.
 223
 224 At shutdown time,
 225 .I mdmon
 226 should not be killed along with other processes.  Also as it holds a
 227 file (socket actually) open in
 228 .B /var
 229 it will not be possible to unmount
 230 .B /var
 231 if it is a separate filesystem.  Rather the
 232 .B /var
 233 filesystem, like the root filesystem, should be remounted read-only.
 234
 235
 236
 237 .SH EXAMPLES
 238
 239 .B "  mdmon \-\-all-active-arrays \-\-takeover"
 240 .br
 241 Any
 242 .I mdmon
 243 which is currently running is killed and a new instance is started.
 244 This should be run late in the boot sequence and particularly after
 245 .B /var
 246 is mounted and writable.
 247 .SH SEE ALSO
 248 .IR mdadm (8),
 249 .IR md (4).