NeilBrown [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
Create: Don't optimise resync as recovery when creating raid5 in a container.
As spares are treated quite differently in containers, we cannot
fake-up a spare to optimise initialisation for a raid5 in a container,
so disable that code for ->external arrays.
NeilBrown [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
mdopen: use small sequence number for uniquifying array names.
Rather than appending the md minor number, we now append a small
sequence number to make sure name in /dev/md/ that aren't LOCAL are
unique. As the map file is locked while we do this, we are sure
of no losing any races.
NeilBrown [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
Assemble: allow members of containers to be assembled and auto-assembled.
Try to treat members of containers much like other arrays for
assembly.
We still look through the list of devices for a match (it will be
the container), then find the relevant 'info' and try to assemble
the array.
Dan Williams [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
Assemble: block attempts to reassemble container members
Attempting to open(O_EXCL) each candidate device usually filters out all
busy raid components. However, containers do not behave like components
and will return container_content that may describe active member
arrays.
This patch just adds a function that will be used to check if a
container member is busy. It will be used shortly.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Dan Williams [Tue, 4 Nov 2008 09:51:11 +0000 (20:51 +1100)]
Assemble: factor out assemble_container_content
Factor out, from Incremental_container, the code for assembling an
array based on information extracted from a container. We will
shortly use this from Assemble too.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 4 Nov 2008 09:50:39 +0000 (20:50 +1100)]
Manage: when stopping an array, delete all names from /dev.
This only applies if udev isn't installed or is disabled
by MDADM_NO_UDEV
We try to remove partitions too.
We find names to remove by looking in /var/run/mdadm/map
NeilBrown [Tue, 4 Nov 2008 09:50:39 +0000 (20:50 +1100)]
Generate 'change' uevents when arrays change in non-obvious ways.
When a 'container' gets started, we need udev to notice, but the
kernel has no way of knowing that a KOBJ_CHANGE event is needed. So
send one directly via the 'uevent' sysfs attribute.
Also, uevents don't get generated when md arrays are stopped (prior to
2.6.28) so send 'change' events then too.
NeilBrown [Tue, 4 Nov 2008 09:50:39 +0000 (20:50 +1100)]
detail: --export also provided MD_DEVNAME
MD_NAME is the name of the array extracted directly from the metadata.
MD_DEVNAME is the current working name of the array. It should appear
in /dev/md. It is possibly what the user gave when creating the
array.
We extract it from /var/run/mdadm/map.
NeilBrown [Tue, 4 Nov 2008 09:50:38 +0000 (20:50 +1100)]
Always update mdadm/map when starting an array.
We previously only updated /var/run/mdadm/map when starting an
array with --incremental. However we now make more use of
that file (to pass the dev name to udev) so always update it.
NeilBrown [Tue, 4 Nov 2008 09:50:38 +0000 (20:50 +1100)]
config: Support container=uuid as alternative to container=/dev/name in mdadm.conf
When mdadm.conf is automatically generated, we might not know a
suitable /dev/name. But we do know the uuid of the container.
So allow that as an option.
NeilBrown [Tue, 4 Nov 2008 09:50:38 +0000 (20:50 +1100)]
config: Don't require an array to have a device name.
i.e. in mdadm.conf you can have a line like
ARRAY uuid=whatever
and it will use auto-name-generation to give a name to the array at
assemble-time. The is different from blind auto-assembly in that the
array will be treated as 'local'.
NeilBrown [Mon, 3 Nov 2008 23:35:42 +0000 (10:35 +1100)]
assemble: combine the two create_mddev calls in to one.
This delays the create_mddev call even further in the case where
an array device name is given for --assemble. It is now delayed
until the 'name' of the array is also available.
NeilBrown [Mon, 3 Nov 2008 23:35:37 +0000 (10:35 +1100)]
Delay creation of array devices for assemble/build/create
We will shortly be feeding more information into the process of
creating array devices, so delay the creation. Still open them
early if the device already exists.
This involves making sure the autof flag is in the right place
so that it can be found at creation time.
Also, Assemble, Build, and Create now always close 'mdfd'.
NeilBrown [Mon, 3 Nov 2008 23:35:35 +0000 (10:35 +1100)]
Avoid opening md device twice in particular '--assemble' instance.
When
mdadm --assemble /dev/whatever
is given, mdadm will treat it as though '--scan' were given, even
though it wasn't.
In this case, the code opens /dev/whatever twice, which is pointless.
We already know /dev/whatever is open at this point, so remove the
'open' and the tests, and make sure it is always closed afterwards.
NeilBrown [Mon, 3 Nov 2008 23:35:08 +0000 (10:35 +1100)]
Move recently merged /sys/dev/ lookup into stat2devnum.
But sysfs_init and stat2devnum try to convert stat information
into an md devnum. Combine all the value of both pieces of code
into stat2devnum and have sysfs_init call that.
NeilBrown [Sun, 2 Nov 2008 20:19:37 +0000 (07:19 +1100)]
mapfile: fix bug in testing for /var/run/mdadm/
There was a bug. If /var/run/mdadm/ did not exist as a directory,
the map file should have been created in /var/run/mdadm.map, but
due to bug it would never get created.
NeilBrown [Sun, 2 Nov 2008 19:39:02 +0000 (06:39 +1100)]
Incremental: change precedence order for autof setting.
It doesn't really make sense for the --auto setting to ever over-ride
the setting on an ARRAY line. That could cause failure if the
ARRAY line has a 'standard' now. So revert to the array line having
precedence over command line, then CREATE line last.
NeilBrown [Thu, 30 Oct 2008 05:37:29 +0000 (16:37 +1100)]
Adjust major number testing to allow for extended minor number in 2.6.28
From 2.6.28, normal md device will be able to have partitions. These
partitions will have a different major number. Sometimes mdadm tests
the major number and so can get confused.
Change these tests to test against get_mdp_major(). mdp does not use
extended minor number and so this test will always be accurate.
Also use /sys/dev links to map major/minor to devnum in sysfs.
NeilBrown [Wed, 29 Oct 2008 22:48:18 +0000 (09:48 +1100)]
Incremental: allow assembly of foreign array.
If a foreign (i.e. not known to be local) array is discovered
by --incremental assembly, we now assemble it. However we ignore
any name information in the array so as not to potentially create
a name that conflict with a 'local' array.
Also, foreign arrays are always assembled 'read-auto' to avoid writing
anything until the array is actually used.
NeilBrown [Wed, 29 Oct 2008 22:34:04 +0000 (09:34 +1100)]
Fix --incremental assembly of partitions arrays.
If incremental assembly finds an array mentioned in mdadm.conf,
with a 'standard partitioned' name like /dev/md_d0 or /dev/md/d0,
it will not create a partitioned array like it should.
This is because it mishandled the 'devnum' returned by
is_standard.
That is a devnum that does not have the partition-or-not encoded
into it. So we need to check the actual return value of
is_standard and encode the partition-or-not info into the devnum.
Doug Ledford [Wed, 29 Oct 2008 19:05:36 +0000 (15:05 -0400)]
Fix NULL pointer oops
RAID10 is the only raid level that uses the avail char array pointer
during the enough() operation, so it was the only one that saw this.
The code in incremental assumes unconditionally that count_active will
allocate the avail char array, that it might be used by enough, and that
it will need to be freed afterward. Once you make count_active actually
do that, then the oops goes away.
Doug Ledford [Wed, 29 Oct 2008 19:05:35 +0000 (15:05 -0400)]
Fix bad metadata formatting
Certain operations (Detail.c mainly) would print out the metadata of
an array in a format that the scan operation in super0.c and super1.c
would later reject as unknown when it was found in the mdadm.conf file.
Use a consistent format, but also modify the super0 and super1 match
methods to accept the other format without complaint.
NeilBrown [Sat, 25 Oct 2008 07:20:49 +0000 (18:20 +1100)]
Allow WRITEMOSTLY to be cleared on --readd using --readwrite.
Previously it was possible to set the WRITEMOSTLY flag when
adding a device to an array, but not to clear the flag when re-adding.
This is now possible with --readwrite.
NeilBrown [Fri, 17 Oct 2008 00:52:38 +0000 (11:52 +1100)]
Remove .UR .UE macros from man page because the don't do what we want.
.UR URL
text
.UE
is meant to create a hyperlink from the 'text' to the 'URL'.
But I wanted just to have the URL, so UR isn't really the right
tool - the URL gets displayed twice.
So just display the URL in bold and assume man2html etc can recognise
it and do the right thing.
Dan Williams [Fri, 3 Oct 2008 05:26:00 +0000 (22:26 -0700)]
mdmon: suicide prevention
mdmon cannot remove the pidfile at shutdown becuase it needs to stay
running across the "mount -o remount,ro /" event. When it relaunches
after a reboot there is a good chance that the pid will match what was
there previously. The result is that the "take over for unresponsive
mdmon" logic results in self termination.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Oct 2008 22:50:23 +0000 (15:50 -0700)]
mdmon: --switch-root
For raid rootfs we cannot run the array unmonitored for any length of
time. At least XFS will not mount/replay the journal if the underlying
block device is readonly (FIXME it also seems that XFS does not always
honor the ro status of the backing device as I was able to hit the
BUG_ON(mddev->ro == 1) in md_write_start... but I digress).
So we need to start mdmon in the initramfs before '/' is mounted and
then restart it after the real rootfs is available. Upon seeing the
--switch-root option, mdmon will kill any victims in the current
/var/run/mdadm directory and then chroot(2) before continuing.
The option is deliberately called 'switch-root' instead of 'chroot' to
hopefully indicate that this is different than doing "chroot mdmon
/dev/imsm".
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Oct 2008 22:42:57 +0000 (15:42 -0700)]
mdmon: wait after trying to kill
Now that mdmon handles sigterm if another monitor wants to take over it
should wait until all managed arrays are clean. So make WaitClean()
available to mdmon and teach try_kill_monitor() to wait on each subarray
in the container.
...since we may be communicating with a dieing process, we need to
block SIGPIPE earlier.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Oct 2008 13:32:08 +0000 (06:32 -0700)]
mdmon: terminate clean
We generally don't want mdmon to be terminated, but if a SIGTERM gets
through try to leave the monitored arrays in a clean state, block
attempts to mark the array dirty, and stop servicing the socket.
When we are killed by sigterm don't remove the pidfile let that be
cleaned up by the next monitor.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Oct 2008 01:50:44 +0000 (18:50 -0700)]
Treat all devices at the container level as spares
Raid disk and disk number information is not relevant at the container
level, especially for imsm. So arrange for getinfo_super_imsm() to
always publish devices as spares and report the number of spares at
Assemble() time.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Oct 2008 01:50:43 +0000 (18:50 -0700)]
mdmon: periodically retry to create the socket
If initial socket creation fails, EROFS, set a periodic alarm to wake up
the manager and retry. Include a kernel patch that will wake us up if
the mount flags are changed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:07 +0000 (12:12 -0700)]
trivial warn_unused_result squashing
Made the mistake of recompiling the F9 mdadm rpm which has a patch to
remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots
of errors:
config.c:568: warning: ignoring return value of asprintf
Assemble.c:411: warning: ignoring return value of asprintf
Assemble.c:413: warning: ignoring return value of asprintf
super0.c:549: warning: ignoring return value of posix_memalign
super0.c:742: warning: ignoring return value of posix_memalign
super0.c:812: warning: ignoring return value of posix_memalign
super1.c:692: warning: ignoring return value of posix_memalign
super1.c:1039: warning: ignoring return value of posix_memalign
super1.c:1155: warning: ignoring return value of posix_memalign
super-ddf.c:508: warning: ignoring return value of posix_memalign
super-ddf.c:645: warning: ignoring return value of posix_memalign
super-ddf.c:696: warning: ignoring return value of posix_memalign
super-ddf.c:715: warning: ignoring return value of posix_memalign
super-ddf.c:1476: warning: ignoring return value of posix_memalign
super-ddf.c:1603: warning: ignoring return value of posix_memalign
super-ddf.c:1614: warning: ignoring return value of posix_memalign
super-ddf.c:1842: warning: ignoring return value of posix_memalign
super-ddf.c:2013: warning: ignoring return value of posix_memalign
super-ddf.c:2140: warning: ignoring return value of write
super-ddf.c:2143: warning: ignoring return value of write
super-ddf.c:2147: warning: ignoring return value of write
super-ddf.c:2150: warning: ignoring return value of write
super-ddf.c:2162: warning: ignoring return value of write
super-ddf.c:2169: warning: ignoring return value of write
super-ddf.c:2172: warning: ignoring return value of write
super-ddf.c:2176: warning: ignoring return value of write
super-ddf.c:2181: warning: ignoring return value of write
super-ddf.c:2686: warning: ignoring return value of posix_memalign
super-ddf.c:2690: warning: ignoring return value of write
super-ddf.c:3070: warning: ignoring return value of posix_memalign
super-ddf.c:3254: warning: ignoring return value of posix_memalign
bitmap.c:128: warning: ignoring return value of posix_memalign
mdmon.c:94: warning: ignoring return value of write
mdmon.c:221: warning: ignoring return value of pipe
mdmon.c:327: warning: ignoring return value of write
mdmon.c:330: warning: ignoring return value of chdir
mdmon.c:335: warning: ignoring return value of dup
monitor.c:415: warning: rv may be used uninitialized in this function
...some of these like the write() ones are not so trivial so save those
fixes for the next patch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:07 +0000 (12:12 -0700)]
imsm: determine failed indexes from the most up-to-date disk
load_imsm_disk() currently notices if spares missed their activation
update, but we allow a stale failed disk back in to the array because its
serial number is clobbered in the most up-to-date disk.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:07 +0000 (12:12 -0700)]
imsm: manage a list of missing disks
If a drive is removed while mdmon is not running we need a way to
identify what is missing and mark that disk as failed in the metadata.
At ->load_super() time create a list of missing disks defined as a disk
that is marked in-sync yet does not appear in super->disks.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:06 +0000 (12:12 -0700)]
imsm: enable checkpointing of migration (resync/rebuild)
When the array is shutdown, or when mdadm --wait-clean is called, any
active resync process will be idled allowing mdmon to record the current
resync position.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:06 +0000 (12:12 -0700)]
Extend --wait-clean to checkpoint resync
Root file systems backed by external metadata arrays need to be
explicitly checkpointed near the time the rootfs is marked readonly as
userspace will not have an opportunity to react to the final shutdown of
the array.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:06 +0000 (12:12 -0700)]
--wait-clean: shorten timeout
Set the safemode timeout to a small value to get the array marked clean as
soon as possible. We don't write 'clean' directly as it may cause mdmon to
miss a 'write-pending' event.
Include a couple fixes to sysfs_set_safemode():
1/ 0 pad the milliseconds field
2/ workaround input truncation in the kernel
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:06 +0000 (12:12 -0700)]
monitor: protect against CONFIG_LBD=n
md/resync_start reports different terminal values depending on kernel
configuration (~0UL versus ~0ULL). Make detection of the
resync-complete state more robust by comparing against array size.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 28 Sep 2008 19:12:03 +0000 (12:12 -0700)]
imsm: trust sector reservation from metadata
On ich6r the option-rom appears to reserve only 432 sectors rather than
the 418+4096 of newer implementations. For compatibility trust the
metadata in these cases.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Wed, 15 Oct 2008 03:34:18 +0000 (14:34 +1100)]
Grow: Fix linear-growth when devices are not all the same size.
If we add a device to a linear array which is a difference size
to the other devices in the array then, for v1.x metadata, we need to
make sure the size is correctly reflected in the superblock.
NeilBrown [Mon, 13 Oct 2008 05:15:16 +0000 (16:15 +1100)]
Manage: allow adding device that is just large enough to v1.x array.
When adding a device to an array, we check that it is large enough.
Currently the check makes sure there is also room for a reasonably
sized bitmap. But if the array doesn't have a bitmap, then this test
might be too restrictive.
So when adding, only insist there is enough space for the current
bitmap.
When Creating, still require room for the standard sized bitmap.
Don't try to set_array_info when -I find new devices for an array.
When -I get a new device for a container and tries to incrementally
assemble the container array, it calls sysfs_set_array to create the
array without first checking if it already exists. This produces
unpleasant error messages.