NeilBrown [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
Create: Don't optimise resync as recovery when creating raid5 in a container.
As spares are treated quite differently in containers, we cannot
fake-up a spare to optimise initialisation for a raid5 in a container,
so disable that code for ->external arrays.
NeilBrown [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
mdopen: use small sequence number for uniquifying array names.
Rather than appending the md minor number, we now append a small
sequence number to make sure name in /dev/md/ that aren't LOCAL are
unique. As the map file is locked while we do this, we are sure
of no losing any races.
NeilBrown [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
Assemble: allow members of containers to be assembled and auto-assembled.
Try to treat members of containers much like other arrays for
assembly.
We still look through the list of devices for a match (it will be
the container), then find the relevant 'info' and try to assemble
the array.
Dan Williams [Tue, 4 Nov 2008 09:51:12 +0000 (20:51 +1100)]
Assemble: block attempts to reassemble container members
Attempting to open(O_EXCL) each candidate device usually filters out all
busy raid components. However, containers do not behave like components
and will return container_content that may describe active member
arrays.
This patch just adds a function that will be used to check if a
container member is busy. It will be used shortly.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Dan Williams [Tue, 4 Nov 2008 09:51:11 +0000 (20:51 +1100)]
Assemble: factor out assemble_container_content
Factor out, from Incremental_container, the code for assembling an
array based on information extracted from a container. We will
shortly use this from Assemble too.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 4 Nov 2008 09:50:39 +0000 (20:50 +1100)]
Manage: when stopping an array, delete all names from /dev.
This only applies if udev isn't installed or is disabled
by MDADM_NO_UDEV
We try to remove partitions too.
We find names to remove by looking in /var/run/mdadm/map
NeilBrown [Tue, 4 Nov 2008 09:50:39 +0000 (20:50 +1100)]
Generate 'change' uevents when arrays change in non-obvious ways.
When a 'container' gets started, we need udev to notice, but the
kernel has no way of knowing that a KOBJ_CHANGE event is needed. So
send one directly via the 'uevent' sysfs attribute.
Also, uevents don't get generated when md arrays are stopped (prior to
2.6.28) so send 'change' events then too.
NeilBrown [Tue, 4 Nov 2008 09:50:39 +0000 (20:50 +1100)]
detail: --export also provided MD_DEVNAME
MD_NAME is the name of the array extracted directly from the metadata.
MD_DEVNAME is the current working name of the array. It should appear
in /dev/md. It is possibly what the user gave when creating the
array.
We extract it from /var/run/mdadm/map.
NeilBrown [Tue, 4 Nov 2008 09:50:38 +0000 (20:50 +1100)]
Always update mdadm/map when starting an array.
We previously only updated /var/run/mdadm/map when starting an
array with --incremental. However we now make more use of
that file (to pass the dev name to udev) so always update it.
NeilBrown [Tue, 4 Nov 2008 09:50:38 +0000 (20:50 +1100)]
config: Support container=uuid as alternative to container=/dev/name in mdadm.conf
When mdadm.conf is automatically generated, we might not know a
suitable /dev/name. But we do know the uuid of the container.
So allow that as an option.
NeilBrown [Tue, 4 Nov 2008 09:50:38 +0000 (20:50 +1100)]
config: Don't require an array to have a device name.
i.e. in mdadm.conf you can have a line like
ARRAY uuid=whatever
and it will use auto-name-generation to give a name to the array at
assemble-time. The is different from blind auto-assembly in that the
array will be treated as 'local'.
NeilBrown [Mon, 3 Nov 2008 23:35:42 +0000 (10:35 +1100)]
assemble: combine the two create_mddev calls in to one.
This delays the create_mddev call even further in the case where
an array device name is given for --assemble. It is now delayed
until the 'name' of the array is also available.
NeilBrown [Mon, 3 Nov 2008 23:35:37 +0000 (10:35 +1100)]
Delay creation of array devices for assemble/build/create
We will shortly be feeding more information into the process of
creating array devices, so delay the creation. Still open them
early if the device already exists.
This involves making sure the autof flag is in the right place
so that it can be found at creation time.
Also, Assemble, Build, and Create now always close 'mdfd'.
NeilBrown [Mon, 3 Nov 2008 23:35:35 +0000 (10:35 +1100)]
Avoid opening md device twice in particular '--assemble' instance.
When
mdadm --assemble /dev/whatever
is given, mdadm will treat it as though '--scan' were given, even
though it wasn't.
In this case, the code opens /dev/whatever twice, which is pointless.
We already know /dev/whatever is open at this point, so remove the
'open' and the tests, and make sure it is always closed afterwards.
NeilBrown [Mon, 3 Nov 2008 23:35:08 +0000 (10:35 +1100)]
Move recently merged /sys/dev/ lookup into stat2devnum.
But sysfs_init and stat2devnum try to convert stat information
into an md devnum. Combine all the value of both pieces of code
into stat2devnum and have sysfs_init call that.
NeilBrown [Sun, 2 Nov 2008 20:19:37 +0000 (07:19 +1100)]
mapfile: fix bug in testing for /var/run/mdadm/
There was a bug. If /var/run/mdadm/ did not exist as a directory,
the map file should have been created in /var/run/mdadm.map, but
due to bug it would never get created.
NeilBrown [Sun, 2 Nov 2008 19:39:02 +0000 (06:39 +1100)]
Incremental: change precedence order for autof setting.
It doesn't really make sense for the --auto setting to ever over-ride
the setting on an ARRAY line. That could cause failure if the
ARRAY line has a 'standard' now. So revert to the array line having
precedence over command line, then CREATE line last.
NeilBrown [Thu, 30 Oct 2008 05:37:29 +0000 (16:37 +1100)]
Adjust major number testing to allow for extended minor number in 2.6.28
From 2.6.28, normal md device will be able to have partitions. These
partitions will have a different major number. Sometimes mdadm tests
the major number and so can get confused.
Change these tests to test against get_mdp_major(). mdp does not use
extended minor number and so this test will always be accurate.
Also use /sys/dev links to map major/minor to devnum in sysfs.
NeilBrown [Wed, 29 Oct 2008 22:48:18 +0000 (09:48 +1100)]
Incremental: allow assembly of foreign array.
If a foreign (i.e. not known to be local) array is discovered
by --incremental assembly, we now assemble it. However we ignore
any name information in the array so as not to potentially create
a name that conflict with a 'local' array.
Also, foreign arrays are always assembled 'read-auto' to avoid writing
anything until the array is actually used.
NeilBrown [Wed, 29 Oct 2008 22:34:04 +0000 (09:34 +1100)]
Fix --incremental assembly of partitions arrays.
If incremental assembly finds an array mentioned in mdadm.conf,
with a 'standard partitioned' name like /dev/md_d0 or /dev/md/d0,
it will not create a partitioned array like it should.
This is because it mishandled the 'devnum' returned by
is_standard.
That is a devnum that does not have the partition-or-not encoded
into it. So we need to check the actual return value of
is_standard and encode the partition-or-not info into the devnum.
Doug Ledford [Wed, 29 Oct 2008 19:05:36 +0000 (15:05 -0400)]
Fix NULL pointer oops
RAID10 is the only raid level that uses the avail char array pointer
during the enough() operation, so it was the only one that saw this.
The code in incremental assumes unconditionally that count_active will
allocate the avail char array, that it might be used by enough, and that
it will need to be freed afterward. Once you make count_active actually
do that, then the oops goes away.
Doug Ledford [Wed, 29 Oct 2008 19:05:35 +0000 (15:05 -0400)]
Fix bad metadata formatting
Certain operations (Detail.c mainly) would print out the metadata of
an array in a format that the scan operation in super0.c and super1.c
would later reject as unknown when it was found in the mdadm.conf file.
Use a consistent format, but also modify the super0 and super1 match
methods to accept the other format without complaint.
NeilBrown [Sat, 25 Oct 2008 07:20:49 +0000 (18:20 +1100)]
Allow WRITEMOSTLY to be cleared on --readd using --readwrite.
Previously it was possible to set the WRITEMOSTLY flag when
adding a device to an array, but not to clear the flag when re-adding.
This is now possible with --readwrite.
NeilBrown [Fri, 17 Oct 2008 00:52:38 +0000 (11:52 +1100)]
Remove .UR .UE macros from man page because the don't do what we want.
.UR URL
text
.UE
is meant to create a hyperlink from the 'text' to the 'URL'.
But I wanted just to have the URL, so UR isn't really the right
tool - the URL gets displayed twice.
So just display the URL in bold and assume man2html etc can recognise
it and do the right thing.
Dan Williams [Fri, 3 Oct 2008 05:26:00 +0000 (22:26 -0700)]
mdmon: suicide prevention
mdmon cannot remove the pidfile at shutdown becuase it needs to stay
running across the "mount -o remount,ro /" event. When it relaunches
after a reboot there is a good chance that the pid will match what was
there previously. The result is that the "take over for unresponsive
mdmon" logic results in self termination.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Oct 2008 22:50:23 +0000 (15:50 -0700)]
mdmon: --switch-root
For raid rootfs we cannot run the array unmonitored for any length of
time. At least XFS will not mount/replay the journal if the underlying
block device is readonly (FIXME it also seems that XFS does not always
honor the ro status of the backing device as I was able to hit the
BUG_ON(mddev->ro == 1) in md_write_start... but I digress).
So we need to start mdmon in the initramfs before '/' is mounted and
then restart it after the real rootfs is available. Upon seeing the
--switch-root option, mdmon will kill any victims in the current
/var/run/mdadm directory and then chroot(2) before continuing.
The option is deliberately called 'switch-root' instead of 'chroot' to
hopefully indicate that this is different than doing "chroot mdmon
/dev/imsm".
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Oct 2008 22:42:57 +0000 (15:42 -0700)]
mdmon: wait after trying to kill
Now that mdmon handles sigterm if another monitor wants to take over it
should wait until all managed arrays are clean. So make WaitClean()
available to mdmon and teach try_kill_monitor() to wait on each subarray
in the container.
...since we may be communicating with a dieing process, we need to
block SIGPIPE earlier.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>