NeilBrown [Mon, 11 May 2009 05:18:20 +0000 (15:18 +1000)]
Make --brief even briefer.
Because ---examine --brief, or --detail --brief are
often used to create mdadm.conf, and because people don't want to
have to update their mdadm.conf unnecessarily, we don't want to
include information that might change.
And now that level changing is supported, that is almost everything
but UUID.
So move some more fields into the "Only print with --verbose" class.
NeilBrown [Mon, 11 May 2009 05:17:05 +0000 (15:17 +1000)]
config: support "ARRAY <ignore> ..." lines in mdadm.conf
Sometimes we want to ensure particular arrays are never
assembled automatically. This might include an array made of
devices that are shared between hosts.
To support this, allow ARRAY lines in mdadm.conf to use the word
"ignore" rather than a device name. Arrays which match such lines
are never automatically assembled (though they can still be assembled
by explicitly giving identification information on the mdadm command
line.
NeilBrown [Mon, 11 May 2009 05:16:49 +0000 (15:16 +1000)]
assemble: support arrays created with --homehost=any
If an array is created with --homehost=any, then --assemble and
--incremental will treat it as being local to 'this' host, no matter
what the name of this host is.
This is useful for array that will be given unique names and be
moved between machines.
NeilBrown [Mon, 11 May 2009 05:16:47 +0000 (15:16 +1000)]
create_dev - allow array names like mdX and /dev/mdX to appear 'numeric'
When choosing the minor number to use with an array, we currently base
the number of the 'name' stored in the metadata if that name is
numeric.
Extend that so that if it looks like a number md device name (/dev/md0
or just md0 or even /dev/md/0), then we use the number at the end to
suggest a minor number.
The means that if someone creates and array with "--name md0" or even
"--name /dev/md0" it will continue to do what they expect.
From 2.6.30, /proc/mounts and various /sys files will
probably always returns 'readable' to select, so we will need
to wait on POLLPRI to get the 'new data is available' signal.
When using select, this corresponds to an 'exception', so
adjust calls to select accordingly.
In one case we sometimes wait on a socket and sometime on
/proc/mounts, so we need to test which.
During early boot, /var/run may not exist or be writable.
If that happens, sore the mapfile (which is very important for
incremental assembly) in /dev (which should exist for udev).
Thanks to Doug Ledford <dledford@redhat.com> for identify this
problem and suggesting a solution.
incremental_container: preserve 'in_sync' flag when adding to existing array.
When building container members with -IR, we need to ensure that
devices added to an active array preserve the 'in_sync' status so they
don't needlessly get rebuilt.
So allow sysfs_add_disk to do this (only works in kernels since
2.6.30) and pass the relevant flag down.
Dan Williams [Sun, 12 Apr 2009 07:58:28 +0000 (00:58 -0700)]
imsm: set array size at Create/Assemble
imsm arrays round down the effective array size to the closest 1
megabyte boundary so teach get_info_super_imsm and sysfs_set_array to
set 'md/array_size' if available (and make sure ddf uses the default
size).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 12 Apr 2009 07:58:28 +0000 (00:58 -0700)]
imsm: defend against unsupported migrations (temporary)
Until support for higher order migrations (online capacity expansion,
raid level migration, chunk size migration...) are implemented do not
allow arrays in these states to be assembled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 12 Apr 2009 07:58:27 +0000 (00:58 -0700)]
imsm: add 'verify', 'verify with fixup', and 'general' migration types
imsm distinguishes parity initialization from parity checking in the
metadata. Older option roms marked the repair operation with the
'verify' type and a 'with fixup' flag in the raid device 'status' field.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sun, 12 Apr 2009 07:58:27 +0000 (00:58 -0700)]
imsm: fix imsm_map.num_domains
'num_domains' is the number of parity domains. I.e. 2 in the raid10
case (2-mirrors), while raid0 through raid5 have 1 parity domain (even
though raid0 does not have parity).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 8 Apr 2009 18:41:51 +0000 (11:41 -0700)]
imsm: extract right-most whitespace stripped serial number
According to new documentation the metadata expects that all whitespace
(characters <= 0x20) are stripped from the incoming serial number. If
the length remains longer than MAX_RAID_SERIAL_LEN then only the
right-most characters are preserved.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
I'm not attaching a patch for this because it's so simple. Long story
short, watching both add and change events in udev rules is bad for md
devices. Specifically, the kernel will generate a change event on
things like array stop, and on things like fdisk close. In the case
of array stop, it can result in the array being assembled again
immediately. In the case of fdisk close, the situation is worse.
Let's say you stop all the md devices on some block device in order to
repartition. You run fdisk, change the partition table, then issue a
write of the table. The write of the table triggers the change event
*before* the kernel updates the partition table in memory for the
block device, causing udev to rerun the incremental rules on the old
partition table and restart all the arrays you just stopped with the
old partition table layout, at which point the kernel is unable to
reread the partition table. So, once you've enable incremental
assembly, it becomes apparent that what we really want is to only
start devices on add, not on add|change.
ddf: fixed 'working_disks' reported by container_content.
The 'work_disks' number should be the number that is expected, not the
number found so far. This is needed for Incremental assembly to
start the array at the right time.
When reporting "--detail --scan", use names like /dev/md/foo where
available rather than /dev/md/127
This is particularly needed for containers where the member arrays
will report "container=/dev/md/foo" and we want the container to have
the same name.
grow: don't wait forever for critical section to pass.
If an array reshape completed within 1 second, then --grow will not
notice that it has finished and will keep waiting for the critical
section to pass.
NeilBrown [Tue, 10 Mar 2009 05:28:22 +0000 (16:28 +1100)]
mdmon: allow incremental assembly of containers.
If mdmon sees a device added to a container, it should assume it is
a new spare. It could be a part of the array that just hadn't been
assembled yet. So check first.
NeilBrown [Tue, 10 Mar 2009 05:28:22 +0000 (16:28 +1100)]
Assemble/container: catch errors when starting a partial container.
If we are assembling an array in a container and it isn't complete
enough to start yet, then
- don't start mdmon
- don't say the array is started
- don't wait for the device to appear in /dev
NeilBrown [Tue, 10 Mar 2009 05:28:22 +0000 (16:28 +1100)]
mdopen: be more careful when adding digit to names.
If we need to add digits to a name to make it unique, but don't have
to add '_', we need to avoid adding a digit immediately after a digit.
So if the last character of the name is a digit, add the '_' anyway.
NeilBrown [Tue, 10 Mar 2009 05:28:22 +0000 (16:28 +1100)]
Incremental: fix some handling of trustworthy.
1/ if homehost matches, then we need to set trustworthy to 'LOCAL'
2/ if we decide to set trustworthy to 'METADATA' because we have to
use the metadata version name, do that *after* we have checked if
we are going to assemble within a container, as inside the
container there could be different sources of names to use.
NeilBrown [Mon, 9 Mar 2009 00:16:53 +0000 (11:16 +1100)]
Support new raid6 layouts needed for DDF
DDF raid6 layouts are subtly different from the standard 'md' layouts.
From 2.6.30 the kernel knows about these.
Teach mdadm about them, and also allow 'ddf' to set an appropriate default.
NeilBrown [Sun, 8 Mar 2009 23:17:42 +0000 (10:17 +1100)]
super1 - do metadata IO in sector_size units.
If the sector size is > 512, we need to be more careful about
alignment.
The largest known sector size is 4096 and (fortunately) both the
superblock and (in many cases) the bitmap are 4096-byte aligned.
So there should be no data-overlap problems.
The exception is when the bitmap is squeezed into the 3K after the
superblock. This arrangement cannot currently be supported on
4K sector-size devices.
NeilBrown [Sun, 8 Mar 2009 22:59:39 +0000 (09:59 +1100)]
super1: make sure max_dev grows enough when adding a device to an array.
There was a few kernel releases where the kernel would shrink max_dev
to be just enough to hold the current number of devices.
More recent kernels never shrink it.
However to be as compatible as possible, if we notice that
max_dev is too small to successfully add a device, increase it.
Dan Williams [Wed, 25 Feb 2009 01:45:57 +0000 (18:45 -0700)]
Incremental: honor --no-degraded to delay assembly
Currently Incremental_container is being called after adding each disk.
In the imsm case where spares are not tracked in the raid_disks field we
can use --no-degraded to block premature assembly.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 25 Feb 2009 01:45:57 +0000 (18:45 -0700)]
mdmon: fix missed 'clean' event
mdmon may miss events because it re-reads state after read_and_act. The
additional read is used to determine dirty status before allowing a
sigterm to proceed. Since read_and_act is in the best position to
determine 'dirty' status and its return value is not used, modify it to
return true if the array is dirty.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 25 Feb 2009 01:45:57 +0000 (18:45 -0700)]
imsm: auto layout
In support of auto-layout:
1/ collect and merge all extents to find the largest common-start free region
2/ verify that we meet the "all volumes must use the same set of disks"
2/ mark the disks to be added in add_to_super_imsm_volume
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 25 Feb 2009 01:45:57 +0000 (18:45 -0700)]
Create: fixup 'insert_point', dependent on 'subdevs', for auto-layout
'subdevs' is read from the container in the auto-layout case so reset
subdevs dependent default values. 'insert_point' without this
change is always 2 blocking creation of arrays with > 2 raid disks.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 25 Feb 2009 01:45:56 +0000 (18:45 -0700)]
sysfs: allow sysfs_read to detect and drop removed disks
All operations that rely on loading from an existing container (like
--add) will fail after a disk has been removed. Provide an option to
skip missing / offline disks rather than abort. We attempt to do this
in the load_super_{imsm,ddf}_all cases when mdmon is running i.e. we
already have a consitent version of the metadata running in the system.
Otherwise, we fail as normal and let the administrator fix up the
container.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 25 Feb 2009 01:45:56 +0000 (18:45 -0700)]
imsm: fix mark_failure / introduce mark_missing
Actually, rename mark_failure to mark_missing and then implement the
correct mark_failure which according to new documentation is to:
1/ Set the FAILED status bit
2/ Set IMSM_ORD_REBUILD to mark the disk out of sync
3/ Set map->failed_disk_num if this is the first failure detected
failure (it is ~0 otherwise)
Previously the assumption was that IMSM_ORD_REBUILD only appeared in
map[1], so all routines that care about out-of-sync disks need to be
updated.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 24 Feb 2009 06:06:24 +0000 (23:06 -0700)]
imsm: fix activate spare to ignore foreign disks
A foreign disk is one that all other drives believe is not-in-sync but
does not have the 'failed' status bit set.
This also reverts, because that commit is addressing the wrong problem.
Ideally mdmon would kick "non-fresh" drives like the kernel does at
native-md activation time, but that is too awkward to implement at the
moment because mdadm owns container manipulations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Mon, 23 Feb 2009 21:26:11 +0000 (14:26 -0700)]
imsm: fixup container spare uuids by default
Spares in the imsm case are marked with the "match-all" uuid of ffffffff-ffffffff-ffffffff-ffffffff. When performing incremental
assembly we need to associate such devices with a populated container
uuid. Also when performing --detail on a container with only spares
present we can make an attempt to return a real uuid.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Mon, 23 Feb 2009 21:26:10 +0000 (14:26 -0700)]
imsm: provide a simulated option-rom for regression tests
IMSM_NO_PLATFORM turns off checks that should be tested, so provide a
IMSM_TEST_OROM variable to allow testing the orom constraints in the
mdadm regression suite.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Thu, 5 Feb 2009 06:06:03 +0000 (17:06 +1100)]
Monitor: send --test message for arrays in /proc/mdstat that aren't in mdadm.conf
"mdadm --monitor --test --scan" currently only sends test messages for
arrays listed on the command line or in /etc/mdadm.conf. With this
patch it also reports on any active arrays, which is more in line with
the description in the manpage.
Thanks to Andrew Walrond <andrew@walrond.org> for reporting this error.
Dan Williams [Mon, 2 Feb 2009 17:54:58 +0000 (10:54 -0700)]
imsm: don't check raid1 chunk size
mdadm -C /dev/md/r1d2n1s0-5 -amd -l1 --size 5242880 -n 2 /dev/sdb /dev/sdc -R -f -v -c 64
mdadm: chunk size ignored for this level
mdadm: super0.90 cannot open /dev/sdb: Device or resource busy
mdadm: super1.x cannot open /dev/sdb: Device or resource busy
mdadm: platform does not support a chunk size of: 0
mdadm: device /dev/sdb not suitable for any style of array
Reported-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Tested-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Sun, 1 Feb 2009 23:03:20 +0000 (10:03 +1100)]
Fix the used device size in mdadm -D output.
As get_component_size() returns the number of used sectors of a device
we need halve before pringing as K, and shift the value by 9, not 10,
before passing to human_size.
Thanks to Andre Noll <maan@systemlinux.org> for identifying problem
(and a slightly different version of this patch)
2008-12-08 Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
* Makefile (dadm.uclibc): Remove misspelled and unneeded rule.
* md5.h: Include stdint.h for uClibc.
* mdadm.h: uClibc defines __UCLIBC__. If uClibc has LFS off
then use lseek instead of lseek64.
Signed-off-by: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
Dan Williams [Fri, 23 Jan 2009 22:45:34 +0000 (15:45 -0700)]
imsm: fix failed disks are allowed back into the container
Failed disks do not have valid serial numbers which means we will not
pick up the 'failed' status bit from the metadata entry. Check for
dl->index == -2 to prevent failed disks from being incorporated into the
container.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>