Dan Williams [Wed, 14 Oct 2009 00:37:02 +0000 (17:37 -0700)]
mdmon: preserve socket over chroot
Connect to the monitor in the old namespace and use that connection for
WaitClean requests when stopping the victim mdmon instance. This allows
ping_monitor() to work post chroot().
Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:08:33 +0000 (17:08 -0700)]
mdmon: exec(2) when the switchroot argument is not "/"
Try to execute mdmon from the target namespace. When used for initramfs
handovers we need to drop all references to the initramfs filesystem for
that memory to be freed.
Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:57 +0000 (17:41 -0700)]
mdmon: avoid writes in the startup path for mdmon on root arrays
When killing a previous monitor be careful not to cause writes to the
filesystem until the reads necessary to get the monitor operational have
completed.
The code is already prepared for errors creating the pid and socket
files, so simply defer creation of these files until after the first
call to manage().
Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:57 +0000 (17:41 -0700)]
Detail: export MD_UUID from mapfile
The load_super() from an mdadm --detail call may race against an mdmon
update. When this happens the load_super sees an inconsistent metadata
block and returns an error. The fallback path to use the map file
contents lacks uuid reporting, so provide __fname_from_uuid for
generically printing a uuid.
Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
imsm: add --update=uuid support
When disks have conflicting container memberships (same container ids
but incompatible member arrays) --update=uuid can be used to move
offenders to a new container id by changing 'orig_family_num'.
Note that this only supports random updates of the uuid as the actual
uuid is synthesized. We also need to communicate the new
'orig_family_num' value to all disks involved in the update. A new
field 'update_private' is added to struct mdinfo to allow this
information to be transmitted.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
ddf: prevent superblock being zeroed on --update
The full fix would be to support updating ddf metadata, but this minimal
fix just prevents the superblock from being zeroed when someone
inadvertently passes an unsupported --update option during assembly.
Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
imsm: fix/support --update
Fix init_super_imsm() to return an empty mpb when info == NULL, and
teach store_super_imsm() to simply write out the passed in mpb.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=523320 Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
imsm: fix spare record writeout race
imsm_activate_spare() in the manager thread may race against
write_super_imsm_spares() in the monitor thread. Give
write_super_imsm_spares() its own private mpb buffer to prevent
confusing the manager.
This change uncovered cases where spares were not being assembled due to
a failed metadata version number check. Spares can freely associate
across metadata version number, so reduce the scope of the version check
in the spare assembly case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 30 Sep 2009 18:45:41 +0000 (11:45 -0700)]
imsm: disambiguate family_num
This is a result of trawling through the Windows implementation to learn
the mechanism of how it disambiguates family_num. It is a continuation
of commit 148acb7b "imsm: fix family number handling" which introduced a
regression when reassembling a container with stale disks and rebuilt
members.
When rebuilding, a new family number is assigned to protect against the
"prodigal array member" problem. It prevents a former family member
from returning to the system and causing a rebuild to go the wrong
direction. However, this invalidates looking at the generation number to
determine the most up-to-date disk when comparing across family numbers.
Instead the assembly logic looks for agreement between a disk's local
family membership compared against a global list of all families in the
system. Whenever a disk's local metadata does not match a family number
on the global list that family number is marked offline.
It is possible that this logic results in multiple incompatible but
valid family numbers existing in a container. In this case mdadm.conf
cannot be consulted because it only records the uuid which is generated
from static fields in the metadata. The metadata lacks the data needed
to disambiguate "local" versus "foreign". The "foreign" array in this
case requires updating to change its container-id information
(orig_family_num), and possibly the member array names.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 30 Sep 2009 18:44:38 +0000 (11:44 -0700)]
imsm: kill close() of component device
None of the other formats close the passed in fd at load, and this
becomes a problem when trying to support --update where we need O_EXCL
protection across the entire operation.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hans de Goede [Thu, 24 Sep 2009 13:52:06 +0000 (06:52 -0700)]
mdmon: fix freeing unallocated memory
mdmon was creating a supertype struct with malloc, and thus not
necessarily getting zero-d memory.
This was causing it to segfault when called like this from the initrd:
/sbin/mdmon /proc/mdstat /sysroot
The problem was that load_super_imsm would get called on the non-zero'd
super struct, whcih in turn calls free_super_imsm, which checks st->sb,
which should be zero but isn't and then starts freeing bogus memory.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 15 Sep 2009 18:35:28 +0000 (11:35 -0700)]
Examine: don't count containers as spares
mdadm -Ebs will include containers in the scanned device list.
Examine() falsely thinks they are spares when MD_DISK_SYNC is not set.
This could be fixed by forcing all formats to set this flag for
container devices, but this flag is currently used by imsm to identify
free-floating spares.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 15 Sep 2009 18:34:20 +0000 (11:34 -0700)]
Detail: fix for an imsm container with a spare
Spares for imsm arrays do not have any info about the container in their
metadata records. If Detail() inadvertantly picks such a device for
->get_array_info() it will end up with less than useful info for the
container. So, continue to read from the disks until a non-spare device
is found.
This bug was found by timeouts waiting for udev to create the
user-friendly container name. To detect future UUID reporting problems
and a debug print to the timeout case in wait_for().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 15 Sep 2009 18:34:20 +0000 (11:34 -0700)]
imsm: fix spare promotion
1/ Fix an off by one error when detecting whether the device allocation
loop succeeded or not
2/ Update ->num_raid_devs before copying to avoid a segmentation fault
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Fri, 7 Aug 2009 04:17:40 +0000 (14:17 +1000)]
Exmaine/brief: put member arrays after container arrays.
A previous patch moved move the '--examine --brief' reporting of
member arrays to before their containers. This breaks "mdadm -As"
assembly. So put them back, but still fix the problem addressed by
previous patch.
Reported-by: Artur Wojcik <artur.wojcik@intel.com> Reported-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:42 +0000 (17:11 -0700)]
imsm: fix spare-uuid assignment
imsm spares do not have container membership by default so we associate
them with the first container found in the configuration file. Some
ARRAY lines do not specify the metadata type so we cannot assume that
_cst will always be valid.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:41 +0000 (17:11 -0700)]
platform: relax rom scanning alignment for ahci platforms
The PCI-3.0 Firmware specification allows for option-roms to have
512-byte alignment rather than 2048-byte. As there does not appear to
be a reliable method to detect a PCI-3.0 compliant BIOS from userspace
we allow the imsm platform detection code to presume that a system
modern enough to have an Intel AHCI controller does not have
dangerous/legacy ISA regions in the option-ROM memory space.
An environment variable to disable this behaviour, IMSM_SAFE_OROM_SCAN,
is added in case this presumption is ever proven wrong.
Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:41 +0000 (17:11 -0700)]
imsm: fix family number handling
The family_number field can change. The option-rom will change the
family number when it starts a rebuild process (flags a container for
rebuild). This was not seen previously as mdadm would usually start the
rebuild process, preserving the family number.
This is the mechanism that helps to prevent a prodigal array member from
being returned to its original system and cause a rebuild to go in the
wrong direction. With the change we will end up with a container that
will fail to assemble unless the device with the incompatible family
number is left out of the assembly.
So, take several actions:
1/ Convert uuid generation to use orig_family_num, being careful to
preserve the existing uuid in the case where orig_family_num is not
set (i.e. previous mdadm created imsm arrays)
2/ Set orig_family_num at Create. For arrays created by mdadm prior to
this release orig_family_num will be zero, so set it to family_num at
the first metadata write.
3/ Add checks for orig_family_num to compare_super_imsm
4/ Update the family number when initiating rebuild
5/ The option-rom mixes some random data into the family number, add
this functionality to the mdadm implementation.
Reported-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:41 +0000 (17:11 -0700)]
conditionally update uuids in the map file after Create()
The map file needs to be updated after adding the first member array to
an Intel metadata container. The uuid for an imsm container uses the
->family_num field of the metadata. This field is static, but is only
set after the first member array has been created. Prior to this all
devices are free floating spares and do not have any information that
can identify specific container membership. At Create() time we take
the uninitialized uuid from ->get_info_super() prior to updating the
metadata. So the current result is:
So, before we write out the new metadata check to see if the member
array uuid has changed as a result of this addition. If it has, update
its uuid in the map file and flag its parent container for updating. In
support of updating the container uuid the semantics of
->write_init_super are changed to clear any metadata specific member
array cursors (e.g. ddf_super.currentconf or intel_super.current_vol)
such that a subsequent call to ->getinfo_super returns container
information.
Reported-by: Ignacy Kasperowicz <ignacy.kasperowicz@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:41 +0000 (17:11 -0700)]
fix examine_brief segfault
When performing an "-Ebs -e <metadata type>" we segfault because the
superblock has been freed too early. We also leak memory for 'ddf' and
'imsm' because, unlike super[01], we do not implicitly free when
->load_super is called on an already loaded supertype.
So, fix up imsm and ddf to match type 0 and 1 ->load_super() semantics,
and update Examine to not free the superblock until all usages have been
exhausted.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:41 +0000 (17:11 -0700)]
fix RebuildMap() to retrieve 'subarray' info
RebuildMap falsely returns container info for member arrays. Retrieving
the subarray and container_dev details prior to ->load_super() changes the
result from:
Dan Williams [Sat, 1 Aug 2009 00:08:22 +0000 (17:08 -0700)]
teach imsm and ddf what st->subarray means at load_super time
RebuildMap wants to poll through mdstat and retrieve a (kernel name,
uuid, user name) tuple for each array. Teach imsm and ddf to honor
st->sub_array at ->load_super() time to set their internal subarray
pointers to the value specified in st->subarray, or return an error if
st->subarray specifies an invalid array.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Using pclose is probably the right thing to do seeing that we
used popen, but as there is no clear need to wait for sendmail
to finish, it isn't really important.
NeilBrown [Thu, 4 Jun 2009 02:44:32 +0000 (12:44 +1000)]
Examine: fix --examine --brief --verbose on containers.
With --verbose, --examine --brief prints dev= information after
the personality has done its bit.
But with containers, the member array are printed in between.
So in super-ddf and super-intel, move printing of the member
arrays to before printing of the container. This avoids
confusion.
NeilBrown [Thu, 4 Jun 2009 02:29:21 +0000 (12:29 +1000)]
super-intel: fix test on failed_disk_num.
We sometimes set failed_disk_num to ~0.
However we cannot test for equality with that as failed_disk_num
is 8bit and ~0 is probably 32bit with lots of 1's.
So test if ~failed_disk_num is 0 instead.
Reported-By: "Mr. James W. Laferriere" <babydr@baby-dragons.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 2 Jun 2009 04:35:44 +0000 (14:35 +1000)]
Monitor: reduce default poll interval if mdstat is pollable.
Since 2.6.16, mdstat responds to select/poll.
So in that case, increase the default poll interval to about 15
minutes.
This ensures that the background load is insignificant.
NeilBrown [Tue, 2 Jun 2009 04:24:58 +0000 (14:24 +1000)]
Monitor: don't get confused if utime is never set.
externally managed arrays do not (currently) cause utime in
GET_ARRAY_INFO to be updated. So if it is zero, just assume the
current time.
This will cause GET_DISK_INFO to be called more often, but as we do
the scan only every 60 seconds normally, a few extra syscalls isn't
going to make a big difference.
Dan Williams [Mon, 18 May 2009 17:02:58 +0000 (10:02 -0700)]
imsm: kill "auto=" in brief_examine_super_imsm
The auto parameter is obsolete after kernel version 2.6.28 as all arrays
are partitionable via block device extended minor support. Environments
that requre the mdp style of array can always edit the configuration
file to specify auto=mdp.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Mon, 18 May 2009 16:58:55 +0000 (09:58 -0700)]
imsm: fix num_domains
The 'num_domains' field simply identifies the number of mirrors. So it
is 2 for a 2-disk raid1 or a 4-disk raid10. The orom does not currently
support more than 2 mirrors, but a three disk raid1 for example would
increase num_domains to 3.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Mon, 11 May 2009 05:58:44 +0000 (15:58 +1000)]
create_mddev: don't replace /dev/mdX with /dev/md/X
If someone creates/assemble an array called "/dev/md0", don't force
it to be "/dev/md/0". Doing so isn't really necessary and it
likely to confuse people.
NeilBrown [Mon, 11 May 2009 05:58:42 +0000 (15:58 +1000)]
mapfile - when rebuilding, choose an appropriate name is none is found.
When rebuilding the mapfile (mdadm -Ir), if not appropriate name is
found in /dev/md/, try to find an appropriate name, either by looking
in mdadm.conf or by using the name in the metadata.
NeilBrown [Mon, 11 May 2009 05:47:10 +0000 (15:47 +1000)]
Fix printf compile warning.
It always afters to cast big things to (unsigned long long) before
printing as %llu - it seems there will always be one arch which
has something to complain about ....
NeilBrown [Mon, 11 May 2009 05:47:10 +0000 (15:47 +1000)]
map_dev: prefer names in /dev/md/
Rather than preferring non-standard names (of which there are
many, like /dev/block/9:1), prefer names in /dev/md/ when finding
the name of an md device.
NeilBrown [Mon, 11 May 2009 05:47:10 +0000 (15:47 +1000)]
Be more consistent about keeping the host: prefix on array names.
If an array name contains a "hostname:" prefix, then
--assemble will tend to leave it there, while --incremental
will strip it off (when chosing a device name during auto-assembly).
Make this more consistent: strip the name off if we decide that
the name will be treated as 'local'. Leave it on if it will be
treated as 'foreign'.
NeilBrown [Mon, 11 May 2009 05:46:46 +0000 (15:46 +1000)]
Allow homehost to be largely ignored when assembling arrays.
If mdadm.conf contains
HOMEHOST <ignore>
or commandline contains
--homehost=<ignore>
then the check that array metadata mentions the given homehost is
replace by a check that the name recorded in the metadata is not
already used by some other array mentioned in mdadm.conf.
This allows more arrays to use their native name rather than having
an _NN suffix added.
This should only be used during boot time if all arrays required for
normal boot are listed in mdadm.conf.
If auto-assembly is used to find all array during boot, then the
HOMEHOST feature should be used to ensure there is no room for
confusion in choosing array names, and so it should not be set
to <ignore>.
NeilBrown [Mon, 11 May 2009 05:18:25 +0000 (15:18 +1000)]
Fix tests on ->container and ->member
For container= and member= to be effective in an mdadm.conf line
they must both be present. So when checking for their absence we
need container != NULL || member != NULL.
NeilBrown [Mon, 11 May 2009 05:18:20 +0000 (15:18 +1000)]
Make --brief even briefer.
Because ---examine --brief, or --detail --brief are
often used to create mdadm.conf, and because people don't want to
have to update their mdadm.conf unnecessarily, we don't want to
include information that might change.
And now that level changing is supported, that is almost everything
but UUID.
So move some more fields into the "Only print with --verbose" class.
NeilBrown [Mon, 11 May 2009 05:17:05 +0000 (15:17 +1000)]
config: support "ARRAY <ignore> ..." lines in mdadm.conf
Sometimes we want to ensure particular arrays are never
assembled automatically. This might include an array made of
devices that are shared between hosts.
To support this, allow ARRAY lines in mdadm.conf to use the word
"ignore" rather than a device name. Arrays which match such lines
are never automatically assembled (though they can still be assembled
by explicitly giving identification information on the mdadm command
line.
NeilBrown [Mon, 11 May 2009 05:16:49 +0000 (15:16 +1000)]
assemble: support arrays created with --homehost=any
If an array is created with --homehost=any, then --assemble and
--incremental will treat it as being local to 'this' host, no matter
what the name of this host is.
This is useful for array that will be given unique names and be
moved between machines.
NeilBrown [Mon, 11 May 2009 05:16:47 +0000 (15:16 +1000)]
create_dev - allow array names like mdX and /dev/mdX to appear 'numeric'
When choosing the minor number to use with an array, we currently base
the number of the 'name' stored in the metadata if that name is
numeric.
Extend that so that if it looks like a number md device name (/dev/md0
or just md0 or even /dev/md/0), then we use the number at the end to
suggest a minor number.
The means that if someone creates and array with "--name md0" or even
"--name /dev/md0" it will continue to do what they expect.
Apparently the dereferencing of a type-punned pointer breaks strict
aliasing rules. And we wouldn't want to do that.
So just make a different array of the appropriate type and use memcpy.