NeilBrown [Tue, 24 Nov 2009 05:32:01 +0000 (16:32 +1100)]
Various fixes for --kill
- When --kill-superblock is used with --metadata, find every
different superblock if there are several and kill them all.
- When creating a new array, kill off any old metadata. The code
to do this was already present but has become broken over time.
NeilBrown [Thu, 19 Nov 2009 04:54:49 +0000 (15:54 +1100)]
Create: warn when creating a raid1 using default metadata.
As a some/most bootloaders don't understand md metadata, it might
be difficult to boot off an array with the default 1.0 metadata.
So if this is used for a RAID1, ask for confirmation.
NeilBrown [Tue, 17 Nov 2009 02:15:34 +0000 (13:15 +1100)]
Don't silently map --re-add to --add
As --add can destroy important data on a disk, and
--re-add is not suppose to, it is wrong to silently
try --add if --re-add fails.
So print a message and abort instead.
NeilBrown [Tue, 17 Nov 2009 02:15:34 +0000 (13:15 +1100)]
Improve error messages when metadata handler does not support request.
->validate_geometry is called to validate overall parameters,
and to validate each individual device.
If it ever fails, it needs to report the reason, as common code
cannot possible know.
NeilBrown [Tue, 17 Nov 2009 02:15:32 +0000 (13:15 +1100)]
Change default metadata from 0.90 to 1.1
1.1 is more flexible in a number of ways and is safer.
0.90 is still fully supported.
1.0 should possibly be used for RAID1 arrays that you
want to boot off, depending on your boot loader.
NeilBrown [Tue, 17 Nov 2009 02:08:55 +0000 (13:08 +1100)]
Increase default chunk size to 512K
This seems more appropriate for current (and recent) model drives than
64K.
64K is still the default for '--build' as changing that could corrupt
data.
64K is also the default rounding for 'linear' on kernels older than
2.6.16.
NeilBrown [Tue, 17 Nov 2009 01:31:10 +0000 (12:31 +1100)]
Assemble/super0: allow non-in-sync devices to be assembled without complaint.
Other metadata formats already did not worry about whether 'sync' was
missing or not. super0 needs that now, but only for 0.91 metadata
that is undergoing reshape.
NeilBrown [Tue, 17 Nov 2009 01:30:54 +0000 (12:30 +1100)]
Assemble: include ACTIVE but not in-sync devices as non-spares.
Previously such things did not exist: ACTIVE and SYNC were either both
set or both clear. Recent changes with reshape means that a device
can be ACTIVE but not yet fully in-sync, so they need to be handled
and included in the array as active devices.
NeilBrown [Fri, 6 Nov 2009 06:26:47 +0000 (17:26 +1100)]
Grow: do not allow size changes with other changes.
A change the reduces the size of an array always happens
before any other change. So it can cause data to be lost.
By themselves these changes are reversible. But once another
change has started, the data would be permanently lost.
So recommend data integrity be checked between a size change
and any other change.
NeilBrown [Fri, 6 Nov 2009 04:19:39 +0000 (15:19 +1100)]
Grow: restrict to 2.6.32
2.6.31 has a bug which can lead to unsafe reshaping.
So only allow a reshape with 2.6.32.
When the required fixed get into 2.6.31.y, this can be relaxed
slightly
NeilBrown [Fri, 6 Nov 2009 03:18:49 +0000 (14:18 +1100)]
Grow: get component_size before using it.
We were using ->component_size while it hadn't been set.
This effectively meant that 'blocks' wasn't multiplied by
16 and reshape was even slower than it should have been.
Marco d'Itri [Wed, 28 Oct 2009 23:14:43 +0000 (10:14 +1100)]
vol_id was removed by the udev upstream maintainer in May 2009.
One should use
/sbin/blkid -o udev -p ...
(from util-linux >> 2.16) instead of
vol_id --export ...
Author: Marco d'Itri <md@linux.it>
Bug-Debian: http://bugs.debian.org/541884 Reviewed-by: martin f. krafft <madduck@debian.org> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Oct 2009 00:00:56 +0000 (11:00 +1100)]
Free some malloced memory that wasn't being freed.
As mdadm is normally a short-lived program it isn't always necessary
to free memory that was allocated, as the 'exit()' call will
automatically free everything. But it is more obviously correct if
the 'free' is there.
So this patch add a few calls to 'free'
NeilBrown [Wed, 21 Oct 2009 23:42:06 +0000 (10:42 +1100)]
Grow: update backup-metadata mtime every time we write it.
Originally the backup-metadata was only written once at the
start of a raid5 reshape that made the array bigger. So we only
set the mtime once.
Now that we can be writing metadata continually during an in-place
reshape, we need to update the mtime more often.
Also, allow the metadata mtime to be slightly in advance of the
array mtime. Normally the difference will be less than a second,
so 10 minutes should be plenty. This guards against an old backup
file being used to restart an array. but starting two reshapes in the
10 minutes is sufficiently unlikely, and the possibility of an
accident is already sufficiently small, that 10 minutes is probably
fine.
Thanks to Guy Martin <gmsoft@tuxicoman.be> for discovering and
reporting that .mtime wasn't being updated properly.
Dan Williams [Wed, 14 Oct 2009 00:37:02 +0000 (17:37 -0700)]
mdmon: preserve socket over chroot
Connect to the monitor in the old namespace and use that connection for
WaitClean requests when stopping the victim mdmon instance. This allows
ping_monitor() to work post chroot().
Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:08:33 +0000 (17:08 -0700)]
mdmon: exec(2) when the switchroot argument is not "/"
Try to execute mdmon from the target namespace. When used for initramfs
handovers we need to drop all references to the initramfs filesystem for
that memory to be freed.
Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:57 +0000 (17:41 -0700)]
mdmon: avoid writes in the startup path for mdmon on root arrays
When killing a previous monitor be careful not to cause writes to the
filesystem until the reads necessary to get the monitor operational have
completed.
The code is already prepared for errors creating the pid and socket
files, so simply defer creation of these files until after the first
call to manage().
Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:57 +0000 (17:41 -0700)]
Detail: export MD_UUID from mapfile
The load_super() from an mdadm --detail call may race against an mdmon
update. When this happens the load_super sees an inconsistent metadata
block and returns an error. The fallback path to use the map file
contents lacks uuid reporting, so provide __fname_from_uuid for
generically printing a uuid.
Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
imsm: add --update=uuid support
When disks have conflicting container memberships (same container ids
but incompatible member arrays) --update=uuid can be used to move
offenders to a new container id by changing 'orig_family_num'.
Note that this only supports random updates of the uuid as the actual
uuid is synthesized. We also need to communicate the new
'orig_family_num' value to all disks involved in the update. A new
field 'update_private' is added to struct mdinfo to allow this
information to be transmitted.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
ddf: prevent superblock being zeroed on --update
The full fix would be to support updating ddf metadata, but this minimal
fix just prevents the superblock from being zeroed when someone
inadvertently passes an unsupported --update option during assembly.
Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
imsm: fix/support --update
Fix init_super_imsm() to return an empty mpb when info == NULL, and
teach store_super_imsm() to simply write out the passed in mpb.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=523320 Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 14 Oct 2009 00:41:53 +0000 (17:41 -0700)]
imsm: fix spare record writeout race
imsm_activate_spare() in the manager thread may race against
write_super_imsm_spares() in the monitor thread. Give
write_super_imsm_spares() its own private mpb buffer to prevent
confusing the manager.
This change uncovered cases where spares were not being assembled due to
a failed metadata version number check. Spares can freely associate
across metadata version number, so reduce the scope of the version check
in the spare assembly case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Mon, 12 Oct 2009 05:55:19 +0000 (16:55 +1100)]
Grow: ignore error from final wait_backup
The last time wait_backup is called, it might see reshape
finish and so return an error indicator.
But this is not an error, and we must go ahead and prepare
the array for full access.
Dan Williams [Wed, 30 Sep 2009 18:45:41 +0000 (11:45 -0700)]
imsm: disambiguate family_num
This is a result of trawling through the Windows implementation to learn
the mechanism of how it disambiguates family_num. It is a continuation
of commit 148acb7b "imsm: fix family number handling" which introduced a
regression when reassembling a container with stale disks and rebuilt
members.
When rebuilding, a new family number is assigned to protect against the
"prodigal array member" problem. It prevents a former family member
from returning to the system and causing a rebuild to go the wrong
direction. However, this invalidates looking at the generation number to
determine the most up-to-date disk when comparing across family numbers.
Instead the assembly logic looks for agreement between a disk's local
family membership compared against a global list of all families in the
system. Whenever a disk's local metadata does not match a family number
on the global list that family number is marked offline.
It is possible that this logic results in multiple incompatible but
valid family numbers existing in a container. In this case mdadm.conf
cannot be consulted because it only records the uuid which is generated
from static fields in the metadata. The metadata lacks the data needed
to disambiguate "local" versus "foreign". The "foreign" array in this
case requires updating to change its container-id information
(orig_family_num), and possibly the member array names.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 30 Sep 2009 18:44:38 +0000 (11:44 -0700)]
imsm: kill close() of component device
None of the other formats close the passed in fd at load, and this
becomes a problem when trying to support --update where we need O_EXCL
protection across the entire operation.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hans de Goede [Thu, 24 Sep 2009 13:52:06 +0000 (06:52 -0700)]
mdmon: fix freeing unallocated memory
mdmon was creating a supertype struct with malloc, and thus not
necessarily getting zero-d memory.
This was causing it to segfault when called like this from the initrd:
/sbin/mdmon /proc/mdstat /sysroot
The problem was that load_super_imsm would get called on the non-zero'd
super struct, whcih in turn calls free_super_imsm, which checks st->sb,
which should be zero but isn't and then starts freeing bogus memory.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 15 Sep 2009 18:35:28 +0000 (11:35 -0700)]
Examine: don't count containers as spares
mdadm -Ebs will include containers in the scanned device list.
Examine() falsely thinks they are spares when MD_DISK_SYNC is not set.
This could be fixed by forcing all formats to set this flag for
container devices, but this flag is currently used by imsm to identify
free-floating spares.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 15 Sep 2009 18:34:20 +0000 (11:34 -0700)]
Detail: fix for an imsm container with a spare
Spares for imsm arrays do not have any info about the container in their
metadata records. If Detail() inadvertantly picks such a device for
->get_array_info() it will end up with less than useful info for the
container. So, continue to read from the disks until a non-spare device
is found.
This bug was found by timeouts waiting for udev to create the
user-friendly container name. To detect future UUID reporting problems
and a debug print to the timeout case in wait_for().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 15 Sep 2009 18:34:20 +0000 (11:34 -0700)]
imsm: fix spare promotion
1/ Fix an off by one error when detecting whether the device allocation
loop succeeded or not
2/ Update ->num_raid_devs before copying to avoid a segmentation fault
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NeilBrown [Fri, 7 Aug 2009 04:17:40 +0000 (14:17 +1000)]
Exmaine/brief: put member arrays after container arrays.
A previous patch moved move the '--examine --brief' reporting of
member arrays to before their containers. This breaks "mdadm -As"
assembly. So put them back, but still fix the problem addressed by
previous patch.
Reported-by: Artur Wojcik <artur.wojcik@intel.com> Reported-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:42 +0000 (17:11 -0700)]
imsm: fix spare-uuid assignment
imsm spares do not have container membership by default so we associate
them with the first container found in the configuration file. Some
ARRAY lines do not specify the metadata type so we cannot assume that
_cst will always be valid.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 1 Aug 2009 00:11:41 +0000 (17:11 -0700)]
platform: relax rom scanning alignment for ahci platforms
The PCI-3.0 Firmware specification allows for option-roms to have
512-byte alignment rather than 2048-byte. As there does not appear to
be a reliable method to detect a PCI-3.0 compliant BIOS from userspace
we allow the imsm platform detection code to presume that a system
modern enough to have an Intel AHCI controller does not have
dangerous/legacy ISA regions in the option-ROM memory space.
An environment variable to disable this behaviour, IMSM_SAFE_OROM_SCAN,
is added in case this presumption is ever proven wrong.
Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>