NeilBrown [Tue, 21 May 2013 06:28:23 +0000 (16:28 +1000)]
Grow: use new_data_offset instead of backups for raid4/5/6 reshape.
If we can modify the data_offset, we can avoid doing any backups at all.
If we can't fall back on old approach - but not if --data-offset
was requested.
NeilBrown [Wed, 22 May 2013 02:17:32 +0000 (12:17 +1000)]
Grow: introduce min_offset_change to struct reshape.
raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.
NeilBrown [Wed, 15 May 2013 01:40:27 +0000 (11:40 +1000)]
Create: over-ride "start_ro" setting when creating an array.
If module parameter start_ro is set, arrays start readonly.
This is OK when assembling, but is very surprising when creating
an array as the resync won't start.
So over-ride the setting (unless --read-only was given) make
arrays RW when created.
NeilBrown [Wed, 15 May 2013 01:10:54 +0000 (11:10 +1000)]
Suppress error messages from systemctl.
We call systemctl to see if systemd will run mdmon for us.
If it cannot, we run mdmon directly, so we aren't interested
in the error message.
So redirect stderr to /dev/null.
NeilBrown [Wed, 15 May 2013 01:03:25 +0000 (11:03 +1000)]
create_mddev: add support for /dev/md_XXX non-numeric names.
With the 'devnm' infrastructure fixed, it is quite easy to support
names like "md_home" for md arrays.
The currently defaults to "off" and can be enabled in mdadm.conf with
CREATE names=yes
This is incase other tools get confused by the new names.
NeilBrown [Mon, 13 May 2013 02:56:38 +0000 (12:56 +1000)]
misc_scan: don't trust the mapping file too much for device names.
misc_scan assumes that any device name found in the 'mapping' file
is usable. Usually it is but sometimes not, such as for inactive
devices.
Depending on it isn't really robust, when a name is found, check that
it exists. If not, fall back on map_dev.
This will allow "--detail --scan" to notice inactive devices.
NeilBrown [Mon, 13 May 2013 02:07:40 +0000 (12:07 +1000)]
Incrmental: tell udevs to unmount when array looks to have disappeared.
If a device is removed which appears to be busy in an md array, then
it is very like the array cannot be used.
We currently try to stop it, but that could fail if udisks had
automatically mounted it.
So tell udisks to unmount it, but ignore any error.
NeilBrown [Wed, 1 May 2013 00:23:40 +0000 (10:23 +1000)]
Wait: also wait if an action is about to start.
If a sync/recover action is about to start but hasn't actually begun
yet, /proc/mdstat won't show it, but md/sync_action will (it checks
MD_RECOVERY_NEEDED).
So when /proc/mdstat seems to say nothing is happening, double check
with md/sync_action.
Linux 3.10 will allow more "--add" to be handled as "--re-add".
To be sure the tests work correctly we sometimes need to zero
the device to ensure it really is an --add that happens.
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:37 +0000 (12:07 +0200)]
monitor: read_and_act: handle race conditions for resync_start
When arrays are stopped, sysfs attributes may be deleted by
the kernel, and attempts to read these attributes will fail.
Setting resync_start to 0 is wrong in this case, because it
may make is_resync_complete() erroneously return
FALSE for a clean array. It is better to leave resync_start
untouched (the previously read value for this array).
Otherwise set_array_state() will pass thewrong state information
to the metadata handler, which will write it to disk, and at
the next restart an unnecessary recovery is started for the
array.
It is also possible that resync_start is actually *not* deleted
yet when read_and_act is running, and an apparently valid
value of "0" is read from it, with the same effect as described
above. This happens if the kernel has already called md_clean()
on the array (setting recovery_cp = 0), but the delayed removal
of "resync_start" hasn't happened yet. Therefore, in "clear"
state, "resync_start" shouldn't be read at all.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:36 +0000 (12:07 +0200)]
monitor: don't call pselect() on deleted sysfs files
It makes no sense to listen for events on files that have
been deleted. This happens when arrays are stopped and the
kernel removes the associated sysfs structures.
Calling pselect() on the deleted attributes may cause a storm
of wake events.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:35 +0000 (12:07 +0200)]
DDF: add code to debug state changes
The 10ddf-create test case fails sporadically because wrong meta
data is written, making the array appear inconsistent when it's
restarted. Added code to aid debugging this.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:34 +0000 (12:07 +0200)]
DDF: brief_detail_super_ddf: print correct UUID for subarrays
Commit c1ea5a98 caused brief_detail_super_ddf() to be called
for subarrays. But the UUID printed was always the one of the
container. This is wrong and actually worse than printing no UUID
at all, and causes the DDF test case (10ddf-create) to fail.
This patch adds code to determine the MD UUID of a subarray correctly.
The hard part is to figure out for which subarray the function is
called. Moved that to an extra function.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
Manage_runstop: call flush_mdmon if O_EXCL fails on stopping mdmon array.
When stopping an mdmon array, at reshape might be being aborted
which inhibets O_EXCL. So if that is possible, call flush_mdmon
to make sure mdmon isn't still busy.
imsm: monitor: do not finish migration if there are no failed disks
Transition from "degraded" to "recovery" made in OROM is slightly different
than the same transision in mdadm. Missing disk is not removed from list of
raid devices, but just from map. Therefore mdadm should not end migration
basing on existence of list of missing disks but should rely on count of
failed disks.
Add updating component_size to manager thread of mdmon
Mdmon does not update component_size now. It is wrong because in case
of size's expansion component_size is changed by mdadm but mdmon does not
reread its new value and uses a wrong, old one. As a result the metadata
is incorrect during size's expansion. It contains no information that
resync is in progress (there is no checkpoint too). The metadata is
as if resync has already been finished but it has not.
Component_size will be set to match information in sysfs. This value
will be updated by manager thread in manage_member() function.
Now mdmon uses the correct, current value of component_size and the
correct metadata (containing information about resync and checkpoint)
is written.
NeilBrown [Mon, 4 Mar 2013 23:36:21 +0000 (10:36 +1100)]
Create: default to bitmap=internal for large arrays.
Here, "large" means components are 100G or more. It is
usually beneficial to have write-intent bitmaps on such arrays.
They can be suppressed with --bitmap=none
NeilBrown [Mon, 4 Mar 2013 22:46:34 +0000 (09:46 +1100)]
Enhance incremental removal.
When asked to incrementally-remove a device, try marking the array
read-auto first. That will delay recording the failure in the
metadata until it is really relevant.
This way, if the device are just unplugged when the array is not
really in use, the metadata will remain clean.
If marking the default as faulty fails because it is EBUSY, that
implies that the array would be failed without the device. As the
device has (presumably gone) - that means the array is dead. So try
to stop it. If that fails because it is in use, send a uevent to
report that it is gone. Hopefully whoever mounted it will now let go.
This means that if you plug in some devices and they are
auto-assembled, then unplugging them will auto-deassemble relatively
cleanly.
To be complete, we really need the kernel to disassemble the array
after the last close somehow. Maybe if a REMOVE has failed and a STOP
has failed and nothing else much has happened, it could safely stop
the array on last close.
mwilck@arcor.de [Fri, 1 Mar 2013 22:28:33 +0000 (23:28 +0100)]
Detail.c: call load_container for container subarrays
Without calling load_container at this point, the
info structure may be missing some important information.
In particular, information about secondary DDF RAID levels
may be wrong if information is only read from a single disk.
If this fails, fall back to the previous code.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 1 Mar 2013 22:28:32 +0000 (23:28 +0100)]
DDF: compare_super_ddf: merge local info of other superblock
If a match is found in compare_super_ddf, check the other SB
for local DDF information (VD config records, physical disk data)
which is not available in the current superblock, and add it
if needed.
This is important for the mdmon - when disks are added to a
auto read-only array, they must be present in the DDF structure
in order to guarantee consistent writeback of metadata to all
disks.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 1 Mar 2013 22:28:30 +0000 (23:28 +0100)]
DDF: __write_init_super_ddf: use correct VD conf
When writing back the DDF structure, make sure that on each disk
we write the configs that include this disk even if a secondary
RAID level is present. Otherwise the secondary RAID will not be
read correctly any more when we open the device next time.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 1 Mar 2013 22:28:23 +0000 (23:28 +0100)]
DDF: use existing locations for primary and secondary DDF structure
Some RAID BIOSes apparently use hard-coded LBA offsets (presumably
from the end of the disk) for the primary and secondary DDF
structure, ignoring the values given in the DDF anchor. This is
broken BIOS behavior, but it will cause any changes made by MD
(e.g. setting the init_state flag after a full initialization)
to be "forgotten" after the next reboot.
This patch fixes this by using the exiting LBA locations if
available. Verified that this fixes MD+LSI Mega Software RAID
BIOS.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 1 Mar 2013 22:28:22 +0000 (23:28 +0100)]
DDF: cleanly save the secondary DDF structure
So far, mdadm only saved the header of the secondary structure.
With this patch, the full secondary DDF structure is saved
consistently, too. Some vendor DDF implementations need it.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 21 Feb 2013 06:02:21 +0000 (17:02 +1100)]
Grow: fix problem with reshaping RAID4 to RAID0.
As 'layout' doesn't map neatly from RAID4 to RAID5, we need to
set it correctly for RAID4.
Also, when no reshape is needed we should set re->level to the final
desired level.
Thomas Bächler [Sat, 9 Feb 2013 20:49:47 +0000 (21:49 +0100)]
udev: Fix order of execution of the md rules
Right now, the rules that run blkid on raid arrays are executed after
the assembly rules. This means incremental assembly will always fail
when raid arrays are again physical components of raid arrays.
Instead of simply reversing the order, split the rules up into two files,
one dealing with array properties and one dealing with assembly.
NeilBrown [Thu, 7 Feb 2013 00:51:21 +0000 (11:51 +1100)]
make --update=homehost work again
Commit 1e2b276535cea41c348292a019bdda8a58cb1679 (Report error in --update
string is not recognised) broke homehost updating functionality because it
depended on each string comparison being done even after we already found
a match. Make it work again by restructuring code.
Reported-by: (and original version by) Justin Maggard <jmaggard10@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 5 Feb 2013 04:34:17 +0000 (15:34 +1100)]
Avoid using BLKFLSBUF.
Now that we use O_DIRECT for all device IO, BLKFLSBUF is not needed to
ensure we get current data, and it can impose a cost if any flush-out
is needed. So remove it.
To be safe, add O_DIRECT to one place where it isn't currently used:
when reading a bitmap.
NeilBrown [Tue, 5 Feb 2013 04:32:49 +0000 (15:32 +1100)]
Detail: print correct size for large external-metadata arrays.
If externally menaged metadata is in use, array.major_version will
be zero, so the test here to consider using get_component_size()
is wrong. So if sra is present, use the major_version from there.
Jes Sorensen [Fri, 1 Feb 2013 15:15:18 +0000 (16:15 +0100)]
Add support for launching mdmon via systemctl instead of fork/exec
If launching mdmon via systemctl fails, we fall back to the old method
of fork/exec. This allows for having mdmon launched via systemctl
which avoids problems with it getting killed by systemd due to it
ending up in the parent's cgroup (udev).
NeilBrown [Sun, 6 Jan 2013 23:38:46 +0000 (10:38 +1100)]
dev_open - don't bother trying map_dev
map_dev can be slow, and doesn't really provide a better result
than just creating a temporary device.
So discard it and use mknod/open/unlink to open a major:minor device.
NeilBrown [Sun, 6 Jan 2013 23:34:43 +0000 (10:34 +1100)]
platform-intel - cache 'intel_devices' for a few seconds.
find_intel_devices() has take a little while to run as it scans
some directory tree, and the result isn't likely to change
often.
So cache the value and only discard it after 10 seconds.
NeilBrown [Sun, 6 Jan 2013 23:17:04 +0000 (10:17 +1100)]
conditionally remove map_dev from find_free_devnum
map_dev can be slow so it is best to not call it when
not necessary.
The final test in "find_free_devnum" is not relevant when
udev is being used, so remove the test in that case.
NeilBrown [Wed, 5 Dec 2012 00:06:55 +0000 (11:06 +1100)]
Assemble: Don't auto-assemble arrays which conflict with mdadm.conf
When auto-assembling we might find an array which appear in
mdadm.conf.
This can happen if the array (based on UUID) doesn't match what is
in mdadm.conf.
For consistency we should avoid auto-assembling such an array just as
we avoid regular-assembling of the array.
Reported-by: Ross Boylan <ross@biostat.ucsf.edu> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 27 Nov 2012 23:12:09 +0000 (10:12 +1100)]
Fix "--remove faulty" and similar commands.
A recent change to improve error messages for subdev management broken
all use cases were device names like %d:%d were used.
Re-arrange the code again so we use dev_open first - which understands
those names - and then only try 'stat' if that failed.
The important thing is to base the 'Cannot find' message on the result
of 'stat', not on the result of 'open'.
It fixes the following uninitialized variables compilation-time error:
WARN - Grow.c: In function ‘reshape_array’:
WARN - Grow.c:2413:21: error: ‘min_space_after’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
WARN - Grow.c:2376:39: note: ‘min_space_after’ was declared here
WARN - Grow.c:2414:22: error: ‘min_space_before’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
WARN - Grow.c:2376:21: note: ‘min_space_before’ was declared here
WARN - cc1: all warnings being treated as errors
WARN - make: *** [Grow.o] Error 1
It occurs during compilation of mdadm on Fedora 17.
Marcin Tomczak [Fri, 9 Nov 2012 14:46:36 +0000 (15:46 +0100)]
imsm: Forbid spanning between multiple controllers.
Attaching disks to multiple controllers of the same type has been
allowed so far. Now spanning between multiple controllers is disallowed
at all by IMSM metadata.