NeilBrown [Mon, 24 Jun 2013 06:59:37 +0000 (16:59 +1000)]
Make: CXFLAGS should be conditionally assigned.
As the Makefile encourages users to set CXFLAGS for extra flags,
we should only conditionally set it.
That way it can be over-ridden in the environment as well as on
the command line.
mwilck@arcor.de [Thu, 20 Jun 2013 20:21:05 +0000 (22:21 +0200)]
Detail: deterministic ordering in --brief --verbose
Have mdadm --Detail --brief --verbose print the list of devices in
alphabetical order.
This is useful for debugging purposes. E.g. the test script
10ddf-create compares the output of two mdadm -Dbv calls which
may be different if the order is not deterministic.
(I confess: I use a modified "test" script that always runs
"mdadm --verbose" rather than "mdadm --quiet", otherwise this
wouldn't happen in 10ddf-create).
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 24 Jun 2013 04:08:41 +0000 (14:08 +1000)]
Grow: remove excess drives when converting to RAID0.
When converting to RAID0, all spares and non-data drives
need to be removed first.
It is possible that the first HOT_REMOVE_DISK will fail because the
personality hasn't let go of it yet, so retry a few times.
NeilBrown [Mon, 24 Jun 2013 03:04:38 +0000 (13:04 +1000)]
Grow: fix two problems with new_data_offset
1/ ignore failed devices - obviously
2/ We need to tell the kernel which direction the reshape should
progress even if we didn't choose the particular data_offset
to use.
NeilBrown [Mon, 24 Jun 2013 03:02:35 +0000 (13:02 +1000)]
Grow: Try hard to set new_offset.
Setting new_offset can fail if the v1.x "data_size" is too small.
So if that happens, try increasing it first by writing "0".
That can fail on spare devices due to a kernel bug, so if it doesn't
try writing the correct number of sectors.
NeilBrown [Wed, 19 Jun 2013 01:09:33 +0000 (11:09 +1000)]
Assemble: when forcing a single-degraded RAID6 array, trigger a 'repair'.
When an active/degraded RAID6 array is force-started we clear the
'active' flag, but it is still possible that some parity is
no in sync. This is because there are two parity block.
It would be nice to be able to tell the kernel "P is OK, Q maybe not".
But that is not possible.
So when we force-assemble such an array, trigger a 'repair' to fix up
any errant Q blocks.
This is not ideal as a restart during the repair will not be continued
after the restart, but it is the best we can do without kernel help.
NeilBrown [Wed, 19 Jun 2013 00:33:47 +0000 (10:33 +1000)]
sysfs_read: return devices in same order as in filesystem.
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear. That makes more sense when exposed to a
human (as the next patch will).
Bernd Schubert [Tue, 18 Jun 2013 09:09:26 +0000 (11:09 +0200)]
raid6check: Fix memory leaks detected by valgrind
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B19: check_stripes (raid6check.c:151)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B67: check_stripes (raid6check.c:155)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
Bernd Schubert [Tue, 18 Jun 2013 09:09:16 +0000 (11:09 +0200)]
raid6check: Fix build of raid6check
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.
Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)
NeilBrown [Mon, 17 Jun 2013 06:55:31 +0000 (16:55 +1000)]
Assemble/Incr: Don't include spares with too-high event count.
Some failure scenarios can leave a spare with a higher event count
than an in-sync device. Assembling an array like this will confuse
the kernel.
So detect spares with event counts higher than the best non-spare
event count and exclude them from the array.
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 27 May 2013 05:09:38 +0000 (15:09 +1000)]
Grow: allow for different sized devices when updating data_offset.
It is possible that the devices in an array have different sizes, and
different data_offsets. So the 'before_space' and 'after_space' may
be different from drive to drive.
Any decisions about how much to change the data_offset must work on
all devices, so must be based on the minimum available space on
any devices.
So find this minimum first, then do the calculation.
NeilBrown [Thu, 23 May 2013 04:41:29 +0000 (14:41 +1000)]
Assemble: --update=metadata converts v0.90 to v1.0
This allows the smooth conversion of legacy 0.90 arrays
to 1.0 metadata.
Old metadata is likely to remain but will be ignored.
It can be removed with
mdadm --zero-superblock --metadata=0.90 /dev/whatever
NeilBrown [Tue, 21 May 2013 06:28:23 +0000 (16:28 +1000)]
Grow: use new_data_offset instead of backups for raid4/5/6 reshape.
If we can modify the data_offset, we can avoid doing any backups at all.
If we can't fall back on old approach - but not if --data-offset
was requested.
NeilBrown [Wed, 22 May 2013 02:17:32 +0000 (12:17 +1000)]
Grow: introduce min_offset_change to struct reshape.
raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.
NeilBrown [Wed, 15 May 2013 01:40:27 +0000 (11:40 +1000)]
Create: over-ride "start_ro" setting when creating an array.
If module parameter start_ro is set, arrays start readonly.
This is OK when assembling, but is very surprising when creating
an array as the resync won't start.
So over-ride the setting (unless --read-only was given) make
arrays RW when created.
NeilBrown [Wed, 15 May 2013 01:10:54 +0000 (11:10 +1000)]
Suppress error messages from systemctl.
We call systemctl to see if systemd will run mdmon for us.
If it cannot, we run mdmon directly, so we aren't interested
in the error message.
So redirect stderr to /dev/null.
NeilBrown [Wed, 15 May 2013 01:03:25 +0000 (11:03 +1000)]
create_mddev: add support for /dev/md_XXX non-numeric names.
With the 'devnm' infrastructure fixed, it is quite easy to support
names like "md_home" for md arrays.
The currently defaults to "off" and can be enabled in mdadm.conf with
CREATE names=yes
This is incase other tools get confused by the new names.
NeilBrown [Mon, 13 May 2013 02:56:38 +0000 (12:56 +1000)]
misc_scan: don't trust the mapping file too much for device names.
misc_scan assumes that any device name found in the 'mapping' file
is usable. Usually it is but sometimes not, such as for inactive
devices.
Depending on it isn't really robust, when a name is found, check that
it exists. If not, fall back on map_dev.
This will allow "--detail --scan" to notice inactive devices.
NeilBrown [Mon, 13 May 2013 02:07:40 +0000 (12:07 +1000)]
Incrmental: tell udevs to unmount when array looks to have disappeared.
If a device is removed which appears to be busy in an md array, then
it is very like the array cannot be used.
We currently try to stop it, but that could fail if udisks had
automatically mounted it.
So tell udisks to unmount it, but ignore any error.
NeilBrown [Wed, 1 May 2013 00:23:40 +0000 (10:23 +1000)]
Wait: also wait if an action is about to start.
If a sync/recover action is about to start but hasn't actually begun
yet, /proc/mdstat won't show it, but md/sync_action will (it checks
MD_RECOVERY_NEEDED).
So when /proc/mdstat seems to say nothing is happening, double check
with md/sync_action.
Linux 3.10 will allow more "--add" to be handled as "--re-add".
To be sure the tests work correctly we sometimes need to zero
the device to ensure it really is an --add that happens.
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:37 +0000 (12:07 +0200)]
monitor: read_and_act: handle race conditions for resync_start
When arrays are stopped, sysfs attributes may be deleted by
the kernel, and attempts to read these attributes will fail.
Setting resync_start to 0 is wrong in this case, because it
may make is_resync_complete() erroneously return
FALSE for a clean array. It is better to leave resync_start
untouched (the previously read value for this array).
Otherwise set_array_state() will pass thewrong state information
to the metadata handler, which will write it to disk, and at
the next restart an unnecessary recovery is started for the
array.
It is also possible that resync_start is actually *not* deleted
yet when read_and_act is running, and an apparently valid
value of "0" is read from it, with the same effect as described
above. This happens if the kernel has already called md_clean()
on the array (setting recovery_cp = 0), but the delayed removal
of "resync_start" hasn't happened yet. Therefore, in "clear"
state, "resync_start" shouldn't be read at all.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:36 +0000 (12:07 +0200)]
monitor: don't call pselect() on deleted sysfs files
It makes no sense to listen for events on files that have
been deleted. This happens when arrays are stopped and the
kernel removes the associated sysfs structures.
Calling pselect() on the deleted attributes may cause a storm
of wake events.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:35 +0000 (12:07 +0200)]
DDF: add code to debug state changes
The 10ddf-create test case fails sporadically because wrong meta
data is written, making the array appear inconsistent when it's
restarted. Added code to aid debugging this.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
mwilck@arcor.de [Fri, 25 Oct 2013 10:07:34 +0000 (12:07 +0200)]
DDF: brief_detail_super_ddf: print correct UUID for subarrays
Commit c1ea5a98 caused brief_detail_super_ddf() to be called
for subarrays. But the UUID printed was always the one of the
container. This is wrong and actually worse than printing no UUID
at all, and causes the DDF test case (10ddf-create) to fail.
This patch adds code to determine the MD UUID of a subarray correctly.
The hard part is to figure out for which subarray the function is
called. Moved that to an extra function.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
Manage_runstop: call flush_mdmon if O_EXCL fails on stopping mdmon array.
When stopping an mdmon array, at reshape might be being aborted
which inhibets O_EXCL. So if that is possible, call flush_mdmon
to make sure mdmon isn't still busy.
imsm: monitor: do not finish migration if there are no failed disks
Transition from "degraded" to "recovery" made in OROM is slightly different
than the same transision in mdadm. Missing disk is not removed from list of
raid devices, but just from map. Therefore mdadm should not end migration
basing on existence of list of missing disks but should rely on count of
failed disks.