Manage: check alignment when stopping an array undergoing reshape.
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.
The kernel only enforces the new stripe size multiple.
So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".
NeilBrown [Thu, 27 Jun 2013 00:12:31 +0000 (10:12 +1000)]
Grow: report better message when --grow --chunk cannot work.
When changing the chunksize of an array, the new chunksize must
divide the device size.
If it doesn't we report a very brief message.
Make this message a bit longer and suggest a way forward be reducing
the size of the array.
Reported-by: Mark Knecht <markknecht@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 25 Jun 2013 05:56:22 +0000 (15:56 +1000)]
Subject: Make wait_for and open_dev_excl faster
When we crete or assemble an array, we wait for udev to create the
device file in /dev so that as soon as mdadm complete, the device can
be used.
This waiting is performed in multiples of 200ms, which can sometimes
be too long to wait.
So change to an exponential backoff. Wait 1, then 2, then 4 msec etc.
Once we get to 256msec, stop backing off and continue waiting 256ms at
a time until we reach the limit which is now 4.608sec rather than 5sec
which it was before.
NeilBrown [Tue, 25 Jun 2013 05:52:58 +0000 (15:52 +1000)]
Grow: fix bug in raid0 -> raid5 conversion.
The moment we change a RAID0 to a RAID5 it will try to recovery. This
will abort quite quickly as there are not spare devices, but it could
confuse the attempt to freeze the array.
So allow 'freeze' to work even on a recovering array.
NeilBrown [Mon, 24 Jun 2013 06:59:37 +0000 (16:59 +1000)]
Make: CXFLAGS should be conditionally assigned.
As the Makefile encourages users to set CXFLAGS for extra flags,
we should only conditionally set it.
That way it can be over-ridden in the environment as well as on
the command line.
mwilck@arcor.de [Thu, 20 Jun 2013 20:21:05 +0000 (22:21 +0200)]
Detail: deterministic ordering in --brief --verbose
Have mdadm --Detail --brief --verbose print the list of devices in
alphabetical order.
This is useful for debugging purposes. E.g. the test script
10ddf-create compares the output of two mdadm -Dbv calls which
may be different if the order is not deterministic.
(I confess: I use a modified "test" script that always runs
"mdadm --verbose" rather than "mdadm --quiet", otherwise this
wouldn't happen in 10ddf-create).
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 24 Jun 2013 04:08:41 +0000 (14:08 +1000)]
Grow: remove excess drives when converting to RAID0.
When converting to RAID0, all spares and non-data drives
need to be removed first.
It is possible that the first HOT_REMOVE_DISK will fail because the
personality hasn't let go of it yet, so retry a few times.
NeilBrown [Mon, 24 Jun 2013 03:04:38 +0000 (13:04 +1000)]
Grow: fix two problems with new_data_offset
1/ ignore failed devices - obviously
2/ We need to tell the kernel which direction the reshape should
progress even if we didn't choose the particular data_offset
to use.
NeilBrown [Mon, 24 Jun 2013 03:02:35 +0000 (13:02 +1000)]
Grow: Try hard to set new_offset.
Setting new_offset can fail if the v1.x "data_size" is too small.
So if that happens, try increasing it first by writing "0".
That can fail on spare devices due to a kernel bug, so if it doesn't
try writing the correct number of sectors.
NeilBrown [Wed, 19 Jun 2013 01:09:33 +0000 (11:09 +1000)]
Assemble: when forcing a single-degraded RAID6 array, trigger a 'repair'.
When an active/degraded RAID6 array is force-started we clear the
'active' flag, but it is still possible that some parity is
no in sync. This is because there are two parity block.
It would be nice to be able to tell the kernel "P is OK, Q maybe not".
But that is not possible.
So when we force-assemble such an array, trigger a 'repair' to fix up
any errant Q blocks.
This is not ideal as a restart during the repair will not be continued
after the restart, but it is the best we can do without kernel help.
NeilBrown [Wed, 19 Jun 2013 00:33:47 +0000 (10:33 +1000)]
sysfs_read: return devices in same order as in filesystem.
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear. That makes more sense when exposed to a
human (as the next patch will).
Bernd Schubert [Tue, 18 Jun 2013 09:09:26 +0000 (11:09 +0200)]
raid6check: Fix memory leaks detected by valgrind
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B19: check_stripes (raid6check.c:151)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B67: check_stripes (raid6check.c:155)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
Bernd Schubert [Tue, 18 Jun 2013 09:09:16 +0000 (11:09 +0200)]
raid6check: Fix build of raid6check
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.
Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)
NeilBrown [Mon, 17 Jun 2013 06:55:31 +0000 (16:55 +1000)]
Assemble/Incr: Don't include spares with too-high event count.
Some failure scenarios can leave a spare with a higher event count
than an in-sync device. Assembling an array like this will confuse
the kernel.
So detect spares with event counts higher than the best non-spare
event count and exclude them from the array.
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 27 May 2013 05:09:38 +0000 (15:09 +1000)]
Grow: allow for different sized devices when updating data_offset.
It is possible that the devices in an array have different sizes, and
different data_offsets. So the 'before_space' and 'after_space' may
be different from drive to drive.
Any decisions about how much to change the data_offset must work on
all devices, so must be based on the minimum available space on
any devices.
So find this minimum first, then do the calculation.
NeilBrown [Thu, 23 May 2013 04:41:29 +0000 (14:41 +1000)]
Assemble: --update=metadata converts v0.90 to v1.0
This allows the smooth conversion of legacy 0.90 arrays
to 1.0 metadata.
Old metadata is likely to remain but will be ignored.
It can be removed with
mdadm --zero-superblock --metadata=0.90 /dev/whatever
NeilBrown [Tue, 21 May 2013 06:28:23 +0000 (16:28 +1000)]
Grow: use new_data_offset instead of backups for raid4/5/6 reshape.
If we can modify the data_offset, we can avoid doing any backups at all.
If we can't fall back on old approach - but not if --data-offset
was requested.
NeilBrown [Wed, 22 May 2013 02:17:32 +0000 (12:17 +1000)]
Grow: introduce min_offset_change to struct reshape.
raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.
NeilBrown [Wed, 15 May 2013 01:40:27 +0000 (11:40 +1000)]
Create: over-ride "start_ro" setting when creating an array.
If module parameter start_ro is set, arrays start readonly.
This is OK when assembling, but is very surprising when creating
an array as the resync won't start.
So over-ride the setting (unless --read-only was given) make
arrays RW when created.
NeilBrown [Wed, 15 May 2013 01:10:54 +0000 (11:10 +1000)]
Suppress error messages from systemctl.
We call systemctl to see if systemd will run mdmon for us.
If it cannot, we run mdmon directly, so we aren't interested
in the error message.
So redirect stderr to /dev/null.
NeilBrown [Wed, 15 May 2013 01:03:25 +0000 (11:03 +1000)]
create_mddev: add support for /dev/md_XXX non-numeric names.
With the 'devnm' infrastructure fixed, it is quite easy to support
names like "md_home" for md arrays.
The currently defaults to "off" and can be enabled in mdadm.conf with
CREATE names=yes
This is incase other tools get confused by the new names.
NeilBrown [Mon, 13 May 2013 02:56:38 +0000 (12:56 +1000)]
misc_scan: don't trust the mapping file too much for device names.
misc_scan assumes that any device name found in the 'mapping' file
is usable. Usually it is but sometimes not, such as for inactive
devices.
Depending on it isn't really robust, when a name is found, check that
it exists. If not, fall back on map_dev.
This will allow "--detail --scan" to notice inactive devices.