Neil Brown [Thu, 15 May 2008 06:48:12 +0000 (16:48 +1000)]
Change write_init_super to be called only once.
The current model for creating arrays involves writing
a superblock to each device in the array.
With containers (as with DDF), that model doesn't work.
Every device in the container may need to be updated
for an array made from just some the devices in a container.
So instead of calling write_init_super for each device,
we call it once for the array and have it iterate over
all the devices in the array.
To help with this, ->add_to_super now passes in an 'fd' and name for
the device. These get saved for use by write_init_super. So
add_to_super takes ownership of the fd, and write_init_super will
close it.
This information is stored in the new 'info' field of supertype.
As part of this, write_init_super now removes any old traces of raid
metadata rather than doing this in common code.
Neil Brown [Thu, 15 May 2008 06:48:08 +0000 (16:48 +1000)]
Reduce openning of dev in create.
Now that validate_geometry opens and checks the device,
we don't need to do it as much in top level Create.
We only need it to check for old array or filesystem info.
So only open the device at that place.
Bill Nottingham [Mon, 5 May 2008 09:44:04 +0000 (19:44 +1000)]
Simplistig locking for --incremental.
From: Bill Nottingham <notting@redhat.com>
mdadm --incremental doesn't really do any locking. If you get multiple
events in parallel for the same device (that has not yet started), they
will all go down the path to create the array. One will succeed, the
rest will have SET_ARRAY_INFO die with -EBUSY (md: array mdX already has disks!)
and will exit without adding the disk.
Original bug report is: https://bugzilla.redhat.com/show_bug.cgi?id=433932
This is solved by adding very very rudimentary locking. Incremental() now
opens the device with O_EXCL to ensure only one invocation is frobbing the
array at once. A simple loop just tries to open 5 times a second for 5
seconds. If the array stays locked that long, you probably have bigger
issues.
Neil Brown [Mon, 28 Apr 2008 06:30:31 +0000 (16:30 +1000)]
Small improvements to --incremental for arrays that are in the middle of reshape
There is still a problem: If array is partially assembled and started
read-only, the last device doesn't get added properly. Probably a kernel
problem.
Neil Brown [Mon, 28 Apr 2008 06:30:09 +0000 (16:30 +1000)]
Allow creation of a RAID6 with a single missing device.
This did not work before as we couldn't mark it clean as there would
be some parity blocks out of sync, and raid6 will not assemble a
dirty degraded array.
So make such arrays doubly degraded (the last device becomes a spare)
and clean.
Neil Brown [Mon, 28 Apr 2008 06:29:37 +0000 (16:29 +1000)]
Fix problems with array.size overflowing on large arrays.
array.size is 32bits and counts K. So for arrays with
more than 4Terrabytes, it can overflow.
The correct number can be read from sysfs, but there are still
a few places that use array.size and risk truncation. What is worse.
they compare a number of kilobytes with a number of sectors !!
So use get_component_size() to read the sysfs information, and be
more consistent about units.
Neil Brown [Mon, 28 Apr 2008 06:29:12 +0000 (16:29 +1000)]
Fix for segfault when reading /proc/mdstat
Some kernel versions don't put a space between 'active' and '(auto-read-only)'
in /proc/mdstat. This causes a parsing problem leaving 'level' set to
NULL which causes a crash.
So synthesise a space there if it is missing, and check for 'level' to
be NULL and don't de-ref if it is.
Neil Brown [Tue, 16 Oct 2007 03:52:35 +0000 (13:52 +1000)]
Fix restarting of a reshaping array.
The last release broke the ability to assemble an array that
was in the middle of a reshape.
This patch adds code to test if the critical section needs
to be restored or not so that - if we have failed to restore it,
we know whether to fail or not.
martin f. krafft [Sun, 30 Sep 2007 12:22:56 +0000 (13:22 +0100)]
Fix segfault on assembly on amd64 with v1 superblocks
Commit a40b4fe introduced a temporary supertype variable tst, instead of
manipulating st directly. However, it was forgotton to pass &tst into the
recursive load_super1 call, causing an infinite recursion.
Signed-off-by: martin f. krafft <madduck@debian.org>
Neil Brown [Mon, 24 Sep 2007 03:14:13 +0000 (13:14 +1000)]
Make "--write-mostly" effective when re-adding a device to an array.
Fixes Debian Bug 442874
When we discover that we can 're-add' a drive, we forget to check the
write-mostly flag.
This highlights the fact that you cannot turn 'off' the write-mostly
flag at this point. I wonder if that is a problem...
Iustin Pop [Tue, 11 Sep 2007 14:20:19 +0000 (16:20 +0200)]
Explain the read-balancing algorithm for RAID1 better in md.4
From: Iustin Pop <iusty@k1024.org>
There are many questions on the mailing list about the RAID1 read
performance profile. This patch adds a new paragraph to the RAID1
section in md.4 that details what kind of speed-up one should expect
from RAID1.
Neil Brown [Mon, 20 Aug 2007 04:14:42 +0000 (14:14 +1000)]
Report error when grow cannot be restarted.
Make sure that if --assemble find an array in the critical region
of a reshape, and cannot find the critical data to restart the
reshape, it gives an error message.
Neil Brown [Mon, 20 Aug 2007 04:14:25 +0000 (14:14 +1000)]
Fix problem with add a device to a 1.x array created with older mdadm.
When adding new disk to an array, don't reserve so much bitmap
space that the disk cannot store the required data. (Needed when
1.x array was created with older mdadm).
Ian Dall [Mon, 9 Jul 2007 01:29:04 +0000 (11:29 +1000)]
Allow "--write-behind=" to be done in grow mode.
From: Ian Dall <ian@beware.dropbear.id.au>
I have a small patch to mdadm which allows the write-behind amount to be
set a array grow time (instead of currently only at grow or create
time). I have tested this fairly extensively on some arrays built out of
loop back devices, and once on a real live array.
Enhance raid4 support: --assemble and --monitor wasn't quite happy with it.
From: Doug Ledford <dledford@redhat.com>
This one actually does a couple things. Mainly related to raid4, but
kinda touches other raid levels some.
When creating a raid4 array, treat it like a raid5 array in that we
create it in degraded mode by default and add the last disk as a spare.
Besides speeding things up, this has a second effect that it makes mdadm
more consistent. In order to create a degraded raid5 array, you need
only passing missing as one of the devices. For a degraded raid4 array,
prior to this patch, you must pass assume-clean or else it refuses to
create the array. Even force won't make it work without assume-clean.
With the patch, raid4 behaves identical to raid5.
Separate from that, the monitor functionality completely ignores raid4
arrays. That seems to stem from the code that checks to see if the
array is part of a long list of types. It seems easier to check which
array types *aren't* redundant instead of listing the ones that are
redundant and missing some of them. This makes the monitor service
actually watch raid4 arrays.
This one fixes a bug where once manage mode is set, the -a short option
is no longer parsed correctly (true of grow mode as well). This happens
because when you switch the short opts to the bitmap_auto version, it
specifies that the argument must follow a, yet the loop expects to get
an undecorated option and parse it as the disk dev instead of trying to
parse optarg. So, create a new short opt array that is used for manage
and grow that doesn't list a as having an argument.
Mark some files FD_CLOEXEC to protect sendmail from them.
From: Doug Ledford <dledford@redhat.com>
When running with SELinux enabled and using mdadm to monitor devices,
attempts to send emails to an admin will be blocked because mdadm is
holding open /proc/mdstat without setting the FD_CLOEXEC flag. As a
result, sendmail has an open descriptor to /proc/mdstat after the
popen() call, which SELinux decides isn't really any of sendmail's
business and so sendmail gets denied.
Improve error message when trying to create an array that already exists.
From: Doug Ledford <dledford@redhat.com>
Simple bugfix. If an array already exists and we are asked to create
this array, error out with an error message that makes sense to people
instead of an error that the SET_ARRAY_INFO ioctl had an invalid
argument. Plus a typo correction.
Interpret "--metadata=1" with --assemble to imply any version-1, not just 1.0
From: Doug Ledford <dledford@redhat.com>
OK, this one fixes an issue where people were doing manual array
creation and specifying superblock types other than 1.0 (aka, 1.1, 1.2)
and then using mdadm -Ebs to populate their mdadm.conf file. The
general problem is that if you specify a superblock type in the ARRAY
line (or on the command line), then you must specify the superblock type
*exactly*, including the minor version. Unfortunately, mdadm -Ebs
prints out all version 1 superblocks, regardless of minor version, as
just plain old 1. This breaks the mdadm.conf file for anything other
than plain version 1 superblock devices.
So, since I thought it was basically backwards that the mdadm -E output
was lax on specifying the location of the superblock where as the mdadm
-A input was strict, I reversed that. With this patch, the mdadm -E
output is now exact for any given superblock. But, in addition, the
mdadm -A input is now lax for any superblock that doesn't specifically
list the minor version, aka version 1 now means version 1, not version
0.90, but any minor version. So does default/large.