Dan Williams [Thu, 15 May 2008 06:48:54 +0000 (16:48 +1000)]
add infrastructure to receive higher order commands, like remove_device
From: Dan Williams <dan.j.williams@intel.com>
Each md_message encapsulates a single command. A command includes an 'action'
member which describes what if any data comes after the action. Communication
with the monitor involves updating the active_cmd pointer and then writing to
mgr_pipe. Pass/fail status is returned via mon_pipe.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 15 May 2008 06:48:49 +0000 (16:48 +1000)]
handle disk failures
From: Dan Williams <dan.j.williams@intel.com>
Added curr_state as a parameter to set_disk. Handlers look at this to
record components failures, and set global 'degraded' or 'failed'
status.
When reading the state as faulty:
1/ mark the disk failed in the metadata
2/ write '-blocked' to the rdev state to allow the kernel's failure
mechanism to advance
3/ the kernel will take away the drive's role in remove_and_add_spares()
4/ once the disk no longer has a role writing 'remove' to the rdev state
will get the disk out of array.
There is a window after writing '-blocked' where the kernel will return
-EBUSY to remove requests. We rely on the fact that the disk will
continue to show faulty so we lazily wait until the kernel is ready to
remove the disk. If the manager thread needs to get the disk out of the
way it can ping the monitor and wait, just like the replace_array()
case.
[buglet fix: swap the parameters of attr_match in read_dev_state]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Neil Brown [Thu, 15 May 2008 06:48:12 +0000 (16:48 +1000)]
Change write_init_super to be called only once.
The current model for creating arrays involves writing
a superblock to each device in the array.
With containers (as with DDF), that model doesn't work.
Every device in the container may need to be updated
for an array made from just some the devices in a container.
So instead of calling write_init_super for each device,
we call it once for the array and have it iterate over
all the devices in the array.
To help with this, ->add_to_super now passes in an 'fd' and name for
the device. These get saved for use by write_init_super. So
add_to_super takes ownership of the fd, and write_init_super will
close it.
This information is stored in the new 'info' field of supertype.
As part of this, write_init_super now removes any old traces of raid
metadata rather than doing this in common code.
Neil Brown [Thu, 15 May 2008 06:48:08 +0000 (16:48 +1000)]
Reduce openning of dev in create.
Now that validate_geometry opens and checks the device,
we don't need to do it as much in top level Create.
We only need it to check for old array or filesystem info.
So only open the device at that place.
Bill Nottingham [Mon, 5 May 2008 09:44:04 +0000 (19:44 +1000)]
Simplistig locking for --incremental.
From: Bill Nottingham <notting@redhat.com>
mdadm --incremental doesn't really do any locking. If you get multiple
events in parallel for the same device (that has not yet started), they
will all go down the path to create the array. One will succeed, the
rest will have SET_ARRAY_INFO die with -EBUSY (md: array mdX already has disks!)
and will exit without adding the disk.
Original bug report is: https://bugzilla.redhat.com/show_bug.cgi?id=433932
This is solved by adding very very rudimentary locking. Incremental() now
opens the device with O_EXCL to ensure only one invocation is frobbing the
array at once. A simple loop just tries to open 5 times a second for 5
seconds. If the array stays locked that long, you probably have bigger
issues.
Neil Brown [Mon, 28 Apr 2008 06:30:31 +0000 (16:30 +1000)]
Small improvements to --incremental for arrays that are in the middle of reshape
There is still a problem: If array is partially assembled and started
read-only, the last device doesn't get added properly. Probably a kernel
problem.
Neil Brown [Mon, 28 Apr 2008 06:30:09 +0000 (16:30 +1000)]
Allow creation of a RAID6 with a single missing device.
This did not work before as we couldn't mark it clean as there would
be some parity blocks out of sync, and raid6 will not assemble a
dirty degraded array.
So make such arrays doubly degraded (the last device becomes a spare)
and clean.
Neil Brown [Mon, 28 Apr 2008 06:29:37 +0000 (16:29 +1000)]
Fix problems with array.size overflowing on large arrays.
array.size is 32bits and counts K. So for arrays with
more than 4Terrabytes, it can overflow.
The correct number can be read from sysfs, but there are still
a few places that use array.size and risk truncation. What is worse.
they compare a number of kilobytes with a number of sectors !!
So use get_component_size() to read the sysfs information, and be
more consistent about units.
Neil Brown [Mon, 28 Apr 2008 06:29:12 +0000 (16:29 +1000)]
Fix for segfault when reading /proc/mdstat
Some kernel versions don't put a space between 'active' and '(auto-read-only)'
in /proc/mdstat. This causes a parsing problem leaving 'level' set to
NULL which causes a crash.
So synthesise a space there if it is missing, and check for 'level' to
be NULL and don't de-ref if it is.
Neil Brown [Tue, 16 Oct 2007 03:52:35 +0000 (13:52 +1000)]
Fix restarting of a reshaping array.
The last release broke the ability to assemble an array that
was in the middle of a reshape.
This patch adds code to test if the critical section needs
to be restored or not so that - if we have failed to restore it,
we know whether to fail or not.
martin f. krafft [Sun, 30 Sep 2007 12:22:56 +0000 (13:22 +0100)]
Fix segfault on assembly on amd64 with v1 superblocks
Commit a40b4fe introduced a temporary supertype variable tst, instead of
manipulating st directly. However, it was forgotton to pass &tst into the
recursive load_super1 call, causing an infinite recursion.
Signed-off-by: martin f. krafft <madduck@debian.org>
Neil Brown [Mon, 24 Sep 2007 03:14:13 +0000 (13:14 +1000)]
Make "--write-mostly" effective when re-adding a device to an array.
Fixes Debian Bug 442874
When we discover that we can 're-add' a drive, we forget to check the
write-mostly flag.
This highlights the fact that you cannot turn 'off' the write-mostly
flag at this point. I wonder if that is a problem...
Iustin Pop [Tue, 11 Sep 2007 14:20:19 +0000 (16:20 +0200)]
Explain the read-balancing algorithm for RAID1 better in md.4
From: Iustin Pop <iusty@k1024.org>
There are many questions on the mailing list about the RAID1 read
performance profile. This patch adds a new paragraph to the RAID1
section in md.4 that details what kind of speed-up one should expect
from RAID1.
Neil Brown [Mon, 20 Aug 2007 04:14:42 +0000 (14:14 +1000)]
Report error when grow cannot be restarted.
Make sure that if --assemble find an array in the critical region
of a reshape, and cannot find the critical data to restart the
reshape, it gives an error message.