_avail_space1() is calls from both avail_space1() and validate_geometry1()
and does slightly different things.
The partial code sharing doesn't really help. In particularly the
responsibility for setting the size of the array is currently
confused.
So duplicate the code into the two locations - one where 'super' is
always NULL (validate_geometry1) and one where it is never NULL
(avail_space1), and simplify.
This call to validate_geometry is really rather gratuitous.
It is purely about the fact that super0 cannot use more than 4TB.
So just make it an explicit test - less confusing that way.
With this, validate_geometry is only called from Create, which
makes it easier to reason about.
Also validate_geometry is now never passed NULL for the 'chunk'
parameter, so we can remove those annoying tests for NULL.
Grow: don't hold array open while waiting for reshape.
If we will need to change array level when a reshape completes, a copy
of mdadm waits in the background.
Currently this copy hold the device (/dev/mdX) open. This prevents
the array from being stopped.
So close the file descriptor and re-open after the reshape completes.
super1: update data_size when performing "revert-reshape".
The "data_size" is with respect to "data_offset". When the kernel
changes "data_offset" it modifies "data_size" to match - see
md_finish_reshape() in the kernel.
So when mdadm switches the data_offset for the new data_offset, it
must update data_size correspondingly.
Let the first created array be RAID5 rather than RAID0. This makes
the test harder than before, because everything after the first
Create has do be done indirectly through mdmon.
This was waiting on "reshape_position" which doesn't
get update events.
Before sysfs_wait was introduced, the code to wait didn't
wait at all, so it spun.
With sysfs_wait, it would wait forever.
Change to wait in sync_completed which does get events.
* Aligned the spelling of “RAID” to use captial letters in all places.
* Aligned the spelling of the RAID level names (LINEAR, RAID1, …) to use capital
letters in all places, except for the string “faulty” in places where not the
RAID level was meant.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Stop: fix up synchronising end of reshape to good boundary.
If we stop too soon after reshape starts (probably only during
testing), we can get confused by the status of the reshape.
If that might be happening - sleep a bit longer.
Also allow for reshape going unusually slowly (again, probably only
during testing).
Grow: use mdstat_wait to wait for delayed reshape.
Having a fix time for a wait is clumsy and can make us
wait much too long.
So use mdstat_wait and keep the mdstat_fd open.
This requires an 'mdstat_close' so it doesn't stay open
forever.
DDF load headers: if primary is invalid, don't check fields.
Currently we compare fields between primary and secondary
superblocks, before we check if the primary is even valid.
This is a bit backwards, so reverse it.
The "indirect" code path for adding VDs was not working correctly
for secondary RAID level. The "other BVDs" were not transmitted
to mdmon. Thus mdmon wouldn't build up correct information, and
RAID creation would fail when mdmon was already running on the container.
The return value of disk.raid_disk may be wrong.
The old code was using raiddisk, which is only valid with auto
layout. This leads to errors when arrays are created with
specified disks and mdmon is already running, like this:
Implement kill_subarray, for mdmon running and not running.
The way Kill_subarray() is implemented, this requires that the
DDF layer uses "currentconf" to remember the last subarray
queried with container_content(), and use it as the one to kill.
I don't like this much but IMSM does it the same way.
DDF: write_init_super_ddf: don't zero superblocks for subarrays
commit d682f344 inserted this call to "Kill" in write_init_super_ddf:
"Matching the functionality already in super0 and super1, when
we first create a container, remove any other recognisable metadata to
ensure it doesn't cause confusion."
But we should do this only at first container creation, not when
subarrays are created later.
Monitor: Don't write metadata in inactive array state
The kernel docs state that meta data is never written in states
clear, inactive, suspended, readonly, and read_auto.
Why should this be different for containers?
We need to write metadata when the array is disabled, though.
Tested with the DDF (10*) and IMSM (9*) tests, works.
DDF: add_to_super_ddf: Use same amount of workspace as other disks
If there are already disks in the container, reserve the same amount
of workspace as the first disk. Fill in the primary/secondary/
workspace LBA values.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
DDF: use LBA_OFFSET macro instead of lba_offset field
Remove the lba_offset field from struct vcl. This field acted as
a "cache" for the address of the lba_offset field in the vd_config
structure. This isn't useful any more if there are multiple
vd_configs in a vcl.
This patch also adds __cpu_to_be64 in two places where it has been
quite obviously forgotten (ddf_set_disk, ddf_activate_spare).
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
Instead of allocating the other_bvds array and every element
separately, allocate all in a single chunk. Also, move allocation
in a subroutine as it's used in several places.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
Support for RAID 10 makes it necessary to rewrite the algorithm
for deriving DDF layout from MD layout. The functions level_to_prl
and layout_to_rlq are combined in a single function that takes
md layout parameters and converts them to DDF.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
DDF: layout_ddf2md: new DDF->md RAID layout conversion
layout_ddf2md() is a new RAID layout conversion routine.
It obsoletes the previous separate routines for obtaining
md level and layout (map_num1, rlq_to_layout).
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
The DDF code was assuming that the VD slots 0..populated_vdes
were used and the rest was unused. Remove this assumption and
deal with empty slots instead.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
If secondary RAID level is taken into account, translation between
the md RAID member (raid_disk) and the index of a physical disk
in a BVD becomes more complex.
Also, take into account that the member list can have unused entries
(this is independent of secondary RAID level).
Adapt usage of find_vdcr() accordingly
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
This patch implements the previously unsupported case where
store_super_ddf is called with a non-empty superblock.
For DDF, writing meta data to just one disk makes no sense.
We would run the risk of writing inconsistent meta data
to the devices. So just call __write_init_super_ddf and
write to all devices, including the one passed by the caller.
This patch assumes that the device to store the superblock on
has already been added to the DDF structure. Otherwise, an
error message will be emitted.
Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
Assemble: avoid a consistency check when --force is given.
mdadm will normally not include a device into an array if that device
reports that the "best" device has failed, as this normally implies
some sort of inconsistency.
However when --force is given it means that the given drives really
should be assembled if at all possible so in that case the test should
be avoided.
The particular case where this was a problem was a RAID5 were all
devices had the same event count but three of them reported that the
first two had failed.
As they all had the same event count the first was taken as the 'best'
and that caused the later ones to be excluded. Listing one of the
later ones first allowed the array to be assembled. So in this case
the test clearly just got in the way and did nothing useful.
Grow: notice when --stop is synchronising a reshape and don't mess it up.
--stop now tries to wait for a reshape to be at just the right spot.
However for a reducing reshape, mdadm will be running in the
background watching, and might adjust sync_max and mess things up.
So teach "progress_reshape" to notice when "sync_max" is modified, and
leave it alone.
progress_reshape() may not set reshape_completed if the reshape is
interrupted, so we need to initialize it to the current value before
hand, so the value used afterwards is credible.
Stop: improve synchronising of reshape with whole stripes.
It is possible for 'sync_completed' to be further ahead than
we deduced from 'reshape_position'. However we cannot read it while
the array is frozen, so it is hard to know.
Once that array is unfrozen, check and if sync_completed is ahead of
'sync_max', push 'sync_max' well ahead if 'sync_completed' so it
will all synchronise up properly.
Assemble: improve messages when restarting a reshape.
If the restarted reshape needs a backup file and we don't have one,
that should be reported before we try to start the array.
Also we shouldn't say the "Cannot grow" but "cannot complete".
Assemble: ignore devices= if container= is present.
If "container=" is present, then we are going to assemble from the
given container where that container is made of those devices or not.
So in this case the "devices=" is purely documentation and is best
ignored.
As part of this, move the test on the "container=" value when that
start with "/" up before the device is opened. There sooner we test
things, the better.
Reported-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
Config: use better device names for "DEVICES container"
When "containers" appears on the "DEVICES" line (which is does by
default), use names from the mdadm map file instead of kernel names,
when possible.
This mean that the name will be more likely to appear in mdadm.conf
and so more likely to match "container=" tags.
If the container metadata doesn't know how many device to expect (as
is the case with IMSM), don't fail an --assemble which over-specifies
the number of devices.
Manage: check alignment when stopping an array undergoing reshape.
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.
The kernel only enforces the new stripe size multiple.
So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".