Adam Kwolek [Fri, 26 Nov 2010 07:08:01 +0000 (08:08 +0100)]
imsm: Allow multiple spares to be collected.
Assumption for spares searching was that after picking new device, it
has to be added to array before next search. This causes returning
different disk on each call.
When creating a spare list during Online Capacity Expansion, we will
first collect the devices list and then all devices are added to md.
Picked device from spares pool has to be checked against picked
devices so far. If not, the same disk will be returned all the time.
Already picked devices are stored in the list and this list is used
for new devices verification also.
So add an extra arg to imsm_add_spare to hold a list of known
spares to ignore.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Adam Kwolek [Mon, 29 Nov 2010 01:53:16 +0000 (12:53 +1100)]
imsm: FIX: core dump during imsm metadata writing
Wrong number of disks during metadata update causes core dump. New
disks number based on internal mdmon information has to used for
calculation (not previously read from metadata).
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Adam Kwolek [Mon, 29 Nov 2010 01:28:01 +0000 (12:28 +1100)]
imsm: Add support for general migration
Internal IMSM procedures need to support the General Migration.
It is used during operations like:
- Online Capacity Expansion,
- migration initialization,
- finishing migration,
- apply changes to raid disks etc.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Adam Kwolek [Mon, 29 Nov 2010 01:11:09 +0000 (12:11 +1100)]
Treat feature as experimental
Due to fact that IMSM Windows compatibility was not tested yet,
feature has to be treated as experimental until compatibility
verification will be performed.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Adam Kwolek [Mon, 29 Nov 2010 00:57:51 +0000 (11:57 +1100)]
Disk removal support for Raid10->Raid0 takeover
Until now Raid10->Raid0 takeover was possible only if all the mirrors
where removed before md starts the takeover. Now mdadm, when
performing Raid10->raid0 takeover, will remove all unwanted mirrors
from the array before actual md takeover is called.
Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Anna Czarnowska [Fri, 26 Nov 2010 13:29:53 +0000 (14:29 +0100)]
Monitor: fix writing autorebuild.pid
If /var/run/mdadm doesn't exist we can never succeed writing
so we should try to create it first. When we make sure it is there we
write pid file as before.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Anna Czarnowska [Fri, 26 Nov 2010 10:51:59 +0000 (11:51 +0100)]
Monitor: reset dev when size too small
Cc: linux-raid@vger.kernel.org, Williams, Dan J <dan.j.williams@intel.com>, Ciechanowski, Ed <ed.ciechanowski@intel.com>
Otherwise spare will be considered good anyway.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Anna Czarnowska [Sun, 28 Nov 2010 22:51:27 +0000 (09:51 +1100)]
Monitor: few bug fixes for spare migration
1. If array not changed we should still report any degraded
- another array may have a new spare that we can move.
2. Array with err=1 can't give a spare.
3. We look for spares in "from" not "st" which is supertype
and has devname=NULL.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 28 Nov 2010 22:40:15 +0000 (09:40 +1100)]
Incremental - avoid including wayward devices.
If a devices - typically in a mirrored set - is assembled
independently of the other devices, and then attempted to be brought
back into the set, it could contain inconsistent data. It should not
be included.
So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device. If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.
This patches fixes --incremental, --assemble was done in an earlier
patch.
NeilBrown [Thu, 25 Nov 2010 07:58:45 +0000 (18:58 +1100)]
Improve opt parsing, and distinguish long from short.
In several cases, two different long options map to the same short
option. So e.g. you could give '--brief' and it would be interpreted
as '--bitmap'. That isn't really good.
So for every shared short option, define an option number and return
that for the long option instead. Then always check for both the
short and long options.
Also give some bugs like " mode == 'G'" which should be '== GROW'.
NeilBrown [Thu, 25 Nov 2010 07:37:23 +0000 (18:37 +1100)]
Monitor: separate 'choose_spare' out from 'move_spare'
choosing a spare from a container is more complicated that
from a native array. So separate out choose_spare to make it easier
to use an alternate implementation
Dan Williams [Tue, 23 Nov 2010 05:39:58 +0000 (16:39 +1100)]
External reshape (step 2): Freeze container
When growing the number of raid disks the reshape process will promote
container-spares to subarray-spares (later the kernel promotes them to
subarray-members in raid5_start_reshape()). The automatic spare
promotion that mdmon performs upon seeing a degraded array must be
disabled until the reshape process has been initiated. Otherwise, mdmon
may start a rebuild before the reshape parameters can be specified.
In the external case we arrange for the monitor to be blocked, and
turn off the safemode delay.
Mdmon is updated to check sync_action is not frozen before initiating
recovery.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Dan Williams [Thu, 18 Nov 2010 09:22:59 +0000 (10:22 +0100)]
External reshape (step 1): container reshape and ->reshape_super()
In the native metadata case Grow_reshape() and the kernel validate what
reshapes are possible / supported and the kernel handles all the metadata
updates. In the external case the metadata format may have specific
constraints above this baseline. External formats also introduce the
constraint of only permitting some reshapes at container scope versus subarray
scope. For exmaple imsm changes to 'raiddisks' must be applied to all arrays
in the container.
This operation assumes that its 'st' parameter has been obtained from
super_by_fd() (such that st->subarray is up to date), and that a snapshot of
the metadata has been loaded from the container.
Why a new method, versus extending an existing one?
->validate_geometry: this routine assumes it is being called from Create(),
adding reshape complicates the cases that this routine needs to handle. Where
we find that checks can be shared between the two cases those routines
refactored into common code internal to the metadata handler, i.e. no need to
provide a unified external interface. ->validate_geometry() also does not
expect to update the metadata.
->update_super: this is meant to update single fields at Assembly() and only at
the container scope. Reshape potentially wants to update multiple fields at
either container or subarray scope.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Dan Williams [Thu, 18 Nov 2010 09:22:01 +0000 (10:22 +0100)]
Assemble: fix assembly in the delta_disks > max_degraded case
Incremental assembly works on such an array because the kernel sees the
disk as in-sync and that the array is reshaping. Teach Assemble() the
same assumptions.
This is only needed on kernels that do not initialize ->recovery_offset
when activating spares for reshape.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Dan Williams [Tue, 23 Nov 2010 04:08:19 +0000 (15:08 +1100)]
Manage: allow manual control of external raid0 readonly flag
mdadm --readwrite <subarray> will clear the external readonly flag ('-'
to '/'), but only for redudant arrays. Allow raid0 arrays as well so
the user has a simple helper to control this flag.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Dan Williams [Tue, 23 Nov 2010 04:00:54 +0000 (15:00 +1100)]
block monitor: freeze spare assignment for external arrays
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares. In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.
When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level. Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.
Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication. If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version. Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation. This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Marcin Labun [Mon, 22 Nov 2010 09:58:07 +0000 (20:58 +1100)]
Policy is aware of metadata disk's controller domains.
Platform (metadata) domain let the metadata handlers differentiate
disk domains based on controllers that the disk belongs to.
Platform domain is sub-domain inside user specified domain
in mdadm.conf configuration files inheriting all parameters from it.
The metadata domain name is used disk domain matching functions.
The disk with the same metadata domain name belong to the same metadata
domain.
New metadata handler is added that retrieves platform domain string based
on disk path:
const char *(*get_disk_controller_domain)(const char *path);
Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 22 Nov 2010 09:58:07 +0000 (20:58 +1100)]
Monitor: teach spare migration about containers
When trying to move a spare, move to the container of a degraded
array, not to the array itself.
And don't try to move from a subarray, only from a native or container
array.
And don't move from a container which contains degraded subarrays.
NeilBrown [Mon, 22 Nov 2010 09:58:07 +0000 (20:58 +1100)]
Monitor: policy based spare migration.
Rather than only migrating between arrays with the same spare_group,
we now migrate based on domains set in the policy.
In order for spare_group to continue to work, we treat it as a domain
of the destination array, and a domain of any device we might remove
from a source array.
Anna Czarnowska [Mon, 22 Nov 2010 09:58:07 +0000 (20:58 +1100)]
imsm: create mdinfo list of disks in a container from supertype
If getinfo_super is called on a container supertype we only get information
on first disk. As a parameter it uses reference to preallocated
mdinfo structure. Amending getinfo_super to return full list of disks
would require ammending all previous calls and subsequently freeing memory
allocated for mdinfo list.
Function container_content that returns a mdinfo list is written
specifically for assembly, performing actions not needed to just fill
mdinfo. It also does not include spares so is unsuitable.
As an alternative a new function getinfo_super_disks is created
to obtain information about all disks states in array.
Existing function sysfs_free is used to free memory
allocated by getinfo_super_disks.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Anna Czarnowska [Mon, 22 Nov 2010 09:58:06 +0000 (20:58 +1100)]
mdadm: added --no-sharing option for Monitor mode
--no-sharing option disables moving spares between arrays/containers.
Without the option spares are moved if needed according to config rules.
We only allow one process moving spares started with --scan option.
If there is such process running and another instance of Monitor
is starting without --scan, then we issue a warning but allow it
to continue.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
Anna Czarnowska [Mon, 22 Nov 2010 09:58:06 +0000 (20:58 +1100)]
Monitor: set err on arrays not in mdstat
mse can be NULL when the array was not in mdstat when we read it
but existed in statelist and was recreated after reading mdstat.
In this case we set err as we can't get full update on this array
this time.
If the same array is given twice in command line it appears twice
in statelist. The first one will mark mse->devnum=INT_MAX
so the second one can't find mse. We set err on the second one as
it's not needed. Also if it becomes degraded we would look for spares
twice for the same array.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 22 Nov 2010 09:58:06 +0000 (20:58 +1100)]
Add action=spare-same-slot policy.
When "mdadm -I" is given a device with no metadata, mdadm tries to add
it as a 'spare' somewhere based on policy.
This patch changes the behaviour in two ways:
1/ If the device is at a 'path' where a previous device was removed
from an array or container, then we preferentially add the spare to
that array or container.
2/ Previously only 'bare' devices were considered for adding as
spares. Now if action=spare-same-slot is active, we will add
non-bare devices, but *only* if the path was previously in use
for some array, and the device will only be added to that array.
Based on code
From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
NeilBrown [Mon, 22 Nov 2010 09:58:06 +0000 (20:58 +1100)]
incr/spare: recheck allowed action for each metadata.
The current act_spare tests only test if it is allowed for some
metadata.
As we check each array or partitioning type, we need to double-check
that sparing is allowed for that array or partitioning type.
extension of IncrementalRemove to store location (path-id) of removed device
If the disk is taken out from its port this port information is
lost. Only udev rule can provide us with this information, and then we
have to store it somehow. This patch adds writing 'cookie' file in
/dev/.mdadm/failed-slots directory in form of file named with value of
f<path-id> containing the metadata type and uuid of the array (or
container) that the device was a member of. The uuid is in exactly
the same format as in the mapfile.
FAILED_SLOTS_DIR constant has been added to hold the location of
cookie files.
added --path <path_id> to give the information on the 'path-id' of removed device
<path-id> allows to identify the port to which given device is plugged
in. In case of hot-removal, udev can pass this information for future
use (eg. write this name as 'cookie' allowing to detect the fact of
reinserting device to the same port).
--path <path-id> parameter has been added to device removal handle
(and char *path has been added to IncrementalRemove() to pass this
value) in order to pass path-id to this handler.
NeilBrown [Mon, 22 Nov 2010 09:58:06 +0000 (20:58 +1100)]
Assemble: simplify the handling of is_member_busy.
This is somewhat inconsistent with the last member of a
container getting special handling.
Just simplify it so the code seems to make sense and important
is easy to follow.
NeilBrown [Mon, 22 Nov 2010 09:58:05 +0000 (20:58 +1100)]
Remove content from mddev_dev
Now that the next_member loop is much smaller it is easy to
just use 'content' rather than stashing it in 'tmpdev->content'.
So we can remove the 'content' field from 'struct mddev_dev'.
NeilBrown [Mon, 22 Nov 2010 09:24:35 +0000 (20:24 +1100)]
Use new container_content rather than passing subarray to load_super.
Now that we can ask container_content for a specific subarray,
we don't need to pass the subarray name to load_super, and have it
secretly modify the returned state.
NeilBrown [Mon, 22 Nov 2010 08:35:25 +0000 (19:35 +1100)]
mapinfo: simplify subarray handling.
We don't need ->container_dev here, and we will soon be passing
subarray as an explicit arg to load_super.
So simplify extraction of subarray and move the strcpy close to
->load_super.
NeilBrown [Mon, 22 Nov 2010 08:35:25 +0000 (19:35 +1100)]
Assemble - avoid including wayward devices.
If a device - typically in a mirrored set - is assembled independently
of the other devices, and then attempted to be brought back into the
set it could contain inconsistent data. It should not be included.
So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device. If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.
This patch only fixes --assemble, not --incremental