git.ipfire.org Git - thirdparty/mdadm.git/log

imsm: FIX: initalize reshape progress as it is stored in metatdata

reshape prodess cannot be restarted due to no checkpoint information
in mdinfo.
When metadata is read during reshape process or reshape restart,
rehape_progress (mdinfo field) has to be initialized to value
stored in metadata. This allows start reshape from stored
in metadata checkpoint.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

set default chunk in validate_geometry

When chunk size is not set from command line we need to guess it
depending on metadata given on command line or found on listed devices.

Validate_geometry sets the default for it's metadata if chunk is not set.
For external metadata chunk is set only when creating in a container.
For imsm validate_geometry_imsm_orom is responsible for finding default
chunk depending on container metadata loaded. Container will already know
which controller it is attached to, and have this controllers orom
available.
do_default_chunk indicates that we need to find default chunk and
if validate_geometry fails for some metadata it tells us to reset chunk
that may have been set.

Current solution would set default chunk correctly for imsm only if
container device was given on command line. With the list of devices
chunk was always set to 512.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

fix: memory leak in Create

match_metadata_desc allocates memory for st
which is not needed after validate_geometry fails

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

modified message on failure to read metadata in Manage

Loading container may fail if e.g. one of the disks in container
has been detached but udev has not realized the change.
Addition to such array will fail because reading superblock
from one of disks in array fails.
Current message is a bit confusing.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: sysfs_disk_to_scsi_id() adapted to current sysfs format

Problem: sysfs_disk_to_scsi_id() not returns correct scsi_id value.
Reason: sysfs format has been changed

This patch adapt sysfs_disk_to_scsi_id() to new sysfs format.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

User space RAID-6 access fix

> I have applied some patch - with some formatting changes to make it consistent
> with the rest of the code.
>
> I don't really have time to look more deeply at it at the moment.
> Maybe someone else will?...

Hi Neil,

thanks for including this in git.

Actually I did it look at it :-) and I already found a
couple of issues, below is a small fix patch.

Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Seg Fault in incremental if BBM log detected

Bug detected for imsm metadata.
Assembling of array using Incremental switch generate segmentation
fault if BBM log is detected.
Reason: missing return from Incremental_container if BBM is detected
and unnecessary list=NULL assignment.
This patch fix the problem and memory leak in this area.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

mdadm.man add encouragement to shrink filesystem before shrinking array.

Before resizing an array with --size or --array-size, then filesystem
should be resized. mdadm cannot do this so the user should.

Reported-by: Gavin Flower <gavinflower@yahoo.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Fix regression with removing 'failed' and 'detached' devices.

If a request to remove all 'failed' or 'detached' devices chooses to
remove the first device, it will not actually try the removal and will
skip any following devices.

This fixes it.

Reported-by: Rémi Rérolle <rrerolle@lacie.com>
Tested-by: Rémi Rérolle <rrerolle@lacie.com>
Signed-off-by: NeilBrown <neilb@suse.de>

analyse_change: fix calculation of after.data_disks and ->delta_disks.

When changing level when a new number of raid disks was explicitly
specified, we much make sure that the change implied by the
change in level is properly incorporated into the final result.

So explicitly track the change in number of parity disks
(delta_parity) and use it together with delta_disks to determine
final data_disks.
Also set info->delta_disks so other code doesn't need to mirror
this analysis.

And add some errors in cases where a new number of disks was
requested but is not currently supported

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Add raid5 to raid0 case to analyse_change()

Transition raid5 to raid0 was not covered in analyse_change()
Missing case added.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: Add spare disks information to array description

Spares that are specified on container can be used by any array in container.
this means that for every array in container they should be reported.
This let caller know how many spare devices (not used in any array)
are still available.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Get spares from external metadata

For external metadata cases, information about number of spares cannot
be get via ioctl GET_ARRAY_INFO for particular array
(as info variable is initialized by). In md this information is present
in container object not array one.
This causes need to get spare disks number from external metadata.

This information is required for reshape_array() function to decide
if spare disks number satisfy operation requirements.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: delta_disk can have UnSet value

Delta_disk can be set to UnSet value.
This can a cause to pass wrong parameter to reshape_super().
To avoid such situations raid_disks and delta_disks parameters
have to be passed to reshape_super() separately.
It will be up to reshape_super() function validation
and usage of this parameters to avoid not valid values.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

fix: imsm: assemble doesn't restart recovery

Because IMSM_ORD_REBUILD is set in second map not first.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

fix: imsm: size must be in K for rounding to chunk

chunk is in K so size must be converted to K before it is rounded.
Otherwise we may get wrong freesize returned
resulting in creation failure.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: Add information about failed disk to '-E' option

During metadata printout in '-E' option failed disk map field
information is missing. Add this information to mdadm '-E' option
output.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: array after migration should be unfrozen

After level migration array is left frozen.
When process is not externally forked reshape_array() should
unfreeze array before exit (this is not container operation).

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: Use single migration type for all migrations

Use single enum definition/migration type for all migrations.
Using separate definitions causes limitation for number of changes
in metadata implementation during single update for migration/reshape.
Single CH_MIGRATION enum allows for many mtadata parameters change
in single update. It will be possible to change i.e. chunk size together
with raid level. In current implementation 2 metadata updates would be
required for such action, one using CH_CHUNK_MIGR and second using
CH_LEVEL_MIGRATION migration type.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: add raid5 to raid0 case to analyse_change()

Transition raid0 to raid5 is not possible
due to wrong condition in imsm_analyze_change().
Current condition blocks migration possibility instead allow for it.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: Test for raid1 -> raid0 takeover added

Patch introduces test for raid1 to raid0 takeover operation
verification for imsm metadata format.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

UT FIX: Pass all UT in suit 13

Parameters in UT suit 13 were wrongly chosen. This causes that
computed number of backup blocks was too big comparing to loop devices
size used in test. Changes in test parameters (chunk, array size)
causes that backup blocks passes mdadm condition for very small
loop devices.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: md runs recovery instead reshape for growing single disk raid0 array

Problem occurs when we want to expand single disk raid0 array.
This is done via degraded 2 disks raid4 array. When new spare disk
for reshape is added to array, md immediately initiates recovery before
mdadm can configure and start reshape. This is due fact that 2 disk
raid4/5 array is special md case.
Mdmon does nothing here because container is blocked.
This is caused because after takeover array is not in frozen state in md.

Put array in to frozen state after takeover to allow mdadm to finish
configuration before reshape is executed in md.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Container can be left frozen

When container operation fails before child process starts,
array can be left frozen because container_reshape() doesn't make
unfreeze() operation in all error cases, as it is responsible for.

add unfreeze() operation for error case scenarios in reshape_container()

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

UT FIX: imsm container can have different blocks number

When imsm container is created it have different blocks number
in /proc/mdstat depending on containing array raid level (raid0/raid5).
raid5 case:
md127 : inactive sdd[3](S) sde[2](S) sdc[1](S) sdb[0](S)
2884 blocks super external:imsm

raid0 case:
md127 : inactive sdd[3](S) sde[2](S) sdc[1](S) sdb[0](S)
836 blocks super external:imsm

Due to this, it cannot be compared to one value for both cases.

Test imsm container existence before unit test run only.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: Wrong output string format

On wrong starting container reshape conditions, mdadm returns information on console:
mdadmimsm: Operation is not allowed on this container

it is changed to:
mdadm: (imsm) Operation is not allowed on this container

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Spare migration tests updated

Added tests for cases when:
0 - there is no config file at all
0a - config file has no domains defined
9a - spare is in global domain
15 - spare is in global domain for $platform metadata

Test 4 pass condition changed for imsm metadata.

disk_policy only adds controller domain for imsm
when config_rules_has_path=1.
If there are any domains in config then every disk will have also
a controller domain.
As a result the array needing spares has at least
one controller domain even when it doesn't have any other
domains specified explicitly. In this case any spare in the same
controller domain matches when it has no other domains from config.
(This behaviour is different from the case when there is no paths
in config at all as array domain then is null and no spare matches)

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

fix: array is reassembled inactive if stopped during resync

If initial resync or recovery of a redundant array is not finished
before it is stopped then during assembly md will start it as inactive.
Writing readonly to array_state in assemble_container_content
fails because md thinks the array is during reshape.

In fact we only have a reshape if both current and previous map
states are the same. Otherwise we may have resync or recovery.
Setting reshape_active in such cases causes the issue.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

User space RAID-6 access

> test_stripe assumes that the data starts at the start of each device.
> AS you are using 1.2 metadata (the default), data starts about 1M in to
> the device (I think - you can check with --examine)
>
> You could fix test_stripe to put the right value in the 'offsets' array,
> or you could create the array with 1.0 or 0.90 metadata.

Hi Neil,

thanks for the info, maybe this should be a second patch.

In the meantime, please find attached a patch to restripe.c
of mdadm 3.2 (latest, I hope).

This should add the functionality to detect, in RAID-6,
which of the disks potentially has problems, in case of
parity errors.

Some checks take place in order to avoid false positives,
I hope these are correct and enough.

I'm not 100% happy of the interface (too much redundancy),
but for the time being it could be OK.

Of course, any improvement is welcome.

Please consider to include these changes to the next mdadm
whatever release.

bye,

Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: Size is already set in metadata

In metadata size is set already during migration initialization.
There is no reason for second /the same/ calculation.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: size have to be calculated based on first map

Before reshape finalization migration is still present in metadata.
After patch 'imsm: FIX: crash during getting map'
function get_imsm_map() returns correct value,
this means in our case from second (start) map.

We should calculate map size basing on first (final) map.
For this we should request it by setting second function parameter to '0'

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: move common code for array size calculation to function

Array size calculation is made in the same way in few places in code.
Make function imsm_set_device_size() for this common code.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: Debug strings cleanup

Some debug strings remains as they were introduced,
before code was moved to separate function.
Information displayed by debug information in not all cases
was correct.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: fix: imsm_num_data_members() can return error

imsm_num_data_members() can indicate error by returning 0 value
In such case size cannot be set based on 0 value.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: put expansion finalization in to one place

When a->last_checkpoint variable can reach array end,
reshape finalization can be put in to single place.

There is no need to reset migration variables.
imsm_set_disk() will call end_migration() and this sets
all migration variables to required values.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: array size is wrong

Calculation of size is almost ok, except concept of blocks.
Size for setting in md has to be divided by 2 to be correct.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Last checkpoint is not set

When reshape is finished monitor has to set last checkpoint
to the array end to allow metatdata for reshape finalization.
Metadata has to know if reshape is finished or it is broken
On reshape finish metadata finalization is required.
When reshape is broken, metadata must remain as is to allow
for reshape restart from checkpoint.

This can be resolved based on reshape_position sysfs entry.
When it is equal to 'none', it means that md finishes work.
In such situation move checkpoint to the end of array.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: crash during getting map

When get_imsm_map() is called with second_map parameter == '-1'
and array is not in migration state NULL pointer is returned.
This is wrong. '-1' means return map as migration record points.

'-1' can be passed to get_imsm_map() from imsm_num_data_members().
imsm_num_data_members() is called to get current map members based
on migr_state information

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Release mdadm-3.2 - developer only release

Signed-off-by: NeilBrown <neilb@suse.de>

Various compile fixes.

Make "make everything" succeed.
This fixed some real bugs.

Signed-off-by: NeilBrown <neilb@suse.de>

Various man page fixes.

Signed-off-by: NeilBrown <neilb@suse.de>

tests: add IMSM_NO_PLATFORM to some places that were missing it.

Signed-off-by: NeilBrown <neilb@suse.de>

managemon: don't try to add spares when resync/recovery is happening.

kernel should reject this anyway, and we really should not be trying
as it can only lead to confusion.

Signed-off-by: NeilBrown <neilb@suse.de>

Allow explicitly listed spared to be included by default.

When the metadata doesn't identify which array a spare belongs to
we normally require an explicit domain match to connect a spare
with an array.
However when the spare is explicitly listed in argv, it should be
safe to include as long as there is no domain conflict.

Signed-off-by: NeilBrown <neilb@suse.de>

Allow domain_test to report that no domains were found.

Sometime we will need to know the difference between no domains found
and domains didn't match.
So allow domain_test to return different values and fix up all callers
to maintain current behaviour.

Signed-off-by: NeilBrown <neilb@suse.de>

test: remind where the log file is.

Signed-off-by: NeilBrown <neilb@suse.de>

test: remove all the environment handling.

Instead, just include the environ explicitly in the test file
or, where shared, source the shared file.

Signed-off-by: NeilBrown <neilb@suse.de>

Incr: don't exclude 'active' devices from auto inclusion in a container.

For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.

Signed-off-by: NeilBrown <neilb@suse.de>

free_super after assembling a container

Else the devices are held open.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: ignore unknown devices not listed on command line.

If we find a device that has not superblock, we currently fail
unless in auto_assem mode.
However we really should only fail if the device was explicitly listed
in the arg list. So add a test for that.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: allow to assemble container with uuid=0:0:0:0

When there are any arrays in config file the spares with
domain not matching any array are not assembled because
auto assembly is not attempted.
Addition of ARRAY line with uuid=0:0:0:0 in config will work
with modified condition for gathering spares.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: do not move partitions to external container

Arrays on partitions are not supported for external metadata
so do not take such spare from native array.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: map coping causes mdmon crash

Too big map was copied (outside allocated memory) and this causes
mdmon crash for 2 raid0 arrays in container.
Map of correct (smaller) size should be copied,
to not overwrite any internal data.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: mdmon crash during 2 raid0 arrays expansion

When expansion is run on 2 raid0 arrays in container no update
is sent to mdmon because mdmon is off (mdadm performs update)
Memory size for first reshaped array is allocated to satisfy memory
requirements for expanded maps.
Memory for second device is allocated using old disks number, as in
metadata there is no information about this array reshape.
When mdmon initiates second array reshape it overwrites internal
structures and crashes). There is no place to keep expanded maps.
To avoid this situation during loading metadata, allocated memory
should be performed using the maximum used disks number in particular
container.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: Update metadata for second array

When second array reshape is about to start external metadata should
be updated by mdmon in imsm_set_array_state(). For this purposes
imsm_progress_container_reshape() is reused.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm:FIX: change arrays reshape order

Reshape is started from second array, so it causes imsm incompatibility
and problems during second array start.

Reshape should be started in arrays metadata order.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: make sure to break out of the backup loop when finished.

If there is nothing more to backup, then break out of the loop.

Signed-off-by: NeilBrown <neilb@suse.de>

Make sure odisks is consistent between creating and using the fdlist

reshape_prepare_fdlist and child_monitor currently have slightly
different ideas of the 'old number of raid devices' which can cause
major confusion.

So settle on one value, and assign it to odisks early and always use
it.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: round max_progress to old chunk size too.

kernel requires sync_max to be a multiple of the current
chunk size. This is not really 'correct', but we need to work
with it. So round down.

Signed-off-by: NeilBrown <neilb@suse.de>

Allow test to detect 'resync=DELAYED' state

There is no space around the '=' when resync is delayed,
so allow for that in pattern matching.

Signed-off-by: NeilBrown <neilb@suse.de>

Initialise all of file when opening backup file for reshape.

Due to a miscalculation we didn't initialise the whole file.
There is 4K (8 sectors) for the metadata, then the data.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: when restarting, do set new details if they are already set.

When restarting a reshape with internal metadata, the new geometry
is already set and the reshape has been start (but has not been
allowed to continue yet).

So in that case, don't set things and don't ask for a reshape.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow:make sure 'array' is up-to-date before SET_ARRAY_INFO

The value of 'array' might not be current, so SET_ARRAY_INFO
and fail. Just refresh it before setting raid_disks.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: don't try setting new geometry when restarting a native reshape.

md won't let us change raid_disks in this case, so don't even try.

Signed-off-by: NeilBrown <neilb@suse.de>

open_mddev: open RDONLY if RDWR doesn't work.

If an array is read-only then "mdadm -S"
cannot open it to stop it without this fix.

Signed-off-by: NeilBrown <neilb@suse.de>

Call free_super before attempting to add a new device

Now that write_init_super doesn't close fds any more, we need
to call free_super before the ADD_NEW_DISK ioctl.
Also call free_super before some error returns, for cleanliness.

Signed-off-by: NeilBrown <neilb@suse.de>

Man pages update for policy framework

Includes description of POLICY line in /etc/mdadm.conf
and of changes in Monitor and Incremental related to autorebuild.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

11spare-migration: pass conditions for tests 9 and 12 should be reversed

Test 9: We do not block spare migration between different metadatas.
test 13: Migrated spare must belong the same domain as destination -
there is no additional condition for action.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

env-11spare-migration: imsm requires IMSM_NO_PLATFORM set with loop devices

By default IMSM checks if member device belongs to AHCI or ISCI controller.
When using loop devices one must disable these checks by setting
IMSM_NO_PLATFORM.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Call free_super earlier when creating an array.

As free_super now closes fds for member devices, rather than
write_init_super doing it, we need to call free_super earlier,
so that the device (on which we hold an O_EXCL open) is closed
before it is added to the array.

So close at the end of pass-1 rather than after pass-2.

Signed-off-by: NeilBrown <neilb@suse.de>

super1: fix regression in write_init_super.

Now that a 'supertype' container more information, the simplistic
copying of 'st' into 'refst' is incorrect and results in closing
some fds when load_super1(refst) calls free_super().

So do it more correctly using dup_super.

Reported-by: "Labun, Marcin" <Marcin.Labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: not all disks are released in free_imsm_disks()

Adding spare disks to imsm container fails due to problem with writing
new_dev to sysfs. This problem was caused by not closed handle
(opened exclusively) in Manage.c:803.

Disk handle was not closed by free_imsm().
This is due to not released disk_mgmt_list in free_imsm_disks().

Proper release of imsm metadata allows for spare adding without problems.
Memory leak was fixed also.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: avoid adding too many spares to container

Tests revealed that sometimes there are still more spares taken
than needed. The reason for this is that after adding one spare
to container with degraded subarray
if between ioctl in main loop and load_container in try_spare_migration
mdmon activates the spare we see active<raid but find no spares in parent
container and so add an extra spare.

To prevent such behaviour we count active disks in the list returned
by getinfo_super_disks and compare it with subarray->active.
If the number has increased it means new spare was added and activated
so there is no need for more.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Meet SET_ARRAY_INFO ioctl requirements

Problem has been observed when raid10<->raid0 takeover operation
is executed.
In code updating layout, raid_disks and chunk_size for non-restriping
operations in reshape array functions SET_ARRAY_INFO ioctl call was
not succeeded.
Takeover process finish execution with error, mdadm shows message:
"mdadm: failed to set disks"

Cause is not meeting SET_ARRAY_INFO ioctl requirements:
- only one parameter may be changed at one time
- level of current array info and new info should be the same

Patch introduces solution for this issue.
At the beginning of discussed code we read current information
about array and then compare them with new values should be set.
If particular value is different (and should be set),
we are overwrite only this one in array info and then call ioctl.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Remove disks in mdmon for external metadata

For raid10 -> raid0 takeover operation we should reject disks
in mirror by marking them as 'failed' and then remove them from
array by writing "remove" to disk state.

For external metadata second action is executed by mdmon.
According the description in monitor.c:175 when monitor detect
"faulty" in device state, it blocks the device, mark it as failed
in metadata, unblocks the device and finally writes "remove"
to device state.
For external case writing "remove" to device state in mdadm
is not necessary and harmful.
It may cause following issues:
1. "remove" operation for external case in mdadm is not finish
with successful result because monitor may block the device or disk
has been already removed by monitor.
2. If disk is removed by mdadm earlier than mdmon catch "failed" state,
metadata is not properly updated- is not marked as failed.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

WORKAROUND: mdadm hangs during reshape (PART #2)

After loop can occurs that due to 0 value reported by kernel
we have 0 in completed variable.
This is wrong. we are interested in real completed point.
0 value means that we reached sync point set in md,
so we can set completed variable to just reached point.
this point value is stored in max_progress variable.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: start_reshape status should be checked

mdadm should verify if reshape is started before it goes
in to check-pointing machine.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Array after takeover has to be frozen

Problem occurs when we want to expand single disk raid0 array.
This is done via degraded 2 disks raid4 array. When new spare
is added to array, md immediately initiates recovery before
mdadm can configure and start reshape. This is due fact that 2 disk
raid4/5 array is special md case. Mdmon does nothing here because
container is blocked.
Put array in to frozen state allows mdadm to finish configuration
before reshape is executed in md.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: FIX: do not allow for container operation for the same disks number

imsm_reshape_super() currently allows for expansion when requested
raid_disks number is the same as current.
This is wrong. Existing in code condition is too weak.
We should allow for expansion when new disks_number is greater
than current only.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

fix extended partition detection

# mdadm --detail --export /dev/md127p1

Before:
MD_LEVEL=raid5
MD_DEVICES=4
MD_METADATA=0.90

After:
MD_LEVEL=raid5
MD_DEVICES=4
MD_CONTAINER=/dev/md0
MD_MEMBER=0
MD_UUID=55746a20:925d24a7:4f9bd7e2:9c9a411f

We parse the symlink target with a format:

../../block/mdXXX/mdXXXpYY

...and need the second '/' from the end of the string to read detect a
'md' device.

Reported-by: Krzysztof Wasilewski <krzysztof.wasilewski@intel.com>
Cc: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Dynamic hot-plug udev rules for policies

Neil,
Please consider this patch that once was discussed and I think agreed with in general direction. It was sent a while ago
but somehow did not merged into your devel3-2. This patch enables hot-plug of so called bare devices (as understand by domain policies rules in mdadm.conf).
Without this patch we do NOT serve hot-plug of bare devices at all.

Thanks,
Marcin Labun

Subject was: FW: Autorebuild, new dynamic udev rules for hot-plugs

>>From c0aecd4dd96691e8bfa6f2dc187261ec8bb2c5a2 Mon Sep 17 00:00:00 2001
From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Date: Thu, 23 Dec 2010 16:35:01 +0100
Subject: [PATCH] Dynamic hot-plug udev rules for policies
Cc: linux-raid@vger.kernel.org, Williams, Dan J <dan.j.williams@intel.com>, Ciechanowski, Ed <ed.ciechanowski@intel.com>
When introducing policies, new hot-plug rules were added to support
bare disks. Mdadm was started for each hot plugged block device
to determine if it could be used as spare or as a replacement member for
degraded array.
This patch introduces limitation of range of devices that are handled
by mdadm.
It limits them to the ones specified in domains associated with
the actions: spare-same-port, spare and spare-force.
In order to enable hot-plug for bare disks one must update udev rules
with command

mdadm --activate-domains[=filename]

Above command writes udev rule configuration to stdout. If 'filename'
is given output is written to the file provided as parameter. It is up
to system administrator what should be done later. To make such rule
permanent (i.e. remain after reboot) rule should be writen to
/lib/udev/rules.d directory. Other cases will just need to write it to
/dev/.udev/rules.d directory where temporary rules lies. One should be
aware of the meaning of names/priorities of the udev rules.

After mdadm.conf is changed one is obliged to re-run
"mdadm --activate-domains" command in order to bring the system
configuration up to date.
All hot-plugged disks containing metadata are still handled by existing
rules.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Ignore/don't set data_disks for level=1

When analyse_change sets level=1, data_disks is meaningless
as is layout.
So don't set them, and make sure we ignore them.

Signed-off-by: NeilBrown <neilb@suse.de>

Mistake in raid1->raid5 migration

1. Mistake in target level comparison.
2. Initialize reshape->after.data_disks field
to proper spares_needed calculation

Signed-off-by: NeilBrown <neilb@suse.de>

Add raid1->raid0 takeover support

Add support for raid1 to raid0 takeover operation in user space.
This patch includes support for native and imsm metadata.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

WORKAROUND: mdadm hangs during reshape

During reshape when reshape is finished in md, progress_reshape() hangs
on select().
This is because 'sync_completed' is reset to zero before 'sync_action'
becomes 'idle', and we don't look for notification on 'sync_action'.

So if completed becomes zero after reshape_progress has made some
progress, then deduce that reshape has finished.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: monitor doesn't handshake with md

when in container are present raid0 and raid5 arrays, and reshape order is:
1. raid0 array
2. raid5 array

mdadm cannot set new raid_disks for raid0 array. For this action md has to have
handshake with mdmon. We have the following conditions:
1. Raid0 is not monitored
2. raid0 has been just takeovered to raid4/5 (it has to be monitored
3. monitor has to start monitor new raid4/5 array
4. monitor is not started (it is started to second raid5 array)
In such situation pig_monitor is required to let know to m monitor about new array
(not in the starting monitor case only)

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: support for Intel SAS controller in get_disk_controller_domain handler

get_disk_controller_domain recognizes Intel (R) SAS controller (isci).
The function returns three different strings that differentiate disk attached
to AHCI, ISCI or unknown controller types to create separate domains
for each case.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: detail_platform_imsm supports Intel SAS controller (isci driver)

Added support in detail_platform_imsm for Intel (R) SAS controller.
Function supports AHCI and ISCI controllers. RAID properties are derived
from common OROM for both types.

Based on code From: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: prepare detail_platform_imsm to support different types of controllers

Pull out the AHCI specific parts of detail_platform_imsm to separate functions.
Introduce support new types of controllers.

Based on code From: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: support for Intel(R) SAS controller in imsm handler

add_to_super_imsm handler is able to recognize new type of controller.
It stores the controller information in its structures and blocks
mixing of different controller type in the same container.
In this way it maintains compatibility between Linux and Windows IMSM RAID
stacks. IMSM metadata does not allow arrays to span on devices attached to
different storage controllers.

Based on code From: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm platform: support for Intel(R) SAS controller.

This patch adds platform support for SAS controller(s) built in Intel(R) Patsburg
chipset.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

FIX: Reset disk state if disk is missing

If we can't read actual disk state, it shoud be initiated
to 0.
Overwise it may be out of date value resulting false action
later in code (e.g. set disk to improper state).

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Check number of failed disks durig raid10->raid0 takeover

Number of failed disks MUST be half of initial number of disks.
If number of failed disks is different we should not update
metadata- data corruption may occur after array reassemlation.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

raid0->raid10 takeover- process metadata update

Implementation of raid0->raid10 takeover metadata update
at process_update level.
- We are using memory previously allocated in prepare_update to
create two dummy disks will be inserted in the metadata and
new imsm_dev structure with expanded disk order table.
- Update indexes in disk list
- Update metadata map
- Update disk order table

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

raid0->raid10 takeover- allocate memory for added disks

Allocate memory will be used in process_update.
For raid0->raid10 takeover operation number of disks doubles
so we should allocate memory for additional disks
and one imsm_dev structure with extended order table.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

raid0->raid10 takeover- create metadata update

Create metadata update for raid0 -> raid10 takeover.
Because we have no mdmon running for raid0 we have to
update metadata using local update mechanism

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Add raid10 -> raid0 takeover support

The patch introduces takeover from level 10 to level 0 for imsm
metadata. This patch contains procedures connected with preparing
and applying metadata update during 10 -> 0 takeover.
When performing takeover 10->0 mdmon should update the external
metadata (due to disk slot and level changes).
To achieve that mdadm calls reshape_super() and prepare
the "update_takeover" metadata update type.
Prepared update is processed by mdmon in process_update().

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Fix some issues with setting 'new' state of a reshape

- when reshaping a container, ->reshape_active is already set
  even though it isn't really active yet, so we need to set
  the new geometry even when reshape_active is set.  This is safe.

- When restarting a reshape, make sure the reshape_position is set
  appropriately when external metadata is used.

Signed-off-by: NeilBrown <neilb@suse.de>

Don't close fds in write_init_super

We previously closed all 'fds' associated with an array in
write_init_super .. sometimes, and sometimes at bad times.
This isn't neat and free_super is a better place to close them.

So make sure free_super always closes the fds that the metadata
manager kept hold of, and stop closing them in write_init_super.

Also add a few more calls to free_super to make sure they really do
get closed.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Fix up analysis of reshape from RAID1 to RAID5.

Need to allow raid-disks to change at the same time.

NeilBrown <neilb@suse.de>