git.ipfire.org Git - thirdparty/mdadm.git/log

Assemble/Incremental: don't hold O_EXCL on mddev after assembly.

As soon as the array is assembled, udev or systemd might run
fsck and mount it. So we need to drop O_EXCL promptly.

Signed-off-by: NeilBrown <neilb@suse.de>

Two small fixes related to enough()

1/ enough_fd doesn't use avail_disks any more, so discard it.

2/ Manage_Add increments 'found' at the wrong place, so it can
waste time before calling enough().

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: improve support for "DEVICE" based restriction in mdadm.conf

--incremental currently fails if the device name passed does not
textually match the names permitted by the DEVICE line in mdadm.conf.
This is problematic when "mdadm -I" is run by udev as the name given
can be a temp name.

This patch makes two improvements:
1/ We generate a list of all existing devices that match the names
  in mdadm.conf, and allow rdev based matching
2/ We allows extra aliases to be provided on the command line, and
  perform textual matching on those.  This is particularly suitable
  for udev usages as ${DEVLINKS} can be provided even though the links
  make not yet be created.

Signed-off-by: NeilBrown <neilb@suse.de>

Systemd integration for starting newly-degraded arrays.

Normally "mdadm -I" will not start an array if it has reason to
expect further devices.
This means that if a device is removed while the host is shut down,
"mdadm -I" will never start the device.

If the array is know to the host, it make sense to start the array
anyway after a reasonable timeout.

This patch adds systemd/udev infrastructure so that 30 seconds after
a known array first becomes able to be assembled as a degraded array,
the array will be assembled even if more devices are still expected.

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: add --export handling.

If --export is given with --incremental, then
  MD_DEVNAME
is output which gives the name of the device (in /dev/md) that
is the array (or container) that the device would be added to.
Also
  MD_STARTED
is set to one of
  no
  unsafe
  yes
  nothing

to indicate if the array was started.  IF MD_STARTED=unsafe
then it may be appropriate to run
  mdadm -R /dev/md/$MD_DEVNAME
after a timeout to ensure newly degraded array are started.

If
  MD_FOREIGN=yes
it might be appropriate to suppress this as the array is
probably not critical.

Signed-off-by: NeilBrown <neilb@suse.de>

Restructure assemble_container_content and improve messages.

We lose one level of indent, and now get told the difference between
'not assemble because not safe' and 'not assembled because not enough
devices'.

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: don't abort container if one member explicitly disabled.

If a member of a container is explicitly disabled, others may not
be so we should continue.

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: remove test that can never succeed.

Incremental_container never returns 1, so this test is pointless.
It is a holdover from when we called "Incremental()" rather than
"Incremental_container()" at this point.

Signed-off-by: NeilBrown <neilb@suse.de>

IMSM metadata really should be ignored when found on partitions.

commit b31df43682216d1c65813eae49ebdd8253db8907
changed load_super_imsm to not insist on finding a partition if
ignore_hw_compat was set.
Unfortunately this is set for '--assemble' so arrays could get
assembled badly.

The comment says this was to allow e.g. --examine of image files.
A better fixes for this is to change test_partitions to not report
a regular file as being a partition.
The errors from the BLKPG ioctl are:

ENOTTY : not a block device.
EINVAL : not a whole device (probably a partition)
ENXIO : partition doesn't exist (so not a partition)

Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

ddf tests: fix get_rootdev

Getting the major number from the hex device number should take
all-but-the-last-two digits, rather than just the first two digits.

Signed-off-by: NeilBrown <neilb@suse.de>

Add support for --add-spare

--add-spare is like --add, but a --re-add is never attempted.
So it is equivalent to two separate commands:

--zero-metadata
--add

Signed-off-by: NeilBrown <neilb@suse.de>

Fix typos in mdadm.8.in

I found a small bug in the documentation of mdadm. I fixed it in my
local git clone of git://neil.brown.name/mdadm Here is the change:

Signed-off-by: NeilBrown <neilb@suse.de>

Assembe: fix bug in force_array - it wasn't forcing properly.

Since 'best' was expanded to hold replacement devices, we might
need to go up to raid_disks*2 to find devices to force.

Also fix another place when considering replacement drives would
be wrong (the 'chosen' device should never be a replacement).

Reported-by: John Yates <jyates65@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: write meta data in readonly state, sometimes

This patch reverts 24a216bf:
"Monitor: Don't write metadata in inactive array state".

While it's true that writing meta data is usually not necessary
in readonly state, there is one important exception: if a
disk goes faulty, we want to record that, even if the array is
inactive.

We might as well just revert 24a216bf, because with the recently
submitted patch
"Monitor: don't set arrays dirty after transition to read-only"
those meta data writes that really annoying (for a clean, readonly,
healthy array during startup) are gone anyway.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-incremental-wrong-order: new unit test

This is a test simulating two temporary missing disks. These will
have less recent meta data than the other disks in the container.
When the array is reassembled, we expect mdadm to detect that
and react to it by using the meta data of the more recent disks
as reference.

This test FAILS with mdadm 3.3 for DDF.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-assemble-missing: new unit test

This is a test case for handling incremental
assembly correctly after disks had been missing once.

This test is the basis for other similar but more tricky
test cases involving inconsitent meta data.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/env-ddf-template: add helper function for checksums

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-readd-readonly: new unit test.

A test for my recent patch "Monitor: write meta data in readonly state,
sometimes". Test that a faulty disk is recorded in the meta data.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/env-ddf-template: fix container name

/dev/md/ddf0 works also with assembly. /dev/md/ddf doesn't.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: add_to_super_ddf: be careful with workspace_lba

Some vendor DDF structures interpret workspace_lba
very differently then us. Make a sanity check on the value
before using it for config_size.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-stop-readd: New DDF unit test

This is similar to 10ddf-fail-readd. The difference is that the
array is stopped and incrementally assembled before the disk is
re-added.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-readd: New DDF unit test

This unit test is for a simple fail/remove/readd scenario.

Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: don't set arrays dirty after transition to read-only

This patch reverts commit 4867e068. Setting arrays dirty after
transition from inactive to anything else causes unnecessary
meta data writes and may wreak trouble unnecessarily when
a disk was missing during assembly but the array was never
written to.

The reason for 4867e068 was a special situation during reshape
from RAID0 to RAID4. I ran all IMSM test cases with it reverted
and found no regressions, so I believe the reshape logic for
IMSM works fine in mdadm 3.3 also without this.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: compare_super_ddf: fix sequence number check

The sequence number check in compare_super_ddf was broken,
anchor sequence number is always -1.

With this patch, mdadm will refuse to add a disk with non-matching
sequence number.

This fixes Francis Moreau's problem reported with subject
"mdadm 3.3 fails to kick out non fresh disk".

FIXME: More work is needed here. Currently mdadm won't even add the
disk to the container, that's wrong. It should be added as a spare.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF test: make sure mdmon isn't started by systemd

For testing we usually want the locally built mdmon, not the
one systemd prefers.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF tests: allow to run on systems without /dev/sda

Some ddf tests scripts assume that /dev/sda is always present.
That's wrong e.g. on VMs. Use a more general approach.

Signed-off-by: NeilBrown <neilb@suse.de>

Be consistent in return types from byteswap macros

The bswap_*() macros return int values. Make sure we return the
equivalent types in same byteorder pass-through functions to avoid
problems with the original type leaking through to printf() etc.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Remove bashism from Makefile

Makefile uses [ x == y ] construct which does not work
with POSIX shell. Since this is just testing a flag,
replace it with string comparison (=) operator instead.

Signed-off-By: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: NeilBrown <neilb@suse.de>

Give error if --incremental --scan also has a device name given.

Signed-off-by: NeilBrown <neilb@suse.de>

Make -IRs and --run work properly for containers.

We really need to make sure assemble_container_content()
gets called to finished the assembly of these.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: brief_examine_subarrays_ddf: print array name

Print an array name in brief output, like IMSM does.

SUSE's YaST2 (libstorage) needs this in order to detect MD arrays
during installation.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: factor out array name generation

The same algorithm was used in getinfo_super_ddf_bvd and
container_content_ddf. Put it in a common function.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: honour --offroot, again

commit 3e32ba9d removed support for --offroot, and a9c15847 made
mdmon use @ in argv[0] only when started from initrd.

This breaks mdadm in OpenSUSE 12.3, which starts mdmon from the
root file system and relies on --offroot to work as documented earlier.

Reintroducing --offroot as an undocumented option, as its use is going to
go away soon anyway.

If this can't be applied, it should probably be included as distro-specific
patch if mdadm 3.3 is built for OpenSUSE 12.3. I haven't checked if the
patch is necesary for OpenSUSE Factory, too.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: allow for possibility that there is no secondary copy of metadata.

If there isn't, we currently write the second copy at some
random location :-)

Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

config: set "auto_seen" after processing the auto line.

Otherwise when we process an empty autoline (to be sure to
capture the MDADM_CONF_AUTO environment variable) we can end up
setting everything to 'yes' which over-rides 'no'.

Signed-off-by: NeilBrown <neilb@suse.de>

Move ARRAY_SIZE macro to common include file.

That was super-ddf can use it.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: handle fake RAIDs with changing subarray UUIDs

Some fake RAID BIOSes (in particular, LSI ones) change the
VD GUID at every boot. These GUIDs are not suitable for
identifying an array. Luckily the header GUID appears to
remain constant.

We construct a pseudo-UUID from the header GUID and those
properties of the subarray that we expect to remain constant.
This is only array name and index; all else might change e.g.
during grow.

Don't do this for all non-MD arrays, only for those known
to use varying volume GUIDs.

This patch obsoletes my previous patch "DDF: new algorithm
for subarray UUID"

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Manage.c: fix small memory leak

'avail' is dynamically allocated, so it should be freed.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

managemon: fix a dprintk.

There is not guarantee that 'inst' is a number, and even if there
were there is no point converting it str->int and then int->str again.

Signed-off-by: NeilBrown <neilb@suse.de>

Minor fixes in mdadm.conf.5 man page.

Signed-off-by: NeilBrown <neilb@suse.de>

Release mdadm-3.3

(and various cosmetic fixes)

Signed-off-by: NeilBrown <neilb@suse.de>

config: support MDADM_CONF_AUTO= env var.

If a distribution allows the choice between using mdadm and
dmraid for DDF and IMSM to be made by some config file
(/etc/defaults/ /sys/sysconfig/ etc) which is queried by
/etc/init.d scripts, then the fact that mdadm implements this
choce through the config file is not very helpful.

So allow the "AUTO" line to be specified in part using MDADM_CONF_AUTO
in environment.

Signed-off-by: NeilBrown <neilb@suse.de>

config: refactor load_conffile() to have a single exit.

This will make next patch cleaner.
No functional change.

Signed-off-by: NeilBrown <neilb@suse.de>

Config: multiple occurences of lines is not an error.

As we now support config directories it is helpful if
lines are allowed to occur multiple times with one
over-riding the other.
So stop giving warnings when later lines are ignored.

Signed-off-by: NeilBrown <neilb@suse.de>

config: read /etc/mdadm.conf.d as well as /etc/mdadm.conf

If a configfile is explicitly given, just that file or directory
is read. Otherwise we now read both a file
/etc/mdadm.conf
and a directory
/etc/mdadm.conf.d

This allows a transition to directory based config, which in turn
allows easy control from scripts.

Signed-off-by: NeilBrown <neilb@suse.de>

Conf: allow conf file to be a directory.

If config file is a directory, process each file within with a name
ending in ".conf" that doesn't start with ".".
Files are processed in lexical order.

Signed-off-by: NeilBrown <neilb@suse.de>

Config: factor reading of file out into separate function.

This will make it easier to read multiple files in a conf.d/

Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: make sure we set safe_mode on SIGTERM.

Without this, array may not go clean and mdmon will then
not exit.

A safe_mode of '0' (which is the only one that is handled differently
by this patch) means "never switch to 'active_idle'". We don't want
that when mdmon is stopping.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: don't ever consider a 'spare' to be the 'most recent'.

If all devices have the same event count and the first one is a spare,
then that spare will be the 'most_recent'.
However then other devices will think the 'most_recent' has failed
(for v0.90 metadata) and will be rejected from the array.

So never consider a 'spare' to be 'most recent'.

Reported-by: Andreas Baer <synthetic.gods@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Make sure "mdmon" doesn't get called "@dmon".

The Anaconda installer (via its "loader" program) will try to kill
many processes at shutdown, but not "mdmon".

However when mdadm runs mdmon in the Anaconda environment, mdmon
sets argv[0][0] to '@' resulting in "@dmon" which confuses
"loader".

So change mdadm to set argv[0] to a path so that mdmon becomes e.g.
"@usr/sbin/mdmon"
which "loader" will recognise as being "mdmon".

Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: fix hang when growing a RAID5.

Since:

commit 84d11e6c6a3b827b2daa32e16303235ce33d49f5
Author: NeilBrown <neilb@suse.de>
Date: Thu Aug 1 11:16:14 2013 +1000

Grow: exit background thread cleanly on SIGTERM.

removed the setting of "sync_max" from abort_reshape() we need
to do it explicitly here.

Signed-off-by: NeilBrown <neilb@suse.de>

in_initrd: fix gcc compiler error

On some systems, this code caused a "comparison between signed
and unsigned" error.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: increase default value for safe_mode_delay to 4000ms

That is the same value that IMSM uses. The current default of 200ms
seems to have been copied from the native MD meta data. That value
appears to be much too low for DDF, given that writing the DDF meta
data means that easily several MB worth of data need to be written to
disk.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: container_content_ddf: set safe_mode_delay > 0

Set safe_mode_delay to something >0, otherwise all container subarrays
assembled will have safe_mode_delay=0. That will break the assumption that
meta data becomes clean after running mdadm --wait-clean.

Use the same value as in getinfo_super_ddf_bvd. It would be cleaner
to call that directly from container_content_ddf, but I need to check
possible side effects first.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: export_examine_super_ddf: print MD_DEVICES

Have mdadm -E --export print the number of RAID devices,
like other meta data formats do. Anaconda (RHEL/CentOS installer)
depends on it.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: ddf_activate_spare: fix gcc -O2 uninitialized warning

At this point 'di' and 'rv' both have the same value. gcc doesn't
realise that and a human reader might not either.
'rv' makes more sense too, so use that.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Add ANNOUNCE-3.2.6 from different branch

just for completeness...

Signed-off-by: NeilBrown <neilb@suse.de>

Add raid6check to .gitignore

Signed-off-by: NeilBrown <neilb@suse.de>

Change "mdadm --run" to use the same code as "mdadm --IRs".

Current "mdadm --run /dev/mdX" will not handle external metadata
properly. mdmon won't be started etc.

So use the code from "mdadm -IRs" instead - that already does all
the right things.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

super1: fix setting of data_offset for 1.0 metadata.

commit 23bf42cc79d46de019d4b27c16354a191a98ed41
super1: simplify setting of array size.

removed the setting for sb->data_offset for 1.0 metadata for some reason,
and messed up the size calculation for 1.0 metadata too.

Signed-off-by: NeilBrown <neilb@suse.de>

Fix bug with adding to 0.90 array

commit 7ccc4cc4fc6889680bbe4ec673cab3f6aa49aad3
Manage: remove call to validate_geometry.

used entirely the wrong number for "4TB" !!

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: ddf_open_new: check device status for new subarray

It is possible that mdadm creates a new subarray containing failed
devices. This may happen if a device has failed, but the meta data
containing that information hasn't been written out yet.

This code tests for this situation, and handles it in the monitor.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-create-race: test handling of fail/create race

If a disk fails and simulaneously a new array is created, a race
condition may arise because the meta data on disk doesn't reflect
the disk failure yet. This is a test for that case.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-spare: more sophisticated result checks

This test can succeed two ways, depending on timing.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-two-spares: new unit test

This is one more unit test for failure/recovery, this time with
double redundancy, which isn't covered by the other tests.

Signed-off-by: NeilBrown <neilb@suse.de>

Create: fix warning about pre-existing filesystems.

An ext[234] filesystem larger than 2TB was beign reported with
a negative size - which looks odd.

So fix it to use suitably large and unsigned values.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: Write new conf entries with a single write.

The recent change to skip over invalid conf entries was bad because
it could leave garbage on the disk.
But we don't to write each entry separately as the writes a O_DIRECT
and so synchronous so it takes way too long.

So allocate a large buffer (probably the one used to read the config records)
and fill that then write it all at once.

Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

test: allow LVM volumes or RAM disks as test devices

Allow other device types for testing; this allows to test on
a larger variety of devices.

Option --dev=[loop|lvm|ram] selects loop device (default), lvm,
and ram disk, respecively. To use RAM disks with DDF,
the kernel parameter ramdisk_size=65536 must be used.
For LVM, use --volgroup=<vg> to specify the name of the volume
group in which the test LVs will be created.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: get_extents: don't allocate space on failed disks

We should skip known failed disks when allocating space for
new arrays. This fixes the problem with 10ddf-fail-spare.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-spare: new unit test

This is Albert Pauw's latest test. Note that this FAILS.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-twice: remove hard-coded assumptions

This test has some randomness because it is not always deterministic
which of the two arrays gets the spare and which remains degraded.
Handle it.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/env-ddf-template: some helper functions

helper functions to determine the list of devices in an array,
etc.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Makefile: check that 'run' directory exists.

mdadm default to using /run/mdadm. However not all distros
provide /run yet. This can confuse people who build their own
mdadm.
So have "make" complain if the given directory doesn't exist.
This will make it harder to build an mdadm which doesn't work.

Reported-by: Albert Pauw <albert.pauw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: don't use 'ghost' values from an inactive array.

It is possible for mdmon to see (in /proc/mdstat) and array
in 'inactive' state, "mdadm -S" has written "inactive" to
"array_state".

In this state values such as "raid_disk" are not meaningful
and so should be ignored by manage_member().

Reported-by: "Dorau, Lukasz" <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: fix removal of failed devices.

Commit c7079c84 arrange for DDF to forget about any device
that is failed and not still marked as part of any array.

However such devices could still be part of the container and this
removal and updating of 'pdnum' can result in multiple devices having
the same pdnum. This in turn easily leads to confusion and
corruption.

So only discard pd entries for devices which are failed, not listed in
any virtual device, and for which we don't have a handle on the
device.

pd entries will not get removed until a new device is added after
the device has been removed from the container, either by
"mdadm --remove" or by assembling without the failed devices.

Reported-by: Albert Pauw <albert.pauw@gmail.com>
Analysed-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

test: ensure testing uses correct mdmon

When testing we want to run mdmon directly, not use
systemctl to get systemd to run it.

So allow an environment variable to make that choice.

Signed-off-by: NeilBrown <neilb@suse.de>

managemon: fix typo affecting incrmental assembly.

This clearly should be 'st2'.
As it is the 'raid_disk' value being tested is completely
meaningless in the context of the new device.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: fix writing metadata updates.

Recent commit 273989b93a3185c0e4d54f0d1bc404248a92d157
skipped writing some large blocks of 0xFF, but didn't seek
over the space, so subsequent data was written wrongly.

When we don't write, we need to seek.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-twice: New unit test

This is the test by Albert Pauw. Fail 2 disks, and add one.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: no need for GET_LAYOUT any more

With the previous patch, mdmon will provide the layout property for us.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: always get layout from sysfs

commit 71d68ff62 uses the array layout. It needs to be initialized.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: don't lie to systemd.

Now that mdmon responds fairly well to SIGTERM, stop lying to
systemd about being started on the initrd.

Note that if mdmon is rerun (--takeover) for some reason, and systemd
chooses to kill processes before remounting / readonly, then the
unmount will hang.

If systemd ever lets us tell it that we don't want to be killed until
root is readonly, then we should do that.

Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: clear safe_mode_delay on shutdown

When we receive a signal, set the safemode delay to v.small
so that we can ge clean arrays and exit quickly

Signed-off-by: NeilBrown <neilb@suse.de>o

DDF: differentiate between new metadata and metadata updates.

When writing an update, we don't need to overwrite lots of
empty fields. This makes updates somewhat faster.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: use some #defines instead of bare constants.

Signed-off-by: NeilBrown <neilb@suse.de>

Introduce devid2kname - slightly different to devid2devnm.

The purpose od devid2devnm is to return a kernel name of an
md device, whether that device is a whole device or a partition,
we want the whole device. md4, never md4p2.

In one place I was using devid2devnm where I really wanted the
partition if there was one ... and wasn't really interested in it
being an md device.
So introduce a new 'devid2kname' for that case.

Signed-off-by: NeilBrown <neilb@suse.de>

Don't lie to systemd about mdadm's status.

Telling systemd that mdadm was started from the initrd
is often a lie and never necessary. Now that the reshape monitoring
thread handles SIGTERM gracefully it is OK for system to kill
and mdadm that it finds running.

mdmon still have a bit of a question mark over it so I won't remove
the '@' from there just yet.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: exit background thread cleanly on SIGTERM.

If the mdadm thread that monitors a reshape gets SIGTERM it should
exit cleanly and clear the 'suspended' region of the array.
However it mustn't clear 'sync_max' as that would allow the
reshape to continue unmonitored.

If the thread ever does get killed, the array should really be
shutdown soon after if possible.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/env-ddf-template: helper for new unit test

I forgot to check in this helper script, similar to the one for IMSM.
It is needed by tests/10ddf-create-fail-rebuild.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-create-fail-rebuild: new unit test for DDF

This test adds a new unit test similar to 009imsm-create-fail-rebuild.
With the previous patches, it actually succeeds on my system.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: manage_member: fix race condition during slow meta data writes

In order to track kernel state changes, the monitor needs to
notice changes in sysfs. If the changes are transient, and the
monitor is busy writing meta data, it can happen that the changes
are missed. This will cause the meta data to be inconsistent with
the real state of the array.

I can reproduce this in a test scenario with a DDF container and
two subarrays, where I set a disk to "failed" and then add a global
hot-spare. On a typical MD test setup with loop devices, I can
reliably reproduce a failure where the metadata show degraded members
although the kernel finished the recovery successfully.

This patch fixes this problem by applying two changes. First, when
a metadata update is queued, wait until it is certain that the monitor
actually applied these meta data (the for loop is actually needed to
avoid failures completely in my test case). Second, after triggering the
recovery, set prev_state of the changed array to "recover", in case
the monitor misses the transient "recover" state.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: manage_member: debug messages for array state

Add debug messages to watch the manager's steps.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: wait_and_act: fix debug message for SIGUSR1

Correctly print out wake reason if it was a signal. Previous code
would print misleading select events (pselect(2) man page says the
fdsets become undefined in case of error).

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

monitor: read_and_act: log status when called

read_and_act() currently prints a debug message only very late.
Print the status seen by mdmon right away, to track mdmon's
actions more closely. Add a time stamp to observe long delays
between read_and_act calls, e.g. caused by meta data writes.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: ddf_set_disk: add some debug messages

Adds more verbose debugging in ddf_set_disk, to understand failures
better.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: load_ddf_header: more error logging

Try to determine problem if load_ddf_header fails. May be useful
for determining compatibility problems with Fake RAID BIOSes.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: ddf_process_update: log offsets for conf changes

I needed this for tracking a bug with wrong offsets after array
creation.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: log disk status changes more nicely

In particular, include refnum for better tracking. This makes
it a little easier for humans to track what happened to which disk.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: ddf_activate_spare: bugfix for 62ff3c40

Move the check for good drives in the dl loop - otherwise dl
may be NULL and mdmon may crash.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Fix is_resync_complete for RAID10

For RAID10, 'sync' numbers go up to the array size rather than the
component size. is_resync_complete() needs to allow for this.

Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>