git.ipfire.org Git - thirdparty/mdadm.git/log

Grow: try to let "--grow --continue" from systemd complete a reshape.

If "--assemble" or "--incremental" is started by udev, then
monitoring the reshape in the background won't work.

So try asking systemd to start a grow-continue.

If that fails, just do it the old way.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: store a link to current backup file in /run/mdadm or similar.

Subsequent patch will allow the background part of "mdadm --grow" to
be run from systemd. This can require the passing of a backup file
name.
To do this, store that name as a symlink in /run/mdadm (or MAP_DIR)
and look for it when appropriate.

It might be useful to also store the name across reboot, but that
would be a different patch. We would need to use the uuid to identify
it, and store it in stable storage.

Signed-off-by: NeilBrown <neilb@suse.de>

Create: don't default to bitmap=internal when it is not supported

For large arrays (component size > 100GB) if write-intent bitmap is not
enabled, then it is set by default to "internal", even if the metadata
format does support internal bitmaps, which causes Create to fail.

This patch adds checking if add_internal_bitmap is set in the
superswitch before setting bitmap_file to "internal".

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Fix race between --create and --incremental

This modifies locking in Create to eliminate a situation where
--incremental can assemble a device between write_init_super() and
add_disk(), which causes Create to fail.

It sporadically occurs e.g. when metadata is written on a device,
causing an udev change event which triggers mdadm --incremental.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

systemd: various fixes for boot with container-arrays.

1/ Add systemd shutdown script to ensure DDF and IMSM are
   clean before we actually shutdown

2/ Get udev to tell systemd to run the mdmon@mdXXX.service
   units when a member array appears.

   If we boot off a member array (with dracut at least),
   the mdmon started in the initramfs will lose track of
   /sys etc, so we need to restart it.
   systemd will try to forget about it too (but not actually
   kill it because we said not to do this).
   Having udev tell it to start it will allow a new mdmon to
   run which can see /sys, and systemd will know about it.

3/ Always use --offroot and --takeover when starting mdmon with
   systemd
   --offroot is needed else shutdown will hang.
   --takeover is needed incase an mdmon was started earlier
   (e.g. in initramfs).
   Neither hurt if they aren't actually needed.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: Don't fail compare_super_ddf due to re-configure changes.

It is possible that one device has seem some reconfig but the other
hasn't.  In that case  they are still the "same" DDF, even though
one might be older.  Such age will be detected by 'seq' differences.

If A is new and B is old, then it is import that
  mdadm -I B
  mdadm -I A

doesn't get confused because A has the same uuid as B, but compare_super fails.

So: if the seq numbers are different, then just accept as two
different superblocks.
If they are the same, then look to copy data from new to old.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: fix possible mdmon crash when updating metadata.

Testing 'c' and then using 'vdc' assumes that the two are in sync,
but sometimes they aren't.
Testing 'vdc' is safer.
This avoids a crash in some cases when failing/removing/added devices
to a DDF.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: guard against ->pdnum being negative.

It is conceivable that ->pdnum could be -1, though only if
the metadata is corrupt.
We should be careful not to use it if it is.

Also remove an assignment for pdnum to ->container_member.
This is never used and cannot possibly mean anything.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: mark missing-on-assembly device properly.

As well as removing from the array we really should mark
it is 'failed', and mark the array as degraded.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: Fix assorted typos and do some reformatting.

..because it is more fun when new patches are harder to apply to old version :-)

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: move manual repair code to separate function

This patch cleans up a bit the code by moving
the second repair mode, that is the manual
repair, to a separate function.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: move autorepair code to separate function

This patch cleans up a bit the code by moving
the autorepair part into a separate function.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: lock the stripe until necessary

The stripe locking mechanism must be atomic between
the check and the, potential, autorepair.
For this reason, the autorepair code needs to be just
after the check and both parts (check and autorepair)
must be excuted under stripe lock.
Of course, the manual repair can operate as before.
This patch reorganize the code and provides the single,
atomic, stripe lock.
It should be confirmed that this new locking is not
too demanding.
In case it is, some other solutions will be required
(suggestions wellcome).

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>

ddf-sudden-degraded test fix.

Change how sudden-degraded devices should appear.
We don't record failure, we record that the device isn't there.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: when first activating an array, record any missing devices.

We must remember they are missing so that if they re-appear we
don't get confused.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: report seq counter as events.

Also don't treat two devices with different seq numbers as completely
unrelated.

This allows split-brain detection to work properly for ddf.

Signed-off-by: NeilBrown <neilb@suse.de>

Work around architectures having statfs.f_type defined as long

Having RAMFS_MAGIC defined as 0x858458f6 causing problems when trying
to compare it directly against statfs.f_type being cast from long to
unsigned long.

This hack is extremly ugly, but it should at least do the right thing
for every situation.

Thanks to Arnd Bergmann for suggesting the fix.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

tests: add test that DDF marks missing devices as failed on assembly.

If we assemble a newly-degraded array, the missing devices must be marked
as 'failed' so we don't expect them in future.

Signed-off-by: NeilBrown <neilb@suse.de>

mdmon@.service: Change type of process start-up to 'forking'.

Mdadm does not wait enough time when mdmon is started by systemd.
It causes various problems with behaviour of a RAID volume with external metadata.
For example: mdmon does not update a value of checkpoint during migration
and second RAID5 volume is read-only after reboot done during
container reshape (both problems occur with IMSM matadata).
If a type of process start-up is changed to 'forking', systemctl will
wait until mdmon (parent) process exits after calling fork.
This way mdmon will always be fully initialized after start_mdmon
and these problems will not occur.
In this case it is recommended to add a path to PIDFile, so that systemd
does not have to guess a PID of the mdmon process.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: change load_devices to return most_recent 'st' value.

This means that

st->ss->getinfo_super(st, content, NULL);
clean = content->array.state & 1;

will get an up-to-date value for 'clean'. This fix allows
tests/03r5assem-failed
to work.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: re-arrange freeing of 'tst' in load_devices().

When we return in error, we need to free(tst), and ->free_super(tst);
Sometimes we didn't.

Also the final ->free_super(tst) should be followed by free(tst)
but wasn't.

Move that file free forward in the code a bit as we will want to use
the tst there in the next patch.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: allow load_devices to change the 'st' which is passed in.

The given 'st' might not be best. Making this interface change
will allow load_devices to return a better 'st'.

Signed-off-by: NeilBrown <neilb@suse.de>

New test: 03r5assem-failed

This test currently fails, confirming a bug which was recently
reported.

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: reduce verbosity

This patch will remove some legacy code.
It is part of the verbosity "cleanup".
In any case, if information about the P
and Q parity mismatches is required, it
should go inside the code handling page
size blocks, not full stripe size.

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: add O_SYNC to open

It could be better to make sure the
data reaches the disks, so open the
drives with O_SYNC flag.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: fix Q parity generation

In the transition to 4K page processing,
the Q parity generation had a wrong offset
in the buffer.
This patche fix this.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: fix position printout

This patch make a bit more clear
the position, in the disk, where
an error is found.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c: reduce verbosity

This patch removes some printouts, which
are not really useful here.
These could be re-added later, in case a
verbosity parameter will be provided.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>

raid6check.c add page size check and repair

raid6check current performs checks and repair on a whole chunk at a
time. This is often not ideal as corruption can happen with smaller
granularity.

This patches changes raid6check to use a page-size (4K) granularity.

We still process a chunk at a time, but within each chunk we process a
page at a time.

Signed-off-by: NeilBrown <neilb@suse.de>

mdmon@.service: remove over-ride of Standard IO.

Redirecting output to /dev/null is unnecessary and hides any error
messages there might be. So leave as defaults which are none,
journal, inherit.

Signed-off-by: NeilBrown <neilb@suse.de>

systemd/mdmon: set IMSM_NO_PLATFORM=1

As mdmon doesn't inherit environment from mdadm when it is started
by system, it cannot inherit IMSM_NO_PLATFORM.
But if an imsm array as assembled then mdmon really should handle it
whether there is a platform present or not.
So always set this var.

Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: don't complain about notifying parent when there is no need

When run with --foreground mdmon has no need to notify any
parent, so it shouldn't even try, let alone complain when it fails.

Also close an end of a pipe which is no longer used.

Signed-off-by: NeilBrown <neilb@suse.de>

IMSM: don't crash when creating an array with missing devices.

'missing' devices are in a different list so when collection the
serial numbers of all devices we need to check both lists.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: fix problems with prematurely aborting of reshapes.

1/ when unfreezing, make sure the array is frozen first.
   If it isn't we might end up interrupting a reshape.
2/ When the child finishes, don't call abort_reshape() as that
   will interrupt the reshape.  Just set suspend_* etc
   explicitly.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: fix detection of failed devices during assembly.

When we call "getinfo_super", we report the working/failed status
of the particular device, and also (via the 'map') the working/failed
status of every other device that this metadata is aware of.

It is important that the way we calculate "working or failed" is
consistent.
As it is, getinfo_super_ddf() will report a spare as "working", but
every other device will see it as "failed", which leads to failure to
assemble arrays with spares.

For getinfo_super_ddf (i.e. for the container), a device is assumed
"working" unless flagged as DDF_Failed.
For getinfo_super_ddf_bvd (for a member array), a device is assumed
"failed" unless DDF_Online is set, and DDF_Failed is not set.

Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: avoid infinite loop when auto-assembling partial container.

When auto-assembling we loop until we get no successes.

If a device is found that look like it is part of an already-existing
container, but we subsequently fail to add that device, then the fact
that the container is running looks like a success. This can result
in infinite looping.
So if a container was already partially assemble, and is still only
partially assembled after we try to add devices, then don't treat that
as success.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF - really ignore DDF metadata on partitions.

See commit 357ac1067835d1cdd5f80acc28501db0ffc64957
which made a similar change for super-intel, and really should have
fixed DDF at the same time.

Signed-off-by: NeilBrown <neilb@suse.de>

policy: NULL path isn't really acceptable - use the devname

According to:
commit b451aa4846c5ccca5447a6b6d45e5623b8c8e961
Fix handling for "auto" line in mdadm.conf

a NULL path isn't really acceptable and the devname should be used instead.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Clarify scope of Rebuild events in mdadm manpage

To date, the manpage did not make it clear under which circumstances
Rebuild events are generated, leading to a question on the mailing
list as to whether it is normal for these events to be generated
while checking an array.
So clarify that all operations that act on the entire array are in
scope. The list is given as "e.g.", because it might grow in the
future as other full-array operations are added.

Reported-by: Mark Knecht <markknecht@gmail.com>
Signed-off-by: Jan Ceuleers <jan.ceuleers@computer.org>
Signed-off-by: NeilBrown <neilb@suse.de>

mdamd-last-resort: add a Conflicts line to stop the timer.

When the md device actually appears we want to stop the timer and not
bother with the mdadm-last-resort@.server. In particular, running
that causes confusing messages and is in general best avoided.

Fortuantely this can simply be achieved with a Conflicts= line

Signed-off-by: NeilBrown <neilb@suse.de>

udev rules: try "mdadm -I" on "change" events.

We need to attempt "mdadm -I" on "change" events as well as "add" events,
as the "change" make make a device ready to be part of an array.
This is particularly important for stacked md devices. When the
member devices are "add"ed they don't have any content visible yet.
That doesn't happen until a "change".

Idea taken from Fedora udev file.

Signed-off-by: NeilBrown <neilb@suse.de>

udev rules: add some by-pass rules from Fedora

1/ If ANACONDA is running, don't -I assemble any arrays, ANACONDA
needs to be in control
2/ honour "noiswmd" and "nodmraid" kernel command line options.

Signed-off-by: NeilBrown <neilb@suse.de>

Add mdmonitor.service systemd unit file.

This systemd unit file runs mdadm in --monitor mode.
It is started by a SYSTEMD_WANTS signal from udev whenever
an md array is started that would benefit from mdadm --monitor.

Commandline arguments can be provided by a script
/usr/lib/systemd/scripts/mdadm_env.sh
which should write an
MDADM_MONITOR_ARGS=....
line to /run/sysconfig/mdadm

A script to extra args from SUSE's /etc/sysconfig/mdadm file
is provided.
If no mdadm_env.sh is provided, then args are "--scan" which
requires "mail" or "program" to be set in /etc/mdadm.conf.
I believe this is suitable for Fedora.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble/Incremental: don't hold O_EXCL on mddev after assembly.

As soon as the array is assembled, udev or systemd might run
fsck and mount it. So we need to drop O_EXCL promptly.

Signed-off-by: NeilBrown <neilb@suse.de>

Two small fixes related to enough()

1/ enough_fd doesn't use avail_disks any more, so discard it.

2/ Manage_Add increments 'found' at the wrong place, so it can
waste time before calling enough().

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: improve support for "DEVICE" based restriction in mdadm.conf

--incremental currently fails if the device name passed does not
textually match the names permitted by the DEVICE line in mdadm.conf.
This is problematic when "mdadm -I" is run by udev as the name given
can be a temp name.

This patch makes two improvements:
1/ We generate a list of all existing devices that match the names
  in mdadm.conf, and allow rdev based matching
2/ We allows extra aliases to be provided on the command line, and
  perform textual matching on those.  This is particularly suitable
  for udev usages as ${DEVLINKS} can be provided even though the links
  make not yet be created.

Signed-off-by: NeilBrown <neilb@suse.de>

Systemd integration for starting newly-degraded arrays.

Normally "mdadm -I" will not start an array if it has reason to
expect further devices.
This means that if a device is removed while the host is shut down,
"mdadm -I" will never start the device.

If the array is know to the host, it make sense to start the array
anyway after a reasonable timeout.

This patch adds systemd/udev infrastructure so that 30 seconds after
a known array first becomes able to be assembled as a degraded array,
the array will be assembled even if more devices are still expected.

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: add --export handling.

If --export is given with --incremental, then
  MD_DEVNAME
is output which gives the name of the device (in /dev/md) that
is the array (or container) that the device would be added to.
Also
  MD_STARTED
is set to one of
  no
  unsafe
  yes
  nothing

to indicate if the array was started.  IF MD_STARTED=unsafe
then it may be appropriate to run
  mdadm -R /dev/md/$MD_DEVNAME
after a timeout to ensure newly degraded array are started.

If
  MD_FOREIGN=yes
it might be appropriate to suppress this as the array is
probably not critical.

Signed-off-by: NeilBrown <neilb@suse.de>

Restructure assemble_container_content and improve messages.

We lose one level of indent, and now get told the difference between
'not assemble because not safe' and 'not assembled because not enough
devices'.

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: don't abort container if one member explicitly disabled.

If a member of a container is explicitly disabled, others may not
be so we should continue.

Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: remove test that can never succeed.

Incremental_container never returns 1, so this test is pointless.
It is a holdover from when we called "Incremental()" rather than
"Incremental_container()" at this point.

Signed-off-by: NeilBrown <neilb@suse.de>

IMSM metadata really should be ignored when found on partitions.

commit b31df43682216d1c65813eae49ebdd8253db8907
changed load_super_imsm to not insist on finding a partition if
ignore_hw_compat was set.
Unfortunately this is set for '--assemble' so arrays could get
assembled badly.

The comment says this was to allow e.g. --examine of image files.
A better fixes for this is to change test_partitions to not report
a regular file as being a partition.
The errors from the BLKPG ioctl are:

ENOTTY : not a block device.
EINVAL : not a whole device (probably a partition)
ENXIO : partition doesn't exist (so not a partition)

Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

ddf tests: fix get_rootdev

Getting the major number from the hex device number should take
all-but-the-last-two digits, rather than just the first two digits.

Signed-off-by: NeilBrown <neilb@suse.de>

Add support for --add-spare

--add-spare is like --add, but a --re-add is never attempted.
So it is equivalent to two separate commands:

--zero-metadata
--add

Signed-off-by: NeilBrown <neilb@suse.de>

Fix typos in mdadm.8.in

I found a small bug in the documentation of mdadm. I fixed it in my
local git clone of git://neil.brown.name/mdadm Here is the change:

Signed-off-by: NeilBrown <neilb@suse.de>

Assembe: fix bug in force_array - it wasn't forcing properly.

Since 'best' was expanded to hold replacement devices, we might
need to go up to raid_disks*2 to find devices to force.

Also fix another place when considering replacement drives would
be wrong (the 'chosen' device should never be a replacement).

Reported-by: John Yates <jyates65@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: write meta data in readonly state, sometimes

This patch reverts 24a216bf:
"Monitor: Don't write metadata in inactive array state".

While it's true that writing meta data is usually not necessary
in readonly state, there is one important exception: if a
disk goes faulty, we want to record that, even if the array is
inactive.

We might as well just revert 24a216bf, because with the recently
submitted patch
"Monitor: don't set arrays dirty after transition to read-only"
those meta data writes that really annoying (for a clean, readonly,
healthy array during startup) are gone anyway.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-incremental-wrong-order: new unit test

This is a test simulating two temporary missing disks. These will
have less recent meta data than the other disks in the container.
When the array is reassembled, we expect mdadm to detect that
and react to it by using the meta data of the more recent disks
as reference.

This test FAILS with mdadm 3.3 for DDF.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-assemble-missing: new unit test

This is a test case for handling incremental
assembly correctly after disks had been missing once.

This test is the basis for other similar but more tricky
test cases involving inconsitent meta data.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/env-ddf-template: add helper function for checksums

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-readd-readonly: new unit test.

A test for my recent patch "Monitor: write meta data in readonly state,
sometimes". Test that a faulty disk is recorded in the meta data.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/env-ddf-template: fix container name

/dev/md/ddf0 works also with assembly. /dev/md/ddf doesn't.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: add_to_super_ddf: be careful with workspace_lba

Some vendor DDF structures interpret workspace_lba
very differently then us. Make a sanity check on the value
before using it for config_size.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-stop-readd: New DDF unit test

This is similar to 10ddf-fail-readd. The difference is that the
array is stopped and incrementally assembled before the disk is
re-added.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/10ddf-fail-readd: New DDF unit test

This unit test is for a simple fail/remove/readd scenario.

Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: don't set arrays dirty after transition to read-only

This patch reverts commit 4867e068. Setting arrays dirty after
transition from inactive to anything else causes unnecessary
meta data writes and may wreak trouble unnecessarily when
a disk was missing during assembly but the array was never
written to.

The reason for 4867e068 was a special situation during reshape
from RAID0 to RAID4. I ran all IMSM test cases with it reverted
and found no regressions, so I believe the reshape logic for
IMSM works fine in mdadm 3.3 also without this.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: compare_super_ddf: fix sequence number check

The sequence number check in compare_super_ddf was broken,
anchor sequence number is always -1.

With this patch, mdadm will refuse to add a disk with non-matching
sequence number.

This fixes Francis Moreau's problem reported with subject
"mdadm 3.3 fails to kick out non fresh disk".

FIXME: More work is needed here. Currently mdadm won't even add the
disk to the container, that's wrong. It should be added as a spare.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF test: make sure mdmon isn't started by systemd

For testing we usually want the locally built mdmon, not the
one systemd prefers.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF tests: allow to run on systems without /dev/sda

Some ddf tests scripts assume that /dev/sda is always present.
That's wrong e.g. on VMs. Use a more general approach.

Signed-off-by: NeilBrown <neilb@suse.de>

Be consistent in return types from byteswap macros

The bswap_*() macros return int values. Make sure we return the
equivalent types in same byteorder pass-through functions to avoid
problems with the original type leaking through to printf() etc.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Remove bashism from Makefile

Makefile uses [ x == y ] construct which does not work
with POSIX shell. Since this is just testing a flag,
replace it with string comparison (=) operator instead.

Signed-off-By: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: NeilBrown <neilb@suse.de>

Give error if --incremental --scan also has a device name given.

Signed-off-by: NeilBrown <neilb@suse.de>

Make -IRs and --run work properly for containers.

We really need to make sure assemble_container_content()
gets called to finished the assembly of these.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: brief_examine_subarrays_ddf: print array name

Print an array name in brief output, like IMSM does.

SUSE's YaST2 (libstorage) needs this in order to detect MD arrays
during installation.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: factor out array name generation

The same algorithm was used in getinfo_super_ddf_bvd and
container_content_ddf. Put it in a common function.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: honour --offroot, again

commit 3e32ba9d removed support for --offroot, and a9c15847 made
mdmon use @ in argv[0] only when started from initrd.

This breaks mdadm in OpenSUSE 12.3, which starts mdmon from the
root file system and relies on --offroot to work as documented earlier.

Reintroducing --offroot as an undocumented option, as its use is going to
go away soon anyway.

If this can't be applied, it should probably be included as distro-specific
patch if mdadm 3.3 is built for OpenSUSE 12.3. I haven't checked if the
patch is necesary for OpenSUSE Factory, too.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: allow for possibility that there is no secondary copy of metadata.

If there isn't, we currently write the second copy at some
random location :-)

Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

config: set "auto_seen" after processing the auto line.

Otherwise when we process an empty autoline (to be sure to
capture the MDADM_CONF_AUTO environment variable) we can end up
setting everything to 'yes' which over-rides 'no'.

Signed-off-by: NeilBrown <neilb@suse.de>

Move ARRAY_SIZE macro to common include file.

That was super-ddf can use it.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: handle fake RAIDs with changing subarray UUIDs

Some fake RAID BIOSes (in particular, LSI ones) change the
VD GUID at every boot. These GUIDs are not suitable for
identifying an array. Luckily the header GUID appears to
remain constant.

We construct a pseudo-UUID from the header GUID and those
properties of the subarray that we expect to remain constant.
This is only array name and index; all else might change e.g.
during grow.

Don't do this for all non-MD arrays, only for those known
to use varying volume GUIDs.

This patch obsoletes my previous patch "DDF: new algorithm
for subarray UUID"

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Manage.c: fix small memory leak

'avail' is dynamically allocated, so it should be freed.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

managemon: fix a dprintk.

There is not guarantee that 'inst' is a number, and even if there
were there is no point converting it str->int and then int->str again.

Signed-off-by: NeilBrown <neilb@suse.de>

Minor fixes in mdadm.conf.5 man page.

Signed-off-by: NeilBrown <neilb@suse.de>

Release mdadm-3.3

(and various cosmetic fixes)

Signed-off-by: NeilBrown <neilb@suse.de>

config: support MDADM_CONF_AUTO= env var.

If a distribution allows the choice between using mdadm and
dmraid for DDF and IMSM to be made by some config file
(/etc/defaults/ /sys/sysconfig/ etc) which is queried by
/etc/init.d scripts, then the fact that mdadm implements this
choce through the config file is not very helpful.

So allow the "AUTO" line to be specified in part using MDADM_CONF_AUTO
in environment.

Signed-off-by: NeilBrown <neilb@suse.de>

config: refactor load_conffile() to have a single exit.

This will make next patch cleaner.
No functional change.

Signed-off-by: NeilBrown <neilb@suse.de>

Config: multiple occurences of lines is not an error.

As we now support config directories it is helpful if
lines are allowed to occur multiple times with one
over-riding the other.
So stop giving warnings when later lines are ignored.

Signed-off-by: NeilBrown <neilb@suse.de>

config: read /etc/mdadm.conf.d as well as /etc/mdadm.conf

If a configfile is explicitly given, just that file or directory
is read. Otherwise we now read both a file
/etc/mdadm.conf
and a directory
/etc/mdadm.conf.d

This allows a transition to directory based config, which in turn
allows easy control from scripts.

Signed-off-by: NeilBrown <neilb@suse.de>

Conf: allow conf file to be a directory.

If config file is a directory, process each file within with a name
ending in ".conf" that doesn't start with ".".
Files are processed in lexical order.

Signed-off-by: NeilBrown <neilb@suse.de>

Config: factor reading of file out into separate function.

This will make it easier to read multiple files in a conf.d/

Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: make sure we set safe_mode on SIGTERM.

Without this, array may not go clean and mdmon will then
not exit.

A safe_mode of '0' (which is the only one that is handled differently
by this patch) means "never switch to 'active_idle'". We don't want
that when mdmon is stopping.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: don't ever consider a 'spare' to be the 'most recent'.

If all devices have the same event count and the first one is a spare,
then that spare will be the 'most_recent'.
However then other devices will think the 'most_recent' has failed
(for v0.90 metadata) and will be rejected from the array.

So never consider a 'spare' to be 'most recent'.

Reported-by: Andreas Baer <synthetic.gods@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Make sure "mdmon" doesn't get called "@dmon".

The Anaconda installer (via its "loader" program) will try to kill
many processes at shutdown, but not "mdmon".

However when mdadm runs mdmon in the Anaconda environment, mdmon
sets argv[0][0] to '@' resulting in "@dmon" which confuses
"loader".

So change mdadm to set argv[0] to a path so that mdmon becomes e.g.
"@usr/sbin/mdmon"
which "loader" will recognise as being "mdmon".

Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: fix hang when growing a RAID5.

Since:

commit 84d11e6c6a3b827b2daa32e16303235ce33d49f5
Author: NeilBrown <neilb@suse.de>
Date: Thu Aug 1 11:16:14 2013 +1000

Grow: exit background thread cleanly on SIGTERM.

removed the setting of "sync_max" from abort_reshape() we need
to do it explicitly here.

Signed-off-by: NeilBrown <neilb@suse.de>

in_initrd: fix gcc compiler error

On some systems, this code caused a "comparison between signed
and unsigned" error.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: increase default value for safe_mode_delay to 4000ms

That is the same value that IMSM uses. The current default of 200ms
seems to have been copied from the native MD meta data. That value
appears to be much too low for DDF, given that writing the DDF meta
data means that easily several MB worth of data need to be written to
disk.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: container_content_ddf: set safe_mode_delay > 0

Set safe_mode_delay to something >0, otherwise all container subarrays
assembled will have safe_mode_delay=0. That will break the assumption that
meta data becomes clean after running mdadm --wait-clean.

Use the same value as in getinfo_super_ddf_bvd. It would be cleaner
to call that directly from container_content_ddf, but I need to check
possible side effects first.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: export_examine_super_ddf: print MD_DEVICES

Have mdadm -E --export print the number of RAID devices,
like other meta data formats do. Anaconda (RHEL/CentOS installer)
depends on it.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: ddf_activate_spare: fix gcc -O2 uninitialized warning

At this point 'di' and 'rv' both have the same value. gcc doesn't
realise that and a human reader might not either.
'rv' makes more sense too, so use that.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>

Add ANNOUNCE-3.2.6 from different branch

just for completeness...

Signed-off-by: NeilBrown <neilb@suse.de>