]> git.ipfire.org Git - thirdparty/mdadm.git/log
thirdparty/mdadm.git
8 years agotest: assume recovery has completed if sync_completed says so.
NeilBrown [Thu, 23 Jul 2015 01:17:10 +0000 (11:17 +1000)] 
test: assume recovery has completed if sync_completed says so.

The final completion of a recovery can be delayed, so use
sync_completed to check if it is finished, just not been reaped.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: flushbufs after writing zeros
NeilBrown [Thu, 23 Jul 2015 01:09:19 +0000 (11:09 +1000)] 
tests: flushbufs after writing zeros

sometimes the removed device is re-added before the writes
get all the way to the md device - so the array doesn't need
any recovery and the test fails.
So flush first to be safe.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: add -F flag to mkfs
NeilBrown [Tue, 21 Jul 2015 23:58:41 +0000 (09:58 +1000)] 
test: add -F flag to mkfs

newer versions of mkfs.extX ask before creating a filesystem
on a device which appears to already have a filesystem.
We don't want that, so add the -F flag.
Also be explicit about fs type as one shouldn't depend on defaults.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: document --homehost=any functionality.
NeilBrown [Tue, 21 Jul 2015 23:33:17 +0000 (09:33 +1000)] 
mdadm: document --homehost=any functionality.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: improve tests for matching --name= request.
NeilBrown [Tue, 21 Jul 2015 23:24:36 +0000 (09:24 +1000)] 
Assemble: improve tests for matching --name= request.

If the name in the array has a home-host, then
require that it matches, or is "any", or requested
homehost is "any".

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check: use O_DIRECT instead of O_SYNC.
NeilBrown [Mon, 20 Jul 2015 07:17:37 +0000 (17:17 +1000)] 
raid6check: use O_DIRECT instead of O_SYNC.

O_DIRECT is more direct and is faster.
This requires aligned memory allocation, but that isn't hard.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agorestripe: fix data block order in raid6_2_data_recov
NeilBrown [Mon, 20 Jul 2015 07:15:13 +0000 (17:15 +1000)] 
restripe: fix data block order in raid6_2_data_recov

... rather than relying on the caller getting them in the
correct order.
This is better engineering and fixes a bug, but because the
failed_slotX numbers are used later with assumption that
they weren't swapped

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: various cleanup/fixes
NeilBrown [Mon, 20 Jul 2015 04:11:33 +0000 (14:11 +1000)] 
raid6check: various cleanup/fixes

- document meaning of various arrays. In particular:
   stripes[]
   blocks[]
   blocks_page[]
   block_index_for_slot[]

  It needs to be clear if these are indexed by raid_disk
  number or syndrome number.

- changed meaning of block_index_for_slot[].  It didn't seem
  to be used consistently.  It also made use of the block numbers
  in array data ordering, which is not directly relevant for syndrome
  calculations.

- reduced number of args to autorepair and manual_repair
  There don't need both stripes[] and blocks[].  And they don't need
  diskP or diskQ.
  blocks[-1] is the P chunk, blocks[-2] is the Q chunk.
  block_index_for_slot[] can be used to find the target device for
  a particular syndrome block.

- remove stripe locking from within manual_repair, and instead
  use the global stripe locking used for check and autorepair.

- this necessitated changes to raid6_datap_recov and raid5_2data_reov
  so the P and Q blocks could be before or after the data blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: really ensure stripe_cache is bit enough to handle new chunk size
NeilBrown [Fri, 17 Jul 2015 03:10:25 +0000 (13:10 +1000)] 
Assemble: really ensure stripe_cache is bit enough to handle new chunk size

Earlier patch:
  56fcbcbb6f17df0e5dedf59744deee037c5d5fbd
calculated the proper chunk size - but didn't use it..

Let's actually use it this time.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check
NeilBrown [Thu, 16 Jul 2015 01:55:27 +0000 (11:55 +1000)] 
raid6check

fix checking of DDF layouts.

Stuff probably still broken.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: get device ordering correct for syndrome calculation.
NeilBrown [Thu, 16 Jul 2015 01:25:40 +0000 (11:25 +1000)] 
raid6check: get device ordering correct for syndrome calculation.

The order of devices used for the syndrome calculation is not
the same as the order of data in the array.
The D block immediately after Q is first, then they continue
cyclicly in raid-disk order, skipping over the P disk if it is seen.

This gets the 'check' right for all layouts other than DDF, which is
quite different.

I haven't confirmed that this does't break repair.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: slow down --stop a bit to allow revert-inplace to work.
NeilBrown [Wed, 15 Jul 2015 23:27:58 +0000 (09:27 +1000)] 
tests: slow down --stop a bit to allow revert-inplace to work.

revert-inplace would sometimes find that the original reshape had
finished.
So slow down the reshaping during --stop (which needs to be a little
bit fast so that stop doesn't timeout waiting) and don't wait quite
so long before stopping.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: add 19raid6check
NeilBrown [Wed, 15 Jul 2015 22:02:52 +0000 (08:02 +1000)] 
tests: add 19raid6check

This checks that raid6check finds no errors in newly created array
with all different layouts.
(it doesn't...)

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: clear out old metadata from loop devices.
NeilBrown [Wed, 15 Jul 2015 21:49:14 +0000 (07:49 +1000)] 
test: clear out old metadata from loop devices.

Old metadata can tempt udev to assemble things, which
just gets in the way.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: report role of suspect device.
NeilBrown [Fri, 10 Jul 2015 04:46:59 +0000 (14:46 +1000)] 
raid6check: report role of suspect device.

i.e. -2 for Q, -1 for P, 0-N for data.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: save failure logs to logdir
NeilBrown [Fri, 10 Jul 2015 04:44:58 +0000 (14:44 +1000)] 
tests: save failure logs to logdir

If --save-logs is given we already save all logs to --logdir
If not, we should still save erroneous logs to --logdir.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: do not try to 'flushbufs' after stopping a array
NeilBrown [Fri, 10 Jul 2015 04:42:20 +0000 (14:42 +1000)] 
tests: do not try to 'flushbufs' after stopping a array

If the array is stopped, there is nothing to flush, and
blockdev can signal an error.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: add dmesg output to logs on error.
NeilBrown [Mon, 6 Jul 2015 03:59:33 +0000 (13:59 +1000)] 
test: add dmesg output to logs on error.

This can help isolate the problem.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: check sync_action as well when checking for an action.
NeilBrown [Mon, 6 Jul 2015 03:58:19 +0000 (13:58 +1000)] 
test: check sync_action as well when checking for an action.

Some actions only appear in /proc/mdstat after a little delay,
so check in sync_action as well.

This applies when checking for recovery etc, and when waiting for idle.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: speed up reshape when stopping arrays.
NeilBrown [Mon, 6 Jul 2015 03:52:04 +0000 (13:52 +1000)] 
test: speed up reshape when stopping arrays.

--stop needs to wait for reshape to get to a suitable
spot, so having really slow resync isn't helpful.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: stop all arrays before starting test.
NeilBrown [Mon, 6 Jul 2015 03:48:59 +0000 (13:48 +1000)] 
test: stop all arrays before starting test.

As well a cleaning up loop devices, stop all arrays.
After all, we cannot do the one without the other.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoGrow: remove stray tracing message.
NeilBrown [Mon, 6 Jul 2015 03:46:38 +0000 (13:46 +1000)] 
Grow: remove stray tracing message.

Signed-off-by: NeilBrow <neilb@suse.com>
8 years agoManage/stop: don't stop during initial critical section.
NeilBrown [Mon, 6 Jul 2015 03:45:39 +0000 (13:45 +1000)] 
Manage/stop: don't stop during initial critical section.

If the array is reshaping to more devices, then stopping
during that initial critical section is a bad idea.
So check for it and wait a bit.

Should probably handle final critical section of a reduction
too.
same-size reshape should be handled correctly already.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoManage/stop: improve some comments.
NeilBrown [Mon, 6 Jul 2015 03:37:19 +0000 (13:37 +1000)] 
Manage/stop: improve some comments.

This code always confuses me - this might help a bit.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoManage/stop: guard against 'completed' being too large.
NeilBrown [Mon, 6 Jul 2015 03:33:20 +0000 (13:33 +1000)] 
Manage/stop: guard against 'completed' being too large.

A race can allow 'completed' to read as 2^63-1, which takes
a long time to count up to.
So guard against that possibility.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMonitor: don't Wait forever on a 'frozen' array.
NeilBrown [Mon, 6 Jul 2015 03:26:41 +0000 (13:26 +1000)] 
Monitor: don't Wait forever on a 'frozen' array.

If Wait() finds the array resync is 'frozen', then wait
a little while to avoid races, but don't wait forever.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosysfs: reject reads that use the whole buffer.
NeilBrown [Mon, 6 Jul 2015 03:21:33 +0000 (13:21 +1000)] 
sysfs: reject reads that use the whole buffer.

If a read fills the whole buffer, then we possibly
missed something of the end, and we definitely shouldn't
put a '\0' beyond the end, so just return an error.
This should never happen anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoRemove some trailing white space
NeilBrown [Wed, 1 Jul 2015 22:26:30 +0000 (08:26 +1000)] 
Remove some trailing white space

It looks ugly in my editor.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoManage: fix no-op test in Manage_stop.
NeilBrown [Wed, 1 Jul 2015 22:16:59 +0000 (08:16 +1000)] 
Manage: fix no-op test in Manage_stop.

A 'devnm' never starts with '/', so this test is pointless.
The code should use the passed-in devname unless it is clearly
not usable.  So fix it to do that.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agomdstat: discard 'dev' field, just use 'devnm'
NeilBrown [Wed, 1 Jul 2015 22:15:10 +0000 (08:15 +1000)] 
mdstat: discard 'dev' field, just use 'devnm'

These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: fix typo in comment
NeilBrown [Thu, 18 Jun 2015 05:51:45 +0000 (15:51 +1000)] 
Grow: fix typo in comment

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: ensure stripe_cache is big enough to handle new chunk size
NeilBrown [Thu, 18 Jun 2015 05:49:52 +0000 (15:49 +1000)] 
Assemble: ensure stripe_cache is big enough to handle new chunk size

If you reshape to a larger chunk size, and need to restart,
it can have problems.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: fix a couple of typos.
NeilBrown [Thu, 28 May 2015 07:21:06 +0000 (17:21 +1000)] 
Grow: fix a couple of typos.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: make 'check wait' more reliable.
NeilBrown [Thu, 28 May 2015 06:53:26 +0000 (16:53 +1000)] 
test: make 'check wait' more reliable.

'recover' etc doesn't appear in /proc/mdstat immediately.
The "sync" thread must be started first.
But 'sync_action' shows it as soon as MD_RECOVERY_NEEDED is set
in the kernel.  So look there too.

Now maybe I can get rid of some of those silly 'sleep' calls.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/imsm-grow-template change 'wait' to 'check wait'
NeilBrown [Thu, 28 May 2015 06:51:23 +0000 (16:51 +1000)] 
tests/imsm-grow-template change 'wait' to 'check wait'

'wait' is a shell builtin that isn't doing anything useful.
It should be calling 'check wait' I think.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: fix problem with --grow --continue
NeilBrown [Thu, 28 May 2015 06:43:15 +0000 (16:43 +1000)] 
Grow: fix problem with --grow --continue

If an array is being reshaped using backup space on a 'spare' device,
then
  mdadm --grow --continue
won't find it as by the time it runs, nothing looks like a spare are
more.  The spare has been added to the array, but has no data yet.

So allow reshape_prepare_fdlist to find a newly-incorporated spare and
report this so it can be used.

Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: wait a bit long for reshape to complete.
NeilBrown [Mon, 25 May 2015 06:59:19 +0000 (16:59 +1000)] 
tests: wait a bit long for reshape to complete.

As the kernel now does less locking, 'check wait' doesn't
always wait long enough.  Add some pauses.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: another attempt to fix stop-during-reshape race.
NeilBrown [Mon, 25 May 2015 06:33:45 +0000 (16:33 +1000)] 
Grow: another attempt to fix stop-during-reshape race.

When the array is stopped during a critical section, we sometimes
erase the backup, which is bad.
This happens when 'completed' is zero.
This can happen easily when 'stop' freezes reshape.

So try to be more careful and check 'reshape_position'.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoFix minor typo in mdadm manpage.
Andrew Burgess [Wed, 25 Mar 2015 17:17:49 +0000 (17:17 +0000)] 
Fix minor typo in mdadm manpage.

Appologies if this is the wrong mailing list for this patch.

This is a very small patch for the manual page for the mdadm utility.

Thanks,
Andrew

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agomdadm: monitor: fix nullptr dereference when get_md_name() returns NULL
Sergey Vidishev [Tue, 19 May 2015 19:02:46 +0000 (22:02 +0300)] 
mdadm: monitor: fix nullptr dereference when get_md_name() returns NULL

Function add_new_arrays() expects that function get_md_name() should
return pointer to devname, but also get_md_name() may return NULL. So
check the pointer before use it in add_new_arrays().

Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: forcefully clean up old loop devices.
NeilBrown [Wed, 20 May 2015 03:16:00 +0000 (13:16 +1000)] 
test: forcefully clean up old loop devices.

sometimes these can get left around, and udev can be looking
at them at awkward times so they don't disappear.
So be forceful.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: be even more careful about handing a '0' completed value.
NeilBrown [Fri, 15 May 2015 05:11:48 +0000 (15:11 +1000)] 
Grow: be even more careful about handing a '0' completed value.

Some old kernels set 'completed' to '0' too soon.
But modern kernels don't.
And when 'mdadm --stop' freezes and resume the grow,
'completed' goes back to zero briefly, which can confuse this
logic.
So only  think '0' might be wrong from an old kernel when
the reshape has gone idle.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/07reshape5intr : retry if writing 'check' fails.
NeilBrown [Fri, 15 May 2015 05:09:08 +0000 (15:09 +1000)] 
tests/07reshape5intr : retry if writing 'check' fails.

It can sometimes.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/19raid6repair: don't flushbufs on non-existent array.
NeilBrown [Fri, 15 May 2015 02:34:27 +0000 (12:34 +1000)] 
tests/19raid6repair: don't flushbufs on non-existent array.

..that triggers an error.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: wait for complete rebuild in integrity checks
NeilBrown [Thu, 14 May 2015 23:40:33 +0000 (09:40 +1000)] 
tests: wait for complete rebuild in integrity checks

'check wait' seems a bit racy now.
Wait for the array to be fully optimal before proceeding.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: retry when writing 'reshape' to 'sync_action' is EBUSY.
NeilBrown [Thu, 14 May 2015 04:50:42 +0000 (14:50 +1000)] 
Grow: retry when writing 'reshape' to 'sync_action' is EBUSY.

EBUSY can be returned if something has recently happened
to cause md to want to check if recovery is needed, but hasn't
had a chance yet.

This can easily happen in testing.

So retry a few times in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/05r6tor0: minor adjustments
NeilBrown [Thu, 14 May 2015 03:41:37 +0000 (13:41 +1000)] 
tests/05r6tor0: minor adjustments

1/ use correct data-offset for cmp - that has changed.
2/ flushbufs on the block device before reading to avoid cache issues

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: 05r6tor0 - add some more waiting.
NeilBrown [Thu, 14 May 2015 02:27:25 +0000 (12:27 +1000)] 
tests: 05r6tor0 - add some more waiting.

I don't really know why this is needed, but there is a delay
between the reshape finishing and the level/etc changing.
So add some sleeps.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/imsm-grow-template: sleep a bit more.
NeilBrown [Thu, 14 May 2015 02:14:26 +0000 (12:14 +1000)] 
tests/imsm-grow-template: sleep a bit more.

The current sleep/wait doesn't seem long enough,
particularly when two arrays are being reshaped in the one
container.

So wait a bit more...

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: be more careful if array is stopped during critical section.
NeilBrown [Thu, 14 May 2015 23:42:39 +0000 (09:42 +1000)] 
Grow: be more careful if array is stopped during critical section.

In that case, updating 'completed' to 'max_progress' is wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: add missing space in message.
NeilBrown [Thu, 14 May 2015 23:41:12 +0000 (09:41 +1000)] 
Grow: add missing space in message.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: only warn about incompatible metadata when no fallback available.
NeilBrown [Thu, 14 May 2015 01:17:39 +0000 (11:17 +1000)] 
Grow: only warn about incompatible metadata when no fallback available.

We might be trying to set_new_data_offset() for RAID10, when it is
a necessary requirement, or for RAID5 where it is optional.
In the latter case, a message about metadata versions is no helpful.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoManage: when re-adding, do check avail size if ->sb cannot be found.
NeilBrown [Wed, 13 May 2015 04:08:41 +0000 (14:08 +1000)] 
Manage: when re-adding, do check avail size if ->sb cannot be found.

avail_size1 requires ->sb, so we must only call it if ->sb
was loaded.

If ->sb wasn't loaded, then we are only proceding on the basis that
the kernel might be able to work something out - we don't need to
do any tests on size.

Reported-by: Christoffer Hammarström <christoffer.hammarstrom@linuxgods.com>
Signed-off-by: NeilBrown <neilb@suse.de>
URL: https://bugs.debian.org/784874

8 years agotests: don't "dd" indefinitely.
NeilBrown [Wed, 13 May 2015 03:24:33 +0000 (13:24 +1000)] 
tests: don't "dd" indefinitely.

This will trigger an error.  And now that errors are fatal....

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: ignore failure status from mdadm -IRs
NeilBrown [Wed, 13 May 2015 03:11:02 +0000 (13:11 +1000)] 
tests: ignore failure status from mdadm -IRs

This can report non-zero if there was nothing to do,
and that isn't really an error.
If the array doesn't get started, something else
will complain.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: don't check for pre-existing array when updating uuid.
NeilBrown [Wed, 13 May 2015 02:41:48 +0000 (12:41 +1000)] 
Assemble: don't check for pre-existing array when updating uuid.

This is a very corner-case, but the self-tests tripped on it,
and it makes sense not to trust the uuid when it is being changed.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoDDF: _write_super_to_disk: fix anchor header type
Martin Wilck [Mon, 11 May 2015 14:09:44 +0000 (16:09 +0200)] 
DDF: _write_super_to_disk: fix anchor header type

Since commit 30bee0201, the anchor is updated from the active
DDF header. This requires fixing the header type before the
anchor is written.

The LSI Software RAID code will reject DDF meta data with wrong
anchor type and will erase all meta data when it encounters
such a broken anchor. Thus starting Linux md once on a system
with LSI RAID BIOS may cause the meta data to get destroyed.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: never fail if --wait fails.
NeilBrown [Thu, 7 May 2015 07:00:57 +0000 (17:00 +1000)] 
tests: never fail if --wait fails.

"--wait" will return non-zero status if it didn't need to wait.
This is no a reason to fail a test.

So ignore the return status from those commands.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAdd "Name" defines to some ancillary programs
NeilBrown [Thu, 7 May 2015 04:46:05 +0000 (14:46 +1000)] 
Add "Name" defines to some ancillary programs

All programs now need to declare their "Name".

Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: d56dd607ba43 ("Change way of printing name of a process")
8 years agoManage: fix test for 'is array failed'.
NeilBrown [Wed, 6 May 2015 05:03:50 +0000 (15:03 +1000)] 
Manage: fix test for 'is array failed'.

We 'active_disks' does not count spares, so if array is rebuilding,
this will not necessarily find all devices, so may report an array
as failed when it isn't.

Counting up to nr_disks is better.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoIMSM: Count arrays per orom
Pawel Baldysiak [Wed, 8 Apr 2015 09:42:18 +0000 (11:42 +0200)] 
IMSM: Count arrays per orom

Active arrays with IMSM metadata are counted per hba so far.
This is bad due to new functionality of orom shared between multiple
controllers i.e. more arrays can be created than is supported by orom.
This patch changes the way of counting arrays, so the result will be
sum of arrays under every hba supported by specific orom.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoAssemble/force: make it possible to "force" a new device in a reshape.
NeilBrown [Wed, 8 Apr 2015 02:04:00 +0000 (12:04 +1000)] 
Assemble/force: make it possible to "force" a new device in a reshape.

Normally we do not "force"-assemble devices which are in the
middle of recovery, as they are unlikely to have useful data.

However, when a reshape increases the number of devices,
the newly added devices appear to be recovering because they
do not have complete data on them yet, but then they aren't expected
to until the reshape completes.
So in this case, it can be appropriate to force-assemble them.

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoAssemble: remove stray ':' from error message.
NeilBrown [Wed, 8 Apr 2015 01:27:34 +0000 (11:27 +1000)] 
Assemble: remove stray ':' from error message.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoAssemble: allow a RAID4 to assemble easily when parity devices is missing.
NeilBrown [Tue, 7 Apr 2015 23:36:55 +0000 (09:36 +1000)] 
Assemble: allow a RAID4 to assemble easily when parity devices is missing.

If the parity device of a RAID4 is missing, then there is no immediate
risk to data.  So it doesn't matter if the array is dirty or not.

This can be important when reshaping a RAID0, and is a much better
solution that that in the resent-reverted.
   b720636a5849397dbc6dc1b0f0b671d17034a28b

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoRevert "Assemble: support assembling of a RAID0 being reshaped."
NeilBrown [Tue, 7 Apr 2015 23:29:31 +0000 (09:29 +1000)] 
Revert "Assemble: support assembling of a RAID0 being reshaped."

This reverts commit b720636a5849397dbc6dc1b0f0b671d17034a28b.

As it said, this was a hack.  It causes problems when trying to
--force assemble a RAID4.  There is a better way.

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoAssemble: fix "no uptodate device" message.
NeilBrown [Tue, 7 Apr 2015 23:20:26 +0000 (09:20 +1000)] 
Assemble: fix "no uptodate device" message.

Since we introduced replacement devices, the 'i' used in
start_array() is twice the slot number.

So we need to adjust when printing.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoMonitor: use the "space protocol" for "Wrong-Level".
NeilBrown [Tue, 7 Apr 2015 23:18:55 +0000 (09:18 +1000)] 
Monitor: use the "space protocol" for "Wrong-Level".

"Wrong-Level" is a reason, not a component device, so it should
start with a space to indiciate this to alert().

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoMonitor: Obey "space protocol" when writing to syslog.
NeilBrown [Tue, 7 Apr 2015 23:17:17 +0000 (09:17 +1000)] 
Monitor: Obey "space protocol" when writing to syslog.

"alert" treats the "disc" arg differently if it starts with a space.

At least it does for sending email.  It doesn't for writing to syslog.

Make this consistent and obey the 'space protocol' when writing to
syslog.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoreshape: support raid5 grow on certain older kernels.
NeilBrown [Wed, 25 Mar 2015 23:06:26 +0000 (10:06 +1100)] 
reshape: support raid5 grow on certain older kernels.

Kernels between
  c6563a8c38fde3c1c7fc925a v3.5-rc1~110^2~53
and
  b5254dd5fdd9abcacadb5101 v3.5-rc1~110^2~51

allow new_offset to be set, but don't then allow a RAID5
to be reshaped to change that offset.
Due to selective backports, this includes the SLES11-SP3 kernel.

It is quite easy to handle this case in mdadm, so we do.
Specifically: if the reshape with data-offset fails with EINVAL,
abort the data-offset change and try the "old" way.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoIncRemove: Set "auto-read" only after successful excl open.
Pawel Baldysiak [Fri, 27 Feb 2015 14:47:54 +0000 (15:47 +0100)] 
IncRemove: Set "auto-read" only after successful excl open.

"mdadm -If" - triggered from udev rules when disk is removed from OS -
tries to set array in auto-read-only mode. This can interrupt rebuild
process which is started automatically, e.g. if array is mounted and
spare disk is available (I/O error is detected faster than removing
failed disk by mdadm).
This patch prevents "mdadm -If" from setting array into "auto-read-only",
by requiring exclusive open to succeed.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoIMSM-orom: make sure, that device list is supported
Pawel Baldysiak [Fri, 27 Feb 2015 14:45:50 +0000 (15:45 +0100)] 
IMSM-orom: make sure, that device list is supported

Devices list in PCI Data Structure is supported only in
3 and above revision. Make sure that this is checked.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoimsm: simplified multiple OROMs support
Artur Paszkiewicz [Fri, 27 Feb 2015 13:39:42 +0000 (14:39 +0100)] 
imsm: simplified multiple OROMs support

Replaced oroms array with list, add_orom() now only appends to this list
and add_orom_device_id() only appends devid_list node to an orom_entry.

Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoAssemble: don't ignore the return value from stat.
NeilBrown [Wed, 4 Mar 2015 04:55:44 +0000 (15:55 +1100)] 
Assemble: don't ignore the return value from stat.

static checkers complain about that.
So change the code to use 'fstat', as we really don't want
to see an error here..

Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agowrite_super_imsm_spares(): C statements are terminated by ;
Jes Sorensen [Tue, 24 Feb 2015 21:00:40 +0000 (16:00 -0500)] 
write_super_imsm_spares(): C statements are terminated by ;

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoIncrementalScan(): Make sure 'st' is valid before dereferencing it
Jes Sorensen [Tue, 24 Feb 2015 21:00:39 +0000 (16:00 -0500)] 
IncrementalScan(): Make sure 'st' is valid before dereferencing it

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoGrow.c: Fix classic readlink() buffer overflow
Jes Sorensen [Tue, 24 Feb 2015 21:00:36 +0000 (16:00 -0500)] 
Grow.c: Fix classic readlink() buffer overflow

The buffer passed on to readlink() needs to contain space for the
terminating \0. See 'man 3 readlink' for details.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoDon't break long strings onto multiple lines.
NeilBrown [Thu, 12 Feb 2015 02:46:53 +0000 (13:46 +1100)] 
Don't break long strings onto multiple lines.

It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

 "It is OK to\n"
 "break this string\n"

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoConsistently print program Name and __func__ in debug messages.
NeilBrown [Thu, 12 Feb 2015 02:21:17 +0000 (13:21 +1100)] 
Consistently print program Name and __func__ in debug messages.

make dprintf() print program name and __func__, so that
this messaging is consistent.

Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoChange way of printing name of a process
Pawel Baldysiak [Wed, 11 Feb 2015 21:25:03 +0000 (22:25 +0100)] 
Change way of printing name of a process

Sometimes mdadm prints messages with wrong name "mdmon",
and vice versa.
This patch solves this problem by changing method of determining
process name.
Now "Name" will be set in const at start of a program,
previously was hardcoded as #define.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoMonitor: fix for regression with container devices
Artur Paszkiewicz [Mon, 9 Feb 2015 10:13:50 +0000 (11:13 +0100)] 
Monitor: fix for regression with container devices

This patch fixes 2 problems introduced by commit 9a518d8: not closing a
file descriptor and ignoring container devices. Array state is always
"inactive" for containers, so we make sure that the device is not a
container by reading also the "level" sysfs entry.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agomdcheck: be careful when sourcing the output of "mdadm --detail --export"
NeilBrown [Tue, 3 Feb 2015 22:06:47 +0000 (09:06 +1100)] 
mdcheck: be careful when sourcing the output of "mdadm --detail --export"

The output of "mdadm --detail --export" isn't quoted properly so
fields that contain spaces can be a problem.
We only want the MD_UUID field, and it has a very well defined
format with no spaces.
So use 'grep' to limit the output to just that.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoIMSM: Clear migration record on disks more often
Pawel Baldysiak [Tue, 20 Jan 2015 12:52:25 +0000 (13:52 +0100)] 
IMSM: Clear migration record on disks more often

Migration record is not always cleared after successful migration. This can
block another reshape from being started. Migration will not be continued via
systemd service due to error in verifying reshape position. This patch added
clearing migration record when disk is added to container, and after successful
migration.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoutil: remove rounding error where reporting "human sizes".
NeilBrown [Thu, 18 Dec 2014 05:48:15 +0000 (16:48 +1100)] 
util: remove rounding error where reporting "human sizes".

The division
 1<<20 / 200
is not exact, so dividing by this to convert bytes into half-megs
is wrong and results in incorrect output.

As we are doing "long long" arithmetic, there is no risk of an overflow
until we reach 64 petabytes.
So change to
   * 200 / (1<<20).

Reported-by: Jan Echternach <jan@goneko.de>
Resolved-debian-bug: 763917
URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763917
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoGrow: Fix wrong 'goto' in set_new_data_offset
Pawel Baldysiak [Thu, 27 Nov 2014 11:35:24 +0000 (12:35 +0100)] 
Grow: Fix wrong 'goto' in set_new_data_offset

Commit a821c95f114724b38df1ea99b2858178e0ed28ce
besides introducing additional message, also changed
direct return to "goto" instruction.
'goto release' will cause routine to return with '-1',
when previously '1' was returned.
Described behaviour breaks e.g. IMSM reshape process.
This patch fixes this issue by changing 'goto' to proper one -
the one that returns '1'.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoMonitor: don't open md array that doesn't exist.
NeilBrown [Tue, 28 Oct 2014 21:48:02 +0000 (08:48 +1100)] 
Monitor: don't open md array that doesn't exist.

Opening a block-special-device for an array that doesn't
exist causes that array to be instantiated (as an empty array).
Races at array shutdown can cause the array to spontaneously
re-appear if some deamon notices a 'change' event and goes
to investigate.

Teach "mdadm --monitor" to avoid this race by checking the
"array_state" before opening the device.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoMakefile: binaries shouldn't directly depend on check_rundir
NeilBrown [Tue, 25 Nov 2014 00:44:18 +0000 (11:44 +1100)] 
Makefile: binaries shouldn't directly depend on check_rundir

check_rundir always needs to be "built", so making
mdadm and mdmon depend on it causes them to always be built.
i.e. running
   make ; make

will needlessly link the binaries a second time.

So change the makefile to use "order-only" pre-requisites.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoimsm: use efivarfs interface for reading UEFI variables
Artur Paszkiewicz [Thu, 20 Nov 2014 17:56:13 +0000 (18:56 +0100)] 
imsm: use efivarfs interface for reading UEFI variables

Read UEFI variables using the new efivarfs interface, fallback to
sysfs-efivars if that fails.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoimsm: detail-platform improvements
Artur Paszkiewicz [Wed, 19 Nov 2014 12:53:29 +0000 (13:53 +0100)] 
imsm: detail-platform improvements

Print platform details per OROM, not per controller, differentiate
RST(e) platforms from legacy IMSM, print NVMe device paths, adjust port
printing to newer sysfs path.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoimsm: add support for NVMe devices
Pawel Baldysiak [Wed, 19 Nov 2014 12:53:28 +0000 (13:53 +0100)] 
imsm: add support for NVMe devices

Recognize Intel(R) NVMe devices as IMSM-capable.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoimsm: support for second and combined AHCI controllers in UEFI mode
Artur Paszkiewicz [Wed, 19 Nov 2014 12:53:27 +0000 (13:53 +0100)] 
imsm: support for second and combined AHCI controllers in UEFI mode

Grantly platform introduces a second AHCI controller (sSATA) and two new
UEFI variables for the RSTe firmware. This patch adds support for those
variables in order to correctly determine IMSM platform capabilities in
UEFI mode.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoimsm: support for OROMs shared by multiple HBAs
Artur Paszkiewicz [Wed, 19 Nov 2014 12:53:26 +0000 (13:53 +0100)] 
imsm: support for OROMs shared by multiple HBAs

The IMSM platform code was based on an assumption that the OROM or UEFI
capability structure (represented by struct imsm_orom) always belongs to
only one HBA. This assumption is no longer valid, because of newer
platforms with dual AHCI HBAs. Each HBA can have a separate OROM, but
some versions have a combined OROM for both HBAs.

This patch implements this HBA-OROM relationship in struct orom_entry,
which matches an OROM with a list of HBA PCI ids. All the detected
orom_entries are stored and retrieved using a global array and the
functions add_orom(), add_orom_device_id() and get_orom_by_device_id().
This replaces the arrays: imsm_orom, populated_orom, imsm_efi,
populated_efi.

The scan() function is extended to find all HBAs for an OROM. The list
of their device ids is retrieved from the PCI Expansion ROM Data
Structure, hence the additional field devListOffset in struct
pciExpDataStructFormat.

In UEFI mode we can't read the PCI Expansion ROM Data Structure and the
imsm_orom structures are stored in UEFI variables. They do not provide a
similar device id list, so we also check the HBA PCI class to make sure
that the HBA has RAID mode enabled.

In super-intel.c there are changes which allow spanning of IMSM
containers over HBAs of the same type, but only if the HBAs share the
same OROM.  This is done by comparing imsm_orom pointers, which (outside
of platform-intel.c) always point to the global array containing all the
detected oroms. Additional warnings are added to
validate_container_imsm() to warn about potentially dangerous operations
in all the possible cases, e.g. when an array is assembled using disks
attached to HBAs with separate OROMs.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoIncremental: don't be distracted by partition table when calling try_spare.
NeilBrown [Wed, 5 Nov 2014 05:21:42 +0000 (16:21 +1100)] 
Incremental: don't be distracted by partition table when calling try_spare.

Currently a partition table on a device makes "mdadm -I" think
the array has a particular metadata type and so will only
add it to an array of that (partition table) type .. which doesn't
make any sense.

So tell guess_super to only look for 'array' metadata.

Reported-by: Caspar Smit <c.smit@truebit.nl>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoDetail: fix handling of 'disks' array.
NeilBrown [Mon, 3 Nov 2014 22:35:20 +0000 (09:35 +1100)] 
Detail: fix handling of 'disks' array.

Since the introduction of replacement devices, we reserve
to places in the "disks" array for each raid disk.
That means we should allocate to twice "max_disk" as the array
could have that many raid_disks (though that would limit the
number of replacements).

A couple of other places need to use "max_disks*2" instead of
"max_disks" to co-ordinate with this.

Reported-by: Or Sagi <ors@reduxio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agosuper1: remove some debugging printfs in update_super1
NeilBrown [Mon, 3 Nov 2014 01:56:37 +0000 (12:56 +1100)] 
super1: remove some debugging printfs in update_super1

These should never have been there.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoRebuildmap: strip local host name from device name.
NeilBrown [Mon, 3 Nov 2014 01:49:05 +0000 (12:49 +1100)] 
Rebuildmap: strip local host name from device name.

When /run/mdadm/map is being rebuilt, e.g. by "mdadm -Ir",
if the device doesn't exist in /dev, we have to choose
a name.
Currently we don't strip the hostname which is wrong if
it is the local host.

Reported-by: Stephen Kent <smkent@smkent.net>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agomdcheck: don't git error if not /dev/md?* devices exist.
NeilBrown [Mon, 3 Nov 2014 00:58:06 +0000 (11:58 +1100)] 
mdcheck: don't git error if not /dev/md?* devices exist.

If there are no such devices, the 'for' will set '$dev' to
'/dev/md?*', which should be ignored.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoGrow: fix resize of array component size to > 32bits
Justin Maggard [Sat, 25 Oct 2014 00:55:02 +0000 (17:55 -0700)] 
Grow: fix resize of array component size to > 32bits

If the request --size to --grow an array to is larger
than 32bits, then mdadm may make the wrong choice and
use ioctl instead of setting component_size via sysfs
and the change is ignored.

Instead of using casts to check for a 32-bit overflow,
just check for set bits outside of INT32_MAX.

Fixes: 4e9a3dd16d656b269f5602624ac4f7109a571368
Signed-off-by: NeilBrown <neilb@suse.de>
9 years agomdmon: already read sysfs files once after opening.
NeilBrown [Wed, 17 Sep 2014 05:02:18 +0000 (15:02 +1000)] 
mdmon: already read sysfs files once after opening.

seq_file in the kernel will allocate a read buffer on
first read.  We want this to happen under the managemon thread,
not the 'monitor' thread, as the latter is not allow to allocate
memory (might deadlock).
So do a first read after opening.

Signed-off-by: NeilBrown <neilb@suse.de>
9 years agoGrow: Report when grow needs metadata update
Andy Smith [Fri, 29 Aug 2014 20:47:12 +0000 (20:47 +0000)] 
Grow: Report when grow needs metadata update

Report when the array's metadata needs updating instead of just
reporting the generic "kernel too old" message.

Signed-off-by: Andy Smith <andy@strugglers.net>
Signed-off-by: NeilBrown <neilb@suse.de>
9 years ago--update: add 'bbl' and 'no-bbl' to the list of known updates.
NeilBrown [Wed, 27 Aug 2014 11:04:59 +0000 (21:04 +1000)] 
--update: add 'bbl' and 'no-bbl' to the list of known updates.

so "mdadm -A --update=?" mentions them.

Reported-by: Peter Hoeg <peter@hoeg.com>
Signed-off-by: NeilBrown <neilb@suse.de>