]> git.ipfire.org Git - thirdparty/mdadm.git/log
thirdparty/mdadm.git
8 years agoEnable create array with write journal (--write-journal DEVICE).
Song Liu [Fri, 9 Oct 2015 05:51:43 +0000 (22:51 -0700)] 
Enable create array with write journal (--write-journal DEVICE).

Specify the write journal device with --write-journal DEVICE

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Only one journal device is allowed. If multiple --write-journal
are given, mdadm will use the first and ignore others

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1 --write-journal /dev/sdx
mdadm: Please specify only one journal device for the array.
mdadm: Ignoring --write-journal /dev/sdx...
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoShow device as journal in --detail --examine
Song Liu [Fri, 9 Oct 2015 05:51:42 +0000 (22:51 -0700)] 
Show device as journal in --detail --examine

Example output:

./mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Wed May 13 17:01:12 2015
     Raid Level : raid5
     Array Size : 11720662464 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 3906887488 (3725.90 GiB 4000.65 GB)
   Raid Devices : 4
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed May 13 17:01:12 2015
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 32K

           Name : 0
           UUID : 8fb9ee05:3831d52f:e5c23825:28cd6881
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf

       4       8       17        -      journal   /dev/sdb1

./mdadm -E /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x201
     Array UUID : 562b2334:35b9bcc1:add50892:1f30c4bd
           Name : 0
  Creation Time : Thu Aug 27 12:55:26 2015
     Raid Level : raid5
   Raid Devices : 15

 Avail Dev Size : 249796608 (119.11 GiB 127.90 GB)
     Array Size : 54696423936 (52162.57 GiB 56009.14 GB)
  Used Dev Size : 7813774848 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : 5015e522:d39ba566:5909cf3c:9c51f2ff

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Aug 27 13:16:55 2015
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4e6fd76d - correct
         Events : 262

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Journal
   Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoadd macros for MD_DISK_ROLE_(SPARE/FAULTY)
Song Liu [Fri, 9 Oct 2015 05:51:41 +0000 (22:51 -0700)] 
add macros for MD_DISK_ROLE_(SPARE/FAULTY)

Replace special disk roles (0xffff, 0xfffe) with macros:

define MD_DISK_ROLE_SPARE      0xffff
define MD_DISK_ROLE_FAULTY     0xfffe

Will add macro for journal device in next patch:
define MD_DISK_ROLE_JOURNAL    0xfffd

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoimsm: don't call abort_reshape() in imsm_manage_reshape()
Artur Paszkiewicz [Mon, 5 Oct 2015 13:18:11 +0000 (15:18 +0200)] 
imsm: don't call abort_reshape() in imsm_manage_reshape()

Calling abort_reshape() in imsm_manage_reshape() is unnecessary in case
of an error because it is handled by reshape_array(). Calling it when
reshape completes successfully is also unnecessary and leads to a race
condition:
- reshape ends
- mdadm calls abort_reshape() -> sets sync_action to idle
- MD_RECOVERY_INTR is set and md_reap_sync_thread() does not finish the
  reshape

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Konrad Dabrowski <konrad.dabrowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agore-add: make re-add try to write sysfs node first
Guoqing Jiang [Wed, 7 Oct 2015 02:06:54 +0000 (10:06 +0800)] 
re-add: make re-add try to write sysfs node first

If sysfs node existed, we should try to write "re-add" to it.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMerge branch 'fix' of git://github.com/ldzhong/mdadm
NeilBrown [Wed, 30 Sep 2015 22:30:58 +0000 (08:30 +1000)] 
Merge branch 'fix' of git://github.com/ldzhong/mdadm

8 years agomdadm: make cluster raid also could support re-add
Guoqing Jiang [Thu, 20 Aug 2015 05:56:31 +0000 (13:56 +0800)] 
mdadm: make cluster raid also could support re-add

If it is a cluster raid, the disc.state need to be
changed accordingly when do re-add.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoFix --incremental handling on cluster array.
Goldwyn Rodrigues [Wed, 26 Aug 2015 16:35:21 +0000 (11:35 -0500)] 
Fix --incremental handling on cluster array.

Commit 06bd679317a2 ("Skip clustered devices in incremental")
disabled incremental completely on clustered arrays.
What we really want is that mdadm should not start or create
a clustered array but still be able to add or readd to an existing
device. This would enable udev scripts to automatically add
or re-add a device after transient errors.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosuper1: Do not create bad block log for clustered devices.
NeilBrown [Mon, 28 Sep 2015 01:49:53 +0000 (11:49 +1000)] 
super1: Do not create bad block log for clustered devices.

We currently have no synchronization techniques for the bad
block log, so disable it for the cluster.

Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoIncrement version for clustered bitmaps
Goldwyn Rodrigues [Tue, 18 Aug 2015 21:38:27 +0000 (07:38 +1000)] 
Increment version for clustered bitmaps

Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels
to assemble a clustered device.

In order to maximize compatibility, the major version is set to
BITMAP_MAJOR_CLUSTERED *only* if the bitmap is clustered.

Also, added MD_FEATURE_CLUSTERED in order to return error
for older kernels which would assemble MD in case bitmap is
corrupted.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: remove duplicate logic when c.delay is 0
Lidong Zhong [Wed, 26 Aug 2015 06:01:52 +0000 (14:01 +0800)] 
mdadm: remove duplicate logic when c.delay is 0

8 years agoMakefile: test -s flag and suppress echo when set.
NeilBrown [Wed, 5 Aug 2015 05:10:43 +0000 (15:10 +1000)] 
Makefile: test -s flag and suppress echo when set.

Some rules do their own tracing and so aren't affected
by -s.
So add a test for -s in MAKE_FLAGS and avoid echo when present.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: raid6 repair is now tested on every different layout.
NeilBrown [Mon, 20 Jul 2015 04:17:28 +0000 (14:17 +1000)] 
tests: raid6 repair is now tested on every different layout.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: correctly capture error from ->write_bitmap
NeilBrown [Wed, 5 Aug 2015 04:55:31 +0000 (14:55 +1000)] 
Assemble: correctly capture error from ->write_bitmap

else 'err' might be undefined.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomain: remove use of uninitialized 'rv'.
NeilBrown [Wed, 5 Aug 2015 04:53:33 +0000 (14:53 +1000)] 
main: remove use of uninitialized 'rv'.

If c.homecluster was not NULL, might get an
error anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check: don't ignore return value from posix_memalign.
NeilBrown [Wed, 5 Aug 2015 04:50:34 +0000 (14:50 +1000)] 
raid6check: don't ignore return value from posix_memalign.

Compilers don't like that.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoMerge branch 'mdadm-3.3.x'
NeilBrown [Mon, 3 Aug 2015 06:21:37 +0000 (16:21 +1000)] 
Merge branch 'mdadm-3.3.x'

8 years agoRelease mdadm-3.3.4 mdadm-3.3.x mdadm-3.3.4
NeilBrown [Mon, 3 Aug 2015 06:17:13 +0000 (16:17 +1000)] 
Release mdadm-3.3.4

Important bugfix release.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: really don't assemble IMSM array without OROM.
NeilBrown [Mon, 3 Aug 2015 06:06:51 +0000 (16:06 +1000)] 
Assemble: really don't assemble IMSM array without OROM.

Previous patch missed on case.

Also print more useful information when rejecting
a device with IMSM metadata.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: include mapfile support.
NeilBrown [Mon, 3 Aug 2015 01:54:16 +0000 (11:54 +1000)] 
mdassemble: include mapfile support.

This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: don't assemble IMSM array without OROM.
NeilBrown [Wed, 29 Jul 2015 04:38:37 +0000 (14:38 +1000)] 
Assemble: don't assemble IMSM array without OROM.

If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: include mapfile support.
NeilBrown [Mon, 3 Aug 2015 01:54:16 +0000 (11:54 +1000)] 
mdassemble: include mapfile support.

This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: don't try to perform cluster check.
NeilBrown [Mon, 3 Aug 2015 01:53:01 +0000 (11:53 +1000)] 
mdassemble: don't try to perform cluster check.

mdassemble is meant to be small an simple, so avoid
trying to check for a cluster.
Currently it doesn't, but it still includes the code,
which doesn't build because the library isn't provided.

So just exclude the get_cluster_name code from mdassemble.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomd-cluster: use %-64s to print cluster_name
Guoqing Jiang [Mon, 6 Jul 2015 08:52:11 +0000 (16:52 +0800)] 
md-cluster: use %-64s to print cluster_name

Left align is better for cluster with name less than 64. Also
make the output of cluster name is aligned with others.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: fix wrong condition for go to abort
Guoqing Jiang [Mon, 6 Jul 2015 08:52:10 +0000 (16:52 +0800)] 
mdadm: fix wrong condition for go to abort

When parse_cluster_confirm_arg return 0, it means the
arg are parsed successfully, so change !rv to rv.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: don't assemble IMSM array without OROM.
NeilBrown [Wed, 29 Jul 2015 04:38:37 +0000 (14:38 +1000)] 
Assemble: don't assemble IMSM array without OROM.

If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMerge branch 'cluster'
NeilBrown [Mon, 27 Jul 2015 01:01:08 +0000 (11:01 +1000)] 
Merge branch 'cluster'

Now that 3.3.3 is out, it is time to include the cluster-support code.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoRelease mdadm-3.3.3 mdadm-3.3.3
NeilBrown [Fri, 24 Jul 2015 05:35:53 +0000 (15:35 +1000)] 
Release mdadm-3.3.3

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: add "Name" definition.
NeilBrown [Fri, 24 Jul 2015 06:18:13 +0000 (16:18 +1000)] 
mdassemble: add "Name" definition.

That allows it to compile again :-(

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoDon't ignore return value from read and write
NeilBrown [Fri, 24 Jul 2015 06:11:23 +0000 (16:11 +1000)] 
Don't ignore return value from read and write

New gcc sometimes complains about this.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agobitmap: convert "inline" to "static inline"
NeilBrown [Fri, 24 Jul 2015 06:10:44 +0000 (16:10 +1000)] 
bitmap: convert "inline" to "static inline"

Otherwise new gcc ignores them with some compile options.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: extend --homehost='<ignore>' to allow --name= to ignore homehost
NeilBrown [Fri, 24 Jul 2015 02:50:54 +0000 (12:50 +1000)] 
Assemble: extend --homehost='<ignore>' to allow --name= to ignore homehost

Also make --homehost='<ignore>' work properly.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: assume recovery has completed if sync_completed says so.
NeilBrown [Thu, 23 Jul 2015 01:17:10 +0000 (11:17 +1000)] 
test: assume recovery has completed if sync_completed says so.

The final completion of a recovery can be delayed, so use
sync_completed to check if it is finished, just not been reaped.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: flushbufs after writing zeros
NeilBrown [Thu, 23 Jul 2015 01:09:19 +0000 (11:09 +1000)] 
tests: flushbufs after writing zeros

sometimes the removed device is re-added before the writes
get all the way to the md device - so the array doesn't need
any recovery and the test fails.
So flush first to be safe.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: add -F flag to mkfs
NeilBrown [Tue, 21 Jul 2015 23:58:41 +0000 (09:58 +1000)] 
test: add -F flag to mkfs

newer versions of mkfs.extX ask before creating a filesystem
on a device which appears to already have a filesystem.
We don't want that, so add the -F flag.
Also be explicit about fs type as one shouldn't depend on defaults.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: document --homehost=any functionality.
NeilBrown [Tue, 21 Jul 2015 23:33:17 +0000 (09:33 +1000)] 
mdadm: document --homehost=any functionality.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: improve tests for matching --name= request.
NeilBrown [Tue, 21 Jul 2015 23:24:36 +0000 (09:24 +1000)] 
Assemble: improve tests for matching --name= request.

If the name in the array has a home-host, then
require that it matches, or is "any", or requested
homehost is "any".

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check: use O_DIRECT instead of O_SYNC.
NeilBrown [Mon, 20 Jul 2015 07:17:37 +0000 (17:17 +1000)] 
raid6check: use O_DIRECT instead of O_SYNC.

O_DIRECT is more direct and is faster.
This requires aligned memory allocation, but that isn't hard.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agorestripe: fix data block order in raid6_2_data_recov
NeilBrown [Mon, 20 Jul 2015 07:15:13 +0000 (17:15 +1000)] 
restripe: fix data block order in raid6_2_data_recov

... rather than relying on the caller getting them in the
correct order.
This is better engineering and fixes a bug, but because the
failed_slotX numbers are used later with assumption that
they weren't swapped

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: various cleanup/fixes
NeilBrown [Mon, 20 Jul 2015 04:11:33 +0000 (14:11 +1000)] 
raid6check: various cleanup/fixes

- document meaning of various arrays. In particular:
   stripes[]
   blocks[]
   blocks_page[]
   block_index_for_slot[]

  It needs to be clear if these are indexed by raid_disk
  number or syndrome number.

- changed meaning of block_index_for_slot[].  It didn't seem
  to be used consistently.  It also made use of the block numbers
  in array data ordering, which is not directly relevant for syndrome
  calculations.

- reduced number of args to autorepair and manual_repair
  There don't need both stripes[] and blocks[].  And they don't need
  diskP or diskQ.
  blocks[-1] is the P chunk, blocks[-2] is the Q chunk.
  block_index_for_slot[] can be used to find the target device for
  a particular syndrome block.

- remove stripe locking from within manual_repair, and instead
  use the global stripe locking used for check and autorepair.

- this necessitated changes to raid6_datap_recov and raid5_2data_reov
  so the P and Q blocks could be before or after the data blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: really ensure stripe_cache is bit enough to handle new chunk size
NeilBrown [Fri, 17 Jul 2015 03:10:25 +0000 (13:10 +1000)] 
Assemble: really ensure stripe_cache is bit enough to handle new chunk size

Earlier patch:
  56fcbcbb6f17df0e5dedf59744deee037c5d5fbd
calculated the proper chunk size - but didn't use it..

Let's actually use it this time.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check
NeilBrown [Thu, 16 Jul 2015 01:55:27 +0000 (11:55 +1000)] 
raid6check

fix checking of DDF layouts.

Stuff probably still broken.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: get device ordering correct for syndrome calculation.
NeilBrown [Thu, 16 Jul 2015 01:25:40 +0000 (11:25 +1000)] 
raid6check: get device ordering correct for syndrome calculation.

The order of devices used for the syndrome calculation is not
the same as the order of data in the array.
The D block immediately after Q is first, then they continue
cyclicly in raid-disk order, skipping over the P disk if it is seen.

This gets the 'check' right for all layouts other than DDF, which is
quite different.

I haven't confirmed that this does't break repair.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: slow down --stop a bit to allow revert-inplace to work.
NeilBrown [Wed, 15 Jul 2015 23:27:58 +0000 (09:27 +1000)] 
tests: slow down --stop a bit to allow revert-inplace to work.

revert-inplace would sometimes find that the original reshape had
finished.
So slow down the reshaping during --stop (which needs to be a little
bit fast so that stop doesn't timeout waiting) and don't wait quite
so long before stopping.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: add 19raid6check
NeilBrown [Wed, 15 Jul 2015 22:02:52 +0000 (08:02 +1000)] 
tests: add 19raid6check

This checks that raid6check finds no errors in newly created array
with all different layouts.
(it doesn't...)

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: clear out old metadata from loop devices.
NeilBrown [Wed, 15 Jul 2015 21:49:14 +0000 (07:49 +1000)] 
test: clear out old metadata from loop devices.

Old metadata can tempt udev to assemble things, which
just gets in the way.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: report role of suspect device.
NeilBrown [Fri, 10 Jul 2015 04:46:59 +0000 (14:46 +1000)] 
raid6check: report role of suspect device.

i.e. -2 for Q, -1 for P, 0-N for data.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: save failure logs to logdir
NeilBrown [Fri, 10 Jul 2015 04:44:58 +0000 (14:44 +1000)] 
tests: save failure logs to logdir

If --save-logs is given we already save all logs to --logdir
If not, we should still save erroneous logs to --logdir.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: do not try to 'flushbufs' after stopping a array
NeilBrown [Fri, 10 Jul 2015 04:42:20 +0000 (14:42 +1000)] 
tests: do not try to 'flushbufs' after stopping a array

If the array is stopped, there is nothing to flush, and
blockdev can signal an error.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: add dmesg output to logs on error.
NeilBrown [Mon, 6 Jul 2015 03:59:33 +0000 (13:59 +1000)] 
test: add dmesg output to logs on error.

This can help isolate the problem.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: check sync_action as well when checking for an action.
NeilBrown [Mon, 6 Jul 2015 03:58:19 +0000 (13:58 +1000)] 
test: check sync_action as well when checking for an action.

Some actions only appear in /proc/mdstat after a little delay,
so check in sync_action as well.

This applies when checking for recovery etc, and when waiting for idle.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: speed up reshape when stopping arrays.
NeilBrown [Mon, 6 Jul 2015 03:52:04 +0000 (13:52 +1000)] 
test: speed up reshape when stopping arrays.

--stop needs to wait for reshape to get to a suitable
spot, so having really slow resync isn't helpful.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: stop all arrays before starting test.
NeilBrown [Mon, 6 Jul 2015 03:48:59 +0000 (13:48 +1000)] 
test: stop all arrays before starting test.

As well a cleaning up loop devices, stop all arrays.
After all, we cannot do the one without the other.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoGrow: remove stray tracing message.
NeilBrown [Mon, 6 Jul 2015 03:46:38 +0000 (13:46 +1000)] 
Grow: remove stray tracing message.

Signed-off-by: NeilBrow <neilb@suse.com>
8 years agoManage/stop: don't stop during initial critical section.
NeilBrown [Mon, 6 Jul 2015 03:45:39 +0000 (13:45 +1000)] 
Manage/stop: don't stop during initial critical section.

If the array is reshaping to more devices, then stopping
during that initial critical section is a bad idea.
So check for it and wait a bit.

Should probably handle final critical section of a reduction
too.
same-size reshape should be handled correctly already.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoManage/stop: improve some comments.
NeilBrown [Mon, 6 Jul 2015 03:37:19 +0000 (13:37 +1000)] 
Manage/stop: improve some comments.

This code always confuses me - this might help a bit.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoManage/stop: guard against 'completed' being too large.
NeilBrown [Mon, 6 Jul 2015 03:33:20 +0000 (13:33 +1000)] 
Manage/stop: guard against 'completed' being too large.

A race can allow 'completed' to read as 2^63-1, which takes
a long time to count up to.
So guard against that possibility.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMonitor: don't Wait forever on a 'frozen' array.
NeilBrown [Mon, 6 Jul 2015 03:26:41 +0000 (13:26 +1000)] 
Monitor: don't Wait forever on a 'frozen' array.

If Wait() finds the array resync is 'frozen', then wait
a little while to avoid races, but don't wait forever.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosysfs: reject reads that use the whole buffer.
NeilBrown [Mon, 6 Jul 2015 03:21:33 +0000 (13:21 +1000)] 
sysfs: reject reads that use the whole buffer.

If a read fills the whole buffer, then we possibly
missed something of the end, and we definitely shouldn't
put a '\0' beyond the end, so just return an error.
This should never happen anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoRemove some trailing white space
NeilBrown [Wed, 1 Jul 2015 22:26:30 +0000 (08:26 +1000)] 
Remove some trailing white space

It looks ugly in my editor.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoManage: fix no-op test in Manage_stop.
NeilBrown [Wed, 1 Jul 2015 22:16:59 +0000 (08:16 +1000)] 
Manage: fix no-op test in Manage_stop.

A 'devnm' never starts with '/', so this test is pointless.
The code should use the passed-in devname unless it is clearly
not usable.  So fix it to do that.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agomdstat: discard 'dev' field, just use 'devnm'
NeilBrown [Wed, 1 Jul 2015 22:15:10 +0000 (08:15 +1000)] 
mdstat: discard 'dev' field, just use 'devnm'

These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: fix typo in comment
NeilBrown [Thu, 18 Jun 2015 05:51:45 +0000 (15:51 +1000)] 
Grow: fix typo in comment

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: ensure stripe_cache is big enough to handle new chunk size
NeilBrown [Thu, 18 Jun 2015 05:49:52 +0000 (15:49 +1000)] 
Assemble: ensure stripe_cache is big enough to handle new chunk size

If you reshape to a larger chunk size, and need to restart,
it can have problems.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoReuse calc_bitmap_size to reduce code size cluster
Guoqing Jiang [Wed, 10 Jun 2015 05:42:13 +0000 (13:42 +0800)] 
Reuse calc_bitmap_size to reduce code size

We can use the new added calc_bitmap_size func to remove some
redundant lines.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agomdadm: change the num of cluster node
Guoqing Jiang [Wed, 10 Jun 2015 05:42:12 +0000 (13:42 +0800)] 
mdadm: change the num of cluster node

This extends nodes option for assemble mode, make the num of
cluster node could be change by user.

Before that, it is necessary to ensure there are enough space
for those nodes, calc_bitmap_size is introduced to calculate
the bitmap size of each node.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agomdadm: add the ability to change cluster name
Guoqing Jiang [Wed, 10 Jun 2015 05:42:11 +0000 (13:42 +0800)] 
mdadm: add the ability to change cluster name

To support change the cluster name, the commit do the followings:

1. extend original write_bitmap function for new scenario.
2. add the scenarion to handle the modification of cluster's name
   in write_bitmap1.
3. let the cluster name also show in examine_super1 and detail_super1

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoSkip clustered devices in incremental
Guoqing Jiang [Wed, 10 Jun 2015 05:42:10 +0000 (13:42 +0800)] 
Skip clustered devices in incremental

We want the clustered devices to be started exclusively by a cluster
resource-agent. So, avoid starting using the incremental option.

This also skips a clustered md from starting during boot in inactive mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoConvert a bitmap=none device to clustered
Guoqing Jiang [Wed, 10 Jun 2015 05:42:09 +0000 (13:42 +0800)] 
Convert a bitmap=none device to clustered

This adds the ability to convert a regular md without bitmap
(--bitmap=none) to a clustered device (--bitmap=clustered).

To convert a device with --bitmap=internal or --bitmap=external,
you have to convert to --bitmap=none and then re-execute the
command with --bitmap=clustered.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAdd a new clustered disk
Guoqing Jiang [Wed, 10 Jun 2015 05:42:08 +0000 (13:42 +0800)] 
Add a new clustered disk

A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:

--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)

The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoShow all bitmaps while examining bitmap
Guoqing Jiang [Wed, 10 Jun 2015 05:42:07 +0000 (13:42 +0800)] 
Show all bitmaps while examining bitmap

This adds capability of exmining bitmaps corresponding to all
nodes/slots on the device.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoSet home-cluster while creating an array
Guoqing Jiang [Wed, 10 Jun 2015 05:42:06 +0000 (13:42 +0800)] 
Set home-cluster while creating an array

The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.

If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.

neilb: allow code to compile when corosync-devel is not installed.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAdd nodes option while creating md
Guoqing Jiang [Wed, 10 Jun 2015 05:42:05 +0000 (13:42 +0800)] 
Add nodes option while creating md

Specifies the maximum number of nodes in the cluster that may use
this device simultaneously. This is equivalent to the number of
bitmaps created in the internal superblock (patches to follow).

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoCreate n bitmaps for clustered mode
Guoqing Jiang [Wed, 10 Jun 2015 05:42:04 +0000 (13:42 +0800)] 
Create n bitmaps for clustered mode

For a clustered MD, create bitmaps equal to number of nodes so
each node has an independent bitmap.

Only the first bitmap is has the bits set so that the first node
that assembles the device also performs the sync.

The bitmaps are aligned to 4k boundaries.

On-disk format:

0                    4k                     8k                    12k
-------------------------------------------------------------------
| idle                | md super            | bm super [0] + bits |
| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
| bm bits [3, contd]  |                     |                     |

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: fix a couple of typos.
NeilBrown [Thu, 28 May 2015 07:21:06 +0000 (17:21 +1000)] 
Grow: fix a couple of typos.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: make 'check wait' more reliable.
NeilBrown [Thu, 28 May 2015 06:53:26 +0000 (16:53 +1000)] 
test: make 'check wait' more reliable.

'recover' etc doesn't appear in /proc/mdstat immediately.
The "sync" thread must be started first.
But 'sync_action' shows it as soon as MD_RECOVERY_NEEDED is set
in the kernel.  So look there too.

Now maybe I can get rid of some of those silly 'sleep' calls.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/imsm-grow-template change 'wait' to 'check wait'
NeilBrown [Thu, 28 May 2015 06:51:23 +0000 (16:51 +1000)] 
tests/imsm-grow-template change 'wait' to 'check wait'

'wait' is a shell builtin that isn't doing anything useful.
It should be calling 'check wait' I think.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: fix problem with --grow --continue
NeilBrown [Thu, 28 May 2015 06:43:15 +0000 (16:43 +1000)] 
Grow: fix problem with --grow --continue

If an array is being reshaped using backup space on a 'spare' device,
then
  mdadm --grow --continue
won't find it as by the time it runs, nothing looks like a spare are
more.  The spare has been added to the array, but has no data yet.

So allow reshape_prepare_fdlist to find a newly-incorporated spare and
report this so it can be used.

Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: wait a bit long for reshape to complete.
NeilBrown [Mon, 25 May 2015 06:59:19 +0000 (16:59 +1000)] 
tests: wait a bit long for reshape to complete.

As the kernel now does less locking, 'check wait' doesn't
always wait long enough.  Add some pauses.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: another attempt to fix stop-during-reshape race.
NeilBrown [Mon, 25 May 2015 06:33:45 +0000 (16:33 +1000)] 
Grow: another attempt to fix stop-during-reshape race.

When the array is stopped during a critical section, we sometimes
erase the backup, which is bad.
This happens when 'completed' is zero.
This can happen easily when 'stop' freezes reshape.

So try to be more careful and check 'reshape_position'.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoFix minor typo in mdadm manpage.
Andrew Burgess [Wed, 25 Mar 2015 17:17:49 +0000 (17:17 +0000)] 
Fix minor typo in mdadm manpage.

Appologies if this is the wrong mailing list for this patch.

This is a very small patch for the manual page for the mdadm utility.

Thanks,
Andrew

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agomdadm: monitor: fix nullptr dereference when get_md_name() returns NULL
Sergey Vidishev [Tue, 19 May 2015 19:02:46 +0000 (22:02 +0300)] 
mdadm: monitor: fix nullptr dereference when get_md_name() returns NULL

Function add_new_arrays() expects that function get_md_name() should
return pointer to devname, but also get_md_name() may return NULL. So
check the pointer before use it in add_new_arrays().

Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: forcefully clean up old loop devices.
NeilBrown [Wed, 20 May 2015 03:16:00 +0000 (13:16 +1000)] 
test: forcefully clean up old loop devices.

sometimes these can get left around, and udev can be looking
at them at awkward times so they don't disappear.
So be forceful.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: be even more careful about handing a '0' completed value.
NeilBrown [Fri, 15 May 2015 05:11:48 +0000 (15:11 +1000)] 
Grow: be even more careful about handing a '0' completed value.

Some old kernels set 'completed' to '0' too soon.
But modern kernels don't.
And when 'mdadm --stop' freezes and resume the grow,
'completed' goes back to zero briefly, which can confuse this
logic.
So only  think '0' might be wrong from an old kernel when
the reshape has gone idle.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/07reshape5intr : retry if writing 'check' fails.
NeilBrown [Fri, 15 May 2015 05:09:08 +0000 (15:09 +1000)] 
tests/07reshape5intr : retry if writing 'check' fails.

It can sometimes.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/19raid6repair: don't flushbufs on non-existent array.
NeilBrown [Fri, 15 May 2015 02:34:27 +0000 (12:34 +1000)] 
tests/19raid6repair: don't flushbufs on non-existent array.

..that triggers an error.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: wait for complete rebuild in integrity checks
NeilBrown [Thu, 14 May 2015 23:40:33 +0000 (09:40 +1000)] 
tests: wait for complete rebuild in integrity checks

'check wait' seems a bit racy now.
Wait for the array to be fully optimal before proceeding.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: retry when writing 'reshape' to 'sync_action' is EBUSY.
NeilBrown [Thu, 14 May 2015 04:50:42 +0000 (14:50 +1000)] 
Grow: retry when writing 'reshape' to 'sync_action' is EBUSY.

EBUSY can be returned if something has recently happened
to cause md to want to check if recovery is needed, but hasn't
had a chance yet.

This can easily happen in testing.

So retry a few times in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/05r6tor0: minor adjustments
NeilBrown [Thu, 14 May 2015 03:41:37 +0000 (13:41 +1000)] 
tests/05r6tor0: minor adjustments

1/ use correct data-offset for cmp - that has changed.
2/ flushbufs on the block device before reading to avoid cache issues

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: 05r6tor0 - add some more waiting.
NeilBrown [Thu, 14 May 2015 02:27:25 +0000 (12:27 +1000)] 
tests: 05r6tor0 - add some more waiting.

I don't really know why this is needed, but there is a delay
between the reshape finishing and the level/etc changing.
So add some sleeps.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests/imsm-grow-template: sleep a bit more.
NeilBrown [Thu, 14 May 2015 02:14:26 +0000 (12:14 +1000)] 
tests/imsm-grow-template: sleep a bit more.

The current sleep/wait doesn't seem long enough,
particularly when two arrays are being reshaped in the one
container.

So wait a bit more...

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: be more careful if array is stopped during critical section.
NeilBrown [Thu, 14 May 2015 23:42:39 +0000 (09:42 +1000)] 
Grow: be more careful if array is stopped during critical section.

In that case, updating 'completed' to 'max_progress' is wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: add missing space in message.
NeilBrown [Thu, 14 May 2015 23:41:12 +0000 (09:41 +1000)] 
Grow: add missing space in message.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoGrow: only warn about incompatible metadata when no fallback available.
NeilBrown [Thu, 14 May 2015 01:17:39 +0000 (11:17 +1000)] 
Grow: only warn about incompatible metadata when no fallback available.

We might be trying to set_new_data_offset() for RAID10, when it is
a necessary requirement, or for RAID5 where it is optional.
In the latter case, a message about metadata versions is no helpful.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoManage: when re-adding, do check avail size if ->sb cannot be found.
NeilBrown [Wed, 13 May 2015 04:08:41 +0000 (14:08 +1000)] 
Manage: when re-adding, do check avail size if ->sb cannot be found.

avail_size1 requires ->sb, so we must only call it if ->sb
was loaded.

If ->sb wasn't loaded, then we are only proceding on the basis that
the kernel might be able to work something out - we don't need to
do any tests on size.

Reported-by: Christoffer Hammarström <christoffer.hammarstrom@linuxgods.com>
Signed-off-by: NeilBrown <neilb@suse.de>
URL: https://bugs.debian.org/784874

8 years agotests: don't "dd" indefinitely.
NeilBrown [Wed, 13 May 2015 03:24:33 +0000 (13:24 +1000)] 
tests: don't "dd" indefinitely.

This will trigger an error.  And now that errors are fatal....

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: ignore failure status from mdadm -IRs
NeilBrown [Wed, 13 May 2015 03:11:02 +0000 (13:11 +1000)] 
tests: ignore failure status from mdadm -IRs

This can report non-zero if there was nothing to do,
and that isn't really an error.
If the array doesn't get started, something else
will complain.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: don't check for pre-existing array when updating uuid.
NeilBrown [Wed, 13 May 2015 02:41:48 +0000 (12:41 +1000)] 
Assemble: don't check for pre-existing array when updating uuid.

This is a very corner-case, but the self-tests tripped on it,
and it makes sense not to trust the uuid when it is being changed.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoDDF: _write_super_to_disk: fix anchor header type
Martin Wilck [Mon, 11 May 2015 14:09:44 +0000 (16:09 +0200)] 
DDF: _write_super_to_disk: fix anchor header type

Since commit 30bee0201, the anchor is updated from the active
DDF header. This requires fixing the header type before the
anchor is written.

The LSI Software RAID code will reject DDF meta data with wrong
anchor type and will erase all meta data when it encounters
such a broken anchor. Thus starting Linux md once on a system
with LSI RAID BIOS may cause the meta data to get destroyed.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: never fail if --wait fails.
NeilBrown [Thu, 7 May 2015 07:00:57 +0000 (17:00 +1000)] 
tests: never fail if --wait fails.

"--wait" will return non-zero status if it didn't need to wait.
This is no a reason to fail a test.

So ignore the return status from those commands.

Signed-off-by: NeilBrown <neilb@suse.de>