]> git.ipfire.org Git - thirdparty/mdadm.git/log
thirdparty/mdadm.git
8 years agoRelease mdadm-3.4 mdadm-3.4
NeilBrown [Thu, 28 Jan 2016 02:34:13 +0000 (13:34 +1100)] 
Release mdadm-3.4

My last release!

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssorted fixed for a "make everything" build
NeilBrown [Thu, 28 Jan 2016 02:28:58 +0000 (13:28 +1100)] 
Assorted fixed for a "make everything" build

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosuper1: allow reshape that hasn't really started to be reverted.
NeilBrown [Thu, 28 Jan 2016 01:57:08 +0000 (12:57 +1100)] 
super1: allow reshape that hasn't really started to be reverted.

A simple revert doesn't work here because the reshape_position is
in the critical section.
The best approach is to let the reshape progress a bit and then
go backwards.
If that isn't possible, assembling with --update=revert-reshape and
--invalid-backup should work.

Reported-by-tested-by: George Rapp <george.rapp@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosuper0: Fix reporting of devices between 2GB and 4GB
NeilBrown [Thu, 28 Jan 2016 00:57:54 +0000 (11:57 +1100)] 
super0: Fix reporting of devices between 2GB and 4GB

v0.90 metadata can handle devices between 2GB and 4GB, but we need
to treat the 'size' and unsigned.  In a couple of places we don't.

URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=809447
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosystemd/mdadm-last-resort: add Conflicts to .service file.
NeilBrown [Thu, 28 Jan 2016 00:45:53 +0000 (11:45 +1100)] 
systemd/mdadm-last-resort: add Conflicts to .service file.

It seems that having the Conflicts in the .timer file is not sufficient.
Sometimes it works, but if the timer gets requested after the conflicting
block device appears (or was it "before" ...) the timer is not aborted.

Having the Conflicts in both files seems to work reliably.

URL: https://bugzilla.suse.com/show_bug.cgi?id=853944
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosuper1: fix calculation of space_before
NeilBrown [Thu, 28 Jan 2016 00:44:27 +0000 (11:44 +1100)] 
super1: fix calculation of space_before

This code was meant to update 'earliest' but clearly never doesn't.

This bug would only affect an array with a very large bitmap so it is unlikely
to be significant.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoutil: fix wrong return value of cluster_get_dlmlock
Guoqing Jiang [Wed, 20 Jan 2016 08:21:25 +0000 (16:21 +0800)] 
util: fix wrong return value of cluster_get_dlmlock

Actually lksb.sb_status means that a node got the lock
or not instead of the return value of dlm_lock.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
8 years agoAdd casts for the addr arg of connect and bind
Khem Raj [Thu, 14 Jan 2016 06:32:39 +0000 (22:32 -0800)] 
Add casts for the addr arg of connect and bind

glibc allows the addr arg to connect and socket to be any of a number
of 'sockaddr_*' types, but musl requires 'const struct sockaddr *'
which is in line with open group specs.  So add casts to allow
compilation with musl.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoDefine _POSIX_C_SOURCE if undefined
Khem Raj [Thu, 14 Jan 2016 06:32:38 +0000 (22:32 -0800)] 
Define _POSIX_C_SOURCE if undefined

config.c uses _POSIX_C_SOURCE which is defined in features.h when
glibc/uclibc is used, but isn't defined when musl is used.
So provide a reasonable default.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoCreate: minor fix when adding a journal device
NeilBrown [Thu, 14 Jan 2016 03:09:57 +0000 (14:09 +1100)] 
Create: minor fix when adding a journal device

The check of "is there a filesystem here" is still appropriate for a
journal device.

Also set active_disks correctly - even though it is ignored.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoCreate: fix regression in setting raid_disk
NeilBrown [Thu, 14 Jan 2016 02:22:17 +0000 (13:22 +1100)] 
Create: fix regression in setting raid_disk

Recent commit caused 'missing' declarations to not be handled correctly.

Fixes: cc1799c3ddc9 ("Enable create array with write journal (--write-journal DEVICE).")
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agorestripe: fix compilation of "make test"
NeilBrown [Tue, 12 Jan 2016 23:01:02 +0000 (10:01 +1100)] 
restripe: fix compilation of "make test"

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoFix wrong description in manpage
Guoqing Jiang [Tue, 12 Jan 2016 15:08:24 +0000 (18:08 +0300)] 
Fix wrong description in manpage

The careless change was introduce by 'commit 7e6e839a2651
(mdadm: change the num of cluster node)'. Which should be
revert to avoid misunderstanding.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoimsm: don't update migration record when reshape is interrupted
Artur Paszkiewicz [Tue, 5 Jan 2016 16:16:16 +0000 (17:16 +0100)] 
imsm: don't update migration record when reshape is interrupted

Abort imsm_manage_reshape() without updating the migration record if any
error occurs when checking progress. If reshape is interrupted and the
migration record is then updated, the checkpoint will be wrong and will
cause reshape to fail when the array is restarted.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoimsm: use timeout when waiting for reshape progress
Artur Paszkiewicz [Tue, 5 Jan 2016 16:16:15 +0000 (17:16 +0100)] 
imsm: use timeout when waiting for reshape progress

Waiting for reshape progress is done by using select() on sync_completed
to block until an exception condition is signalled on the
filedescriptor. This happens when the attribute's value is updated by
the kernel, but if the array is stopped when mdadm is blocked on
select() this will never happen, because this attribute is then removed
and apparently the kernel doesn't do sysfs_notify() when removing a
sysfs attribute. So set a 3 second timeout for the sysfs_wait() call.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoIMSM: Add support for VMD
Pawel Baldysiak [Tue, 5 Jan 2016 16:03:04 +0000 (17:03 +0100)] 
IMSM: Add support for VMD

The Intel Volume Management Device (VMD) is an integrated
endpoint on the platform's PCIe root complex that acts
as a host bridge to a secondary PCIe domain.

This patch adds proper handling of NVMe devices attached to VMD domain.
Each VMD domain is treated as a separate controller (HBA).
Spanning between domains is forbidden.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoimsm: abort reshape if sync_action is not "reshape"
Artur Paszkiewicz [Wed, 23 Dec 2015 11:57:11 +0000 (12:57 +0100)] 
imsm: abort reshape if sync_action is not "reshape"

When reshape was interrupted, an incorrect checkpoint would be saved in
the migration record. Change wait_for_reshape_imsm() to return -1 when
sync_action is not "reshape" to abort early in imsm_manage_reshape()
without writing the migration record.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoGrow: close file descriptor earlier to avoid "still in use" when stopping
Artur Paszkiewicz [Wed, 23 Dec 2015 11:57:10 +0000 (12:57 +0100)] 
Grow: close file descriptor earlier to avoid "still in use" when stopping

Close fd2 as soon as it is no longer needed, before calling
Grow_continue(). Otherwise, we won't be able to stop an array with
external metadata during reshape, because mdadm running in background
will be keeping it open.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoDetail: fix wrong condition in recent change.
NeilBrown [Wed, 23 Dec 2015 01:15:32 +0000 (12:15 +1100)] 
Detail: fix wrong condition in recent change.

Now that we can print device details with a specific raid_disk but not
disk.number, the condition for "print either disk.number or disk.raid_disk"
must be make more specific.

Reported-by: Coly Li <colyli@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoCheck and remove bitmap first when reshape to raid0
Xiao Ni [Tue, 22 Dec 2015 03:09:34 +0000 (11:09 +0800)] 
Check and remove bitmap first when reshape to raid0

If reshape one raid device with bitmap to raid0, the reshape progress will
start. But it'll fail and lose some components. So it should remove bitmap
first.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoin --add assign raid_disk of 0 to journal
Song Liu [Mon, 21 Dec 2015 19:23:42 +0000 (11:23 -0800)] 
in --add assign raid_disk of 0 to journal

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomove journal to end of --detail list
Song Liu [Mon, 21 Dec 2015 19:23:41 +0000 (11:23 -0800)] 
move journal to end of --detail list

As we give journal device raid_disk of 0, the output of --detail is:

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       5       8       24        0      journal   /dev/sdb8
       1       8       18        1      active sync   /dev/sdb2
       2       8       19        2      active sync   /dev/sdb3
       3       8       21        3      active sync   /dev/sdb5

       4       8       23        -      spare   /dev/sdb7

This patch makes it back to:
    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       18        1      active sync   /dev/sdb2
       2       8       19        2      active sync   /dev/sdb3
       3       8       21        3      active sync   /dev/sdb5

       4       8       23        -      spare   /dev/sdb7
       5       8       24        -      journal   /dev/sdb8

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAdd --update=force-no-bbl.
NeilBrown [Mon, 21 Dec 2015 03:56:38 +0000 (14:56 +1100)] 
Add --update=force-no-bbl.

This forcibly removed the bad-block log.  There can be situations where it is hard to
remove bad blocks by writing to them - partiularly on RAID5.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMerge branch 'fix-unlikely-potential-overflows' of https://github.com/sjvs/mdadm
NeilBrown [Mon, 21 Dec 2015 02:01:10 +0000 (13:01 +1100)] 
Merge branch 'fix-unlikely-potential-overflows' of https://github.com/sjvs/mdadm

8 years agoMerge https://github.com/makelinux/mdadm
NeilBrown [Mon, 21 Dec 2015 01:57:06 +0000 (12:57 +1100)] 
Merge https://github.com/makelinux/mdadm

Fixes https://github.com/neilbrown/mdadm/issues/17

8 years agoDetail: don't assume a particular 'disk' number of missing devices.
NeilBrown [Fri, 18 Dec 2015 02:51:54 +0000 (13:51 +1100)] 
Detail: don't assume a particular 'disk' number of missing devices.

When a particular raid-disk is missing, we don't know which disk number
it should have, and reporting a number could result in duplicate
numbers (with v1.x metadata - never with the old 0.90).

So set the default to -1 and recoginise that when printing.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoDetail: report correct raid-disk for removed drives.
NeilBrown [Fri, 18 Dec 2015 02:49:30 +0000 (13:49 +1100)] 
Detail: report correct raid-disk for removed drives.

Back in
  Commit: 8057db46a15d ("Detail: fix handling of 'disks' array.")
when we doubled the size of the 'disks' array to handle primary and
replacement, we should have halved the setting of the default raid_disk
number.

Reported-by: Coly Li <colyli@suse.de>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: improve the safeguard for change cluster raid's sb
Guoqing Jiang [Wed, 16 Dec 2015 17:54:26 +0000 (01:54 +0800)] 
mdadm: improve the safeguard for change cluster raid's sb

This commit does the following jobs:

1. rename is_clustered to dlm_funs_ready since it match the
   function better.
2. st->cluster_name can't be use to identify the raid is a
   clustered or not, we should check the bitmap's version to
   perform the identification.
3. for cluster_get_dlmlock/cluster_release_dlmlock funcs, both
   of them just need the lockid as parameter since the cluster
   name can get by get_cluster_name().

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: do not try to hold dlm lock in free_super1
Guoqing Jiang [Wed, 16 Dec 2015 17:54:25 +0000 (01:54 +0800)] 
mdadm: do not try to hold dlm lock in free_super1

Since free_super1 actually doesn't change the sb, it
just free the addr space of sb. Also free_super1 is
called in lots of place within mdadm, so remove dlm
lock code since the func doesn't need the protection
and also reduce latency.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: do not display bitmap info if it is cleared
Guoqing Jiang [Tue, 1 Dec 2015 16:30:12 +0000 (00:30 +0800)] 
mdadm: do not display bitmap info if it is cleared

"mdadm -X DISK" is used to report information about a bitmap
file, it is better to not display all the related infos if
bitmap is cleared with "--bitmap=none" under grow mode.

To do that, the locate_bitmap is changed a little to have a
return value based on MD_FEATURE_BITMAP_OFFSET.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: don't show cluster name once the bitmap is cleared
Guoqing Jiang [Tue, 1 Dec 2015 16:30:11 +0000 (00:30 +0800)] 
mdadm: don't show cluster name once the bitmap is cleared

Don't show cluster name if bitmap is cleared.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: output info more precisely when change bitmap to none
Guoqing Jiang [Tue, 1 Dec 2015 16:30:10 +0000 (00:30 +0800)] 
mdadm: output info more precisely when change bitmap to none

WHen change bitmap to none, the infos could be more accurate
based on existed bitmap type.

And s->bitmap_file is passed from cmd "--bitmap=TYPE", so
remove s->bitmap_file from err info since it should means
change the bitmap to one type failed rather than the type is
already presented.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: let cluster raid could also add disk within incremental mode
Guoqing Jiang [Tue, 1 Dec 2015 16:30:09 +0000 (00:30 +0800)] 
mdadm: let cluster raid could also add disk within incremental mode

For cluster raid, the disc.state need to be changed accordingly under
incremental mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agorecreate journal in mdadm
Song Liu [Tue, 15 Dec 2015 01:43:43 +0000 (17:43 -0800)] 
recreate journal in mdadm

This patch tries recreates missing/faulty journal in mdadm.

Example:

./mdadm --fail /dev/md1 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1

./mdadm --stop /dev/md1
mdadm: stopped /dev/md1

./mdadm -A --scan --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md/1 has been started with 15 drives.

./mdadm --add-journal /dev/md1 /dev/sdb2
mdadm: added /dev/sdb2

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoadd sysfs_array_state to struct mdinfo
Song Liu [Tue, 15 Dec 2015 01:43:42 +0000 (17:43 -0800)] 
add sysfs_array_state to struct mdinfo

Add sysfs_array_state to struct mdinfo, and add GET_ARRAY_STATE to
options of sysfs_read.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: Change timestamps to unsigned data type.
Deepa Dinamani [Tue, 8 Dec 2015 23:10:21 +0000 (15:10 -0800)] 
mdadm: Change timestamps to unsigned data type.

32 bit signed timestamps will overflow in the year 2038.

Change the user interface mdu_array_info_s structure timestamps:
ctime and utime values used in ioctls GET_ARRAY_INFO and
SET_ARRAY_INFO to unsigned int. This will extend the field to last
until the year 2106.

Add time_after/time_before and supporting typecheck from
the kernel to take care of unsigned time wraparound.

The long term plan is to get rid of ctime and utime values in
this structure as this information can be read from the on-disk
meta data directly.

v0.90 on disk meta data uses u32 for maintaining time stamps.
So this will also last until year 2106.
Assumption is that the usage of v0.90 will be deprecated by
year 2106.

Timestamp fields in the on disk meta data for v1.0 version already
use 64 bit data types.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoDetail.c --test fix
Constantine Shulyupin [Thu, 10 Dec 2015 14:19:46 +0000 (16:19 +0200)] 
Detail.c --test fix

8 years agofix bug in assemble
Song Liu [Tue, 8 Dec 2015 01:08:39 +0000 (17:08 -0800)] 
fix bug in assemble

In Assemble, getinfo_super() over-writes journal_clean.  To
ensure correct journal_clean, keep it in a local variable
before getinfo_super().

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomake sure 'path' buffer is large enough to fit 200 characters plus null terminator
Bas van Schaik [Thu, 3 Dec 2015 13:37:08 +0000 (13:37 +0000)] 
make sure 'path' buffer is large enough to fit 200 characters plus null terminator

8 years agoavoid confusion with parameter 'devname' with same name, ensure buffer is large enoug...
Bas van Schaik [Thu, 3 Dec 2015 13:28:32 +0000 (13:28 +0000)] 
avoid confusion with parameter 'devname' with same name, ensure buffer is large enough for two ints plus extras

8 years agoensure buffer is large enough for two ints and some extras
Bas van Schaik [Thu, 3 Dec 2015 13:23:18 +0000 (13:23 +0000)] 
ensure buffer is large enough for two ints and some extras

8 years agoadd crc32c and use it for r5l checksum
Song Liu [Wed, 28 Oct 2015 19:06:06 +0000 (12:06 -0700)] 
add crc32c and use it for r5l checksum

In kernel space, r5l checksum will use crc32c:
http://marc.info/?l=linux-raid&m=144598970529191
mdadm need to change too.

This patch ports a simplified crc32c algorithm from kernel code,
and used in super1.c:write_empty_r5l_meta_block();

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: add test script for raid456 journal
Song Liu [Wed, 21 Oct 2015 18:35:16 +0000 (11:35 -0700)] 
mdadm: add test script for raid456 journal

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: Add description of write journal to md.4
Song Liu [Wed, 21 Oct 2015 18:35:15 +0000 (11:35 -0700)] 
mdadm: Add description of write journal to md.4

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: refactor write journal code in Assemble and Incremental
Song Liu [Wed, 21 Oct 2015 18:35:14 +0000 (11:35 -0700)] 
mdadm: refactor write journal code in Assemble and Incremental

As discussed, standalone require_journal() in struct superswitch
is not a very good idea. Instead, journal related information
fits well in struct mdinfo.

This patch simplifies journal support code in Assemble and
Incremental as:

- Add journal_device_required and journal_clean to struct mdinfo;
- Remove function require_journal from struct superswitch;
- Update Assemble and Incremental to use journal_device_required
and journal_clean from struct mdinfo (instead of separate var).

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMake cmap_* also has same policy as dlm_*
Guoqing Jiang [Mon, 19 Oct 2015 08:03:20 +0000 (16:03 +0800)] 
Make cmap_* also has same policy as dlm_*

Let libcmap lib and related funs also only need one-time
setup during mdadm running period.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoSafeguard against writing to an active device of another node
Guoqing Jiang [Mon, 19 Oct 2015 08:03:19 +0000 (16:03 +0800)] 
Safeguard against writing to an active device of another node

Modifying an exiting device's superblock or creating a new superblock
on an existing device needs to be checked because the device could be
in use by another node in another array. So, we check this by taking
all superblock locks in userspace so that we don't  step onto an active
device used by another node and safeguard against accidental edits.
After the edit is complete, we release all locks and the lockspace so
that it can be used by the kernel space.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAdd help message and man entry for --write-journal
Song Liu [Fri, 9 Oct 2015 05:51:46 +0000 (22:51 -0700)] 
Add help message and man entry for --write-journal

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoCheck write journal in incremental
Song Liu [Fri, 9 Oct 2015 05:51:45 +0000 (22:51 -0700)] 
Check write journal in incremental

If journal device is missing, do not start the array, and shows:

./mdadm -I /dev/sdf
mdadm: journal device is missing, not safe to start yet.

The array will be started when the journal device is attached with -I

./mdadm -I /dev/sdb1
mdadm: /dev/sdb1 attached to /dev/md/0_0, which has been started.

To force start without journal device:

./mdadm -I /dev/sdf --run
mdadm: Trying to run with missing journal device
mdadm: /dev/sdf attached to /dev/md/0_0, which has been started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble array with write journal
Song Liu [Fri, 9 Oct 2015 05:51:44 +0000 (22:51 -0700)] 
Assemble array with write journal

Example output:

./mdadm --assemble /dev/md0 /dev/sd[c-f] /dev/sdb1
mdadm: /dev/md0 has been started with 4 drives and 1 journal.

mdadm checks superblock for journal devices. If the journal device
is missing or faulty, mdadm will show warning

./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1
mdadm: Not safe to assemble with missing or stale journal device, consider --force.

User can insist to start the array (read only) with --force

./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1 --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md0 has been started with 15 drives.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoEnable create array with write journal (--write-journal DEVICE).
Song Liu [Fri, 9 Oct 2015 05:51:43 +0000 (22:51 -0700)] 
Enable create array with write journal (--write-journal DEVICE).

Specify the write journal device with --write-journal DEVICE

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Only one journal device is allowed. If multiple --write-journal
are given, mdadm will use the first and ignore others

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1 --write-journal /dev/sdx
mdadm: Please specify only one journal device for the array.
mdadm: Ignoring --write-journal /dev/sdx...
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoShow device as journal in --detail --examine
Song Liu [Fri, 9 Oct 2015 05:51:42 +0000 (22:51 -0700)] 
Show device as journal in --detail --examine

Example output:

./mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Wed May 13 17:01:12 2015
     Raid Level : raid5
     Array Size : 11720662464 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 3906887488 (3725.90 GiB 4000.65 GB)
   Raid Devices : 4
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed May 13 17:01:12 2015
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 32K

           Name : 0
           UUID : 8fb9ee05:3831d52f:e5c23825:28cd6881
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf

       4       8       17        -      journal   /dev/sdb1

./mdadm -E /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x201
     Array UUID : 562b2334:35b9bcc1:add50892:1f30c4bd
           Name : 0
  Creation Time : Thu Aug 27 12:55:26 2015
     Raid Level : raid5
   Raid Devices : 15

 Avail Dev Size : 249796608 (119.11 GiB 127.90 GB)
     Array Size : 54696423936 (52162.57 GiB 56009.14 GB)
  Used Dev Size : 7813774848 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : 5015e522:d39ba566:5909cf3c:9c51f2ff

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Aug 27 13:16:55 2015
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4e6fd76d - correct
         Events : 262

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Journal
   Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoadd macros for MD_DISK_ROLE_(SPARE/FAULTY)
Song Liu [Fri, 9 Oct 2015 05:51:41 +0000 (22:51 -0700)] 
add macros for MD_DISK_ROLE_(SPARE/FAULTY)

Replace special disk roles (0xffff, 0xfffe) with macros:

define MD_DISK_ROLE_SPARE      0xffff
define MD_DISK_ROLE_FAULTY     0xfffe

Will add macro for journal device in next patch:
define MD_DISK_ROLE_JOURNAL    0xfffd

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoimsm: don't call abort_reshape() in imsm_manage_reshape()
Artur Paszkiewicz [Mon, 5 Oct 2015 13:18:11 +0000 (15:18 +0200)] 
imsm: don't call abort_reshape() in imsm_manage_reshape()

Calling abort_reshape() in imsm_manage_reshape() is unnecessary in case
of an error because it is handled by reshape_array(). Calling it when
reshape completes successfully is also unnecessary and leads to a race
condition:
- reshape ends
- mdadm calls abort_reshape() -> sets sync_action to idle
- MD_RECOVERY_INTR is set and md_reap_sync_thread() does not finish the
  reshape

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Konrad Dabrowski <konrad.dabrowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agore-add: make re-add try to write sysfs node first
Guoqing Jiang [Wed, 7 Oct 2015 02:06:54 +0000 (10:06 +0800)] 
re-add: make re-add try to write sysfs node first

If sysfs node existed, we should try to write "re-add" to it.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMerge branch 'fix' of git://github.com/ldzhong/mdadm
NeilBrown [Wed, 30 Sep 2015 22:30:58 +0000 (08:30 +1000)] 
Merge branch 'fix' of git://github.com/ldzhong/mdadm

8 years agomdadm: make cluster raid also could support re-add
Guoqing Jiang [Thu, 20 Aug 2015 05:56:31 +0000 (13:56 +0800)] 
mdadm: make cluster raid also could support re-add

If it is a cluster raid, the disc.state need to be
changed accordingly when do re-add.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoFix --incremental handling on cluster array.
Goldwyn Rodrigues [Wed, 26 Aug 2015 16:35:21 +0000 (11:35 -0500)] 
Fix --incremental handling on cluster array.

Commit 06bd679317a2 ("Skip clustered devices in incremental")
disabled incremental completely on clustered arrays.
What we really want is that mdadm should not start or create
a clustered array but still be able to add or readd to an existing
device. This would enable udev scripts to automatically add
or re-add a device after transient errors.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agosuper1: Do not create bad block log for clustered devices.
NeilBrown [Mon, 28 Sep 2015 01:49:53 +0000 (11:49 +1000)] 
super1: Do not create bad block log for clustered devices.

We currently have no synchronization techniques for the bad
block log, so disable it for the cluster.

Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoIncrement version for clustered bitmaps
Goldwyn Rodrigues [Tue, 18 Aug 2015 21:38:27 +0000 (07:38 +1000)] 
Increment version for clustered bitmaps

Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels
to assemble a clustered device.

In order to maximize compatibility, the major version is set to
BITMAP_MAJOR_CLUSTERED *only* if the bitmap is clustered.

Also, added MD_FEATURE_CLUSTERED in order to return error
for older kernels which would assemble MD in case bitmap is
corrupted.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: remove duplicate logic when c.delay is 0
Lidong Zhong [Wed, 26 Aug 2015 06:01:52 +0000 (14:01 +0800)] 
mdadm: remove duplicate logic when c.delay is 0

8 years agoMakefile: test -s flag and suppress echo when set.
NeilBrown [Wed, 5 Aug 2015 05:10:43 +0000 (15:10 +1000)] 
Makefile: test -s flag and suppress echo when set.

Some rules do their own tracing and so aren't affected
by -s.
So add a test for -s in MAKE_FLAGS and avoid echo when present.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: raid6 repair is now tested on every different layout.
NeilBrown [Mon, 20 Jul 2015 04:17:28 +0000 (14:17 +1000)] 
tests: raid6 repair is now tested on every different layout.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: correctly capture error from ->write_bitmap
NeilBrown [Wed, 5 Aug 2015 04:55:31 +0000 (14:55 +1000)] 
Assemble: correctly capture error from ->write_bitmap

else 'err' might be undefined.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomain: remove use of uninitialized 'rv'.
NeilBrown [Wed, 5 Aug 2015 04:53:33 +0000 (14:53 +1000)] 
main: remove use of uninitialized 'rv'.

If c.homecluster was not NULL, might get an
error anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check: don't ignore return value from posix_memalign.
NeilBrown [Wed, 5 Aug 2015 04:50:34 +0000 (14:50 +1000)] 
raid6check: don't ignore return value from posix_memalign.

Compilers don't like that.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoMerge branch 'mdadm-3.3.x'
NeilBrown [Mon, 3 Aug 2015 06:21:37 +0000 (16:21 +1000)] 
Merge branch 'mdadm-3.3.x'

8 years agoRelease mdadm-3.3.4 mdadm-3.3.x mdadm-3.3.4
NeilBrown [Mon, 3 Aug 2015 06:17:13 +0000 (16:17 +1000)] 
Release mdadm-3.3.4

Important bugfix release.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: really don't assemble IMSM array without OROM.
NeilBrown [Mon, 3 Aug 2015 06:06:51 +0000 (16:06 +1000)] 
Assemble: really don't assemble IMSM array without OROM.

Previous patch missed on case.

Also print more useful information when rejecting
a device with IMSM metadata.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: include mapfile support.
NeilBrown [Mon, 3 Aug 2015 01:54:16 +0000 (11:54 +1000)] 
mdassemble: include mapfile support.

This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: don't assemble IMSM array without OROM.
NeilBrown [Wed, 29 Jul 2015 04:38:37 +0000 (14:38 +1000)] 
Assemble: don't assemble IMSM array without OROM.

If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: include mapfile support.
NeilBrown [Mon, 3 Aug 2015 01:54:16 +0000 (11:54 +1000)] 
mdassemble: include mapfile support.

This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: don't try to perform cluster check.
NeilBrown [Mon, 3 Aug 2015 01:53:01 +0000 (11:53 +1000)] 
mdassemble: don't try to perform cluster check.

mdassemble is meant to be small an simple, so avoid
trying to check for a cluster.
Currently it doesn't, but it still includes the code,
which doesn't build because the library isn't provided.

So just exclude the get_cluster_name code from mdassemble.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomd-cluster: use %-64s to print cluster_name
Guoqing Jiang [Mon, 6 Jul 2015 08:52:11 +0000 (16:52 +0800)] 
md-cluster: use %-64s to print cluster_name

Left align is better for cluster with name less than 64. Also
make the output of cluster name is aligned with others.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: fix wrong condition for go to abort
Guoqing Jiang [Mon, 6 Jul 2015 08:52:10 +0000 (16:52 +0800)] 
mdadm: fix wrong condition for go to abort

When parse_cluster_confirm_arg return 0, it means the
arg are parsed successfully, so change !rv to rv.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: don't assemble IMSM array without OROM.
NeilBrown [Wed, 29 Jul 2015 04:38:37 +0000 (14:38 +1000)] 
Assemble: don't assemble IMSM array without OROM.

If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoMerge branch 'cluster'
NeilBrown [Mon, 27 Jul 2015 01:01:08 +0000 (11:01 +1000)] 
Merge branch 'cluster'

Now that 3.3.3 is out, it is time to include the cluster-support code.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoRelease mdadm-3.3.3 mdadm-3.3.3
NeilBrown [Fri, 24 Jul 2015 05:35:53 +0000 (15:35 +1000)] 
Release mdadm-3.3.3

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdassemble: add "Name" definition.
NeilBrown [Fri, 24 Jul 2015 06:18:13 +0000 (16:18 +1000)] 
mdassemble: add "Name" definition.

That allows it to compile again :-(

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoDon't ignore return value from read and write
NeilBrown [Fri, 24 Jul 2015 06:11:23 +0000 (16:11 +1000)] 
Don't ignore return value from read and write

New gcc sometimes complains about this.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agobitmap: convert "inline" to "static inline"
NeilBrown [Fri, 24 Jul 2015 06:10:44 +0000 (16:10 +1000)] 
bitmap: convert "inline" to "static inline"

Otherwise new gcc ignores them with some compile options.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: extend --homehost='<ignore>' to allow --name= to ignore homehost
NeilBrown [Fri, 24 Jul 2015 02:50:54 +0000 (12:50 +1000)] 
Assemble: extend --homehost='<ignore>' to allow --name= to ignore homehost

Also make --homehost='<ignore>' work properly.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: assume recovery has completed if sync_completed says so.
NeilBrown [Thu, 23 Jul 2015 01:17:10 +0000 (11:17 +1000)] 
test: assume recovery has completed if sync_completed says so.

The final completion of a recovery can be delayed, so use
sync_completed to check if it is finished, just not been reaped.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: flushbufs after writing zeros
NeilBrown [Thu, 23 Jul 2015 01:09:19 +0000 (11:09 +1000)] 
tests: flushbufs after writing zeros

sometimes the removed device is re-added before the writes
get all the way to the md device - so the array doesn't need
any recovery and the test fails.
So flush first to be safe.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: add -F flag to mkfs
NeilBrown [Tue, 21 Jul 2015 23:58:41 +0000 (09:58 +1000)] 
test: add -F flag to mkfs

newer versions of mkfs.extX ask before creating a filesystem
on a device which appears to already have a filesystem.
We don't want that, so add the -F flag.
Also be explicit about fs type as one shouldn't depend on defaults.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agomdadm: document --homehost=any functionality.
NeilBrown [Tue, 21 Jul 2015 23:33:17 +0000 (09:33 +1000)] 
mdadm: document --homehost=any functionality.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoAssemble: improve tests for matching --name= request.
NeilBrown [Tue, 21 Jul 2015 23:24:36 +0000 (09:24 +1000)] 
Assemble: improve tests for matching --name= request.

If the name in the array has a home-host, then
require that it matches, or is "any", or requested
homehost is "any".

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check: use O_DIRECT instead of O_SYNC.
NeilBrown [Mon, 20 Jul 2015 07:17:37 +0000 (17:17 +1000)] 
raid6check: use O_DIRECT instead of O_SYNC.

O_DIRECT is more direct and is faster.
This requires aligned memory allocation, but that isn't hard.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agorestripe: fix data block order in raid6_2_data_recov
NeilBrown [Mon, 20 Jul 2015 07:15:13 +0000 (17:15 +1000)] 
restripe: fix data block order in raid6_2_data_recov

... rather than relying on the caller getting them in the
correct order.
This is better engineering and fixes a bug, but because the
failed_slotX numbers are used later with assumption that
they weren't swapped

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: various cleanup/fixes
NeilBrown [Mon, 20 Jul 2015 04:11:33 +0000 (14:11 +1000)] 
raid6check: various cleanup/fixes

- document meaning of various arrays. In particular:
   stripes[]
   blocks[]
   blocks_page[]
   block_index_for_slot[]

  It needs to be clear if these are indexed by raid_disk
  number or syndrome number.

- changed meaning of block_index_for_slot[].  It didn't seem
  to be used consistently.  It also made use of the block numbers
  in array data ordering, which is not directly relevant for syndrome
  calculations.

- reduced number of args to autorepair and manual_repair
  There don't need both stripes[] and blocks[].  And they don't need
  diskP or diskQ.
  blocks[-1] is the P chunk, blocks[-2] is the Q chunk.
  block_index_for_slot[] can be used to find the target device for
  a particular syndrome block.

- remove stripe locking from within manual_repair, and instead
  use the global stripe locking used for check and autorepair.

- this necessitated changes to raid6_datap_recov and raid5_2data_reov
  so the P and Q blocks could be before or after the data blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoAssemble: really ensure stripe_cache is bit enough to handle new chunk size
NeilBrown [Fri, 17 Jul 2015 03:10:25 +0000 (13:10 +1000)] 
Assemble: really ensure stripe_cache is bit enough to handle new chunk size

Earlier patch:
  56fcbcbb6f17df0e5dedf59744deee037c5d5fbd
calculated the proper chunk size - but didn't use it..

Let's actually use it this time.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agoraid6check
NeilBrown [Thu, 16 Jul 2015 01:55:27 +0000 (11:55 +1000)] 
raid6check

fix checking of DDF layouts.

Stuff probably still broken.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: get device ordering correct for syndrome calculation.
NeilBrown [Thu, 16 Jul 2015 01:25:40 +0000 (11:25 +1000)] 
raid6check: get device ordering correct for syndrome calculation.

The order of devices used for the syndrome calculation is not
the same as the order of data in the array.
The D block immediately after Q is first, then they continue
cyclicly in raid-disk order, skipping over the P disk if it is seen.

This gets the 'check' right for all layouts other than DDF, which is
quite different.

I haven't confirmed that this does't break repair.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: slow down --stop a bit to allow revert-inplace to work.
NeilBrown [Wed, 15 Jul 2015 23:27:58 +0000 (09:27 +1000)] 
tests: slow down --stop a bit to allow revert-inplace to work.

revert-inplace would sometimes find that the original reshape had
finished.
So slow down the reshaping during --stop (which needs to be a little
bit fast so that stop doesn't timeout waiting) and don't wait quite
so long before stopping.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: add 19raid6check
NeilBrown [Wed, 15 Jul 2015 22:02:52 +0000 (08:02 +1000)] 
tests: add 19raid6check

This checks that raid6check finds no errors in newly created array
with all different layouts.
(it doesn't...)

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotest: clear out old metadata from loop devices.
NeilBrown [Wed, 15 Jul 2015 21:49:14 +0000 (07:49 +1000)] 
test: clear out old metadata from loop devices.

Old metadata can tempt udev to assemble things, which
just gets in the way.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agoraid6check: report role of suspect device.
NeilBrown [Fri, 10 Jul 2015 04:46:59 +0000 (14:46 +1000)] 
raid6check: report role of suspect device.

i.e. -2 for Q, -1 for P, 0-N for data.

Signed-off-by: NeilBrown <neilb@suse.de>
8 years agotests: save failure logs to logdir
NeilBrown [Fri, 10 Jul 2015 04:44:58 +0000 (14:44 +1000)] 
tests: save failure logs to logdir

If --save-logs is given we already save all logs to --logdir
If not, we should still save erroneous logs to --logdir.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotests: do not try to 'flushbufs' after stopping a array
NeilBrown [Fri, 10 Jul 2015 04:42:20 +0000 (14:42 +1000)] 
tests: do not try to 'flushbufs' after stopping a array

If the array is stopped, there is nothing to flush, and
blockdev can signal an error.

Signed-off-by: NeilBrown <neilb@suse.com>
8 years agotest: add dmesg output to logs on error.
NeilBrown [Mon, 6 Jul 2015 03:59:33 +0000 (13:59 +1000)] 
test: add dmesg output to logs on error.

This can help isolate the problem.

Signed-off-by: NeilBrown <neilb@suse.de>