]> git.ipfire.org Git - thirdparty/mdadm.git/log
thirdparty/mdadm.git
7 years agoGrow: support consistency policy change
Artur Paszkiewicz [Wed, 29 Mar 2017 09:54:20 +0000 (11:54 +0200)] 
Grow: support consistency policy change

Extend the --consistency-policy parameter to work also in Grow mode.
Using it changes the currently active consistency policy in the kernel
driver and updates the metadata to make this change permanent. Currently
this supports only changing between "ppl" and "resync" policies, that is
enabling or disabling PPL at runtime.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoAdd 'ppl' and 'no-ppl' options for --update=
Artur Paszkiewicz [Wed, 29 Mar 2017 09:54:19 +0000 (11:54 +0200)] 
Add 'ppl' and 'no-ppl' options for --update=

This can be used with --assemble for super1 and with --update-subarray
for imsm to enable or disable PPL in the metadata.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agosuper1: PPL support
Artur Paszkiewicz [Wed, 29 Mar 2017 09:54:18 +0000 (11:54 +0200)] 
super1: PPL support

Enable creating and assembling raid5 arrays with PPL for 1.x metadata.

When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.

While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoimsm: PPL support
Artur Paszkiewicz [Wed, 29 Mar 2017 09:54:17 +0000 (11:54 +0200)] 
imsm: PPL support

Enable creating and assembling IMSM raid5 arrays with PPL. Update the
IMSM metadata format to include new fields used for PPL.

Add structures for PPL metadata. They are used also by super1 and shared
with the kernel, so put them in md_p.h.

Write the initial empty PPL header when creating an array. When
assembling an array with PPL, validate the PPL header and in case it is
not correct allow to overwrite it if --force was provided.

Write the PPL location and size for a device to the new rdev sysfs
attributes 'ppl_sector' and 'ppl_size'. Enable PPL in the kernel by
writing to 'consistency_policy' before the array is activated.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoDetail: show consistency policy
Artur Paszkiewicz [Wed, 29 Mar 2017 09:54:16 +0000 (11:54 +0200)] 
Detail: show consistency policy

Show the currently enabled consistency policy in the output from
--detail. Add 3 spaces to all existing items in Detail output to align
with "Consistency Policy : ".

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoGeneric support for --consistency-policy and PPL
Artur Paszkiewicz [Wed, 29 Mar 2017 09:54:15 +0000 (11:54 +0200)] 
Generic support for --consistency-policy and PPL

Add a new parameter to mdadm: --consistency-policy=. It determines how
the array maintains consistency in case of unexpected shutdown. This
maps to the md sysfs attribute 'consistency_policy'. It can be used to
create a raid5 array using PPL. Add the necessary plumbing to pass this
option to metadata handlers. The write journal and bitmap
functionalities are treated as different policies, which are implicitly
selected when using --write-journal or --bitmap options.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoDetail: handle non-existent arrays better.
NeilBrown [Mon, 27 Mar 2017 03:36:56 +0000 (14:36 +1100)] 
Detail: handle non-existent arrays better.

If you call "mdadm --detail" with a device file for an array which
doesn't exist, such as by
  mknod /dev/md57 b 9 57
  mdadm --detail /dev/md57

you get an unhelpful message about and inactive RAID0, and return
status is '0'.  This is confusing.

So catch this possibility and print a more useful message, and
return a non-zero status.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoAdd 'force' flag to *hot_remove_disk().
NeilBrown [Mon, 27 Mar 2017 03:36:56 +0000 (14:36 +1100)] 
Add 'force' flag to *hot_remove_disk().

In rare circumstances, the short period that *hot_remove_disk()
waits isn't long enough to IO to complete.  This particularly happens
when a device is failing and many retries are still happening.

We don't want to increase the normal wait time for "mdadm --remove"
as that might be use just to test if a device is active or not, and a
delay would be problematic.
So allow "--force" to mean that mdadm should try extra hard for a
--remove to complete, waiting up to 5 seconds.

Note that this patch fixes a comment which claim the previous
wait time was half a second, where it was really 50msec.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoIntroduce sys_hot_remove_disk()
NeilBrown [Mon, 27 Mar 2017 03:36:56 +0000 (14:36 +1100)] 
Introduce sys_hot_remove_disk()

The new hot_remove_disk() will retry HOT_REMOVE_DISK
several times in the face of EBUSY.
However we sometimes remove a device by writing "remove" to the
"state" attributed.  This should be retried as well.
So introduce sys_hot_remove_disk() to repeat this action a few times.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm/Build:check the level parameter when build new array
Zhilong Liu [Tue, 28 Mar 2017 13:52:27 +0000 (21:52 +0800)] 
mdadm/Build:check the level parameter when build new array

check if user forgets to specify the --level
when build a new array. such as:
./mdadm -B /dev/md0 -n2 /dev/loop[0-1]

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoRetry HOT_REMOVE_DISK a few times.
NeilBrown [Mon, 27 Mar 2017 01:50:16 +0000 (12:50 +1100)] 
Retry HOT_REMOVE_DISK a few times.

HOT_REMOVE_DISK can fail with EBUSY if there are outstanding
IO request that have not completed yet.  It can sometimes
be helpful to wait a little while for these to complete.

We already do this in impose_level() when reshaping a device,
but not in Manage.c in response to an explicit --remove request.

So create hot_remove_disk() to central this code, and call it
where-ever it makes sense to wait for a HOT_REMOVE_DISK to succeed.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoudev-md-raid-assembly.rules: Skip non-ready devices
Hannes Reinecke [Mon, 27 Mar 2017 00:15:44 +0000 (11:15 +1100)] 
udev-md-raid-assembly.rules: Skip non-ready devices

If a device isn't fully initialized (e.g if it should be
handled by multipathing) it should not be considered for
md/RAID auto-assembly.  Doing so can cause incorrect results
such as causing multipath to fail during startup.

There is a convention that the udev environment variable
SYSTEMD_READY be set to zero for such devices.  So change
the mdadm rules to ignore devices with SYSTEMD_READY==0.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm/bitmap:fixed typos in comments of bitmap.h
Zhilong Liu [Mon, 20 Mar 2017 10:46:39 +0000 (18:46 +0800)] 
mdadm/bitmap:fixed typos in comments of bitmap.h

bitmap.h: fixed trivial typos in comments

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agosuper1: ignore failfast flag for setting device role
Gioh Kim [Mon, 20 Mar 2017 09:51:56 +0000 (10:51 +0100)] 
super1: ignore failfast flag for setting device role

There is corner case for setting device role,
if new device has failfast flag.
The failfast flag should be ignored.

Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm: Forced type conversion to avoid truncation
Xiao Ni [Sat, 18 Mar 2017 02:33:45 +0000 (10:33 +0800)] 
mdadm: Forced type conversion to avoid truncation

Gcc reports it needs 19 bytes to right to disk->serial. Because the
type of argument i is int. But the meaning of i is failed disk
number. So it doesn't need to use 19 bytes.  Just add a type
conversion to avoid this building error

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoReplace snprintf with strncpy at some places to avoid truncation
Xiao Ni [Sat, 18 Mar 2017 02:33:44 +0000 (10:33 +0800)] 
Replace snprintf with strncpy at some places to avoid truncation

In gcc7 there are some building errors like:
directive output may be truncated writing up to 31 bytes into a region of size 24
snprintf(str, MPB_SIG_LEN, %s, mpb->sig);

It just need to copy one string to target. So use strncpy to replace it.

For this line code: snprintf(str, MPB_SIG_LEN, %s, mpb->sig);
Because mpb->sig has the content of version after magic, so
it's better to use strncpy to replace snprintf too.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm/Monitor: Fix NULL pointer dereference when stat2devnm return NULL
Zhilong Liu [Mon, 20 Mar 2017 05:21:41 +0000 (13:21 +0800)] 
mdadm/Monitor: Fix NULL pointer dereference when stat2devnm return NULL

Wait(): stat2devnm() returns NULL for non block devices. Check the
pointer is valid derefencing it. This can happen when using --wait,
such as the 'f' and 'd' file type, causing a core dump.
such as: ./mdadm --wait /dev/md/

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm/mdmon:deleted the abort_reshape never invoked
Zhilong Liu [Mon, 20 Mar 2017 05:21:24 +0000 (13:21 +0800)] 
mdadm/mdmon:deleted the abort_reshape never invoked

mdmon.c: abort_reshape() has implemented in Grow.c,
this function doesn't make a lot of sense here.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm:it doesn't make sense to set --bitmap twice
Zhilong Liu [Mon, 20 Mar 2017 05:21:03 +0000 (13:21 +0800)] 
mdadm:it doesn't make sense to set --bitmap twice

mdadm.c: it doesn't make sense to set --bitmap twice.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm:fixed some trivial typos in comments of mdadm.h
Zhilong Liu [Mon, 20 Mar 2017 05:20:06 +0000 (13:20 +0800)] 
mdadm:fixed some trivial typos in comments of mdadm.h

mdadm.h: fixed some trivial typos in comments

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm: Specify enough length when write to buffer
Xiao Ni [Fri, 17 Mar 2017 11:55:43 +0000 (19:55 +0800)] 
mdadm: Specify enough length when write to buffer

In Detail.c the buffer path in function Detail is defined as path[200],
in fact the max lenth of content which needs to write to the buffer is
287. Because the length of dname of struct dirent is 255.
During building it reports error:
error: ā€˜%sā€™ directive writing up to 255 bytes into a region of size 189
[-Werror=format-overflow=]

In function examine_super0 there is a buffer nb with length 5.
But it need to show a int type argument. The lenght of max
number of int is 10. So the buffer length should be 11.

In human_size function the length of buf is 30. During building
there is a error:
output between 20 and 47 bytes into a destination of size 30.
Change the length to 47.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm: Add Wimplicit-fallthrough=0 in Makefile
Xiao Ni [Fri, 17 Mar 2017 11:55:42 +0000 (19:55 +0800)] 
mdadm: Add Wimplicit-fallthrough=0 in Makefile

There are many errors like 'error: this statement may fall through'.
But the logic is right. So add the flag Wimplicit-fallthrough=0
to disable the error messages. The method I use is from
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
#index-Wimplicit-fallthrough-375

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm:add checking clustered bitmap in assemble mode
Zhilong Liu [Tue, 7 Mar 2017 03:13:03 +0000 (11:13 +0800)] 
mdadm:add checking clustered bitmap in assemble mode

mdadm:Both clustered and internal array don't need
to specify --bitmap when assembling array.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm:add man page for --symlinks
Zhilong Liu [Mon, 6 Mar 2017 02:39:57 +0000 (10:39 +0800)] 
mdadm:add man page for --symlinks

In build and create mode:
--symlinks
Auto creation of symlinks in /dev to /dev/md, option --symlinks
must be 'no' or 'yes' and work with --create and --build.
In assemble mode:
--symlinks
See this option under Create and Build options.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoexamine: tidy up some code.
NeilBrown [Thu, 2 Mar 2017 23:57:00 +0000 (10:57 +1100)] 
examine: tidy up some code.

Michael Shigorin reports that the 'lcc' compiler isn't able
to deduce that 'st' must be initialized in

if (c->SparcAdjust)
st->ss->update_super(st, NULL, "sparc2.2",

just because the only times it isn't initialised, 'err' is set non-zero.

This results in a 'possibly uninitialised' warning.
While there is no bug in the code, this does suggest that maybe
the code could be made more obviously correct.

So this patch:
 1/ moves the "err" variable inside the for loop, so an error in
    one device doesn't stop the other devices from being processed
 2/ calls 'continue' early if the device cannot be opened, so that
    a level of indent can be removed, and so that it is clear that
    'st' is always initialised before being used
 3/ frees 'st' if an error occured in load_super or load_container.

Reported-by: Michael Shigorin <mike@altlinux.org>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm:check the nodes when operate clustered array
Zhilong Liu [Wed, 1 Mar 2017 10:42:33 +0000 (18:42 +0800)] 
mdadm:check the nodes when operate clustered array

It doesn't make sense to write_bitmap with less than 2 nodes,
in order to avoid 'write_bitmap' received invalid nodes number,
it would be better to do checking nodes in getopt operations.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agomdadm:fix typo in comment
Zhilong Liu [Wed, 1 Mar 2017 08:44:33 +0000 (16:44 +0800)] 
mdadm:fix typo in comment

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoFix oddity where mdadm did not recognise a relative path
Wol [Tue, 17 Jan 2017 17:47:05 +0000 (17:47 +0000)] 
Fix oddity where mdadm did not recognise a relative path

mdadm assumed that a pathname started with a "/", while an array
name didn't. This alters the logic so that if the first character
is not a "/" it tries to open an array, and if that fails it drops
through to the pathname code rather than terminating immediately
with an error.

Signed-off-by: Wol <anthony@youngman.org.uk>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoimsm: fix missing error message during migration
Pawel Baldysiak [Tue, 24 Jan 2017 13:29:33 +0000 (14:29 +0100)] 
imsm: fix missing error message during migration

If user tries to migrate from raid0 to raid5 and there is no spare
drive to perform it - mdadm will exit with errorcode, but
no error message is printed.

Print error instead of debug message when this condition occurs,
so user is informed why requested migration is not started.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoMakefile: Fix date to be output in ISO format
Jes Sorensen [Tue, 10 Jan 2017 23:51:40 +0000 (18:51 -0500)] 
Makefile: Fix date to be output in ISO format

Updated the static version in the release, but forgot to fix the
Makefile generated version when extracting from git

Reported-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoRelease mdadm-4.0 mdadm-4.0
Jes Sorensen [Mon, 9 Jan 2017 21:44:39 +0000 (16:44 -0500)] 
Release mdadm-4.0

My first release!

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoimsm: show correct size for arrays with 4k disks
Maksymilian Kunt [Mon, 9 Jan 2017 14:16:04 +0000 (15:16 +0100)] 
imsm: show correct size for arrays with 4k disks

Number of blocks used to calculate array size is based on 512 block size
so the size displayed is incorrect for arrays with 4k disks.

Signed-off-by: Maksymilian Kunt <maksymilian.kunt@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoAdd detail information when can not connect monitor
Xiao Ni [Sun, 8 Jan 2017 02:59:54 +0000 (10:59 +0800)] 
Add detail information when can not connect monitor

If it can't connect monitor, now the error message is just
Error waiting for xxx to be clean. Add detail error message
in connect_monitor.

Suggested-by: Oleg Samarin <osamarin68@gmail.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoimsm: count arrays under VMD HBAs correctly
Alexey Obitotskiy [Mon, 9 Jan 2017 12:12:22 +0000 (13:12 +0100)] 
imsm: count arrays under VMD HBAs correctly

OROM defines maximum number of arrays supported. On array creation mdadm
checks if number of arrays doesn't exceed that limit, however it is not
calculated correctly for VMD now.

The current code performs a lookup of HBA using the id. VMD HBAs have
the same id so each lookup returns the same structure (first
encountered). Take a different approach for VMD HBAs. As id is not
unique and cannot be used for lookups, iterate over all VMD HBAs and
compare both id and HBA path.

Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
7 years agoDon't assume VMD sysfs path ends with a disk entry
Alexey Obitotskiy [Wed, 4 Jan 2017 10:45:24 +0000 (11:45 +0100)] 
Don't assume VMD sysfs path ends with a disk entry

When VMD is enabled but no drive is attached to the PCIe port, mdadm
crashes trying to parse the path. Skip entry if valid path has not been
returned. Do it early to avoid unnecessary memory allocation.

Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Fix signed/unsigned comparisons
Pawel Baldysiak [Tue, 3 Jan 2017 14:20:13 +0000 (15:20 +0100)] 
IMSM: Fix signed/unsigned comparisons

Prior to this patch there was an error during compiling
on 32-bit arch. This patch fixes this issue.

Reported-by: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: enable bad block support for imsm metadata
Tomasz Majchrzak [Wed, 28 Dec 2016 08:38:07 +0000 (09:38 +0100)] 
imsm: enable bad block support for imsm metadata

Enable bad block support for imsm metadata as commit e522751d605d
("seq_file: reset iterator to first record for zero offset") has been
accepted in upstream kernel. Prior to that patch mdmon had not been able
to read bad blocks sysfs file.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Do not update metadata if not able to migrate
Pawel Baldysiak [Thu, 22 Dec 2016 12:10:47 +0000 (13:10 +0100)] 
IMSM: Do not update metadata if not able to migrate

This patch prevents mdadm from updating metadata if migration is
not possible. The same check is done in analyse_change(),
but in that place - metadata is already modified.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoMake get_component_size() work with named array.
NeilBrown [Thu, 22 Dec 2016 02:14:59 +0000 (13:14 +1100)] 
Make get_component_size() work with named array.

get_component_size() still assumes that all array are
 /sys/block/md%d or /sys/block/md_d%d
and so doesn't work with e.g. /sys/block/md_foo.

This cause "mdadm --detail" to report
   Used Dev Size : unknown
and causes problems when added spares and in other circumstances.

So change it to use stat2devnm() which does the right thing with all
types of array names.

Reported-and-tested-by: Robert LeBlanc <robert@leblancnet.us>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdadm: add test case for raid5 write back cache
Song Liu [Fri, 16 Dec 2016 00:00:16 +0000 (16:00 -0800)] 
mdadm: add test case for raid5 write back cache

This test cases checks data integrity of raid5 write back cache
under various scenarios:

degraded mode, non-overwrite, raid-5/6.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoAlways return last partition end address in 512B blocks
Mariusz Dabrowski [Tue, 13 Dec 2016 13:31:02 +0000 (14:31 +0100)] 
Always return last partition end address in 512B blocks

For 4K disks 'endofpart' is an index of the last 4K sector used by partition.
mdadm is using number of 512-byte sectors, so value returned by
get_last_partition_end must be multiplied by 8 for devices with 4K sectors.
Also, unused 'ret' variable has been removed.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoUse disk sector size value to set offset for reading GPT
Mariusz Dabrowski [Thu, 8 Dec 2016 11:13:15 +0000 (12:13 +0100)] 
Use disk sector size value to set offset for reading GPT

mdadm is using invalid byte-offset while reading GPT header to get
partition info (size, first sector, last sector etc.). Now this offset
is hardcoded to 512 bytes and it is not valid for disks with sector
size different than 512 bytes because MBR and GPT headers are aligned
to LBA, so valid offset for 4k drives is 4096 bytes.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: set generation number when reading superblock
Mariusz Dabrowski [Thu, 8 Dec 2016 11:12:48 +0000 (12:12 +0100)] 
imsm: set generation number when reading superblock

IMSM doesn't set 'events' field with generation number, so sometimes mdadm
tries to re-assembly container using metadata which isn't most recent (e. g.
from spare disk).

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Add support for Non-Intel NVMe drives under VMD
Pawel Baldysiak [Mon, 12 Dec 2016 10:28:44 +0000 (11:28 +0100)] 
IMSM: Add support for Non-Intel NVMe drives under VMD

This patch adds checking if platform (preOS) supports
non-Intel NVMe drives under VMD domain,
and - if so - allow creating IMSM Raid Volume
with those drives.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdopen: open md devices O_RDONLY
NeilBrown [Mon, 5 Dec 2016 06:27:03 +0000 (17:27 +1100)] 
mdopen: open md devices O_RDONLY

There is no need to request write access when opening
the md device, as we never write to it, and none of the
ioctls we use require write access.

If we do open with write access, then when we close, udev notices that
the device was closed after being open for write access, and it
generates a CHANGE event.

This is generally unwanted, and particularly problematic when mdadm is
trying to --stop the array, as the CHANGE event can cause the array to
be re-opened before it completely closed, which results in a new mddev
being allocated.

So just use O_RDONLY instead of O_RDWR.

Reported-by: Marc Smith <marc.smith@mcc.edu>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: 4kn support for bad block log
Tomasz Majchrzak [Tue, 29 Nov 2016 13:02:35 +0000 (14:02 +0100)] 
imsm: 4kn support for bad block log

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: implement "--examine-badblocks" command
Tomasz Majchrzak [Tue, 29 Nov 2016 13:02:34 +0000 (14:02 +0100)] 
imsm: implement "--examine-badblocks" command

Implement "--examine-badblocks" command to provide list of bad blocks in
metadata for a disk.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: provide list of bad blocks for an array
Tomasz Majchrzak [Fri, 2 Dec 2016 12:54:15 +0000 (13:54 +0100)] 
imsm: provide list of bad blocks for an array

Provide list of bad blocks using memory allocated in advance so it's
safe to call it from monitor.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: clear bad blocks if disk becomes unavailable
Tomasz Majchrzak [Tue, 29 Nov 2016 13:02:32 +0000 (14:02 +0100)] 
imsm: clear bad blocks if disk becomes unavailable

If a disk fails or goes missing, clear the bad blocks associated with it
from metadata. If necessary, update disk ordinals.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: clear bad block from bad block log
Tomasz Majchrzak [Tue, 29 Nov 2016 13:02:31 +0000 (14:02 +0100)] 
imsm: clear bad block from bad block log

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: record new bad block in bad block log
Tomasz Majchrzak [Tue, 29 Nov 2016 13:02:30 +0000 (14:02 +0100)] 
imsm: record new bad block in bad block log

Check for a duplicate first or try to merge it with existing bad block.
If block range exceeds BBM_LOG_MAX_LBA_ENTRY_VAL (256) blocks, it must
be split into multiple ranges. Fail if maximum number of bad blocks has
been already reached.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: give md list of known bad blocks on startup
Tomasz Majchrzak [Wed, 30 Nov 2016 08:41:16 +0000 (09:41 +0100)] 
imsm: give md list of known bad blocks on startup

On create set bad block support flag for each drive. On assmble also
provide a list of known bad blocks. Bad blocks are stored in metadata
per disk so they have to be checked against volume boundaries
beforehand.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: write bad block log on metadata sync
Tomasz Majchrzak [Tue, 29 Nov 2016 13:02:28 +0000 (14:02 +0100)] 
imsm: write bad block log on metadata sync

Pre-allocate memory for largest possible bad block section when monitor
is being opened to avoid a need for memory allocation on metadata sync.

If memory for a structure has been allocated in mpb buffer but it hasn't
been used yet, it will be taken by next buffer grow, leading to
insufficient memory on metadata flush. Start tracking such memory and
take it into calculation when growing a buffer. Also assert has been
added to debug mode to warn when more metadata has been written than
memory allocated.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: parse bad block log in metadata on startup
Tomasz Majchrzak [Tue, 29 Nov 2016 15:40:11 +0000 (16:40 +0100)] 
imsm: parse bad block log in metadata on startup

Always allocate memory for all log entries to avoid a need for memory
allocation when monitor requests to record a bad block.

Also some extra checks added to make static code analyzer happy.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIntroduce enum flag_mode for setting and clearing flags.
NeilBrown [Tue, 29 Nov 2016 22:02:11 +0000 (09:02 +1100)] 
Introduce enum flag_mode for setting and clearing flags.

We currently use '1' to indicate that a flag (writemostly or failfast)
needs to be set, and '2' to indicate that it needs to be cleared.

Using magic number like this is not a best-practice.

So replaced them with values from a enum.

No functional change.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdmon: bad block support for external metadata - clear bad blocks
Tomasz Majchrzak [Thu, 27 Oct 2016 08:53:45 +0000 (10:53 +0200)] 
mdmon: bad block support for external metadata - clear bad blocks

If an update of acknowledged bad blocks file is notified, read entire
bad block list from sysfs file and compare it against local list of bad
blocks. If any obsolete entries are found, remove them from metadata.

As mdmon cannot perform any memory allocation, new superswitch method
get_bad_blocks is expected to return a list of bad blocks in metadata
without allocating memory. It's up to metadata handler to allocate all
required memory in advance.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdmon: bad block support for external metadata - store bad blocks
Tomasz Majchrzak [Thu, 27 Oct 2016 08:53:44 +0000 (10:53 +0200)] 
mdmon: bad block support for external metadata - store bad blocks

If md has changed the state to 'blocked' and metadata handler supports
bad blocks, try process them first. If metadata handler has successfully
stored bad block, acknowledge it to md via 'badblocks' sysfs file. If
metadata handler has failed to store the new bad block (ie. lack of
space), remove bad block support for a disk by writing "-external_bbl"
to state sysfs file. If all bad blocks have been acknowledged, request
to unblock the array.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Acked-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdmon: bad block support for external metadata - sysfs file open
Tomasz Majchrzak [Thu, 27 Oct 2016 08:53:43 +0000 (10:53 +0200)] 
mdmon: bad block support for external metadata - sysfs file open

Open 'badblocks' and 'unacknowledged_bad_blocks' sysfs files for each
disk in the array. Add them to the list of files observed by monitor.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdadm: bad block support for external metadata - initialization
Tomasz Majchrzak [Mon, 28 Nov 2016 14:07:05 +0000 (15:07 +0100)] 
mdadm: bad block support for external metadata - initialization

If metadata handler provides support for bad blocks, tell md by writing
'external_bbl' to rdev state file (both on create and assemble),
followed by a list of known bad blocks written via sysfs 'bad_blocks'
file.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Update num_data_stripes during migration
Pawel Baldysiak [Thu, 24 Nov 2016 08:48:24 +0000 (09:48 +0100)] 
IMSM: Update num_data_stripes during migration

This patch adds updataing num_data_stripes during reshape.
Previously this field once set during creation was never updated.
Also, num_data_strips value multipied by chunk_size is used
for set proper component size for RAID5.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Maksymilian Kunt <maksymilian.kunt@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoAdd failfast support.
NeilBrown [Thu, 24 Nov 2016 23:55:49 +0000 (10:55 +1100)] 
Add failfast support.

Allow per-device "failfast" flag to be set when creating an
array or adding devices to an array.

When re-adding a device which had the failfast flag, it can be removed
using --nofailfast.

failfast status is printed in --detail and --examine output.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIncrease buffer for sysfs disk state
Tomasz Majchrzak [Thu, 27 Oct 2016 09:34:16 +0000 (11:34 +0200)] 
Increase buffer for sysfs disk state

Bad block support has incremented sysfs disk state reported by kernel
("external_bbl") so it became longer than 20 bytes. It causes reshape to
fail as it reads truncated entry from sysfs.

Increase buffer so it can accommodate the string including all state
values currently implemented in kernel at the same time.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIncrease buffer for sysfs path
Tomasz Majchrzak [Fri, 28 Oct 2016 08:35:50 +0000 (10:35 +0200)] 
Increase buffer for sysfs path

'unacknowledged_bad_blocks' is a long name for sysfs property and it
makes sysfs path over 50 characters long. Increase buffer to the double
length of the longest path available in sysfs at the moment.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: 4Kn drives support - adapt general migration record
Pawel Baldysiak [Thu, 17 Nov 2016 13:58:38 +0000 (14:58 +0100)] 
IMSM: 4Kn drives support - adapt general migration record

Convert general migration record for 4Kn drives prior to write and post
read. Calculate record location based on sector size, don't just assume
it's 512. Assure buffer address is aligned to 4096 so write operation
avoids caching.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Add support for 4Kn sector size drives
Pawel Baldysiak [Thu, 17 Nov 2016 13:58:37 +0000 (14:58 +0100)] 
IMSM: Add support for 4Kn sector size drives

This patch adds support for drives with 4Kn sector size
for IMSM metadata. Mixing member drives with 4kn and 512
is not allowed. Some offsets were aligned with sector size.
Internal metadata representation and all calculations
are still based on 512-byte sector sizes. This
implementation converts only sector based values
when reading/writing to drive, because they needs to be
stored in metadata according to accual member drive sector size.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Read and store device sector size
Pawel Baldysiak [Thu, 17 Nov 2016 13:58:36 +0000 (14:58 +0100)] 
IMSM: Read and store device sector size

This patch adds retriving device sector size at startup
and set it in intel_super, so it can be used in other places.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoAdd function for getting member drive sector size
Pawel Baldysiak [Thu, 17 Nov 2016 13:58:35 +0000 (14:58 +0100)] 
Add function for getting member drive sector size

This patch introduces the function for getting sector size of
given device (fd).

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper1: fix setting bad block log offset in write_init_super1()
Artur Paszkiewicz [Thu, 10 Nov 2016 10:50:54 +0000 (11:50 +0100)] 
super1: fix setting bad block log offset in write_init_super1()

Commit f79bbf4f6904 ("super1: don't put the bblog at the end of the free
space.") changed the location of the bad block log to be after the
write-intent bitmap, but a fixed offset was used and it can make bbl
overlap with the bitmap, especially when using a small bitmap chunk.
This patch changes it to use the actual offset and size of the bitmap.
It also joins the cases for v1.1 and v1.2 superblock because the code
was very similar.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper1: make internal bitmap size calculations more consistent
Artur Paszkiewicz [Thu, 10 Nov 2016 10:50:53 +0000 (11:50 +0100)] 
super1: make internal bitmap size calculations more consistent

Determining internal bitmap size is performed using two different
functions (bitmap_sectors() and calc_bitmap_size()) and in
getinfo_super1() it is calculated in yet another way. Each of these
methods give slightly different results. The most accurate is
calc_bitmap_size() but it also has a rounding issue. So:

- fix the rounding issue in calc_bitmap_size() using bitmap_bits()
- replace usages of bitmap_sectors() and open-coded calculations with
  calc_bitmap_size()
- remove bitmap_sectors()
- move bitmap_bits() to mdadm.h as inline - otherwise mdassemble won't
  compile (it does not use bitmap.c)

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoLib.c: Fix geting devname for devices with long path
Pawel Baldysiak [Fri, 21 Oct 2016 09:37:51 +0000 (11:37 +0200)] 
Lib.c: Fix geting devname for devices with long path

In scenario where VMD is enabled, and "x8" type of NVMe drive is
plugged into PCIe switch - the path will be longer than 200 chars
(additional VMD domain + 2 level of PCIe switches).
This patch makes the buffer big enough to handle this kind of
configurations.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Enable spanning between VMD domains
Pawel Baldysiak [Fri, 21 Oct 2016 09:37:50 +0000 (11:37 +0200)] 
IMSM: Enable spanning between VMD domains

Each VMD domain adds additional PCI domain. This patch
enables RAID creation with NVMe drives from different
VMD domains.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIMSM: Add warning message when x8-type device is used
Pawel Baldysiak [Mon, 24 Oct 2016 08:19:52 +0000 (10:19 +0200)] 
IMSM: Add warning message when x8-type device is used

This patch adds the warning message when x8-type device
is used with IMSM metadata. x8 device is a special
NVMe drive - two of them on a single PCIe card.
This card could be a single point of failure for
RAID levels different than RAID0. x8 devices have
serial number ending with "-A/-B" or "-1/-2".

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: load migration record from right disk
Tomasz Majchrzak [Mon, 24 Oct 2016 10:00:17 +0000 (12:00 +0200)] 
imsm: load migration record from right disk

Migration record is only stored on disks in first and second metadata
slot. The function to load the record incorrectly passes disk slot as
disk index. If rebuilt has taken place for a container, disk slot
doesn't match disk index so it causes migration record to be read from a
disk it has not been written to. As a result reshape operation fails.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoraid6check.c: fix "misleading-indentation" error
Yilong Ren [Wed, 26 Oct 2016 08:10:38 +0000 (16:10 +0800)] 
raid6check.c: fix "misleading-indentation" error

To fix the following error info:

root@vm-lkp-nex04-8G-7 /tmp/mdadm# make test
cc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"3.4-43-g1dcee1c\" -DVERS_DATE="\"06th April 2016\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\"  -c -o raid6check.o raid6check.c
raid6check.c: In function 'manual_repair':
raid6check.c:267:4: error: this 'else' clause does not guard... [-Werror=misleading-indentation]
    else
    ^~~~
raid6check.c:269:5: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'else'
     printf("Repairing D(%d) and P\n", failed_data);
     ^~~~~~
cc1: all warnings being treated as errors
<builtin>: recipe for target 'raid6check.o' failed
make: *** [raid6check.o] Error 1
root@vm-lkp-nex04-8G-7 /tmp/mdadm#

Cc: NeilBrown <neilb@suse.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Cc: LKP <lkp@eclists.intel.com>
Reviewed-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Yilong Ren <yilongx.ren@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoFix bus error when accessing MBR partition records
James Clarke [Mon, 17 Oct 2016 20:16:01 +0000 (21:16 +0100)] 
Fix bus error when accessing MBR partition records

Since the MBR layout only has partition records as 2-byte aligned, the
32-bit fields in them are not aligned. Thus, they cannot be accessed on
some architectures (such as SPARC) by using a "struct MBR_part_record *"
pointer, as the compiler can assume that the pointer is properly aligned.
Instead, the records must be accessed by going through the MBR struct
itself every time.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper-intel: Reduce excessive parenthesis abuse
Jes Sorensen [Wed, 19 Oct 2016 16:31:00 +0000 (12:31 -0400)] 
super-intel: Reduce excessive parenthesis abuse

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoAllow level migration only for single-array container
Mariusz Dabrowski [Wed, 12 Oct 2016 12:29:42 +0000 (14:29 +0200)] 
Allow level migration only for single-array container

IMSM doesn't allow to change RAID level of array in container with two
arrays but array count check is being done too late (after removing disks)
and in some cases (e. g. RAID 0 and RAID 1 migrated to RAID 0) both arrays
become degraded. This patch adds array count check before disks are being
removed.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: block chunk size change for RAID 10
Mariusz Dabrowski [Wed, 12 Oct 2016 12:28:42 +0000 (14:28 +0200)] 
imsm: block chunk size change for RAID 10

Chunk size change of RAID 10 array fails because it is not supported but
invalid values still are being written to metadata and array cannot be
assembled after stop. Operation should be blocked before metadata update.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper1: make write_bitmap1 compatible with previous mdadm versions
Guoqing Jiang [Wed, 12 Oct 2016 06:24:07 +0000 (02:24 -0400)] 
super1: make write_bitmap1 compatible with previous mdadm versions

For older mdadm version, v1.x metadata has different bitmap_offset,
we can't ensure all the bitmaps are on a 4K boundary since writing
4K for bitmap could corrupt the superblock, and Anthony reported
the bug about it at below link.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837964

So let's check about the alignment for bitmap_offset before set
the boundary to 4096 unconditionally. Thanks for Neil's detailed
explanation.

Reported-by: Anthony DeRobertis <anthony@derobert.net>
Fixes: 95a05b37e8eb ("Create n bitmaps for clustered mode")
Cc: Neil Brown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoFix some issues found by clang
NeilBrown [Fri, 7 Oct 2016 03:55:20 +0000 (14:55 +1100)] 
Fix some issues found by clang

The clang compiler complained about each of these.

The mdmon.h error will only affect 'far' RAID10 arrays using intel or DDF
metadata, and there is no such thing.

The mdopen.c will cause a problem if there are no free md device
numbers in the first 512.  That is fairly unlikely.

The restripe.c error would only affect the 'test_stripe' command, and
probably doesn't change its behaviour.

The super-intel.c fix is purely cosmetic.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: retrieve nvme serial from sysfs
Artur Paszkiewicz [Thu, 6 Oct 2016 09:13:09 +0000 (11:13 +0200)] 
imsm: retrieve nvme serial from sysfs

Don't rely on SCSI ioctl for reading NVMe serials - SCSI emulation for
NVMe devices can be disabled in the kernel config. Instead, try to get a
serial from /sys/block/nvme*/device/serial. If that fails for whatever
reason (i.e. no such attribute in old kernels) - fall back to the SCSI
method.

This also moves some SCSI-specific code from imsm_read_serial() to
scsi_get_serial().

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoFix RAID metadata check
Mariusz Dabrowski [Thu, 22 Sep 2016 07:02:11 +0000 (09:02 +0200)] 
Fix RAID metadata check

mdadm recognizes devices with partition table as part of an RAID array
and invalid warning message is displayed. After this fix proper warning
messages are being displayed for MBR/GPT disks and devices with RAID
metadata.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: remove redundant characters from some error messages
Artur Paszkiewicz [Fri, 16 Sep 2016 13:25:14 +0000 (15:25 +0200)] 
imsm: remove redundant characters from some error messages

Fix the cases that produced messages like "mdadm: : The message".

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoimsm: do not activate spares for uninitialized member arrays
Artur Paszkiewicz [Thu, 15 Sep 2016 07:53:58 +0000 (09:53 +0200)] 
imsm: do not activate spares for uninitialized member arrays

This fixes some issues when a member array is created with "missing"
devices in a container that has more devices than used in the member
array.

Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdadm: fix a buffer overflow
Song Liu [Thu, 8 Sep 2016 18:21:07 +0000 (11:21 -0700)] 
mdadm: fix a buffer overflow

struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdopen: Prevent overrunning the devname buffer when copying devnm into it for long...
Robert LeBlanc [Wed, 24 Aug 2016 16:10:44 +0000 (10:10 -0600)] 
mdopen: Prevent overrunning the devname buffer when copying devnm into it for long md names.

Linux allows for 32 character device names. When using the maximum
size device name and also storing "/dev/", devname needs to be 37
character long to store the complete device name.
i.e. "/dev/md_abcdefghijklmnopqrstuvwxyz12\0"

Signed-off-by: Robert LeBlanc<robert@leblancnet.us>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agobitmap: Mark a number of local functions static
Jes Sorensen [Mon, 15 Aug 2016 20:35:28 +0000 (16:35 -0400)] 
bitmap: Mark a number of local functions static

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agobitmap: Handle errors when reading bitmap info for cluster nodes
Jes Sorensen [Mon, 15 Aug 2016 20:21:33 +0000 (16:21 -0400)] 
bitmap: Handle errors when reading bitmap info for cluster nodes

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agobitmap: Simplify code for bitmap_file_open()
Jes Sorensen [Mon, 15 Aug 2016 20:16:05 +0000 (16:16 -0400)] 
bitmap: Simplify code for bitmap_file_open()

By switching to open+fstat rather than stat+open the code can be
simplified and avoid duplicating the open handling.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper0: Clean up formatting in examine_super0()
Jes Sorensen [Mon, 15 Aug 2016 19:56:23 +0000 (15:56 -0400)] 
super0: Clean up formatting in examine_super0()

No funcionality change - should be purely cosmetic

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper0: Fix spelling of 'version' in comment and fix formatting
Jes Sorensen [Mon, 15 Aug 2016 19:49:59 +0000 (15:49 -0400)] 
super0: Fix spelling of 'version' in comment and fix formatting

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper0: Use random_uuid() in init_super0()
Jes Sorensen [Mon, 15 Aug 2016 19:48:56 +0000 (15:48 -0400)] 
super0: Use random_uuid() in init_super0()

This shaves another 80 bytes off the mdadm binary.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agoIntroduce random_uuid() helper function
Jes Sorensen [Mon, 15 Aug 2016 19:41:34 +0000 (15:41 -0400)] 
Introduce random_uuid() helper function

This gets rid of 5 nearly identical copies of the same code, and
reduces the binary size of mdadm by over 700 bytes on x86_64.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdadm.h: Fix build problem against newer glibc
Jes Sorensen [Mon, 15 Aug 2016 15:30:39 +0000 (11:30 -0400)] 
mdadm.h: Fix build problem against newer glibc

Newer glibc requires direct include of sys/sysmacros.h in order to
access makedev().

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdadm: put journal device in right place of --detail
Song Liu [Fri, 12 Aug 2016 00:14:13 +0000 (17:14 -0700)] 
mdadm: put journal device in right place of --detail

When there is failed HDDs, journal device showed in wrong place
of --detail:

    Number   Major   Minor   RaidDevice State
       4       8       24        -      journal   /dev/sdb8
       1       8       18        1      active sync   /dev/sdb2
       2       8       19        2      active sync   /dev/sdb3
       3       8       21        3      active sync   /dev/sdb5

       0       8       17        -      faulty   /dev/sdb1

This patch fixed the output as:

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       18        1      active sync   /dev/sdb2
       2       8       19        2      active sync   /dev/sdb3
       3       8       21        3      active sync   /dev/sdb5

       0       8       17        -      faulty   /dev/sdb1
       4       8       24        -      journal   /dev/sdb8

Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agomdadm: add man page for --add-journal
Song Liu [Fri, 12 Aug 2016 00:10:04 +0000 (17:10 -0700)] 
mdadm: add man page for --add-journal

Add the following to man page:

--add-journal
      Recreate journal for RAID-4/5/6 array that lost a journal device.
      In the current implementation, this command cannot add a journal
      to an array that had a failed journal. To avoid interrupting
      on-going write opertion --add-journal only works for array in
      Read-Only state.

Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agolib: Various coding style cleanups
Jes Sorensen [Thu, 11 Aug 2016 20:01:00 +0000 (16:01 -0400)] 
lib: Various coding style cleanups

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agolib: Avoid if and return on the same line
Jes Sorensen [Thu, 11 Aug 2016 19:53:29 +0000 (15:53 -0400)] 
lib: Avoid if and return on the same line

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosysfs: Avoid if and return on the same line
Jes Sorensen [Thu, 11 Aug 2016 19:52:48 +0000 (15:52 -0400)] 
sysfs: Avoid if and return on the same line

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
7 years agosuper1: Avoid if and return on the same line
Jes Sorensen [Thu, 11 Aug 2016 19:52:02 +0000 (15:52 -0400)] 
super1: Avoid if and return on the same line

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>