Zhilong Liu [Wed, 12 Apr 2017 08:36:38 +0000 (16:36 +0800)]
mdadm/manpage:update manpage for readonly parameter
update readonly in manpage:
Currently both the readwrite and readonly are worked well,
update the readonly section.
One commit in linux/driver/md. Cleared "MD_CLOSING bit" to Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag") Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
mdopen: use parameters/new_array to create arrays whenever possible.
In a sufficiently recent kernel, an md%d array can be
created by writing to .../parameters/new_array.
If mdadm does this consistently, then another new
feature, disabling create_on_open, can be enabled.
This avoids races on shutdown.
An added benefit of using new_array (where available)
is that it allows md arrays with numbers larger than 511
(e.g. md999) to be created. The old create_on_open
mechanism doesn't support such devices since
Commit: af5628f05db6 ("md: disable probing for md devices 512 and over.")
in Linux 3.17.
After a few more mdadm releases it would be good to
have mdadm disable create_on_open automatically.
Code is 80 characters wide, so lets try to respect that. In addition, we
should never have one-line 'if () action()' statements. Fixup various
whitespace abuse.
mdassemble doesn't handle container based arrays, no support for sysfs,
etc. It has not been actively maintained for years, so time to send it
off to retirement.
Jes Sorensen [Thu, 30 Mar 2017 20:02:36 +0000 (16:02 -0400)]
sysfs: Use the presence of /sys/block/<dev>/md as indicator of valid device
Rather than calling ioctl(RAID_VERSION), use the presence of
/sys/block/<dev>/md as indicator of the device being valid and sysfs
being active for it. The ioctl could return valid data, but sysfs
not mounted, which renders sysfs_init() useless anyway.
Gioh Kim [Thu, 30 Mar 2017 16:58:13 +0000 (18:58 +0200)]
mdadm.c: fix compile error "switch condition has boolean value"
Remove a boolean expression in switch condition
to prevent compile error of some compilers,
for example, gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu2).
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Tomasz Majchrzak [Thu, 30 Mar 2017 14:25:41 +0000 (16:25 +0200)]
imsm: use rounded size for metadata initialization
Array size is rounded to the nearest MB, however number of data stripes
and blocks per disk are calculated using size passed by the user. If
given size is not aligned, there is a mismatch. It's not possible to
assemble raid0 migrated to raid5 since raid5 arrays use number of data
stripes to calculate array size.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Zhilong Liu [Thu, 30 Mar 2017 07:38:08 +0000 (15:38 +0800)]
mdadm/grow: reshape would be stuck from raid1 to raid5
systemctl doesn't interpret mdadm-grow-continue@.service
correctly due to the wrong argument provided in [service],
it should be corrected %I as %i. Otherwise, if the service
cannot start by systemctl and the reshap progress would be
stuck all time when grows array from raid1 to raid5.
Jes Sorensen [Thu, 30 Mar 2017 14:39:29 +0000 (10:39 -0400)]
Grow: Remove unnecessary optimization
Per explanation by Neil, this optimization of writing "size" to the
attribute of each device, however when reducing the size of devices,
the size change isn't permitted until the array has been shrunk, so
this will fail anyway.
Jes Sorensen [Wed, 29 Mar 2017 18:40:36 +0000 (14:40 -0400)]
Incremental: Remove redundant call for GET_ARRAY_INFO
The code above just called md_get_array_info() and only reached this
point if it returned an error that isn't ENODEV, so it's pointless to
check this again here.
In addition it was incorrectly retrieving ioctl data into a
mdu_bitmap_file_t instead of mdu_array_info_t.
Fixes: ("8382f19 Add new mode: --incremental") Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Jes Sorensen [Wed, 29 Mar 2017 18:35:41 +0000 (14:35 -0400)]
util: Introduce md_get_array_info()
Remove most direct ioctl calls for GET_ARRAY_INFO, except for one,
which will be addressed in the next patch.
This is the start of the effort to clean up the use of ioctl calls and
introduce a more structured API, which will use sysfs and fall back to
ioctl for backup.
Extend the --consistency-policy parameter to work also in Grow mode.
Using it changes the currently active consistency policy in the kernel
driver and updates the metadata to make this change permanent. Currently
this supports only changing between "ppl" and "resync" policies, that is
enabling or disabling PPL at runtime.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Enable creating and assembling raid5 arrays with PPL for 1.x metadata.
When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.
While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Enable creating and assembling IMSM raid5 arrays with PPL. Update the
IMSM metadata format to include new fields used for PPL.
Add structures for PPL metadata. They are used also by super1 and shared
with the kernel, so put them in md_p.h.
Write the initial empty PPL header when creating an array. When
assembling an array with PPL, validate the PPL header and in case it is
not correct allow to overwrite it if --force was provided.
Write the PPL location and size for a device to the new rdev sysfs
attributes 'ppl_sector' and 'ppl_size'. Enable PPL in the kernel by
writing to 'consistency_policy' before the array is activated.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Show the currently enabled consistency policy in the output from
--detail. Add 3 spaces to all existing items in Detail output to align
with "Consistency Policy : ".
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Add a new parameter to mdadm: --consistency-policy=. It determines how
the array maintains consistency in case of unexpected shutdown. This
maps to the md sysfs attribute 'consistency_policy'. It can be used to
create a raid5 array using PPL. Add the necessary plumbing to pass this
option to metadata handlers. The write journal and bitmap
functionalities are treated as different policies, which are implicitly
selected when using --write-journal or --bitmap options.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
NeilBrown [Mon, 27 Mar 2017 03:36:56 +0000 (14:36 +1100)]
Add 'force' flag to *hot_remove_disk().
In rare circumstances, the short period that *hot_remove_disk()
waits isn't long enough to IO to complete. This particularly happens
when a device is failing and many retries are still happening.
We don't want to increase the normal wait time for "mdadm --remove"
as that might be use just to test if a device is active or not, and a
delay would be problematic.
So allow "--force" to mean that mdadm should try extra hard for a
--remove to complete, waiting up to 5 seconds.
Note that this patch fixes a comment which claim the previous
wait time was half a second, where it was really 50msec.
NeilBrown [Mon, 27 Mar 2017 03:36:56 +0000 (14:36 +1100)]
Introduce sys_hot_remove_disk()
The new hot_remove_disk() will retry HOT_REMOVE_DISK
several times in the face of EBUSY.
However we sometimes remove a device by writing "remove" to the
"state" attributed. This should be retried as well.
So introduce sys_hot_remove_disk() to repeat this action a few times.
NeilBrown [Mon, 27 Mar 2017 01:50:16 +0000 (12:50 +1100)]
Retry HOT_REMOVE_DISK a few times.
HOT_REMOVE_DISK can fail with EBUSY if there are outstanding
IO request that have not completed yet. It can sometimes
be helpful to wait a little while for these to complete.
We already do this in impose_level() when reshaping a device,
but not in Manage.c in response to an explicit --remove request.
So create hot_remove_disk() to central this code, and call it
where-ever it makes sense to wait for a HOT_REMOVE_DISK to succeed.
If a device isn't fully initialized (e.g if it should be
handled by multipathing) it should not be considered for
md/RAID auto-assembly. Doing so can cause incorrect results
such as causing multipath to fail during startup.
There is a convention that the udev environment variable
SYSTEMD_READY be set to zero for such devices. So change
the mdadm rules to ignore devices with SYSTEMD_READY==0.
Gioh Kim [Mon, 20 Mar 2017 09:51:56 +0000 (10:51 +0100)]
super1: ignore failfast flag for setting device role
There is corner case for setting device role,
if new device has failfast flag.
The failfast flag should be ignored.
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Xiao Ni [Sat, 18 Mar 2017 02:33:45 +0000 (10:33 +0800)]
mdadm: Forced type conversion to avoid truncation
Gcc reports it needs 19 bytes to right to disk->serial. Because the
type of argument i is int. But the meaning of i is failed disk
number. So it doesn't need to use 19 bytes. Just add a type
conversion to avoid this building error
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Xiao Ni [Sat, 18 Mar 2017 02:33:44 +0000 (10:33 +0800)]
Replace snprintf with strncpy at some places to avoid truncation
In gcc7 there are some building errors like:
directive output may be truncated writing up to 31 bytes into a region of size 24
snprintf(str, MPB_SIG_LEN, %s, mpb->sig);
It just need to copy one string to target. So use strncpy to replace it.
For this line code: snprintf(str, MPB_SIG_LEN, %s, mpb->sig);
Because mpb->sig has the content of version after magic, so
it's better to use strncpy to replace snprintf too.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Zhilong Liu [Mon, 20 Mar 2017 05:21:41 +0000 (13:21 +0800)]
mdadm/Monitor: Fix NULL pointer dereference when stat2devnm return NULL
Wait(): stat2devnm() returns NULL for non block devices. Check the
pointer is valid derefencing it. This can happen when using --wait,
such as the 'f' and 'd' file type, causing a core dump.
such as: ./mdadm --wait /dev/md/
Xiao Ni [Fri, 17 Mar 2017 11:55:43 +0000 (19:55 +0800)]
mdadm: Specify enough length when write to buffer
In Detail.c the buffer path in function Detail is defined as path[200],
in fact the max lenth of content which needs to write to the buffer is
287. Because the length of dname of struct dirent is 255.
During building it reports error:
error: ā%sā directive writing up to 255 bytes into a region of size 189
[-Werror=format-overflow=]
In function examine_super0 there is a buffer nb with length 5.
But it need to show a int type argument. The lenght of max
number of int is 10. So the buffer length should be 11.
In human_size function the length of buf is 30. During building
there is a error:
output between 20 and 47 bytes into a destination of size 30.
Change the length to 47.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Xiao Ni [Fri, 17 Mar 2017 11:55:42 +0000 (19:55 +0800)]
mdadm: Add Wimplicit-fallthrough=0 in Makefile
There are many errors like 'error: this statement may fall through'.
But the logic is right. So add the flag Wimplicit-fallthrough=0
to disable the error messages. The method I use is from
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
#index-Wimplicit-fallthrough-375
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Zhilong Liu [Mon, 6 Mar 2017 02:39:57 +0000 (10:39 +0800)]
mdadm:add man page for --symlinks
In build and create mode:
--symlinks
Auto creation of symlinks in /dev to /dev/md, option --symlinks
must be 'no' or 'yes' and work with --create and --build.
In assemble mode:
--symlinks
See this option under Create and Build options.
Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
NeilBrown [Thu, 2 Mar 2017 23:57:00 +0000 (10:57 +1100)]
examine: tidy up some code.
Michael Shigorin reports that the 'lcc' compiler isn't able
to deduce that 'st' must be initialized in
if (c->SparcAdjust)
st->ss->update_super(st, NULL, "sparc2.2",
just because the only times it isn't initialised, 'err' is set non-zero.
This results in a 'possibly uninitialised' warning.
While there is no bug in the code, this does suggest that maybe
the code could be made more obviously correct.
So this patch:
1/ moves the "err" variable inside the for loop, so an error in
one device doesn't stop the other devices from being processed
2/ calls 'continue' early if the device cannot be opened, so that
a level of indent can be removed, and so that it is clear that
'st' is always initialised before being used
3/ frees 'st' if an error occured in load_super or load_container.
Zhilong Liu [Wed, 1 Mar 2017 10:42:33 +0000 (18:42 +0800)]
mdadm:check the nodes when operate clustered array
It doesn't make sense to write_bitmap with less than 2 nodes,
in order to avoid 'write_bitmap' received invalid nodes number,
it would be better to do checking nodes in getopt operations.
Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Wol [Tue, 17 Jan 2017 17:47:05 +0000 (17:47 +0000)]
Fix oddity where mdadm did not recognise a relative path
mdadm assumed that a pathname started with a "/", while an array
name didn't. This alters the logic so that if the first character
is not a "/" it tries to open an array, and if that fails it drops
through to the pathname code rather than terminating immediately
with an error.
Signed-off-by: Wol <anthony@youngman.org.uk> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Pawel Baldysiak [Tue, 24 Jan 2017 13:29:33 +0000 (14:29 +0100)]
imsm: fix missing error message during migration
If user tries to migrate from raid0 to raid5 and there is no spare
drive to perform it - mdadm will exit with errorcode, but
no error message is printed.
Print error instead of debug message when this condition occurs,
so user is informed why requested migration is not started.
OROM defines maximum number of arrays supported. On array creation mdadm
checks if number of arrays doesn't exceed that limit, however it is not
calculated correctly for VMD now.
The current code performs a lookup of HBA using the id. VMD HBAs have
the same id so each lookup returns the same structure (first
encountered). Take a different approach for VMD HBAs. As id is not
unique and cannot be used for lookups, iterate over all VMD HBAs and
compare both id and HBA path.
Don't assume VMD sysfs path ends with a disk entry
When VMD is enabled but no drive is attached to the PCIe port, mdadm
crashes trying to parse the path. Skip entry if valid path has not been
returned. Do it early to avoid unnecessary memory allocation.
Tomasz Majchrzak [Wed, 28 Dec 2016 08:38:07 +0000 (09:38 +0100)]
imsm: enable bad block support for imsm metadata
Enable bad block support for imsm metadata as commit e522751d605d
("seq_file: reset iterator to first record for zero offset") has been
accepted in upstream kernel. Prior to that patch mdmon had not been able
to read bad blocks sysfs file.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Pawel Baldysiak [Thu, 22 Dec 2016 12:10:47 +0000 (13:10 +0100)]
IMSM: Do not update metadata if not able to migrate
This patch prevents mdadm from updating metadata if migration is
not possible. The same check is done in analyse_change(),
but in that place - metadata is already modified.
Always return last partition end address in 512B blocks
For 4K disks 'endofpart' is an index of the last 4K sector used by partition.
mdadm is using number of 512-byte sectors, so value returned by
get_last_partition_end must be multiplied by 8 for devices with 4K sectors.
Also, unused 'ret' variable has been removed.
Use disk sector size value to set offset for reading GPT
mdadm is using invalid byte-offset while reading GPT header to get
partition info (size, first sector, last sector etc.). Now this offset
is hardcoded to 512 bytes and it is not valid for disks with sector
size different than 512 bytes because MBR and GPT headers are aligned
to LBA, so valid offset for 4k drives is 4096 bytes.
imsm: set generation number when reading superblock
IMSM doesn't set 'events' field with generation number, so sometimes mdadm
tries to re-assembly container using metadata which isn't most recent (e. g.
from spare disk).
Pawel Baldysiak [Mon, 12 Dec 2016 10:28:44 +0000 (11:28 +0100)]
IMSM: Add support for Non-Intel NVMe drives under VMD
This patch adds checking if platform (preOS) supports
non-Intel NVMe drives under VMD domain,
and - if so - allow creating IMSM Raid Volume
with those drives.
NeilBrown [Mon, 5 Dec 2016 06:27:03 +0000 (17:27 +1100)]
mdopen: open md devices O_RDONLY
There is no need to request write access when opening
the md device, as we never write to it, and none of the
ioctls we use require write access.
If we do open with write access, then when we close, udev notices that
the device was closed after being open for write access, and it
generates a CHANGE event.
This is generally unwanted, and particularly problematic when mdadm is
trying to --stop the array, as the CHANGE event can cause the array to
be re-opened before it completely closed, which results in a new mddev
being allocated.
So just use O_RDONLY instead of O_RDWR.
Reported-by: Marc Smith <marc.smith@mcc.edu> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>