Pawel Baldysiak [Thu, 16 Jun 2016 09:12:20 +0000 (11:12 +0200)]
monitor: Make sure that last_checkpoint is set to 0 after sync
In a case of successful completion of a resync (in the last step)
- read_and_act sometimes still reads sync_action as "resync"
but sync_completed already is set to component_size.
When this race occurs, sync operation is
marked as finished, but last_checkpoint is
overwritten with sync_completed. It will cause next
sync operation (ie. reshape) to be reported as complete immediately
after start - mdmon will write successful completion of the reshape
to metadata. This patch sets last_checkpoint to 0 once the sync
is completed to stop it happening.
Xiao Ni [Thu, 16 Jun 2016 01:41:02 +0000 (09:41 +0800)]
MDADM:Check mdinfo->reshape_active more times before calling Grow_continue
When reshaping a 3 drives raid5 to 4 drives raid5, there is a chance that
it can't start the reshape. If the disks are not enough to have spaces for
relocating the data_offset, it needs to call start_reshape and then run
mdadm --grow --continue by systemd. But mdadm --grow --continue fails
because it checkes that info->reshape_active is 0.
The info->reshape_active is got from the superblock of underlying devices.
Function start_reshape write reshape to /sys/../sync_action. Before writing
latest superblock to underlying devices, mdadm --grow --continue is called.
There is a chance info->reshape_active is 0. We should wait for superblock
updating more time before calling Grow_continue.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Mike Lovell [Wed, 18 May 2016 18:23:13 +0000 (12:23 -0600)]
Use dev_t for devnm2devid and devid2devnm
Commit 4dd2df0966ec added a trip through makedev(), major(), and minor() for
device major and minor numbers. This would cause mdadm to fail in operating
on a device with a minor number bigger than (2^19)-1 due to it changing
from dev_t to a signed int and back.
Where this was found as a problem was when a array was created with a device
specified as a name like /dev/md/raidname and there were already 128 arrays
on the system. In this case, mdadm would chose 1048575 ((2^20)-1) for the
array and minor number. This would cause the major and minor number to become
negative when generated from devnm2devid() and passed to major() and minor()
in open_dev_excl(). open_dev_excl() would then call dev_open() which would
detect the negative minor number and call open() on the *char containing the
major:minor pair which isn't a valid file.
Signed-off-by: Mike Lovell <mlovell@bluehost.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Pawel Baldysiak [Tue, 17 May 2016 11:24:41 +0000 (13:24 +0200)]
IMSM: retry reading sync_completed during reshape
The sync_completed after restarting a reshape
(for example - after reboot) is set to "delayed" until
mdmon changes the state. Mdadm does not wait for that change with
old kernels. If this condition occurs - it exits and reshape
is not continuing. This patch adds retry of reading sync_complete
with a delay. It gives time for mdmon to change the "delayed" state.
Guoqing Jiang [Wed, 11 May 2016 09:31:36 +0000 (17:31 +0800)]
super1: add more checks for NodeNumUpdate option
There are some cases which didn't need to check the space
is enough or not for NodeNumUpdate option.
1. for array which does not have clustered bitmap.
2. "--nodes" parameter is 0 (eg, add a disk to clustered raid).
3. if "--nodes" parameter is set to a smaller num than
current bms->nodes.
Jes Sorensen [Thu, 12 May 2016 19:19:16 +0000 (15:19 -0400)]
mdadm: Make add_internal_bitmap() return 0 on success
add_internal_bitmap() returned 1 on success and 0 on error which is
inconsistent. This changes it to return 0 on success and use more
reasonable error codes on error.
super1: Clear memory allocated for superblock + bitmap before use
load_super1() did not clear memory allocated for the superblock +
bitmap. This causes issues if the superblock does not contain a bitmap
as later checks of bitmap features would rely on the bits being
cleared.
This bug has been around for a long time, but was only exposed in
mdadm-3.4 with the introduction of the clustering code.
Reported-by: Jan Stodola <jstodola@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
These are similar to stat2devnm() and fd2devnm() but not limited to md
devices. If the device is a partition they will return its kernel name,
not the whole device's name. For more information see commit: 8d83493 ("Introduce devid2kname - slightly different to devid2devnm.")
Also remove unsued declaration for fmt_devname().
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
zhilong [Fri, 25 Mar 2016 02:22:03 +0000 (10:22 +0800)]
mdadm:Add '--nodes' option in GROW mode
mdadm:add '--nodes' option in GROW mode, because
'Cluster nodes' is set 4 by default if the nodes
parameter is not specified when switch bitmap
from none to clustered.
Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Jes Sorensen [Wed, 9 Mar 2016 19:37:46 +0000 (14:37 -0500)]
mdadm: Cleanup conditionals
Be more consistent in the formatting of conditionals. Don't split on
multiple lines if not needed, don't overflow the 80 character line
length, put the condition operator at the end of the line of
multi-line conditionals, etc.
This should be purely cosmetic.... famous last words!
Yi Zhang [Fri, 11 Mar 2016 09:26:40 +0000 (17:26 +0800)]
Grow: analyse_change add notification about only 2-device can be convert from RAID1 to RAID5
Notify "Can only convert a 2-device array to RAID5" instead of
"Impossibly level change request for RAID1" when convert from
RAID1 to RAID5 if the disk num is not equal two like RAID4/5->RAID1
did.
Signed-off-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Pawel Baldysiak [Fri, 11 Mar 2016 15:47:16 +0000 (16:47 +0100)]
super-intel: Simplify for() loop in ahci_enumerate_ports
This patch simplifies for() loop used in
ahci_enumerate_ports(). It makes it more readable.
Similar thing was done in b913501
({platform,super}-intel: Fix two resource leaks).
Pawel Baldysiak [Fri, 11 Mar 2016 15:47:15 +0000 (16:47 +0100)]
super-intel: Make print_vmd_attached_devs() return int again
This patch reverts a0abe1e
(super-intel: Make print_found_intel_controllers() return void)
and make this function "return int" again.
Also, interpreting the return value is added.
Pawel Baldysiak [Fri, 11 Mar 2016 12:49:07 +0000 (13:49 +0100)]
Grow: close fd earlier to avoid "cannot get excl access" when stopping
If this file descriptor is not closed here, it remains open during
reshape process and stopping process will end up with
"cannot get exclusive access to container".
Once this file descriptor is no longer needed - it can be closed.
Hannes Reinecke [Wed, 9 Mar 2016 05:20:18 +0000 (13:20 +0800)]
Fix regression during add devices
Commit d180d2aa2a17 ("Manage: fix test for 'is array failed'.")
introduced a regression which would not allow to re-add new
drivers to a failed array.
Fixes: d180d2aa2a17 ("Manage: fix test for 'is array failed'.") Signed-off-by: Hannes Reinecke <hare@suse.de> Cc: Coly Li <colyli@suse.de> Cc: Neil Brown <neilb@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Jes Sorensen [Mon, 7 Mar 2016 19:50:06 +0000 (14:50 -0500)]
super1: Fix potential buffer overflows when copying cluster_name
cmap_get_string() used to retrieve cluster_name does not restrict it's
size. To prevent buffer overflows use the size of the destination
buffer, not strlen() of the source, and null terminate the copied
string.
Fixes: 0aa2f15b ("mdadm: add the ability to change cluster name)" Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Jes Sorensen [Fri, 4 Mar 2016 21:49:38 +0000 (16:49 -0500)]
Grow: Grow_addbitmap(): Add check to quiet down static code checkers
Grow_addbitmap() is only ever called with s->bitmap_file != NULL, but
not all static code checkers catch this. This adds a check to quiet
down the false positive warnings.
Jes Sorensen [Fri, 4 Mar 2016 21:00:21 +0000 (16:00 -0500)]
load_sys(): Add a buffer size argument
This adds a buffer size argument to load_sys(), rather than relying on
a hard coded buffer size. The old behavior was safe because we knew
the kernel would never return strings overrunning the buffers, however
it was ugly, and would cause code checking tools to spit out warnings.
This caused a Coverity warning over the read into
sra->sysfs_array_state which is only 20 bytes.
Guoqing Jiang [Mon, 7 Mar 2016 09:31:02 +0000 (17:31 +0800)]
Fix wrong bitmap output for cluster raid
For cluster raid, we need to displays bitmap related
contents from different bitmaps which are based on node
num. So bitmap_file_open and locate_bitmap are changed a
little bit for the purpose.
Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com> Fixes: b98043a2f8 ("Show all bitmaps while examining bitmap") Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
NeilBrown [Thu, 18 Feb 2016 04:53:32 +0000 (15:53 +1100)]
super-intel: ensure suspended region is removed when reshape completes.
A recent commit removed a call to abort_reshape() when IMSM reshape
completed. An unanticipated result of this is that the suspended
region is not cleared as it should be.
So after a reshape, a region of the array will cause all IO to block.
Re-instate the required updates to suspend_{lo,hi} coped from
abort_reshape().
This is caught (sometimes) by the test suite.
Also fix a couple of typos found while exploring the code.
Reported-by: Ken Moffat <zarniwhoop@ntlworld.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Fixes: 2139b03c2080 ("imsm: don't call abort_reshape() in imsm_manage_reshape()") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Jes Sorensen [Wed, 10 Feb 2016 19:15:38 +0000 (14:15 -0500)]
mdadm.h: rename bswap macros to avoid clash with uClibc definitions
uClibc exposes it's own version of bswap_<X> macros. Rather than
pulling in random macros by change, rename the mdadm ones to make sure
we know what we are getting.
Reported-by: "Maxin B. John" <maxin.john@gmail.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Maxin B. John [Mon, 8 Feb 2016 09:59:29 +0000 (11:59 +0200)]
Makefile: make the CC definition conditional
By hardcoding CC's definition in the Makefile, all the external gcc
parameters set by tune settings are lost. This causes compile failure
with x32 toolchain
Signed-off-by: Maxin B. John <maxin.john@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Xiao Ni [Sat, 6 Feb 2016 01:18:41 +0000 (09:18 +0800)]
Fix some type comparison problems
As 26714713cd2bad9e0bf7f4669f6cc4659ceaab6c said, 32 bit signed
timestamps will overflow in the year 2038. It already changed the
utime and ctime in struct mdu_array_info_s from int to unsigned
int. So we need to change the values that compared with them to
unsigned int too.
Signed-off-by : Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
NeilBrown [Thu, 28 Jan 2016 01:57:08 +0000 (12:57 +1100)]
super1: allow reshape that hasn't really started to be reverted.
A simple revert doesn't work here because the reshape_position is
in the critical section.
The best approach is to let the reshape progress a bit and then
go backwards.
If that isn't possible, assembling with --update=revert-reshape and
--invalid-backup should work.
Reported-by-tested-by: George Rapp <george.rapp@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Thu, 28 Jan 2016 00:45:53 +0000 (11:45 +1100)]
systemd/mdadm-last-resort: add Conflicts to .service file.
It seems that having the Conflicts in the .timer file is not sufficient.
Sometimes it works, but if the timer gets requested after the conflicting
block device appears (or was it "before" ...) the timer is not aborted.
Having the Conflicts in both files seems to work reliably.
Khem Raj [Thu, 14 Jan 2016 06:32:39 +0000 (22:32 -0800)]
Add casts for the addr arg of connect and bind
glibc allows the addr arg to connect and socket to be any of a number
of 'sockaddr_*' types, but musl requires 'const struct sockaddr *'
which is in line with open group specs. So add casts to allow
compilation with musl.
Khem Raj [Thu, 14 Jan 2016 06:32:38 +0000 (22:32 -0800)]
Define _POSIX_C_SOURCE if undefined
config.c uses _POSIX_C_SOURCE which is defined in features.h when
glibc/uclibc is used, but isn't defined when musl is used.
So provide a reasonable default.
imsm: don't update migration record when reshape is interrupted
Abort imsm_manage_reshape() without updating the migration record if any
error occurs when checking progress. If reshape is interrupted and the
migration record is then updated, the checkpoint will be wrong and will
cause reshape to fail when the array is restarted.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.com>
imsm: use timeout when waiting for reshape progress
Waiting for reshape progress is done by using select() on sync_completed
to block until an exception condition is signalled on the
filedescriptor. This happens when the attribute's value is updated by
the kernel, but if the array is stopped when mdadm is blocked on
select() this will never happen, because this attribute is then removed
and apparently the kernel doesn't do sysfs_notify() when removing a
sysfs attribute. So set a 3 second timeout for the sysfs_wait() call.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.com>
Pawel Baldysiak [Tue, 5 Jan 2016 16:03:04 +0000 (17:03 +0100)]
IMSM: Add support for VMD
The Intel Volume Management Device (VMD) is an integrated
endpoint on the platform's PCIe root complex that acts
as a host bridge to a secondary PCIe domain.
This patch adds proper handling of NVMe devices attached to VMD domain.
Each VMD domain is treated as a separate controller (HBA).
Spanning between domains is forbidden.
imsm: abort reshape if sync_action is not "reshape"
When reshape was interrupted, an incorrect checkpoint would be saved in
the migration record. Change wait_for_reshape_imsm() to return -1 when
sync_action is not "reshape" to abort early in imsm_manage_reshape()
without writing the migration record.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.com>
Grow: close file descriptor earlier to avoid "still in use" when stopping
Close fd2 as soon as it is no longer needed, before calling
Grow_continue(). Otherwise, we won't be able to stop an array with
external metadata during reshape, because mdadm running in background
will be keeping it open.
NeilBrown [Wed, 23 Dec 2015 01:15:32 +0000 (12:15 +1100)]
Detail: fix wrong condition in recent change.
Now that we can print device details with a specific raid_disk but not
disk.number, the condition for "print either disk.number or disk.raid_disk"
must be make more specific.
Reported-by: Coly Li <colyli@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
Xiao Ni [Tue, 22 Dec 2015 03:09:34 +0000 (11:09 +0800)]
Check and remove bitmap first when reshape to raid0
If reshape one raid device with bitmap to raid0, the reshape progress will
start. But it'll fail and lose some components. So it should remove bitmap
first.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
Song Liu [Mon, 21 Dec 2015 19:23:41 +0000 (11:23 -0800)]
move journal to end of --detail list
As we give journal device raid_disk of 0, the output of --detail is:
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
5 8 24 0 journal /dev/sdb8
1 8 18 1 active sync /dev/sdb2
2 8 19 2 active sync /dev/sdb3
3 8 21 3 active sync /dev/sdb5
4 8 23 - spare /dev/sdb7
This patch makes it back to:
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 18 1 active sync /dev/sdb2
2 8 19 2 active sync /dev/sdb3
3 8 21 3 active sync /dev/sdb5
NeilBrown [Fri, 18 Dec 2015 02:51:54 +0000 (13:51 +1100)]
Detail: don't assume a particular 'disk' number of missing devices.
When a particular raid-disk is missing, we don't know which disk number
it should have, and reporting a number could result in duplicate
numbers (with v1.x metadata - never with the old 0.90).
So set the default to -1 and recoginise that when printing.
NeilBrown [Fri, 18 Dec 2015 02:49:30 +0000 (13:49 +1100)]
Detail: report correct raid-disk for removed drives.
Back in
Commit: 8057db46a15d ("Detail: fix handling of 'disks' array.")
when we doubled the size of the 'disks' array to handle primary and
replacement, we should have halved the setting of the default raid_disk
number.
Reported-by: Coly Li <colyli@suse.de> Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Wed, 16 Dec 2015 17:54:26 +0000 (01:54 +0800)]
mdadm: improve the safeguard for change cluster raid's sb
This commit does the following jobs:
1. rename is_clustered to dlm_funs_ready since it match the
function better.
2. st->cluster_name can't be use to identify the raid is a
clustered or not, we should check the bitmap's version to
perform the identification.
3. for cluster_get_dlmlock/cluster_release_dlmlock funcs, both
of them just need the lockid as parameter since the cluster
name can get by get_cluster_name().
Guoqing Jiang [Wed, 16 Dec 2015 17:54:25 +0000 (01:54 +0800)]
mdadm: do not try to hold dlm lock in free_super1
Since free_super1 actually doesn't change the sb, it
just free the addr space of sb. Also free_super1 is
called in lots of place within mdadm, so remove dlm
lock code since the func doesn't need the protection
and also reduce latency.
Guoqing Jiang [Tue, 1 Dec 2015 16:30:12 +0000 (00:30 +0800)]
mdadm: do not display bitmap info if it is cleared
"mdadm -X DISK" is used to report information about a bitmap
file, it is better to not display all the related infos if
bitmap is cleared with "--bitmap=none" under grow mode.
To do that, the locate_bitmap is changed a little to have a
return value based on MD_FEATURE_BITMAP_OFFSET.
Guoqing Jiang [Tue, 1 Dec 2015 16:30:10 +0000 (00:30 +0800)]
mdadm: output info more precisely when change bitmap to none
WHen change bitmap to none, the infos could be more accurate
based on existed bitmap type.
And s->bitmap_file is passed from cmd "--bitmap=TYPE", so
remove s->bitmap_file from err info since it should means
change the bitmap to one type failed rather than the type is
already presented.
Deepa Dinamani [Tue, 8 Dec 2015 23:10:21 +0000 (15:10 -0800)]
mdadm: Change timestamps to unsigned data type.
32 bit signed timestamps will overflow in the year 2038.
Change the user interface mdu_array_info_s structure timestamps:
ctime and utime values used in ioctls GET_ARRAY_INFO and
SET_ARRAY_INFO to unsigned int. This will extend the field to last
until the year 2106.
Add time_after/time_before and supporting typecheck from
the kernel to take care of unsigned time wraparound.
The long term plan is to get rid of ctime and utime values in
this structure as this information can be read from the on-disk
meta data directly.
v0.90 on disk meta data uses u32 for maintaining time stamps.
So this will also last until year 2106.
Assumption is that the usage of v0.90 will be deprecated by
year 2106.
Timestamp fields in the on disk meta data for v1.0 version already
use 64 bit data types.