allow new_offset to be set, but don't then allow a RAID5
to be reshaped to change that offset.
Due to selective backports, this includes the SLES11-SP3 kernel.
It is quite easy to handle this case in mdadm, so we do.
Specifically: if the reshape with data-offset fails with EINVAL,
abort the data-offset change and try the "old" way.
Pawel Baldysiak [Fri, 27 Feb 2015 14:47:54 +0000 (15:47 +0100)]
IncRemove: Set "auto-read" only after successful excl open.
"mdadm -If" - triggered from udev rules when disk is removed from OS -
tries to set array in auto-read-only mode. This can interrupt rebuild
process which is started automatically, e.g. if array is mounted and
spare disk is available (I/O error is detected faster than removing
failed disk by mdadm).
This patch prevents "mdadm -If" from setting array into "auto-read-only",
by requiring exclusive open to succeed.
NeilBrown [Thu, 12 Feb 2015 02:46:53 +0000 (13:46 +1100)]
Don't break long strings onto multiple lines.
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.
Only strings which contain a newline can be broken
into multiple lines:
NeilBrown [Thu, 12 Feb 2015 02:21:17 +0000 (13:21 +1100)]
Consistently print program Name and __func__ in debug messages.
make dprintf() print program name and __func__, so that
this messaging is consistent.
Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.
Pawel Baldysiak [Wed, 11 Feb 2015 21:25:03 +0000 (22:25 +0100)]
Change way of printing name of a process
Sometimes mdadm prints messages with wrong name "mdmon",
and vice versa.
This patch solves this problem by changing method of determining
process name.
Now "Name" will be set in const at start of a program,
previously was hardcoded as #define.
Monitor: fix for regression with container devices
This patch fixes 2 problems introduced by commit 9a518d8: not closing a
file descriptor and ignoring container devices. Array state is always
"inactive" for containers, so we make sure that the device is not a
container by reading also the "level" sysfs entry.
NeilBrown [Tue, 3 Feb 2015 22:06:47 +0000 (09:06 +1100)]
mdcheck: be careful when sourcing the output of "mdadm --detail --export"
The output of "mdadm --detail --export" isn't quoted properly so
fields that contain spaces can be a problem.
We only want the MD_UUID field, and it has a very well defined
format with no spaces.
So use 'grep' to limit the output to just that.
Pawel Baldysiak [Tue, 20 Jan 2015 12:52:25 +0000 (13:52 +0100)]
IMSM: Clear migration record on disks more often
Migration record is not always cleared after successful migration. This can
block another reshape from being started. Migration will not be continued via
systemd service due to error in verifying reshape position. This patch added
clearing migration record when disk is added to container, and after successful
migration.
Pawel Baldysiak [Thu, 27 Nov 2014 11:35:24 +0000 (12:35 +0100)]
Grow: Fix wrong 'goto' in set_new_data_offset
Commit a821c95f114724b38df1ea99b2858178e0ed28ce
besides introducing additional message, also changed
direct return to "goto" instruction.
'goto release' will cause routine to return with '-1',
when previously '1' was returned.
Described behaviour breaks e.g. IMSM reshape process.
This patch fixes this issue by changing 'goto' to proper one -
the one that returns '1'.
NeilBrown [Tue, 28 Oct 2014 21:48:02 +0000 (08:48 +1100)]
Monitor: don't open md array that doesn't exist.
Opening a block-special-device for an array that doesn't
exist causes that array to be instantiated (as an empty array).
Races at array shutdown can cause the array to spontaneously
re-appear if some deamon notices a 'change' event and goes
to investigate.
Teach "mdadm --monitor" to avoid this race by checking the
"array_state" before opening the device.
Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
Print platform details per OROM, not per controller, differentiate
RST(e) platforms from legacy IMSM, print NVMe device paths, adjust port
printing to newer sysfs path.
imsm: support for second and combined AHCI controllers in UEFI mode
Grantly platform introduces a second AHCI controller (sSATA) and two new
UEFI variables for the RSTe firmware. This patch adds support for those
variables in order to correctly determine IMSM platform capabilities in
UEFI mode.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
The IMSM platform code was based on an assumption that the OROM or UEFI
capability structure (represented by struct imsm_orom) always belongs to
only one HBA. This assumption is no longer valid, because of newer
platforms with dual AHCI HBAs. Each HBA can have a separate OROM, but
some versions have a combined OROM for both HBAs.
This patch implements this HBA-OROM relationship in struct orom_entry,
which matches an OROM with a list of HBA PCI ids. All the detected
orom_entries are stored and retrieved using a global array and the
functions add_orom(), add_orom_device_id() and get_orom_by_device_id().
This replaces the arrays: imsm_orom, populated_orom, imsm_efi,
populated_efi.
The scan() function is extended to find all HBAs for an OROM. The list
of their device ids is retrieved from the PCI Expansion ROM Data
Structure, hence the additional field devListOffset in struct
pciExpDataStructFormat.
In UEFI mode we can't read the PCI Expansion ROM Data Structure and the
imsm_orom structures are stored in UEFI variables. They do not provide a
similar device id list, so we also check the HBA PCI class to make sure
that the HBA has RAID mode enabled.
In super-intel.c there are changes which allow spanning of IMSM
containers over HBAs of the same type, but only if the HBAs share the
same OROM. This is done by comparing imsm_orom pointers, which (outside
of platform-intel.c) always point to the global array containing all the
detected oroms. Additional warnings are added to
validate_container_imsm() to warn about potentially dangerous operations
in all the possible cases, e.g. when an array is assembled using disks
attached to HBAs with separate OROMs.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 5 Nov 2014 05:21:42 +0000 (16:21 +1100)]
Incremental: don't be distracted by partition table when calling try_spare.
Currently a partition table on a device makes "mdadm -I" think
the array has a particular metadata type and so will only
add it to an array of that (partition table) type .. which doesn't
make any sense.
So tell guess_super to only look for 'array' metadata.
NeilBrown [Mon, 3 Nov 2014 22:35:20 +0000 (09:35 +1100)]
Detail: fix handling of 'disks' array.
Since the introduction of replacement devices, we reserve
to places in the "disks" array for each raid disk.
That means we should allocate to twice "max_disk" as the array
could have that many raid_disks (though that would limit the
number of replacements).
A couple of other places need to use "max_disks*2" instead of
"max_disks" to co-ordinate with this.
Reported-by: Or Sagi <ors@reduxio.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 3 Nov 2014 01:49:05 +0000 (12:49 +1100)]
Rebuildmap: strip local host name from device name.
When /run/mdadm/map is being rebuilt, e.g. by "mdadm -Ir",
if the device doesn't exist in /dev, we have to choose
a name.
Currently we don't strip the hostname which is wrong if
it is the local host.
Reported-by: Stephen Kent <smkent@smkent.net> Signed-off-by: NeilBrown <neilb@suse.de>
Justin Maggard [Sat, 25 Oct 2014 00:55:02 +0000 (17:55 -0700)]
Grow: fix resize of array component size to > 32bits
If the request --size to --grow an array to is larger
than 32bits, then mdadm may make the wrong choice and
use ioctl instead of setting component_size via sysfs
and the change is ignored.
Instead of using casts to check for a 32-bit overflow,
just check for set bits outside of INT32_MAX.
mdmon: already read sysfs files once after opening.
seq_file in the kernel will allocate a read buffer on
first read. We want this to happen under the managemon thread,
not the 'monitor' thread, as the latter is not allow to allocate
memory (might deadlock).
So do a first read after opening.
Samuli Suominen [Thu, 21 Aug 2014 03:56:48 +0000 (06:56 +0300)]
Fix parallel make problem.
When make is called with, for example,
"make -j9 install install-system"
i.e. both install and install-systemd targets at the same
line and with high -j value,
then the same install.tmp file was used, and udev rules
ends up in systemd service files, or otherway around.
For more information, see:
http://www.spinics.net/lists/raid/msg46782.html
http://bugs.gentoo.org/show_bug.cgi?id=517218
NeilBrown [Fri, 15 Aug 2014 05:45:54 +0000 (15:45 +1000)]
super1: don't allow adding a bitmap if there is no space.
If the data is too close to the superblock there may be
no space for a bitmap.
If that happens, fail the adding of the bitmap rather than
corrupt data.
Reported-by: Lars Wijtemans <rhelbugzilla@lars.wijtemans.nl>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=922944
NeilBrown [Mon, 11 Aug 2014 00:30:42 +0000 (10:30 +1000)]
Manage: fix removal of non-existent devices.
"--remove detached" and others stopped working a while
back when I refactored some code.
For 'remove' and 'fail', the device may not exist so
if it is "MM:mm", (e.g. added by "detached"), just parse
out the numbers.
Reported-by: Killian De Volder <killian.de.volder@megasoft.be> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 11 Aug 2014 00:22:24 +0000 (10:22 +1000)]
Manage: simplify `rdev` handling in Manage_subdevs.
The only use 'struct stat stb' to get the 'rdev', and sometimes
we don't even use 'stat'.
So make 'rdev' a stand-alone variable, and only declare stb'
when we actually need it.
NeilBrown [Wed, 6 Aug 2014 05:56:12 +0000 (15:56 +1000)]
Detail: Avoid dereferencing some NULL pointers.
dm devices which only have a single underlying md device
will respond to md ioctls as though they were that md device.
This can confuse mdadm and lead it to violating its segments.
So add tests for NULL where appropriate. You might not get exactly
the right answer when you "mdadm -D" a dm device, but at least it won't
crash now.
Guy Menanteau [Mon, 4 Aug 2014 14:53:03 +0000 (16:53 +0200)]
DDF: cast print arguments in super-ddf.c
mdadm fails to build on ppc64 and ppc64le architectures.
===
super-ddf.c: In function '_set_config_size':
super-ddf.c:2849:4: error: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' [-Werror=format=]
pr_err("%s: %x:%x: workspace size 0x%llx too big, ignoring\n",
^
super-ddf.c:2855:2: error: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' [-Werror=format=]
dprintf("%s: %x:%x config_size %llx, DDF structure is %llx blocks\n",
^
cc1: all warnings being treated as errors
<builtin>: recipe for target 'super-ddf.o' failed
===
Grow: fix that preventing resize of array to 32bit size.
If the request --size to --grow an array to is 32bits
(i.e. msb in bit 32) then mdadm make wrong choice and
uses ioctl instead of setting component_size via sysfs
and the change is ignored.
This is fixed by using correct casts.
Reported-and-tested-by: Killian De Volder <killian.de.volder@megasoft.be> Signed-off-by: NeilBrown <neilb@suse.de>
Grow process did not check if reshape is already started
when deciding about restarting.
Sync_action should be checked in this case, and if
reshape is running - restart flag should not be set.
Otherwise, Grow process will fail to write data to
sysfs, and reshape will not be continued.
DDF: validate metadata_update size before using it.
process_update already checks update->len, for all but
the 'magic', prepare_update doesn't at all.
So add tests to prepare_update that we don't exceed the buffer.
This will consequently protect process_update from looking
for a 'magic' which isn't there.
Reported-by: Vincent Berg <vberg@ioactive.com> Signed-off-by: NeilBrown <neilb@suse.de>
If 'prepare_update' fails for some reason there is little
point continuing on to 'process_update'.
For now only malloc failures are caught, but other failures
will be considered in future.
Pawel Baldysiak [Mon, 30 Jun 2014 12:22:22 +0000 (12:22 +0000)]
IMSM: Add warning message when assemble spanned container
Due to several changes in code assemble with disks
spanned between different controllers can be obtained
in some cases. After IMSM container will be assembled, check HBA of
disks, and print proper warning if mismatch is detected.
mdmon: ensure Unix domain socket is created with safe permissions.
In the unlikely case that mdmon is started with an overly
permissive umask, we don't want to risk giving away world acccess.
All other "mkdir" and "O_CREAT" calls in mdmon and mdadm set
a suitably restrictive permission mask. 'bind' don't take an
explicit mask so it needs an implicit one.
Reported-by: Vincent Berg <vberg@ioactive.com> Signed-off-by: NeilBrown <neilb@suse.de>
As strncpy doesn't guarantee to nul-terminate, some static
analysers get upset that it is followed by a 'strncat'.
So just use a 'strcpy' - strlen(disk_by_path) is constant
and definitely less than PATH_MAX.
Pawel Baldysiak [Wed, 11 Jun 2014 15:18:44 +0000 (15:18 +0000)]
Grow: fix removal of line in wrong case
Commit 18d9bcfa33939cee345d4d7735bc6081bcc409c8
removed wrong line (in case RAID0->RAID4).
This patch corrects this mistake
(line should be removed in case RAID4->RAID4).
NeilBrown [Thu, 5 Jun 2014 05:58:31 +0000 (15:58 +1000)]
Incremental: remove old devices when assembling in container.
When assembling a native array we just give all devices to the kernel
and leave it to discard the 'old' ones (based on sequence/event
number).
For external/container arrays, mdadm needs to do that.
So in assemble_container_content, get list of current devices in
array and discard any that aren't in the 'content' given.
They must have been rejected by metadata manager.
If we cannot discard old devices the array must already be active, so
just leave it alone, but with a message.
Baldysiak, Pawel [Fri, 30 May 2014 14:40:11 +0000 (14:40 +0000)]
Grow: Do not fork via systemd if freeze_reshape is set
Mdadm should not run 'grow-continue' unit file for container if
'--freeze-reshape' argument is passed. Otherwise it will be ignored,
and reshape will start anyway.
Baldysiak, Pawel [Fri, 30 May 2014 14:38:09 +0000 (14:38 +0000)]
Do not set default 'before.layout' when reshaping from RAID4 to RAID4
Commit fdcad551e9a54c4aa8c4b63160b76e2c539a0441
brings some changes to reshape process.
Setting 'before.layout' when reshaping from RAID4 to another RAID4 is
not really necessary.
If reshape is restarted 'before.layout' will be compared with
'info->array.layout' in reshape_array(). Changes brought by mentioned
commit will cause this comparation return as false, becouse 'array.layout'
is always set to 'ALGORITHM_PARITY_N' in analyse_change() for RAID4, so
reshape will not be continued after reboot/stop.
This patch reverts unnecessary changes.
NeilBrown [Thu, 22 May 2014 06:00:39 +0000 (16:00 +1000)]
mdcheck: new script to help with regular checks of md arrays.
This script allows arrays to be 'checked' for a limited amount
of time on a regular basis.
For example, running
mdcheck --duration 6hours
early every Sunday morning and
mdcheck --continue 6hours
ever other morning will check all arrays every week, but if that take
more than 6 hours, will won't run into the day, but will be continued
the next morning, and the next ... etc.
NeilBrown [Thu, 22 May 2014 05:22:39 +0000 (15:22 +1000)]
--examine-bitmap: give useful message if no bitmap found on md array.
The bitmap is stored on member devices, not on the array, so
--examine-bitmap should be given the member device.
If --examine-bitmap is given an array, and it doesn't have a bitmap
on it (i.e. it isn't a member of some other array), then that
is probably a usage error, so print a helpful message.
NeilBrown [Wed, 21 May 2014 04:03:48 +0000 (14:03 +1000)]
DDF: remove some pointless code in validate_geometry
I'm not sure what this was supposed to do, but it isn't needed
as creating on a container and on individual devices (in a container)
work fine already.
NeilBrown [Wed, 21 May 2014 03:27:54 +0000 (13:27 +1000)]
DDF: ensure dl->devname is freed when processing a 'delete device' update.
As this code runs in 'monitor' it cannot just free memory,
it must add it to a list for 'manager' to free.
Fortunate update->space_list exists for just this purpose.
dl->devname might be small, so put it in update->space and
put dl in update->space_list.
NeilBrown [Tue, 13 May 2014 02:22:03 +0000 (12:22 +1000)]
tests: handle change to DDF assembly.
When a DDF array is assembled with missing devices, those devices
are now alway marked as 'missing' and cannot just re-appear in the array
and be working again.