DDF: validate metadata_update size before using it.
process_update already checks update->len, for all but
the 'magic', prepare_update doesn't at all.
So add tests to prepare_update that we don't exceed the buffer.
This will consequently protect process_update from looking
for a 'magic' which isn't there.
Reported-by: Vincent Berg <vberg@ioactive.com> Signed-off-by: NeilBrown <neilb@suse.de>
If 'prepare_update' fails for some reason there is little
point continuing on to 'process_update'.
For now only malloc failures are caught, but other failures
will be considered in future.
Pawel Baldysiak [Mon, 30 Jun 2014 12:22:22 +0000 (12:22 +0000)]
IMSM: Add warning message when assemble spanned container
Due to several changes in code assemble with disks
spanned between different controllers can be obtained
in some cases. After IMSM container will be assembled, check HBA of
disks, and print proper warning if mismatch is detected.
mdmon: ensure Unix domain socket is created with safe permissions.
In the unlikely case that mdmon is started with an overly
permissive umask, we don't want to risk giving away world acccess.
All other "mkdir" and "O_CREAT" calls in mdmon and mdadm set
a suitably restrictive permission mask. 'bind' don't take an
explicit mask so it needs an implicit one.
Reported-by: Vincent Berg <vberg@ioactive.com> Signed-off-by: NeilBrown <neilb@suse.de>
As strncpy doesn't guarantee to nul-terminate, some static
analysers get upset that it is followed by a 'strncat'.
So just use a 'strcpy' - strlen(disk_by_path) is constant
and definitely less than PATH_MAX.
Pawel Baldysiak [Wed, 11 Jun 2014 15:18:44 +0000 (15:18 +0000)]
Grow: fix removal of line in wrong case
Commit 18d9bcfa33939cee345d4d7735bc6081bcc409c8
removed wrong line (in case RAID0->RAID4).
This patch corrects this mistake
(line should be removed in case RAID4->RAID4).
NeilBrown [Thu, 5 Jun 2014 05:58:31 +0000 (15:58 +1000)]
Incremental: remove old devices when assembling in container.
When assembling a native array we just give all devices to the kernel
and leave it to discard the 'old' ones (based on sequence/event
number).
For external/container arrays, mdadm needs to do that.
So in assemble_container_content, get list of current devices in
array and discard any that aren't in the 'content' given.
They must have been rejected by metadata manager.
If we cannot discard old devices the array must already be active, so
just leave it alone, but with a message.
Baldysiak, Pawel [Fri, 30 May 2014 14:40:11 +0000 (14:40 +0000)]
Grow: Do not fork via systemd if freeze_reshape is set
Mdadm should not run 'grow-continue' unit file for container if
'--freeze-reshape' argument is passed. Otherwise it will be ignored,
and reshape will start anyway.
Baldysiak, Pawel [Fri, 30 May 2014 14:38:09 +0000 (14:38 +0000)]
Do not set default 'before.layout' when reshaping from RAID4 to RAID4
Commit fdcad551e9a54c4aa8c4b63160b76e2c539a0441
brings some changes to reshape process.
Setting 'before.layout' when reshaping from RAID4 to another RAID4 is
not really necessary.
If reshape is restarted 'before.layout' will be compared with
'info->array.layout' in reshape_array(). Changes brought by mentioned
commit will cause this comparation return as false, becouse 'array.layout'
is always set to 'ALGORITHM_PARITY_N' in analyse_change() for RAID4, so
reshape will not be continued after reboot/stop.
This patch reverts unnecessary changes.
NeilBrown [Thu, 22 May 2014 06:00:39 +0000 (16:00 +1000)]
mdcheck: new script to help with regular checks of md arrays.
This script allows arrays to be 'checked' for a limited amount
of time on a regular basis.
For example, running
mdcheck --duration 6hours
early every Sunday morning and
mdcheck --continue 6hours
ever other morning will check all arrays every week, but if that take
more than 6 hours, will won't run into the day, but will be continued
the next morning, and the next ... etc.
NeilBrown [Thu, 22 May 2014 05:22:39 +0000 (15:22 +1000)]
--examine-bitmap: give useful message if no bitmap found on md array.
The bitmap is stored on member devices, not on the array, so
--examine-bitmap should be given the member device.
If --examine-bitmap is given an array, and it doesn't have a bitmap
on it (i.e. it isn't a member of some other array), then that
is probably a usage error, so print a helpful message.
NeilBrown [Wed, 21 May 2014 04:03:48 +0000 (14:03 +1000)]
DDF: remove some pointless code in validate_geometry
I'm not sure what this was supposed to do, but it isn't needed
as creating on a container and on individual devices (in a container)
work fine already.
NeilBrown [Wed, 21 May 2014 03:27:54 +0000 (13:27 +1000)]
DDF: ensure dl->devname is freed when processing a 'delete device' update.
As this code runs in 'monitor' it cannot just free memory,
it must add it to a list for 'manager' to free.
Fortunate update->space_list exists for just this purpose.
dl->devname might be small, so put it in update->space and
put dl in update->space_list.
NeilBrown [Tue, 13 May 2014 02:22:03 +0000 (12:22 +1000)]
tests: handle change to DDF assembly.
When a DDF array is assembled with missing devices, those devices
are now alway marked as 'missing' and cannot just re-appear in the array
and be working again.
We currently copy the anchor to both primary and secondary
blocks.
This assumes that the anchor is uptodate, but it might not be.
We should trust the 'active' block and copy from there.
DDF: update timestamp/seqnum for virtual disks when config changes.
- we weren't updating this timestamp at all
- the 'vd_config' seqnum was updated on every write of the metadata,
which is excessive. Just update it when there is a change.
DDF: avoid ref outside array in getinfo_super_ddf_bvd
As we are range-checking 'cd', there is a chance that it is not
in-range. In that case we should include all array indexes with 'cd'
inside the range-tested branch.
DDF: examine_pds to also list devices that aren't in the metadata.
The phys disks table should list all disks, but if the metadata
is corrupt, it might not even list the disk it was read from.
So check for and report any known disks that aren't listed.
The "used_pdes" value counts the number of physdisk entries that
are in used.
It may not be the last one in use as there may be unused slots in
the middle.
So when were are iterating over phys disks, we need to use max_pdes
and skip unused entries.
NeilBrown [Thu, 15 May 2014 04:23:16 +0000 (14:23 +1000)]
Grow: store a link to current backup file in /run/mdadm or similar.
Subsequent patch will allow the background part of "mdadm --grow" to
be run from systemd. This can require the passing of a backup file
name.
To do this, store that name as a symlink in /run/mdadm (or MAP_DIR)
and look for it when appropriate.
It might be useful to also store the name across reboot, but that
would be a different patch. We would need to use the uuid to identify
it, and store it in stable storage.
Create: don't default to bitmap=internal when it is not supported
For large arrays (component size > 100GB) if write-intent bitmap is not
enabled, then it is set by default to "internal", even if the metadata
format does support internal bitmaps, which causes Create to fail.
This patch adds checking if add_internal_bitmap is set in the
superswitch before setting bitmap_file to "internal".
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
This modifies locking in Create to eliminate a situation where
--incremental can assemble a device between write_init_super() and
add_disk(), which causes Create to fail.
It sporadically occurs e.g. when metadata is written on a device,
causing an udev change event which triggers mdadm --incremental.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
systemd: various fixes for boot with container-arrays.
1/ Add systemd shutdown script to ensure DDF and IMSM are
clean before we actually shutdown
2/ Get udev to tell systemd to run the mdmon@mdXXX.service
units when a member array appears.
If we boot off a member array (with dracut at least),
the mdmon started in the initramfs will lose track of
/sys etc, so we need to restart it.
systemd will try to forget about it too (but not actually
kill it because we said not to do this).
Having udev tell it to start it will allow a new mdmon to
run which can see /sys, and systemd will know about it.
3/ Always use --offroot and --takeover when starting mdmon with
systemd
--offroot is needed else shutdown will hang.
--takeover is needed incase an mdmon was started earlier
(e.g. in initramfs).
Neither hurt if they aren't actually needed.
DDF: Don't fail compare_super_ddf due to re-configure changes.
It is possible that one device has seem some reconfig but the other
hasn't. In that case they are still the "same" DDF, even though
one might be older. Such age will be detected by 'seq' differences.
If A is new and B is old, then it is import that
mdadm -I B
mdadm -I A
doesn't get confused because A has the same uuid as B, but compare_super fails.
So: if the seq numbers are different, then just accept as two
different superblocks.
If they are the same, then look to copy data from new to old.
DDF: fix possible mdmon crash when updating metadata.
Testing 'c' and then using 'vdc' assumes that the two are in sync,
but sometimes they aren't.
Testing 'vdc' is safer.
This avoids a crash in some cases when failing/removing/added devices
to a DDF.
The stripe locking mechanism must be atomic between
the check and the, potential, autorepair.
For this reason, the autorepair code needs to be just
after the check and both parts (check and autorepair)
must be excuted under stripe lock.
Of course, the manual repair can operate as before.
This patch reorganize the code and provides the single,
atomic, stripe lock.
It should be confirmed that this new locking is not
too demanding.
In case it is, some other solutions will be required
(suggestions wellcome).
Jes Sorensen [Wed, 19 Mar 2014 13:26:02 +0000 (14:26 +0100)]
Work around architectures having statfs.f_type defined as long
Having RAMFS_MAGIC defined as 0x858458f6 causing problems when trying
to compare it directly against statfs.f_type being cast from long to
unsigned long.
This hack is extremly ugly, but it should at least do the right thing
for every situation.
Pawel Baldysiak [Thu, 6 Mar 2014 14:51:44 +0000 (15:51 +0100)]
mdmon@.service: Change type of process start-up to 'forking'.
Mdadm does not wait enough time when mdmon is started by systemd.
It causes various problems with behaviour of a RAID volume with external metadata.
For example: mdmon does not update a value of checkpoint during migration
and second RAID5 volume is read-only after reboot done during
container reshape (both problems occur with IMSM matadata).
If a type of process start-up is changed to 'forking', systemctl will
wait until mdmon (parent) process exits after calling fork.
This way mdmon will always be fully initialized after start_mdmon
and these problems will not occur.
In this case it is recommended to add a path to PIDFile, so that systemd
does not have to guess a PID of the mdmon process.
This patch will remove some legacy code.
It is part of the verbosity "cleanup".
In any case, if information about the P
and Q parity mismatches is required, it
should go inside the code handling page
size blocks, not full stripe size.
NeilBrown [Mon, 20 Jan 2014 22:46:07 +0000 (09:46 +1100)]
systemd/mdmon: set IMSM_NO_PLATFORM=1
As mdmon doesn't inherit environment from mdadm when it is started
by system, it cannot inherit IMSM_NO_PLATFORM.
But if an imsm array as assembled then mdmon really should handle it
whether there is a platform present or not.
So always set this var.
NeilBrown [Mon, 20 Jan 2014 04:31:45 +0000 (15:31 +1100)]
Grow: fix problems with prematurely aborting of reshapes.
1/ when unfreezing, make sure the array is frozen first.
If it isn't we might end up interrupting a reshape.
2/ When the child finishes, don't call abort_reshape() as that
will interrupt the reshape. Just set suspend_* etc
explicitly.
NeilBrown [Mon, 20 Jan 2014 04:27:29 +0000 (15:27 +1100)]
DDF: fix detection of failed devices during assembly.
When we call "getinfo_super", we report the working/failed status
of the particular device, and also (via the 'map') the working/failed
status of every other device that this metadata is aware of.
It is important that the way we calculate "working or failed" is
consistent.
As it is, getinfo_super_ddf() will report a spare as "working", but
every other device will see it as "failed", which leads to failure to
assemble arrays with spares.
For getinfo_super_ddf (i.e. for the container), a device is assumed
"working" unless flagged as DDF_Failed.
For getinfo_super_ddf_bvd (for a member array), a device is assumed
"failed" unless DDF_Online is set, and DDF_Failed is not set.
NeilBrown [Mon, 20 Jan 2014 04:23:31 +0000 (15:23 +1100)]
Assemble: avoid infinite loop when auto-assembling partial container.
When auto-assembling we loop until we get no successes.
If a device is found that look like it is part of an already-existing
container, but we subsequently fail to add that device, then the fact
that the container is running looks like a success. This can result
in infinite looping.
So if a container was already partially assemble, and is still only
partially assembled after we try to add devices, then don't treat that
as success.
Jan Ceuleers [Wed, 11 Dec 2013 07:45:55 +0000 (08:45 +0100)]
Clarify scope of Rebuild events in mdadm manpage
To date, the manpage did not make it clear under which circumstances
Rebuild events are generated, leading to a question on the mailing
list as to whether it is normal for these events to be generated
while checking an array.
So clarify that all operations that act on the entire array are in
scope. The list is given as "e.g.", because it might grow in the
future as other full-array operations are added.
Reported-by: Mark Knecht <markknecht@gmail.com> Signed-off-by: Jan Ceuleers <jan.ceuleers@computer.org> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 12 Dec 2013 02:20:32 +0000 (13:20 +1100)]
mdamd-last-resort: add a Conflicts line to stop the timer.
When the md device actually appears we want to stop the timer and not
bother with the mdadm-last-resort@.server. In particular, running
that causes confusing messages and is in general best avoided.
Fortuantely this can simply be achieved with a Conflicts= line
NeilBrown [Wed, 11 Dec 2013 01:29:22 +0000 (12:29 +1100)]
udev rules: try "mdadm -I" on "change" events.
We need to attempt "mdadm -I" on "change" events as well as "add" events,
as the "change" make make a device ready to be part of an array.
This is particularly important for stacked md devices. When the
member devices are "add"ed they don't have any content visible yet.
That doesn't happen until a "change".
NeilBrown [Wed, 11 Dec 2013 01:25:02 +0000 (12:25 +1100)]
udev rules: add some by-pass rules from Fedora
1/ If ANACONDA is running, don't -I assemble any arrays, ANACONDA
needs to be in control
2/ honour "noiswmd" and "nodmraid" kernel command line options.
NeilBrown [Tue, 10 Dec 2013 23:47:54 +0000 (10:47 +1100)]
Add mdmonitor.service systemd unit file.
This systemd unit file runs mdadm in --monitor mode.
It is started by a SYSTEMD_WANTS signal from udev whenever
an md array is started that would benefit from mdadm --monitor.
Commandline arguments can be provided by a script
/usr/lib/systemd/scripts/mdadm_env.sh
which should write an
MDADM_MONITOR_ARGS=....
line to /run/sysconfig/mdadm
A script to extra args from SUSE's /etc/sysconfig/mdadm file
is provided.
If no mdadm_env.sh is provided, then args are "--scan" which
requires "mail" or "program" to be set in /etc/mdadm.conf.
I believe this is suitable for Fedora.