git.ipfire.org Git - thirdparty/mdadm.git/log

mdadm: add the ability to change cluster name

To support change the cluster name, the commit do the followings:

1. extend original write_bitmap function for new scenario.
2. add the scenarion to handle the modification of cluster's name
in write_bitmap1.
3. let the cluster name also show in examine_super1 and detail_super1

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Skip clustered devices in incremental

We want the clustered devices to be started exclusively by a cluster
resource-agent. So, avoid starting using the incremental option.

This also skips a clustered md from starting during boot in inactive mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Convert a bitmap=none device to clustered

This adds the ability to convert a regular md without bitmap
(--bitmap=none) to a clustered device (--bitmap=clustered).

To convert a device with --bitmap=internal or --bitmap=external,
you have to convert to --bitmap=none and then re-execute the
command with --bitmap=clustered.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Add a new clustered disk

A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:

--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)

The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Show all bitmaps while examining bitmap

This adds capability of exmining bitmaps corresponding to all
nodes/slots on the device.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Set home-cluster while creating an array

The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.

If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.

neilb: allow code to compile when corosync-devel is not installed.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Add nodes option while creating md

Specifies the maximum number of nodes in the cluster that may use
this device simultaneously. This is equivalent to the number of
bitmaps created in the internal superblock (patches to follow).

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Create n bitmaps for clustered mode

For a clustered MD, create bitmaps equal to number of nodes so
each node has an independent bitmap.

Only the first bitmap is has the bits set so that the first node
that assembles the device also performs the sync.

The bitmaps are aligned to 4k boundaries.

On-disk format:

0                    4k                     8k                    12k
-------------------------------------------------------------------
| idle                | md super            | bm super [0] + bits |
| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
| bm bits [3, contd]  |                     |                     |

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: fix a couple of typos.

Signed-off-by: NeilBrown <neilb@suse.de>

test: make 'check wait' more reliable.

'recover' etc doesn't appear in /proc/mdstat immediately.
The "sync" thread must be started first.
But 'sync_action' shows it as soon as MD_RECOVERY_NEEDED is set
in the kernel. So look there too.

Now maybe I can get rid of some of those silly 'sleep' calls.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/imsm-grow-template change 'wait' to 'check wait'

'wait' is a shell builtin that isn't doing anything useful.
It should be calling 'check wait' I think.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: fix problem with --grow --continue

If an array is being reshaped using backup space on a 'spare' device,
then
mdadm --grow --continue
won't find it as by the time it runs, nothing looks like a spare are
more. The spare has been added to the array, but has no data yet.

So allow reshape_prepare_fdlist to find a newly-incorporated spare and
report this so it can be used.

Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

tests: wait a bit long for reshape to complete.

As the kernel now does less locking, 'check wait' doesn't
always wait long enough. Add some pauses.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: another attempt to fix stop-during-reshape race.

When the array is stopped during a critical section, we sometimes
erase the backup, which is bad.
This happens when 'completed' is zero.
This can happen easily when 'stop' freezes reshape.

So try to be more careful and check 'reshape_position'.

Signed-off-by: NeilBrown <neilb@suse.de>

Fix minor typo in mdadm manpage.

Appologies if this is the wrong mailing list for this patch.

This is a very small patch for the manual page for the mdadm utility.

Thanks,
Andrew

Signed-off-by: NeilBrown <neilb@suse.de>

mdadm: monitor: fix nullptr dereference when get_md_name() returns NULL

Function add_new_arrays() expects that function get_md_name() should
return pointer to devname, but also get_md_name() may return NULL. So
check the pointer before use it in add_new_arrays().

Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru>
Signed-off-by: NeilBrown <neilb@suse.de>

test: forcefully clean up old loop devices.

sometimes these can get left around, and udev can be looking
at them at awkward times so they don't disappear.
So be forceful.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: be even more careful about handing a '0' completed value.

Some old kernels set 'completed' to '0' too soon.
But modern kernels don't.
And when 'mdadm --stop' freezes and resume the grow,
'completed' goes back to zero briefly, which can confuse this
logic.
So only think '0' might be wrong from an old kernel when
the reshape has gone idle.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/07reshape5intr : retry if writing 'check' fails.

It can sometimes.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/19raid6repair: don't flushbufs on non-existent array.

..that triggers an error.

Signed-off-by: NeilBrown <neilb@suse.de>

tests: wait for complete rebuild in integrity checks

'check wait' seems a bit racy now.
Wait for the array to be fully optimal before proceeding.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: retry when writing 'reshape' to 'sync_action' is EBUSY.

EBUSY can be returned if something has recently happened
to cause md to want to check if recovery is needed, but hasn't
had a chance yet.

This can easily happen in testing.

So retry a few times in that case.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/05r6tor0: minor adjustments

1/ use correct data-offset for cmp - that has changed.
2/ flushbufs on the block device before reading to avoid cache issues

Signed-off-by: NeilBrown <neilb@suse.de>

tests: 05r6tor0 - add some more waiting.

I don't really know why this is needed, but there is a delay
between the reshape finishing and the level/etc changing.
So add some sleeps.

Signed-off-by: NeilBrown <neilb@suse.de>

tests/imsm-grow-template: sleep a bit more.

The current sleep/wait doesn't seem long enough,
particularly when two arrays are being reshaped in the one
container.

So wait a bit more...

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: be more careful if array is stopped during critical section.

In that case, updating 'completed' to 'max_progress' is wrong.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: add missing space in message.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: only warn about incompatible metadata when no fallback available.

We might be trying to set_new_data_offset() for RAID10, when it is
a necessary requirement, or for RAID5 where it is optional.
In the latter case, a message about metadata versions is no helpful.

Signed-off-by: NeilBrown <neilb@suse.de>

Manage: when re-adding, do check avail size if ->sb cannot be found.

avail_size1 requires ->sb, so we must only call it if ->sb
was loaded.

If ->sb wasn't loaded, then we are only proceding on the basis that
the kernel might be able to work something out - we don't need to
do any tests on size.

Reported-by: Christoffer Hammarström <christoffer.hammarstrom@linuxgods.com>
Signed-off-by: NeilBrown <neilb@suse.de>
URL: https://bugs.debian.org/784874

tests: don't "dd" indefinitely.

This will trigger an error. And now that errors are fatal....

Signed-off-by: NeilBrown <neilb@suse.de>

tests: ignore failure status from mdadm -IRs

This can report non-zero if there was nothing to do,
and that isn't really an error.
If the array doesn't get started, something else
will complain.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: don't check for pre-existing array when updating uuid.

This is a very corner-case, but the self-tests tripped on it,
and it makes sense not to trust the uuid when it is being changed.

Signed-off-by: NeilBrown <neilb@suse.de>

DDF: _write_super_to_disk: fix anchor header type

Since commit 30bee0201, the anchor is updated from the active
DDF header. This requires fixing the header type before the
anchor is written.

The LSI Software RAID code will reject DDF meta data with wrong
anchor type and will erase all meta data when it encounters
such a broken anchor. Thus starting Linux md once on a system
with LSI RAID BIOS may cause the meta data to get destroyed.

Signed-off-by: NeilBrown <neilb@suse.de>

tests: never fail if --wait fails.

"--wait" will return non-zero status if it didn't need to wait.
This is no a reason to fail a test.

So ignore the return status from those commands.

Signed-off-by: NeilBrown <neilb@suse.de>

Add "Name" defines to some ancillary programs

All programs now need to declare their "Name".

Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: d56dd607ba43 ("Change way of printing name of a process")

Manage: fix test for 'is array failed'.

We 'active_disks' does not count spares, so if array is rebuilding,
this will not necessarily find all devices, so may report an array
as failed when it isn't.

Counting up to nr_disks is better.

Signed-off-by: NeilBrown <neilb@suse.de>

IMSM: Count arrays per orom

Active arrays with IMSM metadata are counted per hba so far.
This is bad due to new functionality of orom shared between multiple
controllers i.e. more arrays can be created than is supported by orom.
This patch changes the way of counting arrays, so the result will be
sum of arrays under every hba supported by specific orom.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Assemble/force: make it possible to "force" a new device in a reshape.

Normally we do not "force"-assemble devices which are in the
middle of recovery, as they are unlikely to have useful data.

However, when a reshape increases the number of devices,
the newly added devices appear to be recovering because they
do not have complete data on them yet, but then they aren't expected
to until the reshape completes.
So in this case, it can be appropriate to force-assemble them.

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: remove stray ':' from error message.

Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: allow a RAID4 to assemble easily when parity devices is missing.

If the parity device of a RAID4 is missing, then there is no immediate
risk to data. So it doesn't matter if the array is dirty or not.

This can be important when reshaping a RAID0, and is a much better
solution that that in the resent-reverted.
b720636a5849397dbc6dc1b0f0b671d17034a28b

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>

Revert "Assemble: support assembling of a RAID0 being reshaped."

This reverts commit b720636a5849397dbc6dc1b0f0b671d17034a28b.

As it said, this was a hack. It causes problems when trying to
--force assemble a RAID4. There is a better way.

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: fix "no uptodate device" message.

Since we introduced replacement devices, the 'i' used in
start_array() is twice the slot number.

So we need to adjust when printing.

Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: use the "space protocol" for "Wrong-Level".

"Wrong-Level" is a reason, not a component device, so it should
start with a space to indiciate this to alert().

Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: Obey "space protocol" when writing to syslog.

"alert" treats the "disc" arg differently if it starts with a space.

At least it does for sending email. It doesn't for writing to syslog.

Make this consistent and obey the 'space protocol' when writing to
syslog.

Signed-off-by: NeilBrown <neilb@suse.de>

reshape: support raid5 grow on certain older kernels.

Kernels between
c6563a8c38fde3c1c7fc925a v3.5-rc1~110^2~53
and
b5254dd5fdd9abcacadb5101 v3.5-rc1~110^2~51

allow new_offset to be set, but don't then allow a RAID5
to be reshaped to change that offset.
Due to selective backports, this includes the SLES11-SP3 kernel.

It is quite easy to handle this case in mdadm, so we do.
Specifically: if the reshape with data-offset fails with EINVAL,
abort the data-offset change and try the "old" way.

Signed-off-by: NeilBrown <neilb@suse.de>

IncRemove: Set "auto-read" only after successful excl open.

"mdadm -If" - triggered from udev rules when disk is removed from OS -
tries to set array in auto-read-only mode. This can interrupt rebuild
process which is started automatically, e.g. if array is mounted and
spare disk is available (I/O error is detected faster than removing
failed disk by mdadm).
This patch prevents "mdadm -If" from setting array into "auto-read-only",
by requiring exclusive open to succeed.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

IMSM-orom: make sure, that device list is supported

Devices list in PCI Data Structure is supported only in
3 and above revision. Make sure that this is checked.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: simplified multiple OROMs support

Replaced oroms array with list, add_orom() now only appends to this list
and add_orom_device_id() only appends devid_list node to an orom_entry.

Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: don't ignore the return value from stat.

static checkers complain about that.
So change the code to use 'fstat', as we really don't want
to see an error here..

Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

write_super_imsm_spares(): C statements are terminated by ;

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

IncrementalScan(): Make sure 'st' is valid before dereferencing it

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Grow.c: Fix classic readlink() buffer overflow

The buffer passed on to readlink() needs to contain space for the
terminating \0. See 'man 3 readlink' for details.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Don't break long strings onto multiple lines.

It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

"It is OK to\n"
"break this string\n"

Signed-off-by: NeilBrown <neilb@suse.de>

Consistently print program Name and __func__ in debug messages.

make dprintf() print program name and __func__, so that
this messaging is consistent.

Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.

Signed-off-by: NeilBrown <neilb@suse.de>

Change way of printing name of a process

Sometimes mdadm prints messages with wrong name "mdmon",
and vice versa.
This patch solves this problem by changing method of determining
process name.
Now "Name" will be set in const at start of a program,
previously was hardcoded as #define.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: fix for regression with container devices

This patch fixes 2 problems introduced by commit 9a518d8: not closing a
file descriptor and ignoring container devices. Array state is always
"inactive" for containers, so we make sure that the device is not a
container by reading also the "level" sysfs entry.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

mdcheck: be careful when sourcing the output of "mdadm --detail --export"

The output of "mdadm --detail --export" isn't quoted properly so
fields that contain spaces can be a problem.
We only want the MD_UUID field, and it has a very well defined
format with no spaces.
So use 'grep' to limit the output to just that.

Signed-off-by: NeilBrown <neilb@suse.de>

IMSM: Clear migration record on disks more often

Migration record is not always cleared after successful migration. This can
block another reshape from being started. Migration will not be continued via
systemd service due to error in verifying reshape position. This patch added
clearing migration record when disk is added to container, and after successful
migration.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

util: remove rounding error where reporting "human sizes".

The division
1<<20 / 200
is not exact, so dividing by this to convert bytes into half-megs
is wrong and results in incorrect output.

As we are doing "long long" arithmetic, there is no risk of an overflow
until we reach 64 petabytes.
So change to
* 200 / (1<<20).

Reported-by: Jan Echternach <jan@goneko.de>
Resolved-debian-bug: 763917
URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763917
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: Fix wrong 'goto' in set_new_data_offset

Commit a821c95f114724b38df1ea99b2858178e0ed28ce
besides introducing additional message, also changed
direct return to "goto" instruction.
'goto release' will cause routine to return with '-1',
when previously '1' was returned.
Described behaviour breaks e.g. IMSM reshape process.
This patch fixes this issue by changing 'goto' to proper one -
the one that returns '1'.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Monitor: don't open md array that doesn't exist.

Opening a block-special-device for an array that doesn't
exist causes that array to be instantiated (as an empty array).
Races at array shutdown can cause the array to spontaneously
re-appear if some deamon notices a 'change' event and goes
to investigate.

Teach "mdadm --monitor" to avoid this race by checking the
"array_state" before opening the device.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Makefile: binaries shouldn't directly depend on check_rundir

check_rundir always needs to be "built", so making
mdadm and mdmon depend on it causes them to always be built.
i.e. running
make ; make

will needlessly link the binaries a second time.

So change the makefile to use "order-only" pre-requisites.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: use efivarfs interface for reading UEFI variables

Read UEFI variables using the new efivarfs interface, fallback to
sysfs-efivars if that fails.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: detail-platform improvements

Print platform details per OROM, not per controller, differentiate
RST(e) platforms from legacy IMSM, print NVMe device paths, adjust port
printing to newer sysfs path.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: add support for NVMe devices

Recognize Intel(R) NVMe devices as IMSM-capable.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: support for second and combined AHCI controllers in UEFI mode

Grantly platform introduces a second AHCI controller (sSATA) and two new
UEFI variables for the RSTe firmware. This patch adds support for those
variables in order to correctly determine IMSM platform capabilities in
UEFI mode.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

imsm: support for OROMs shared by multiple HBAs

The IMSM platform code was based on an assumption that the OROM or UEFI
capability structure (represented by struct imsm_orom) always belongs to
only one HBA. This assumption is no longer valid, because of newer
platforms with dual AHCI HBAs. Each HBA can have a separate OROM, but
some versions have a combined OROM for both HBAs.

This patch implements this HBA-OROM relationship in struct orom_entry,
which matches an OROM with a list of HBA PCI ids. All the detected
orom_entries are stored and retrieved using a global array and the
functions add_orom(), add_orom_device_id() and get_orom_by_device_id().
This replaces the arrays: imsm_orom, populated_orom, imsm_efi,
populated_efi.

The scan() function is extended to find all HBAs for an OROM. The list
of their device ids is retrieved from the PCI Expansion ROM Data
Structure, hence the additional field devListOffset in struct
pciExpDataStructFormat.

In UEFI mode we can't read the PCI Expansion ROM Data Structure and the
imsm_orom structures are stored in UEFI variables. They do not provide a
similar device id list, so we also check the HBA PCI class to make sure
that the HBA has RAID mode enabled.

In super-intel.c there are changes which allow spanning of IMSM
containers over HBAs of the same type, but only if the HBAs share the
same OROM. This is done by comparing imsm_orom pointers, which (outside
of platform-intel.c) always point to the global array containing all the
detected oroms. Additional warnings are added to
validate_container_imsm() to warn about potentially dangerous operations
in all the possible cases, e.g. when an array is assembled using disks
attached to HBAs with separate OROMs.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Incremental: don't be distracted by partition table when calling try_spare.

Currently a partition table on a device makes "mdadm -I" think
the array has a particular metadata type and so will only
add it to an array of that (partition table) type .. which doesn't
make any sense.

So tell guess_super to only look for 'array' metadata.

Reported-by: Caspar Smit <c.smit@truebit.nl>
Signed-off-by: NeilBrown <neilb@suse.de>

Detail: fix handling of 'disks' array.

Since the introduction of replacement devices, we reserve
to places in the "disks" array for each raid disk.
That means we should allocate to twice "max_disk" as the array
could have that many raid_disks (though that would limit the
number of replacements).

A couple of other places need to use "max_disks*2" instead of
"max_disks" to co-ordinate with this.

Reported-by: Or Sagi <ors@reduxio.com>
Signed-off-by: NeilBrown <neilb@suse.de>

super1: remove some debugging printfs in update_super1

These should never have been there.

Signed-off-by: NeilBrown <neilb@suse.de>

Rebuildmap: strip local host name from device name.

When /run/mdadm/map is being rebuilt, e.g. by "mdadm -Ir",
if the device doesn't exist in /dev, we have to choose
a name.
Currently we don't strip the hostname which is wrong if
it is the local host.

Reported-by: Stephen Kent <smkent@smkent.net>
Signed-off-by: NeilBrown <neilb@suse.de>

mdcheck: don't git error if not /dev/md?* devices exist.

If there are no such devices, the 'for' will set '$dev' to
'/dev/md?*', which should be ignored.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: fix resize of array component size to > 32bits

If the request --size to --grow an array to is larger
than 32bits, then mdadm may make the wrong choice and
use ioctl instead of setting component_size via sysfs
and the change is ignored.

Instead of using casts to check for a 32-bit overflow,
just check for set bits outside of INT32_MAX.

Fixes: 4e9a3dd16d656b269f5602624ac4f7109a571368
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: already read sysfs files once after opening.

seq_file in the kernel will allocate a read buffer on
first read. We want this to happen under the managemon thread,
not the 'monitor' thread, as the latter is not allow to allocate
memory (might deadlock).
So do a first read after opening.

Signed-off-by: NeilBrown <neilb@suse.de>

Grow: Report when grow needs metadata update

Report when the array's metadata needs updating instead of just
reporting the generic "kernel too old" message.

Signed-off-by: Andy Smith <andy@strugglers.net>
Signed-off-by: NeilBrown <neilb@suse.de>

--update: add 'bbl' and 'no-bbl' to the list of known updates.

so "mdadm -A --update=?" mentions them.

Reported-by: Peter Hoeg <peter@hoeg.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Release mdadm-3.3.2

Minor bugfix/stability release.

Signed-off-by: NeilBrown <neilb@suse.de>

Fix parallel make problem.

When make is called with, for example,
   "make -j9 install install-system"
i.e. both install and install-systemd targets at the same
line and with high -j value,
then the same install.tmp file was used, and udev rules
ends up in systemd service files, or otherway around.

For more information, see:
  http://www.spinics.net/lists/raid/msg46782.html
  http://bugs.gentoo.org/show_bug.cgi?id=517218

Signed-off-by: NeilBrown <neilb@suse.de>

super1: make sure 'room' includes 'bbl_size' when creating array.

Because we then go ahead and subtrace bbl_size from room.

Signed-off-by: NeilBrown <neilb@suse.de>

super1: don't allow adding a bitmap if there is no space.

If the data is too close to the superblock there may be
no space for a bitmap.
If that happens, fail the adding of the bitmap rather than
corrupt data.

Reported-by: Lars Wijtemans <rhelbugzilla@lars.wijtemans.nl>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=922944

Monitor: Stop monitoring devices that have disappeared.

If we are only monitoring a device because we found it in
/proc/mdstat, and it has been gone for 5 checks, forget
about it completely.

Signed-off-by: NeilBrown <neilb@suse.de>

mdadm: document some more magic environment variables.

Others are mostly for developers.

Signed-off-by: NeilBrown <neilb@suse.de>

Manage: fix removal of non-existent devices.

"--remove detached" and others stopped working a while
back when I refactored some code.
For 'remove' and 'fail', the device may not exist so
if it is "MM:mm", (e.g. added by "detached"), just parse
out the numbers.

Reported-by: Killian De Volder <killian.de.volder@megasoft.be>
Signed-off-by: NeilBrown <neilb@suse.de>

util: split get_maj_min() out from dev_open()

This allows other code to parse "8:3" style device names.

Signed-off-by: NeilBrown <neilb@suse.de>

Manage: simplify `rdev` handling in Manage_subdevs.

The only use 'struct stat stb' to get the 'rdev', and sometimes
we don't even use 'stat'.
So make 'rdev' a stand-alone variable, and only declare stb'
when we actually need it.

Signed-off-by: NeilBrown <neilb@suse.de>

config: new option to suppress adding bad block lists.

CREATE bbl=no

in mdadm.conf will cause any devices added to an array
to not have a bad block list. By default they do for 1.x
metadata.

This is useful if you are suspicious of the bad-block-list
implementation.

Reported-by: Ethan Wilson <ethan.wilson@shiftmail.org>
Signed-off-by: NeilBrown <neilb@suse.de>

md.4: replace "bad block log" with "bad block list"

Elsewhere we use the term "list", and it is more accurate.
Logs are usually append-only. This list isn't.

Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: don't include super0 and super1 in mdmon

They are no needed, and future patch will add a dependency
yo super1 which mdmon doesn't have.

Signed-off-by: NeilBrown <neilb@suse.de>

super: make sure to ignore disk state flags that we don't understand.

This make it easier to add new flags that some super-types
don't understand.

Signed-off-by: NeilBrown <neilb@suse.de>

Detail: Avoid dereferencing some NULL pointers.

dm devices which only have a single underlying md device
will respond to md ioctls as though they were that md device.
This can confuse mdadm and lead it to violating its segments.

So add tests for NULL where appropriate. You might not get exactly
the right answer when you "mdadm -D" a dm device, but at least it won't
crash now.

Reported-by: Willy Weisz <Willy.Weisz@univie.ac.at>
Resolves: https://bugzilla.novell.com/show_bug.cgi?id=887821
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: cast print arguments in super-ddf.c

mdadm fails to build on ppc64 and ppc64le architectures.
===
super-ddf.c: In function '_set_config_size':
super-ddf.c:2849:4: error: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' [-Werror=format=]
    pr_err("%s: %x:%x: workspace size 0x%llx too big, ignoring\n",
    ^
super-ddf.c:2855:2: error: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' [-Werror=format=]
  dprintf("%s: %x:%x config_size %llx, DDF structure is %llx blocks\n",
  ^
cc1: all warnings being treated as errors
<builtin>: recipe for target 'super-ddf.o' failed
===

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1125883
Signed-off-by: <menantea@linux.vnet.ibm.com>
Signed-off-by: Michel Normand <normand@linux.vnet.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Assemble: Only fail auto-assemble in face of mdadm.conf conflicts.

We should never auto-assemble things that conflict with mdadm.conf
However explicit assembly requests should be allowed.

Reported-by: olovopb
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1070245
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: improve error message is "--grow -n2" used on Linear arrays.

Linear arrays don't respond to setting raid-disks, only to
adding a device.

Reported-by: mulhern
Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1122146
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: fix that preventing resize of array to 32bit size.

If the request --size to --grow an array to is 32bits
(i.e. msb in bit 32) then mdadm make wrong choice and
uses ioctl instead of setting component_size via sysfs
and the change is ignored.

This is fixed by using correct casts.

Reported-and-tested-by: Killian De Volder <killian.de.volder@megasoft.be>
Signed-off-by: NeilBrown <neilb@suse.de>

IMSM: move "validate_container_imsm" to be included in mdassemble

Commit 0c21b485e4beb7bcfe631412a231f7c1ea1067bc added new
function in imsm superswitch. This function should be
included in mdassemble.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Grow: Do not try to restart if reshape is running

Grow process did not check if reshape is already started
when deciding about restarting.
Sync_action should be checked in this case, and if
reshape is running - restart flag should not be set.
Otherwise, Grow process will fail to write data to
sysfs, and reshape will not be continued.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

IMSM: validate metadata_update size before using it.

Every case in prepare_update check that the size message
size is sufficient, so process_update doesn't need to check anything.

Reported-by: Vincent Berg <vberg@ioactive.com>
Signed-off-by: NeilBrown <neilb@suse.de>

DDF: validate metadata_update size before using it.

process_update already checks update->len, for all but
the 'magic', prepare_update doesn't at all.

So add tests to prepare_update that we don't exceed the buffer.
This will consequently protect process_update from looking
for a 'magic' which isn't there.

Reported-by: Vincent Berg <vberg@ioactive.com>
Signed-off-by: NeilBrown <neilb@suse.de>

mdmon: allow prepare_update to report failure.

If 'prepare_update' fails for some reason there is little
point continuing on to 'process_update'.
For now only malloc failures are caught, but other failures
will be considered in future.

Signed-off-by: NeilBrown <neilb@suse.de>

IMSM: Add warning message when assemble spanned container

Due to several changes in code assemble with disks
spanned between different controllers can be obtained
in some cases. After IMSM container will be assembled, check HBA of
disks, and print proper warning if mismatch is detected.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>