Mariusz Tkaczyk [Thu, 29 Feb 2024 11:52:06 +0000 (12:52 +0100)]
mdadm: drop get_required_spare_criteria()
Only IMSM implements get_spare_criteria, so load_super() in
get_required_spare_criteria() is dead code. It is moved inside
metadata handler, because only IMSM implements it.
Give possibility to provide devnode to be opened. With that we can hide
load_container() used only to fill spare criteria inside handler
and simplify implementation in generic code.
Add helper function for testing spare criteria in Incremental and
error messages.
File descriptor in get_spare_criteria_imsm() is always opened on purpose.
New functionality added in next patches will require it. For the same
reason, function is moved to other place.
Mariusz Tkaczyk [Thu, 29 Feb 2024 11:52:05 +0000 (12:52 +0100)]
mdadm: Add functions for spare criteria verification
It is done similar way in few places. As a result, two almost identical
functions (dev_size_from_id() and dev_sector_size_from_id()) are
removed. Now, it uses same file descriptor to send two ioctls.
Two extern functions are added, in next patches
disk_fd_matches_criteria() is used.
Next optimization is inline zeroing struct spare_criteria. With that,
we don't need to reset values in get_spare_criteria_imsm().
Dedicated boolean field for checking if criteria are filled is added.
We don't need to execute the code if it is not set.
Kinga Tanska [Tue, 27 Feb 2024 02:36:14 +0000 (03:36 +0100)]
Detail: remove duplicated code
Remove duplicated code from Detail(), where MD_UUID and MD_DEVNAME
are being set. Superblock is no longer required to print system
properties. Now it tries to obtain map in two ways.
Mariusz Tkaczyk [Fri, 23 Feb 2024 14:51:45 +0000 (15:51 +0100)]
mdadm: remove mkinitramfs stuff
This script uses mdadm.static which is known to not be abandoned
(probably not working) from years. Mdadm is integrated with dracut
and mkinitramfs these days.
Kinga Tanska [Tue, 27 Feb 2024 06:36:39 +0000 (07:36 +0100)]
super-intel: respect IMSM_DEVNAME_AS_SERIAL flag
IMSM_DEVNAME_AS_SERIAL flag was respected only when searching
serial using nvme or scsi device wasn't successful. This
flag shall be applied first, to have user settings with
the highest priority.
Mateusz Kusiak [Wed, 28 Feb 2024 15:37:20 +0000 (16:37 +0100)]
Monitor: Allow no PID in check_one_sharer()
Commit 5fb5479ad100 ("Monitor: open file before check in
check_one_sharer()") introduced a regression that prohibits monitor
from starting if PID file does not exist.
Mateusz Kusiak [Tue, 20 Feb 2024 16:04:44 +0000 (17:04 +0100)]
test: run tests on system level mdadm
The tests run with MDADM_NO_SYSTEMCTL flag by default, however it has
no effect on udev. In case of external metadata, even if flag is set,
udev will trigger systemd to launch mdmon.
This commit changes test execution level, so the tests are run on system
level mdadm, meaning local build must be installed prior to running
tests.
Add warning that the tests are run on system level mdadm and local
build must be installed first.
Do not call mdadm with "quiet" as it makes it not display critical
messages necessary for debug.
Remove forcing speed_limit and add restoring system speed_limit_max
after test execution.
Mateusz Kusiak [Tue, 20 Feb 2024 10:56:12 +0000 (11:56 +0100)]
mdmon: refactor md device name check in main()
Refactor mdmon main function to verify if fd is valid prior to checking
device name. This is due to static code analysis complaining after
change b938519e7719 ("util: remove obsolete code from get_md_name").
Mateusz Kusiak [Thu, 18 Jan 2024 10:30:18 +0000 (11:30 +0100)]
Grow: Move update_tail assign to Grow_reshape()
Due to e919fb0af245 ("FIX: Enable metadata updates for raid0") code
can't enter super-intel.c:3415, resulting in checkpoint not being
saved to metadata for second volume in matrix raid array.
This results in checkpoint being stuck at last value for the
first volume.
Move st->update_tail to Grow_reshape() so it is assigned for each
volume.
Mateusz Kusiak [Thu, 18 Jan 2024 10:30:17 +0000 (11:30 +0100)]
Super-intel: Fix first checkpoint restart
When imsm based array is stopped after reaching first checkpoint and
then assembled, first checkpoint is reported as 0.
This behaviour is valid only for initial checkpoint, if the array was
stopped while performing some action.
Last checkpoint value is not taken from metadata but always starts
with 0 and it's incremented when sync_completed in sysfs changes.
In simplification, read_and_act() is responsible for checkpoint updates
and is executed each time sysfs checkpoint update happens. For first
checkpoint it is executed twice and due to marking checkpoint before
triggering any action on the array, it is impossible to read
sync_completed from sysfs in just two iterations.
The workaround to this is not marking any checkpoint for first
sysfs checkpoint after RAID assembly, to preserve checkpoint value
stored in metadata.
Mateusz Kusiak [Thu, 18 Jan 2024 10:30:15 +0000 (11:30 +0100)]
Remove hardcoded checkpoint interval checking
Mdmon assumes that kernel marks checkpoint every 1/16 of the volume size
and that the checkpoints are equal in size. This is not true, kernel may
mark checkpoints more frequently depending on several factors, including
sync speed. This results in checkpoints reported by mdadm --examine
falling behind the one reported by kernel.
Mariusz Tkaczyk [Mon, 5 Feb 2024 14:50:29 +0000 (15:50 +0100)]
Revert "mdadm: remove container_enough logic"
Mentioned patch changes way of IMSM member arrays assembling, they are
updated by every new drive incremental processes. Previously, member
arrays were created and filled once, by last drive incremental process.
We determined regressions with various impact. Unfortunately, initial
testing didn't show them.
Regressions are connected to drive appearance order and may not be
reproducible on every configuration, there are at least two know
issues for now:
- sysfs attributes are filled using old metadata if there is
outdated drive and it is enumerated first.
- rebuild may be aborted and started from beginning after reboot,
if drive under rebuild is enumerated as the last one.
Mariusz Tkaczyk [Thu, 1 Feb 2024 11:32:41 +0000 (12:32 +0100)]
super1: remove support for name= in config
Only super1 provides "name=" to config. It is recoreded in metadata
so there is no need to duplicate same information.
UUID is our main key.
It is not used by Incremental and Assemble handles empty name well
because other supertypes don't set it in conf.
Expectation that the name in config is same as in metadata is bug prone.
Config should be the place where use can define customized settings.
Remove printing "name=" from mdadm config creation commands. Ignore
the name in config file to keep backward compatibility. Remove
description from man mdadm.conf.
Update 00conftest because "name" is no longer accepted.
As the name is ignored, error for mdadm --detail is not printed.
Reported-by: Stefan Fleischmann <sfle@kth.se> Fixes: e2eb503bd797 ("mdadm: Follow POSIX Portable Character Set") Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Song Liu [Tue, 9 Jan 2024 23:07:16 +0000 (15:07 -0800)]
tests: Gate tests for linear flavor with variable LINEAR
linear flavor is being removed in the kernel [1], so tests for the linear
flavor will fail. Add detection for linear flavor and --disable-linear
option, with the same logic as multipath.
[1] https://lore.kernel.org/linux-raid/20231214222107.2016042-1-song@kernel.org/ Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Pawel Piatkowski [Wed, 20 Dec 2023 09:32:49 +0000 (10:32 +0100)]
manage: adjust checking subarray state in update_subarray
Only changing bitmap related consistency_policy requires
subarray to be inactive.
consistency_policy with PPL or NO_PPL value can be changed on
active subarray.
It fixes regression introduced in commit db10eab68e652f141169 ("Fix --update-subarray on active volume")
Mateusz Grzonka [Tue, 21 Nov 2023 00:58:23 +0000 (01:58 +0100)]
Mdmonitor: Improve udev event handling
Mdmonitor is waiting for udev queue to become empty.
Even if the queue becomes empty, udev might still be processing last event.
However we want to wait and wake up mdmonitor when udev finished
processing events..
Also, the udev queue interface is considered legacy and should not be
used outside of udev.
Use udev monitor instead, and wake up mdmonitor on every event triggered
by udev for md block device.
We need to generate more change events from kernel, because they are
missing in some situations, for example, when rebuild started.
This will be addressed in a separate patch.
Move udev specific code into separate functions, and place them in udev.c file.
Also move use_udev() logic from lib.c into newly created file.
Pawel Piatkowski [Thu, 19 Oct 2023 14:35:25 +0000 (16:35 +0200)]
Fix assembling RAID volume by using incremental
After change "mdadm: remove container_enough logic"
IMSM volumes are started immediately. If volume is during
reshape, then it will be blocked by block_subarray() during
first mdadm -I <devname>. Assemble_container_content() for
next disk will see the change because metadata version from
sysfs and metadata doesn't match and will execute
sysfs_set_array again. Then it fails to set same
component_size, it is prohibited by kernel.
If array is frozen then first sign from metadata version
is different ("/" vs "-"), so exclude it from comparison.
All we want is to double check that base properties are set
and we don't need to call sysfs_set_array again.
Pawel Piatkowski [Thu, 19 Oct 2023 14:35:24 +0000 (16:35 +0200)]
mdadm: remove container_enough logic
Arrays without enough disk count will be assembled but not
started.
Now RAIDs will be assembled always (even if they are failed).
RAID devices in all states will be assembled and exposed
to mdstat.
This change affects only IMSM (for ddf it wasn't used,
container_enough was set to true always).
Removed this logic from incremental_container as well with
runstop checking because runstop condition is being verified
in assemble_container_content function.
Xiao Ni [Tue, 17 Oct 2023 12:35:46 +0000 (20:35 +0800)]
mdadm/super1: Add MD_FEATURE_RAID0_LAYOUT if kernel>=5.4
After and include kernel v5.4, it adds one feature bit MD_FEATURE_RAID0_LAYOUT.
It must need to specify a layout for raid0 with more than one zone. But for
raid0 with one zone, in fact it also has a defalut layout.
Now for raid0 with one zone, *unknown* layout can be seen when running mdadm -D
command. It's the reason that mdadm doesn't set MD_FEATURE_RAID0_LAYOUT for
raid0 with one zone. Then in kernel space, super_1_validate sets mddev->layout
to -1 because of no MD_FEATURE_RAID0_LAYOUT. In fact, in raid0 io path, it
uses the default layout. Set raid0_need_layout to true if kernel_version<=v5.4.
Fixes: 329dfc28debb ('Create: add support for RAID0 layouts.') Signed-off-by: Xiao Ni <xni@redhat.com> Reviewed-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Xiao Ni [Wed, 11 Oct 2023 13:03:32 +0000 (21:03 +0800)]
mdadm/ddf: Abort when raid disk is smaller in getinfo_super_ddf
The metadata is corrupted when the raid_disk<0. So abort directly.
This also can avoid a building error:
super-ddf.c:1988:58: error: array subscript -1 is below array bounds of ‘struct phys_disk_entry[0]’
Xiao Ni [Fri, 8 Sep 2023 08:44:35 +0000 (16:44 +0800)]
mdadm/tests: Don't run mknod before losetup
Sometimes it can fail:
losetup: /var/tmp/mdtest0: failed to set up loop device: No such device or address
/dev/loop0 and /var/tmp/mdtest0 are already created before losetup.
Because losetup can create device node by itself. So remove mknod.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Li Xiao Keng [Thu, 7 Sep 2023 11:37:44 +0000 (19:37 +0800)]
Fix race of "mdadm --add" and "mdadm --incremental"
There is a raid1 with sda and sdb. And we add sdc to this raid,
it may return -EBUSY.
The main process of --add:
1. dev_open(sdc) in Manage_add
2. store_super1(st, di->fd) in write_init_super1
3. fsync(fd) in store_super1
4. close(di->fd) in write_init_super1
5. ioctl(ADD_NEW_DISK)
Step 2 and 3 will add sdc to metadata of raid1. There will be
udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
--incremental --export $devnode --offroot $env{DEVLINKS}"
will be run, and the sdc will be added to the raid1. Then
step 5 will return -EBUSY because it checks if device isn't
claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()
->blkdev_get().
It will be confusing for users because sdc is added first time.
The "incremental" will get map_lock before add sdc to raid1.
So we add map_lock before write_init_super in "mdadm --add"
to fix the race of "add" and "incremental".
Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com> Reviewed-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Coly Li [Sun, 13 Aug 2023 16:46:13 +0000 (00:46 +0800)]
Incremental: remove obsoleted calls to udisks
Utility udisks is removed from udev upstream, calling this obsoleted
command in run_udisks() doesn't make any sense now.
This patch removes the calls chain of udisks, which includes routines
run_udisk(), force_remove(), and 2 locations where force_remove() are
called. Considering force_remove() is removed with udisks util, it is
fair to remove Manage_stop() inside force_remove() as well.
In the two modifications where calling force_remove() are removed,
the failure from Manage_subdevs() can be safely ignored, because,
1) udisks doesn't exist, no need to check the return value to umount
the file system by udisks and remove the component disk again.
2) After the 'I' inremental remove, there is another 'r' hot remove
following up. The first incremental remove is a best-try effort.
Therefore in this patch, where force_remove() is removed, the return
value of calling Manage_subdevs() is not checked too.
Mariusz Tkaczyk [Thu, 1 Jun 2023 07:27:50 +0000 (09:27 +0200)]
mdadm: Follow POSIX Portable Character Set
When the user creates a device with a name that contains whitespace,
mdadm timeouts and throws an error. This issue is caused by udev, which
truncates /dev/md link until the first whitespace.
This patch introduces prohibition of characters other than A-Za-z0-9.-_
in the device name. Also, it prohibits using leading "-" in device name,
so name won't be confused with cli parameter.
Set of allowed characters is taken from POSIX 3.280 Portable Character
Set. Also, device name length now is limited to NAME_MAX.
In some places, there are other requirements for string length (e.g. size
up to MD_NAME_MAX for device name). This routine is made to follow POSIX
and other, more strict limitations should be checked separately.
We are aware of the risk of regression in exceptional cases (as
escape_devname function is removed) that should be fixed by updating
the array name.
The POSIX validation is added for:
- 'name' parameter in every mode.
- first devlist entry, for Build, Create, Assemble, Manage, Grow.
- config entries, both devname and "name=".
Mariusz Tkaczyk [Thu, 1 Jun 2023 07:27:49 +0000 (09:27 +0200)]
mdadm: define ident_set_devname()
Use dedicated set method for ident->devname. Now, devname validation
is done early for modes where device is created (Build, Create and
Assemble). The rules, used for devname validation are derived from
config file.
It could cause regression with execeptional cases where existing device
has name which doesn't match criteria for Manage and Grow modes. It is
low risk and those modes are not omitted from early devname validation.
Use can used main numbered devnode to avoid this problem.
Messages exposed to user are changed so it might cause a regression
in negative scenarios. Error codes are not changed.
Mariusz Tkaczyk [Thu, 1 Jun 2023 07:27:48 +0000 (09:27 +0200)]
mdadm: refactor ident->name handling
Create dedicated setter for name in mddev_ident and propagate it.
Following changes are made:
- move duplicated code from config.c and mdadm.c into new function.
- Add error enum in mdadm.h.
- Use MD_NAME_MAX instead of hardcoded value in mddev_ident.
- Use secure functions.
- Add more detailed verification of the name.
- make error messages reusable for cmdline and config:
- for cmdline, these are errors so use pr_err().
- for config, these are just warnings, so use pr_info().
Mariusz Tkaczyk [Thu, 1 Jun 2023 07:27:47 +0000 (09:27 +0200)]
mdadm: set ident.devname if applicable
This patch tries to propagate the usage of struct mddev_ident for cmdline
where it is applicable. To avoid regression, this value is derived
from devlist->devname for applicable modes only.
As a result, the whole structure is passed to some functions. It produces
some changes for Build, Create and Assemble.
No functional changes intended.
The goal of the change is to unify devname validation which is done in
next patches.
Mariusz Tkaczyk [Thu, 1 Jun 2023 07:27:46 +0000 (09:27 +0200)]
tests: create 00confnames
The test is an attempt to document current implementation of devnode
and name handling for config entries. It is focused on incremental-
default way of array assembling on boot.
The expectations are aligned to current implementation for native
metadata because it is the most complicated scenario- both variables
can be set.
Yu Kuai [Mon, 29 May 2023 13:28:22 +0000 (21:28 +0800)]
tests: add a regression test for raid456 deadlock
The deadlock is described in [1], as the last patch described, it's
fixed first by [2], however this fix will be reverted and the deadlock
is supposed to be fixed by [3].
Yu Kuai [Mon, 29 May 2023 13:28:21 +0000 (21:28 +0800)]
tests: add a regression test for raid10 deadlock
The deadlock is described in [1], it's fixed first by [2], however,
it turns out this commit will trigger other problems[3], hence this
commit will be reverted and the deadlock is supposed to be fixed by [1].
Yu Kuai [Mon, 29 May 2023 13:28:20 +0000 (21:28 +0800)]
tests: support to skip checking dmesg
Prepare to add a regression test for raid10 that require error injection
to trigger error path, and kernel will complain about io error, checking
dmesg for error log will make it impossible to pass this test.
Commit e9fb93af0f76 ("Fix memory leak in file Assemble")
fixes few memory leaks in Assemble, but it introduces
problem with assembling RAID volume. It was caused by
clearing metadata too fast, not only on fail in
select_devices() function.
This commit removes redundant memory free.
It is essential to avoid buffer overflows and similar bugs as much as
possible.
According to Intel rules we are obligated to verify certain
compiler flags, so it will be much easier if they are added to the
Makefile.
Add gcc flags for prevention of buffer overflows, format string vulnerabilities,
stack protection to prevent stack overwrites and aslr enablement through -fPIE.
Also make the flags configurable.
The changes were verified on gcc versions 7.5, 8.3, 9.2, 10 and 12.2.
imsm: Add reading vmd register for finding imsm capability
Currently mdadm does not find imsm capability when running inside VM.
This patch adds the possibility to read from vmd register and check for
capability, effectively allowing to use mdadm with imsm inside virtual machines.
Additionally refactor find_imsm_capability() to make assignments in new
lines.
Xiao Ni [Fri, 25 Aug 2023 12:55:41 +0000 (20:55 +0800)]
mdadm: Stop mdcheck_continue timer when mdcheck_start service can finish check
mdcheck_continue is triggered by mdcheck_start timer. It's used to
continue check action if the raid is too big and mdcheck_start
service can't finish check action. If mdcheck start can finish check
action, it doesn't need to mdcheck continue service anymore. So stop
it when mdcheck start service can finish check action.
Blazej Kucman [Fri, 16 Jun 2023 19:45:55 +0000 (21:45 +0200)]
Add secure gethostname() wrapper
gethostname() func does not ensure null-terminated string
if hostname is longer than buffer length.
For security, a function s_gethostname() has been added
to ensure that "\0" is added to the end of the buffer.
Previously this had to be handled in each place
of the gethostname() call.
Mariusz Tkaczyk [Mon, 29 May 2023 13:52:38 +0000 (15:52 +0200)]
imsm: fix free space calculations
Between two volumes or between last volume and metadata at least
IMSM_RESERVED_SECTORS gap must exist. Currently the gap can be doubled
because metadata reservation contains IMSM_RESERVED_SECTORS too.
Divide reserve variable into pre_reservation and post_reservation to be
more flexible and decide separately if each reservation is needed.
Pre_reservation is needed only when a volume is created and it is not a
real first volume in a container (we can check that by extent_idx).
This type of reservation is not needed for expand.
Post_reservation is not needed only if real last volume is created or
expanded because reservation is done with the metadata.
The volume index in metadata cannot be trusted, because the real volume
order can be reversed. It is safer to use extent table, it is sorted by
start position.
Mariusz Tkaczyk [Mon, 29 May 2023 13:52:37 +0000 (15:52 +0200)]
imsm: return free space after volume for expand
merge_extends() routine searches for the biggest free space. For expand,
it works only in standard cases where the last volume is expanded and
the free space is determined after the last volume.
Add volume index to extent struct and use that do determine size after
super->current_vol during expand.
Limitation to last volume is no longer needed. It unblocks scenarios
where kill-subarray is used to remove first volume and later it is
recreated (now it is the second volume, even if it is placed before
existing one).
Kevin Friedberg [Thu, 16 Feb 2023 04:41:34 +0000 (23:41 -0500)]
enable RAID for SATA under VMD
Detect when a SATA controller has been mapped under Intel Alderlake RST
VMD, so that it can use the VMD controller's RAID capabilities. Create
new device type SYS_DEV_SATA_VMD and list separate controller to prevent
mixing with the NVMe SYS_DEV_VMD devices on the same VMD domain.
Signed-off-by: Kevin Friedberg <kev.friedberg@gmail.com> Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Xiao Ni [Fri, 7 Apr 2023 00:45:28 +0000 (08:45 +0800)]
Remove the config files in mdcheck_start|continue service
We set MDADM_CHECK_DURATION in the mdcheck_start|continue.service files.
And mdcheck doesn't use any configs from the config file. So we can remove
the dependencies.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
when we excute mdadm --assemble, udev-md-raid-assembly.rules is triggered.
Then we stop array, we found an coredump for mdadm --incremental.func
stack are as follows:
#0 enough (level=10, raid_disks=4, layout=258, clean=1,
avail=avail@entry=0x0) at util.c:555
#1 0x0000562170c26965 in Incremental (devlist=<optimized out>,
c=<optimized out>, st=0x5621729b6dc0) at Incremental.c:514
#2 0x0000562170bfb6ff in main (argc=<optimized out>,
argv=<optimized out>) at mdadm.c:1762
func enough() use array avail,avail allocate space in func count_active,
it may not alloc space, causing a coredump.We fix this coredump.
NeilBrown [Tue, 14 Mar 2023 00:06:25 +0000 (11:06 +1100)]
mdopen: always try create_named_array()
mdopen() will use create_named_array() to ask the kernel to create the
given md array, but only if it is given a number or name.
If it is NOT given a name and is required to choose one itself using
find_free_devnm() it does NOT use create_named_array().
On kernels with CONFIG_BLOCK_LEGACY_AUTOLOAD not set, this can result in
failure to assemble an array. This can particularly seen when the
"name" of the array begins with a host name different to the name of the
host running the command.
NeilBrown [Mon, 13 Mar 2023 03:42:58 +0000 (14:42 +1100)]
mdmon: Improve switchroot interactions.
We need a new mdmon@mdfoo instance to run in the root filesystem after
switch root, as /sys and /dev are removed from the initrd.
systemd will not start a new unit with the same name running while the
old unit is still active, and we want the two mdmon processes to overlap
in time to avoid any risk of deadlock, which can happen when a write is
attempted with no mdmon running.
So we need a different unit name in the initrd than in the root. Apart
from the name, everything else should be the same.
This is easily achieved using a different instance name as the
mdmon@.service unit file already supports multiple instances (for
different arrays).
So start "mdmon@mdfoo.service" from root, but
"mdmon@initrd-mdfoo.service" from the initrd. udev can tell which
circumstance is the case by looking for /etc/initrd-release.
continue_from_systemd() is enhanced so that the "initrd-" prefix can be
requested.
Teach mdmon that a container name like "initrd/foo" should be treated
just like "foo". Note that systemd passes the instance name
"initrd-foo" as "initrd/foo".
We don't need a similar mechanism at shutdown because dracut runs
"mdmon --takeover --all" when appropriate.
NeilBrown [Mon, 13 Mar 2023 03:42:58 +0000 (14:42 +1100)]
mdmon: Remove need for KillMode=none
mdmon needs to keep running during the switchroot out of (at boot) and
then back into (at shutdown) the initrd. It runs until a new mdmon
takes over.
Killmode=none is used to achieve this, with the help of --offroot which
sets argv[0][0] to '@' which systemd understands.
This is needed because mdmon is currently run in system-mdmon.slice
which conflicts with shutdown.target so without Killmode=none mdmon
would get killed early in shutdown when system.mdmon.slice is removed.
As described in systemd.service(5), this conflict with shutdown can be
resolved by explicitly requesting system.slice, which is a natural
counterpart to DefaultDependencies=no.
So add that, and also add IgnoreOnIsolate=true to avoid another possible
source of an early death. With these we no longer need KillMode=none
which the systemd developers have marked as "deprecated".
NeilBrown [Mon, 13 Mar 2023 03:42:58 +0000 (14:42 +1100)]
mdmon: don't test both 'all' and 'container_name'.
If 'all' is not set, then container_name must be NULL, as nothing else
can set it. So simplify the test to ignore container_name.
This makes the purpose of the code more obvious.
NeilBrown [Mon, 13 Mar 2023 03:42:58 +0000 (14:42 +1100)]
Use existence of /etc/initrd-release to detect initrd.
Since v183, systemd has used the existence of /etc/initrd-release to
detect if it is running in an initrd, rather than looking at the magic
number of the root filesystem's device. It is time for mdadm to do the
same.
Khem Raj [Wed, 18 Jan 2023 08:32:36 +0000 (00:32 -0800)]
Define alignof using _Alignof when using C11 or newer
WG14 N2350 made very clear that it is an UB having type definitions
within "offsetof" [1]. This patch enhances the implementation of macro
alignof_slot to use builtin "_Alignof" to avoid undefined behavior on
when using std=c11 or newer
clang 16+ has started to flag this [2]
Fixes build when using -std >= gnu11 and using clang16+
Older compilers gcc < 4.9 or clang < 8 has buggy _Alignof even though it
may support C11, exclude those compilers too
Logan Gunthorpe [Wed, 1 Mar 2023 20:41:33 +0000 (13:41 -0700)]
mdadm: Add --write-zeros option for Create
Add the --write-zeros option for Create which will send a write zeros
request to all the disks before assembling the array. After zeroing
the array, the disks will be in a known clean state and the initial
sync may be skipped.
Writing zeroes is best used when there is a hardware offload method
to zero the data. But even still, zeroing can take several minutes on
a large device. Because of this, all disks are zeroed in parallel using
their own forked process and a message is printed to the user. The main
process will proceed only after all the zeroing processes have completed
successfully.
Logan Gunthorpe [Wed, 1 Mar 2023 20:41:31 +0000 (13:41 -0700)]
Create: Factor out add_disks() helpers
The Create function is massive with a very large number of variables.
Reading and understanding the function is almost impossible. To help
with this, factor out the two pass loop that adds the disks to the array.
This moves about 160 lines into three new helper functions and removes
a bunch of local variables from the main Create function. The main new
helper function add_disks() does the two pass loop and calls into
add_disk_to_super() and update_metadata(). Factoring out the
latter two helpers also helps to reduce a ton of indentation.
Logan Gunthorpe [Wed, 1 Mar 2023 20:41:30 +0000 (13:41 -0700)]
Create: remove safe_mode_delay local variable
All .getinfo_super() call sets the info.safe_mode_delay variables
to a constant value, so no matter what the current state is
that function will always set it to the same value.
Create() calls .getinfo_super() multiple times while creating the array.
The value is stored in a local variable for every disk in the loop
to add disks (so the last disc call takes precedence). The local
variable is then used in the call to sysfs_set_safemode().
This can be simplified by using info.safe_mode_delay directly. The info
variable had .getinfo_super() called on it early in the function so, by the
reasoning above, it will have the same value as the local variable which
can thus be removed.
Doing this allows for factoring out code from Create() in a subsequent
patch.