Blazej Kucman [Wed, 11 Mar 2026 12:02:28 +0000 (13:02 +0100)]
imsm: fix incorrect SATA TCG error msg
The error message for SATA drives with Trusted Computing support referenced incorrect kernel
parameter "libata.tpm_enabled=1". The correct parameter name is "libata.allow_tpm=1".
This corrects the error message to display the proper parameter name, helping users correctly
configure their system when encryption verification is blocked.
Abirami0904 [Mon, 9 Mar 2026 14:13:52 +0000 (19:43 +0530)]
sysfs: fix uuid endianness mismatch issue in sysfs_rules_apply()
Issue:
sync_speed_max configured in mdadm.conf is not applied to sysfs due
to a UUID endianness mismatch in sysfs_rules_apply().
Observed behavior:
The sysfs parameter is applied via sysfs_set_str(), which is invoked
from sysfs_rules_apply(). Whether this function is called depends on
a memcmp() comparison between the UUIDs stored in
struct dev_sysfs_rule *rules and struct mdinfo *dev.
This comparison fails due to inconsistent byte ordering of the UUIDs.
Due to this endianness mismatch, the memcmp() comparison in
sysfs_rules_apply() fails, and therefore sysfs_set_str() is not
invoked to update sync_speed_max in sysfs.
Proposed fix:
To resolve this issue, same_uuid() logic is used.
Martin Wilck [Wed, 4 Mar 2026 15:18:22 +0000 (16:18 +0100)]
mdcheck: don't stop mdcheck_continue.timer
the systemd timers for mdcheck_start.timer and mdcheck_continue.timer are
configured to fire at 0:45 once a month and 1:00 daily, respectively. But
it's observed in the field that these timers fire simultaneiously at
00:45. This is due to strange behavior of systemd timers: when a timer is
started after having "elapsed" while disarmed, the timer fires
immediately. A systemd issue will be created about this, but the problem is
not likely to be fixed in systemd soon.
Because the initial RAID check will probably last longer then 15 minutes,
it will not be finished when mdcheck_continue.timer fires regularly at
01:00, which will cause the timer to fire immediately again when
mdcheck_continue.service finishes. On systems with large arrays, this may
cause the initial check to run twice as long as usual, and thus extend well
into regular work hours, slowing down the system noticeably.
Avoid this by not stopping mdcheck_continue.timer in mdcheck any more.
With this change, mdcheck_continue.timer will wake up once a day and start
mdcheck even if there's nothing to check because all checks have
finished. mdcheck will notice this and exit immediately. When
mdcheck_start.timer fires, it does not need to start
mdcheck_continue.timer, because the latter was never stopped. This is what
avoids the problem.
Not stopping mdcheck_continue.timer causes a small amount of unnecessary
system activity, but the overhead is very low (in the order of a few
milliseconds per day on the systems where I have tested).
Jean Delvare [Mon, 23 Feb 2026 16:11:41 +0000 (17:11 +0100)]
platform-intel: Deal with hot-unplugged devices
Don't assume that realpath() will always succeed, if a device gets
hot-unplugged then realpath() can fail. Handle the situation
gracefully, and report an error only if the failure is for a
different reason.
Reported-by: Jochen De Smet <jochen.desmet@dell.com> Co-developed-by: Jochen De Smet <jochen.desmet@dell.com> Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Martin Wilck <mwilck@suse.com>
Maxin John [Tue, 24 Feb 2026 22:26:42 +0000 (00:26 +0200)]
Makefile: detect corosync and libdlm via pkg-config
The Makefile currently checks for corosync and libdlm headers by probing
/usr/include directly. This breaks in cross-compilation environments
because the headers may be located in a sysroot rather than the host
filesystem.
Use pkg-config to detect the presence of corosync and libdlm instead.
Blazej Kucman [Wed, 28 Jan 2026 13:28:04 +0000 (14:28 +0100)]
imsm: Fix UEFI backward compatibility for RAID10D4
The referenced commit introduces an incorrect RAID level set for RAID10 with 4 drives, which must
remain backwards compatibility with UEFI VROC driver. For such RAID, VROC UEFI requires
the MPB_ATTRIB_RAID1 level attribute in metadata and RAID1 in map. However, mentioned change cause
writes RAID10 instead, which VROC UEFI cannot handle correctly.
As a result, RAID10 4 disks is no longer recognized by VROC UEFI since version 9.3. On earlier
versions the incorrect metadata may even cause a platform hang during the UEFI boot phase.
The update_imsm_raid_level() function handles both creation and migration flows. During RAID
creation, the function receives an initial map where the `level` variable is set to 0. This causes
the code path responsible for the R0 -> R10 migration to run.
To prevent the above behavior, a new define IMSM_T_LEVEL_UNKNOWN is introduced and used to
initialize the `level` variable in map during volume creation, ensuring that the migration path
is not entered.
lilinzhe [Thu, 12 Feb 2026 05:05:42 +0000 (13:05 +0800)]
super-ddf.c: the header search now use mmap.
Commit f2197b6b6c14 ("super-ddf: optimize DDF header search for widely
used RAID controllers") introduces a heuristic search for DDF headers,
commonly written by RAID controllers, within the last 32MB of the disk.
However, the 4096-byte block size used for the search proved to be too
slow on network drives, leading to timeouts.
By modifying to use mmap, the responsibility of efficiently fetching
data is delegated to the kernel, thereby accelerating operations in a
remote disk environment.
Martin Wilck [Thu, 12 Feb 2026 20:34:49 +0000 (21:34 +0100)]
mdadm.conf: add "PROBING ddf_extended" option
Add a configuration line PROBING to mdadm.conf. If the parameter
"ddf_extended" is set on this line, use the extended DDF header search
introduced by commit f2197b6b6c14 ("super-ddf: optimize DDF header search
for widely used RAID controllers"), at the cost of slower probing.
Otherwise, just check for the header in the last sector of the disk,
as usual.
Xiao Ni [Sun, 25 Jan 2026 08:50:58 +0000 (16:50 +0800)]
mdadm/incremental: set sysfs name after assembling imsm array
The sysfs name is not set after assembling imsm array. So sysfs_uevent
can't send the change event. The raid device's state depends on the
genuine events from the kernel. If the kernel geniune event is sent
after udev_unblock, the raid can be ready on time. Then the it can be
mounted during boot rightly. If the kernel geniune event is sent
before udev_unblock, the mount will fail during boot.
Xiao Ni [Wed, 4 Feb 2026 13:08:31 +0000 (21:08 +0800)]
mdadm/imsm: use creation_time for ctime in container info
When a disk has both DDF and IMSM metadata (e.g., migrated from DDF
to IMSM or has remnant DDF metadata), guess_super_type() selects the
metadata with the later creation time. Previously, IMSM always
returned ctime=0 in getinfo_super_imsm(), causing it to lose to DDF
which extracts a real timestamp from its GUID.
This resulted in the wrong metadata being selected during assembly,
leading to boot failures when LVM activated raw PVs instead of MD
devices.
Fix this by extracting the actual creation time from the IMSM
metadata structure (mpb->creation_time) instead of hardcoding 0.
This ensures that when both metadata types are present, the more
recent one is correctly selected.
Martin Wilck [Tue, 20 Jan 2026 17:00:11 +0000 (18:00 +0100)]
super-intel.c: fix format overflow error
The following compile error has been observed with gcc 16:
super-intel.c: In function 'imsm_process_update':
super-intel.c:10069:43: error: '%d' directive writing between 1 and 11 bytes into a region of size 7 [-Werror=format-overflow=]
10069 | " MISSING_%d", du->index);
| ^~
In function 'apply_takeover_update',
inlined from 'imsm_process_update' at super-intel.c:10168:7:
super-intel.c:10069:33: note: directive argument in the range [-2147483647, 2147483647]
10069 | " MISSING_%d", du->index);
| ^~~~~~~~~~~~~
In file included from /usr/include/stdio.h:970,
from mdadm.h:42,
from super-intel.c:21:
In function 'sprintf',
inlined from 'apply_takeover_update' at super-intel.c:10068:4,
inlined from 'imsm_process_update' at super-intel.c:10168:7:
/usr/include/bits/stdio2.h:30:10: note: '__builtin_sprintf' output between 11 and 21 bytes into a destination of size 16
30 | return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
Fix it by using an unsigned type for the index variable. This works
because map->num_members is an unsigned short.
Suggested-by: Mariusz Tkaczyk <mtkaczyk@kernel.org> Signed-off-by: Martin Wilck <mwilck@suse.com>
Martin Wilck [Tue, 20 Jan 2026 16:58:02 +0000 (17:58 +0100)]
mdadm: fix compilation errors for unused variables with GCC 16
In upcoming gcc 16, increments and assignments don't count as "uses",
causing errors like this:
Assemble.c: In function 'Assemble':
Assemble.c:1379:55: error: variable 'replcnt' set but not used [-Werror=unused-but-set-variable=]
1379 | unsigned int okcnt, sparecnt, rebuilding_cnt, replcnt, journalcnt;
| ^~~~~~~
See the porting to for more details: https://gcc.gnu.org/gcc-16/porting_to.html
Xiao Ni [Sun, 4 Jan 2026 07:18:48 +0000 (15:18 +0800)]
mdadm: load md_mod first
Load md_mod first before setting module parameter legacy_async_del_gendisk
Everything works well if md_mod is built in kernel. If not, create and
assemble will fail.
Fixes: d354d314db86 ("mdadm: Create array with sync del gendisk mode") Signed-off-by: Xiao Ni <xni@redhat.com>
The kernel block branch has started supporting the
configuration of logical block size after mergeing commit 62ed1b582246 (md:allow configuring logical block size),
therefore a new parameter should be added to allow specifying
the logical block size when creating a RAID device.
Martin Wilck [Tue, 4 Nov 2025 16:03:23 +0000 (17:03 +0100)]
Makefile: install mdcheck and make its path configurable
The mdcheck script is called by the systemd units, but it is currently not
installed. Fix it, using the make variable MISCDIR as target directory for
the installation of mdcheck (defaults to current location
/usr/share/mdadm).
Also, make sure that mdcheck calls the mdadm executable that we installed.
Martin Wilck [Thu, 30 Oct 2025 10:31:20 +0000 (11:31 +0100)]
GitHub workflows: mdadm-test: new workflow for mdadm test suite
Since 3822896 ("github: disable self-runners"), the mdadm test suite
is generally disabled on GitHub. Introduce a new workflow which
executes on a standard GitHub-hosted runner.
Currently some tests fail. Fixing them is out of scope for this
patch. The main reason is that the kernel of the GitHub-hosted runners is
too old.
The workflow is configured such that it will only run when
dispatched manually, or when a push or PR against a branch named
'test_on_push' or 'test_on_pr', respectively.
Martin Wilck [Thu, 25 Sep 2025 21:09:07 +0000 (23:09 +0200)]
mdcheck: restore backward-compatible behavior without arguments
The previous patch changed the behavior of mdcheck when invoked
without --continue or --restart. Previously it would have started
a new check from zero on all arrays. After the previous patch,
it acted like --continue on arrays where a check had been started already,
and started a new check only for arrays where no Checked_* marker
was present.
Introduce a new run mode --start for the latter behavior. --start is the
mode which will be called by mdcheck_continue.service now.
For backward compatibility reasons, when mdcheck is called without a
mode argument (--continue, --start, or --restart), it starts a new
check from position zero on all MD arrays detected on the system.
Martin Wilck [Thu, 14 Aug 2025 15:09:35 +0000 (17:09 +0200)]
mdcheck: simplify start / continue logic and add "--restart" logic
The current logic of "mdcheck" is susceptible to races when multiple
mdcheck instances run simultaneously, as checks can be initiated from both
"mdcheck_start.service" and "mdcheck_continue.service".
The previous commit 8aa4ea95db35 ("systemd: start mdcheck_continue.timer
before mdcheck_start.timer") fixed this for the default configuration by
inverting the ordering of timers. But users can customize the timer
settings, which can cause the race to reappear.
This patch avoids this kind of race entirely, by changing the logic as
follows:
* When `mdcheck` has finished checking a RAID array, it will create a
marker `/var/lib/mdcheckChecked_$UUID`.
* A new option `--restart` is introduced. `mdcheck --restart` removes all
`/var/lib/mdcheck/Checked_*` markers.
This is called from `mdcheck_start.service`, which is typically started
by a timer in large time intervals (default once per month).
* `mdcheck --continue` works as it used to. It continues previously started
checks (where the `/var/lib/mdcheck/MD_UUID_$UUID` file is present and
contains a start position) for the check.
This usage is *not recommended any more*.
* `mdcheck` with no arguments is like `--continue`, but it also starts new
checks for all arrays for which no check has previously been
started, *except* for arrays for which a marker
`/var/lib/mdcheck/Checked_$UUID` exists.
`mdcheck_continue.service` calls `mdcheck` this way. It is called in
short time intervals, by default once per day.
* Combining `--restart` and `--continue` is an error.
This way, the only systemd service that actually triggers a kernel-level
RAID check is `mdcheck_continue.service`, which avoids races.
When all checks have finished, `mdcheck_continue.service` is no-op.
When `mdcheck_start.service` runs, the checks re re-enabled and will be
started from 0 by the next `mdcheck_continue.service` invocation.
Martin Wilck [Mon, 27 Oct 2025 16:59:50 +0000 (17:59 +0100)]
mdcheck: remove short options from getopt command line
The short options for mdcheck have never worked. The current code silently
assumes that e.g. "-d $X" will translate to "--duration $X" on return from
getopt, which is not the case.
Remove the short options. Attempts to use them will now result in an error.
Martin Wilck [Thu, 14 Aug 2025 14:09:25 +0000 (16:09 +0200)]
mdcheck: replace deprecated "$[cnt+1]" syntax
This syntax used to be marked as deprecated [1]. In current bash
man pages, it isn't even mentioned any more. Use the POSIX compatible
syntax "$((X+=1))" instead [2, 3].
Martin Wilck [Thu, 14 Aug 2025 13:55:40 +0000 (15:55 +0200)]
mdcheck: loop over sync_action files in sysfs
The way mdcheck is currently written, it loops over /dev/md?*, which
will contain RAID partitions and arrays that don't support sync
operations, such as RAID0. This is inefficient and makes the script
difficult to trace.
Instead, loop over the sync_action files which actually matter for
checking.
Xiao Ni [Mon, 27 Oct 2025 00:15:38 +0000 (08:15 +0800)]
mdadm/Assemble: alloc superblock in Assemble
Now it allocs superblock outside Assemble and frees the memory outside
Assemble. But the memory can be freed and realloc in Assemble. So freed
memory will be dereferenced outside Assemble. This patch moves the memory
management into Assemble. So it's more safe and the input arguments is
less.
This can be reproduced by:
mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1 --assume-clean
mdadm -Ss
mdadm -A -e 1.2 /dev/md0 /dev/loop0 /dev/loop1
Xiao Ni [Fri, 24 Oct 2025 07:17:29 +0000 (15:17 +0800)]
mdadm: Create array with sync del gendisk mode
kernel patch 9e59d609763f ('md: call del_gendisk in control path') calls
del_gendisk in sync way. After the patch mentioned just now, device node
(/dev/md0 .e.g) will disappear after mdadm --stop command. It resolves the
problem raid can be created again because raid can be created when opening
device node. Then regression tests will be interrupted.
But it causes an error when assembling array which has been fixed by pr182.
So people can't assemble array if they use new kernel and old mdadm. So
in kernel space, 25db5f284fb8 ('md: add legacy_async_del_gendisk mod') is
used to fix this problem. The default is async mode.
async del mode will be removed in future. We'll start use sync del mode in
new mdadm version. So people will not see failure when upgrading to the
new mdadm version with sync del mode.
Xiao Ni [Fri, 17 Oct 2025 09:04:25 +0000 (17:04 +0800)]
mdadm/Incremental: wait a while before removing a member
We encountered a regression that member disk can't be removed in
incremental remove mode:
mdadm -If /dev/loop0
mdadm: Cannot remove member device loop0 from md127
It doesn't allow to remove a member if sync thread is running. mdadm -If
sets member disk faulty first, then it removes the disk. If sync thread
is running, it will be interrupted by setting a member faulty. But the sync
thread hasn't been reapped. So it needs to wait a while to let kernel to
reap sync thread.
Wu Guanghao [Tue, 14 Oct 2025 03:29:37 +0000 (11:29 +0800)]
mdadm: Fix memory leak issue in Manage_stop()
The local variable 'mds' allocated in Manage_stop() is only released
under specific conditions in the for loop. This can lead to memory leak
when under othe conditions.
Wu Guanghao [Tue, 14 Oct 2025 02:23:46 +0000 (10:23 +0800)]
mdadm: modify the order of free_super_xxx() to avoid memory leak
free_super_xx() should be executed at the load_super_xx() entry.
When there are additional checks, it may lead to supertype not being
released, resulting in a memory leak.
According to manual:
On success, udev_monitor_receive_device() returns a pointer to a newly
referenced device that was received via the monitor. The caller is
responsible to drop this reference when done.
Mingye Wang [Sat, 20 Sep 2025 06:21:40 +0000 (14:21 +0800)]
Update raid6check man page
This adds autorepair and manual repair modes, which have been here for about 12 years. The description of the manual repair mode can probably use more work.
Signed-off-by: Mingye Wang <arthur200126@gmail.com>
Since commit e702f392959d ("Mdmonitor: Fix segfault"), when configuration
files used non-absolute ARRAY device names, commands like `mdadm --monitor
--scan` failed with `mdadm: error opening devname: No such file or
directory` unless run from the `/dev/md` directory.
Martin Wilck [Wed, 13 Aug 2025 20:12:53 +0000 (22:12 +0200)]
systemd: start mdcheck_continue.timer before mdcheck_start.timer
In the (unlikely but possible) case that a previously started md check
hasn't finished on the first Sunday of the following month,
mdcheck_start.service will start the scan from position 0, which is
probably not desired.
Have mdcheck_continue.service start first, so that it will pick up the
check where it left off, and that the subsequent mdcheck_start.service will
do nothing.
Martin Wilck [Wed, 13 Aug 2025 19:07:36 +0000 (21:07 +0200)]
mdcheck: make sure signals are processed immediately
"systemctl stop mdcheck_start.service" may hang for a long time,
because the shell doesn't handle signals until the sleep process in
the foreground returns. Fix this by starting sleep in the background
and waiting for it (the built-in "wait" receives the signal).
Martin Wilck [Wed, 13 Aug 2025 19:01:30 +0000 (21:01 +0200)]
mdcheck: reset sync_action to "idle" when stopped
When the mdcheck script stops because the pre-set duration is exceeded, it
will also set the sync action in the kernel to "idle". But when it is
stopped by a signal (e.g. when the systemd service running it is stopped),
it doesn't. This is inconsistent behavior.
Move the code that switches the sync_action to "idle" into a cleanup
function that is always executed on exit. This requires separate "trap"
statements for EXIT(0) and signals, because otherwise a race condition may
arise between the cleanup code and the script body.
Martin Wilck [Wed, 13 Aug 2025 17:28:08 +0000 (19:28 +0200)]
systemd: use "Type=simple" for mdcheck services
"Type=oneshot" means that systemd considers the unit as started when the
started process exits. But the "mdcheck" script may run for several
hours. Thus systemd will regard the unit as "activating" all the
time. This can be easily tested by running "systemctl start
mdcheck_start.service" manually. The systemctl command will not finish
until the mdcheck utility has finished or Ctrl-C is typed, which is
broken.
Mariusz Tkaczyk [Thu, 26 Jun 2025 05:29:37 +0000 (06:29 +0100)]
mdadm: use lseek consistently
mdadm used both lseek and lseek64 for legacy reasons. These days, we just
need to configure __USE_LARGEFILE64 macro. Fixing this issue enables
musl compilation.
Add macro, and change all lseek64 to lseek. Fix style issues in these
lines.
Xiao Ni [Tue, 3 Jun 2025 00:49:29 +0000 (08:49 +0800)]
mdadm/assemble: Don't stop array after creating it
It stops the array which is just created. From the comment it wants to
stop the array if it has no content. But it hasn't added member disks,
so it's a clean array. It's meaningless to do it.
Neil Brown in #159 pointed that mdadm should been keep in base utility
style, allowing much more with no strict limitations until absolutely
necessary to prevent crashes.
This view, supported with regression #160 caused by POSIX portable
character set requirement leads me to revert it.
Revert the POSIX portable character set verification of name and
devname. Make it IMSM only.
Mounting an md device may fail during boot from mdadm's claim
on the device not being released before systemd attempts to mount.
In this case it was found that essentially there is a race condition
occurring in which the mount cannot happen without some kind of delay
being added BEFORE the mount itself triggers, or manual intervention
after a timeout.
The findings:
the inode was for a tmp block node made by mdadm for md0.
For the race condition, mdadm and udev have some infrastructure for making
the device be ignored while under construction. e.g.
$ cat lib/udev/rules.d/01-md-raid-creating.rules
do not edit this file, it will be overwritten on update
While mdadm is creating an array, it creates a file
/run/mdadm/creating-mdXXX. If that file exists, then
the array is not "ready" and we should make sure the
content is ignored.
KERNEL=="md*", TEST=="/run/mdadm/creating-$kernel", ENV{SYSTEMD_READY}="0"
However, this feature currently is only used by the mdadm create command.
See calls to udev_block/udev_unblock in the mdadm code as to where and when
this behavior is used. Any md array being started by incremental or
normal assemble commands does not use this udev integration. So assembly
of an existing array does not look to have any explicit protection from
systemd/udev seeing an array as in a usable state before an mdadm instance
with O_EXCL closes its file handle.
This is for the sake of showing the use case for such an option and why
it would be helpful to delay the mount itself.
While mdadm is still constructing the array mdadm --incremental
that is called from within /usr/lib/udev/rules.d/64-md-raid-assembly.rules,
there is an attempt to mount the md device, but there is not a creation
of "/run/mdadm/creating-xxx" file when in incremental mode that
the rule is looking for. Therefore the device is not marked
as SYSTEMD_READY=0 in
"/usr/lib/udev/rules.d/01-md-raid-creating.rules" and missing
synchronization using the "/run/mdadm/creating-xxx" file.
As to this change affecting containers or IMSM...
(container's array state is inactive all the time)
Even if the "array_state" reports "inactive" when previous components
are added, the mdadm call for the very last array component that makes
it usable/ready, still needs to be synced properly - mdadm needs to drop
the claim first calling "close", then delete the "/run/mdadm/creating-xxx".
Then lets the udev know it is clear to act now (the "udev_unblock" in
mdadm code that generates a synthetic udev event so the rules are
reevalutated). It's this processing of the very last array component
that is the issue here (which is not IO error, but it is that trying to
open the dev returns -EBUSY because of the exclusive claim that mdadm
still holds while the mdadm device is being processed already by udev in
parallel, and that is what the
/run/mdadm/creating-xxx should prevent exactly).
The patch to Incremental.c is to enable creating the
"/run/mdadm/creating-xxx" file during incremental mode.
For the change to Create.c, the unlink is called right before dropping
the exculusive claim for the device. This should be the other way round
to avoid the race 100%. That is, if there's a "close" call and
"udev_unblock" call, the "close" should go first, then followed
"udev_unblock".
Xiao Ni [Thu, 8 May 2025 04:45:32 +0000 (12:45 +0800)]
mdadm/tests: mark 10ddf-fail-two-spares broken
Sometimes 10ddf-fail-two-spares fail because:
++ grep -q 'state\[1\] : Optimal, Consistent' /tmp/mdtest-5k3MzO
++ echo ERROR: /dev/md/vol1 should be optimal in meta data
ERROR: /dev/md/vol1 should be optimal in meta data
Xiao Ni [Thu, 8 May 2025 03:45:50 +0000 (11:45 +0800)]
mdadm: give more time to wait sync thread to reap
01r5fail case reports error sometimes:
++ '[' -n '2248 / 35840' ']'
++ die 'resync or recovery is happening!'
++ echo -e '\n\tERROR: resync or recovery is happening! \n'
ERROR: resync or recovery is happening!
sync thread is reapped in md_thread. So we need to give more time to
wait sync thread to reap.
Xiao Ni [Wed, 7 May 2025 10:34:20 +0000 (18:34 +0800)]
mdadm: add attribute nonstring for signature
It reports building error in f42:
error: initializer-string for array of ‘unsigned char’ truncates NULL
terminator but destination lacks ‘nonstring’ attribute (5 chars into 4
available) [-Werror=unterminated-string-initialization]
Xiao Ni [Wed, 7 May 2025 10:26:08 +0000 (18:26 +0800)]
mdadm: fix building errors
Some building errors are found in ppc64le platform:
format '%llu' expects argument of type 'long long unsigned int', but
argument 3 has type 'long unsigned int' [-Werror=format=]
Xiao Ni [Wed, 7 May 2025 10:06:59 +0000 (18:06 +0800)]
mdadm: use standard libc nftw
commit bd648e3bec3d ("mdadm: Remove klibc and uclibc support") removes
macro HAVE_NFTW/HAVE_FTW and uses libc header ftw.h. But it leaves the
codes in lib.c which let mdadm command call nftw defined in lib.c. It
needs to remove these codes.
The bug can be reproduced by:
mdadm -CR /dev/md0 --level raid5 --metadata=1.1 --chunk=32 --raid-disks 3
--size 10000 /dev/loop1 /dev/loop2 /dev/loop3
mdadm /dev/md0 --grow --chunk=64
mdadm: /dev/md0: cannot open component -unknown-
Fixes: bd648e3bec3d ("mdadm: Remove klibc and uclibc support") Signed-off-by: Xiao Ni <xni@redhat.com>
Martin Wilck [Wed, 30 Apr 2025 19:18:36 +0000 (21:18 +0200)]
mdadm: allow any valid minor number in md device name
Since 25aa732 ("mdadm: numbered names verification"), it is not possible
any more to create arrays /dev/md${N} with N >= 127. The limit has later
been increased to 1024, which is also artificial. The error message printed
by mdadm is misleading, as the problem is not POSIX compatibility here.
# mdadm -C -v /dev/md9999 --name=foo -l1 -n2 /dev/loop0 /dev/loop1
mdadm: Value "/dev/md9999" cannot be set as devname. Reason: Not POSIX compatible.
Given that mdadm creates an array with minor number ${N} if the argument is
/dev/md${N}, the natural limit for the number is the highest minor number
available, which is (1 << MINORBITS) with MINORBITS=20 on Linux.
Fixes: 25aa732 ("mdadm: numbered names verification") Fixes: f786072 ("mdadm: Increase number limit in md device name to 1024.") Signed-off-by: Martin Wilck <mwilck@suse.com>
Mariusz Tkaczyk [Mon, 10 Mar 2025 10:16:28 +0000 (11:16 +0100)]
mdadm: use kernel raid headers
For a years we redefined these headers in mdadm. We should reuse headers
exported by kernel to integrate driver and mdadm better.
Include them and remove mdadm owned headers.
There are 3 defines not available in kernel headers, so define them
directly but put them in ifndef guard to make them transparent later.
Use MD_FEATURE_CLUSTERED instead of MD_FEATURE_BITMAP_VERSIONED. The
value is same, kernel define has different name.
Mariusz Tkaczyk [Fri, 7 Mar 2025 10:38:48 +0000 (11:38 +0100)]
mdadm: Remove klibc and uclibc support
Klibc compilation is not working for at least 3 years because of
following error:
mdadm.h:1912:15: error: unknown type name 'sighandler_t'
It will have a conflict with le/be_to_cpu() functions family provided by
asm/byteorder.h which will be included with raid/md_p.h. Therefore we
need to remove support for it. Also, remove uclibc because it is not actively
maintained.
Remove klibc and uclibc targets from Makefile and special klibc code.
Targets can be removed safely because using CC is recommended.
Allow RAID0 to be created with v0.90 metadata #161
It is not currently possible to create a RAID0 with 0.90 metadata.
This is because 0.90 cannot specify the layout of RAID0 (it is
traditionally ignored) and different kernels do different things with
RAID0 layouts.
However it should be possible to use --layout=dangerous as that
acknowledges the risk.
It also should be possible to create a RAID0 with all devices the same
size because in that case all layouts are identical.
The metadata handler can only check that all devices are the same size
quite late - in write_init_super(). By that time the default is
currently set - set to a value that super0 cannot handle.
So this patch delays the setting of the default value and leave it for
the metadata handler (or for the Build handler).
super1 selects ORIG in that case.
intel and ddf don't support non-uniform RAID0 so they don't need any
change.
super0 now checks the sizes of devices if the default RAID0 layout was
requested and rejects the request in they are not the same.
validiate_geometry0 now allows "dangerous" layouts for raid0.
Blazej Kucman [Mon, 31 Mar 2025 10:46:52 +0000 (12:46 +0200)]
imsm: Fix RAID0 to RAID10 migration
Support for RAID10 with +4 disks in IMSM introduced an inconsistency
between the VROC UEFI driver and Linux IMSM. VROC UEFI does not
support RAID10 with +4 disks, therefore appropriate protections were
added to the mdadm IMSM code that results in skipping processing of
such RAID in the UEFI phase. Unfortunately the case of migration
RAID0 2 disks to RAID10 4 disks was omitted, this case requires
maintaining compatibility with the VROC UEFI driver because it is
supported.
For RAID10 +4 disk the MPB_ATTRIB_RAID10_EXT attribute is set in the
metadata, thanks to which the UEFI driver does not process such RAID.
In the series adding support, a new metadata raid level value
IMSM_T_RAID10 was also introduced. It is not recognized by VROC UEFI.
The issue is caused by the fact that in the case of the mentioned
migration, IMSM_T_RAID10 is entered into the metadata but attribute
MPB_ATTRIB_RAID10_EXT is not entered, which causes an attempt to
process such RAID in the UEFI phase. This situation results in
the platform hang during booting in UEFI phase, this also results in
data loss after failed and interrupted RAID processing in VROC UEFI.
The above situation is result of the update_imsm_raid_level()
function, for the mentioned migration function is executed on a map
with a not yet updated number of disks.
The fix is to explicitly handle migration in the function mentioned
above to maintain compatibility with VROC UEFI driver.
Wu Guanghao [Tue, 11 Mar 2025 03:11:55 +0000 (03:11 +0000)]
super1: Clear extra flags when initializing metadata
When adding a disk to a RAID1 array, the metadata is read from the
existing member disks for sync. However, only the bad_blocks flag are
copied, the bad_blocks records are not copied, so the bad_blocks
records are all zeros. The kernel function super_1_load() detects
bad_blocks flag and reads the bad_blocks record, then sets the bad
block using badblocks_set().
After the kernel commit 1726c7746783 (badblocks: improve badblocks_set()
for multiple ranges handling) if the length of a bad_blocks record is 0,
it will return a failure. Therefore the device addition will fail.
So when adding a new disk, some flags cannot be sync and need to be clead.
Junxiao Bi [Tue, 18 Feb 2025 18:48:31 +0000 (10:48 -0800)]
mdmon: imsm: fix metadata corruption when managing new array
When manager thread detects new array, it will invoke manage_new().
For imsm array, it will further invoke imsm_open_new(). Since
commit bbab0940fa75("imsm: write bad block log on metadata sync"),
it preallocates bad block log when opening the array, that requires
increasing the mpb buffer size.
For that, imsm_open_new() invokes function imsm_update_metadata_locally(),
which first uses imsm_prepare_update() to allocate a larger mpb buffer
and store it at "mpb->next_buf", and then invoke imsm_process_update()
to copy the content from current mpb buffer "mpb->buf" to "mpb->next_buf",
and then free the current mpb buffer and set the new buffer as current.
There is a small race window, when monitor thread is syncing metadata,
it gets current buffer pointer in imsm_sync_metadata()->write_super_imsm(),
but before flushing the buffer to disk, manager thread does above switching
buffer which frees current buffer, then monitor thread will run into
use-after-free issue and could cause on-disk metadata corruption.
If system keeps running, further metadata update could fix the corruption,
because after switching buffer, the new buffer will contain good metadata,
but if panic/power cycle happens while disk metadata is corrupted,
the system will run into bootup failure if array is used as root,
otherwise the array can not be assembled after boot if not used as root.
This issue will not happen for imsm array with only one member array,
because the memory array has not be opened yet, monitor thread will not
do any metadata updates.
This can happen for imsm array with at lease two member array, in the
following two scenarios:
1. Restarting mdmon process with at least two member array
This will happen during system boot up or user restart mdmon after mdadm
upgrade
2. Adding new member array to exist imsm array with at least one member
array.
To fix this, delay the switching buffer operation to monitor thread.
Fixes: bbab0940fa75 ("imsm: write bad block log on metadata sync") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
lilinzhe [Mon, 16 Dec 2024 04:11:41 +0000 (12:11 +0800)]
super-ddf: optimize DDF header search for widely used RAID controllers
Implemented fallback logic to search the last 32MB of the device
for the DDF header (magic). If found, proceeds to load the DDF metadata
from the located position.
When clearing metadata as required by the mdadm --zero (function Kill),
also erase the last 32MB of data; otherwise, it may result in an
infinite loop.
According to the specification, the Anchor Header should be placed at
the end of the disk. However,some widely used RAID hardware, such as
LSI and PERC, do not position it within the last 512 bytes of the disk.
lilinzhe [Mon, 16 Dec 2024 04:00:02 +0000 (12:00 +0800)]
super-ddf: Prevent crash when handling DDF metadata
A dummy function is defined because availability of ss->update_super is
not always verified.
This fix addresses a crash reported when assembling a RAID array using
mdadm with DDF metadata. For more details, see the discussion at:
https://lore.kernel.org/all/
CALHdMH30LuxR4tz9jP2ykDaDJtZ3P7L3LrZ+9e4Fq=Q6NwSM=Q@mail.gmail.com/
The discussion centers on an issue with mdadm where attempting to
assemble a RAID array caused a null pointer dereference. The problem
was traced to a missing update_super() function in super-ddf.c, which
led to a crash in Assemble.c.
Ross Lagerwall [Wed, 29 Jan 2025 13:31:11 +0000 (13:31 +0000)]
platform-intel: Disable legacy option ROM scan on UEFI machines
The legacy option ROM memory range from 0xc0000-0xeffff is not defined
on UEFI machines so don't attempt to scan it. This avoids lockdown log
spam when Secure Boot is enabled (avoids use of /dev/mem).
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Yu Kuai [Fri, 27 Dec 2024 06:07:02 +0000 (14:07 +0800)]
mdadm: fix --grow with --add for linear
For the case mdadm --grow with --add, the s.btype should not be
initialized yet, hence BitmapUnknown should be checked instead of
BitmapNone.
Noted that this behaviour should only support by md-linear, which is
removed from kernel, howerver, it turns out md-linear is used widely
in home NAS and we're planning to reintroduce it soon.
udev: persist properties of MD devices after switch_root
dracut installs in the initrd a custom udev rule for MD devices
(59-persistent-storage-md.rules) only to set the db_persist option (see
[1]). The main purpose is that if an MD device is activated in the initrd,
its properties are kept on the udev database after the transition from the
initrd to the rootfs. This was added to fix detection issues when LVM is
on top.
This patch would allow to remove the custom udev rule shipped by dracut
(63-md-raid-arrays.rules is already being installed in the initrd), and it
will also benefit other initrd generators that do not want to create
custom udev rules.
Coly Li [Wed, 22 Jan 2025 15:18:59 +0000 (23:18 +0800)]
mdopen: add sbin path to env PATH when call system("modprobe md_mod")
During the boot process if mdadm is called in udev context, sbin paths
like /sbin, /usr/sbin, /usr/local/sbin normally not defined in PATH env
variable, calling system("modprobe md_mod") in create_named_array() may
fail with 'sh: modprobe: command not found' error message.
We don't want to move modprobe binary into udev private directory, so
setting the PATH env is a more proper method to avoid the above issue.
This patch sets PATH env variable with "/sbin:/usr/sbin:/usr/local/sbin"
before calling system("modprobe md_mod"). The change only takes effect
within the udev worker context, not seen by global udev environment.
Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Mariusz Tkaczyk <mtkaczyk@kernel.org>