- add imsm_chunk_ops struct for better code readability,
- move chunk size mapping to string into array,
- add function to print supported chunk sizes by IMSM controller.
To avoid repeating mdstat_read() in IncrementalRemove(), new function
mdstat_find_by_member_name() has been proposed. With that,
IncrementalRemove() handles own copy of mdstat content and there is no
need to repeat reading for external stop.
Additionally, It proposed few helper to avoid repeating
mdstat_ent->metadata_version checks across code.
It is essential to avoid vulnerabilities in code as much
as possible using safe compilation flags. It is easier if
they are added to the Makefile and applied during compilation.
Add new gcc flags and make them configurable, because they
may not be supported for some compilers.
Set FORTIFY_SOURCE with the highest supported value for platform.
super0: use define for char array in examine_super0
Using nb with 11 length may cause format-truncation errors,
because it was possible to use snprintf with 12 length input
and write it to 11 length output. Added new define and use it
to avoid this error.
Based on documentation SCSI Primary Commands - 4 (SPC-4) only first 7 bits
of first byte in sense data are used to store response code. The current
verification uses all 8 bits for comparison of response code.
Incorrect verification may make impossible to use SATA disks with IMSM,
because IMSM requires verification of the encryption state before use.
There was issue in kernel libata [1]. This issue hides bug in mdadm because
last bit was not set.
Example output with affected mdadm:
Port3 : /dev/sde (BTPR212503EK120LGN)
mdadm: Failed ata passthrough12 ioctl. Device: /dev/sde.
mdadm: Failed to get drive encryption information
The fix is use the first 7 bits of Byte 0, to compare with the expected
values.
Mentioned commit (see Fixes) causes that devices with UUID
equal to uuid_zero was not recognized properly. For few devices
the first one was taken always, and the same information was
printed. It caused regression, when few containers were created,
symlinks were generated only for the first one.
Add checking if uuid is uuid_zero and, if yes, use devname to
differentiate devices.
GH action is using checkout plugin, which takes fetch-depth
as a parameter to specify number of commits to fetch. Setting it
to 0 to fetch all of the history of all branches and tags.
Do not allow to use '.' on first place for named MD device.
Having leading dot might be confusing, MD device cannot be hidden.
It also removes possibility to create md device with name '.'.
Fixing the following coding errors the coverity tools found:
* Event parameter_hidden: declaration hides parameter "dv".
* Event leaked_storage: Variable "mdi" going out of scope leaks the storage
it points to.
* Event overwrite_var: Overwriting "mdi" in "mdi = mdi->devs" leaks the
storage that "mdi" points to.
* Event leaked_handle: Handle variable "lfd" going out of scope leaks
the handle.
* Event leaked_handle: Returning without closing handle "fd" leaks it.
* Event fixed_size_dest: You might overrun the 32-character fixed-sizei
string "devnm" by copying the return value of "fd2devnm" without
checking the length.
* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "nm" by copying "nmp" without checking the length.
* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "devnm" by copying the return value of "fd2devnm" without
checking the length.
* Event assigned_value: Assigning value "-1" to "tfd" here, but that
stored value is overwritten before it can be used.
mdadm/clustermd_tests: adjust test cases to support md module changes
Since kernel commit db5e653d7c9f ("md: delay choosing sync action to
md_start_sync()") delays the start of the sync action, clustermd
array sync/resync jobs can happen on any leg of the array. This
commit adjusts the test cases to follow the new kernel layer behavior.
Fixing the following coding errors the coverity tools found:
* Calling "lseek64" without checking return value. This library function may
fail and return an error code.
* Overrunning array "anchor->pad2" of 3 bytes by passing it to a function
which accesses it at byte offset 398 using argument "399UL".
* Event leaked_storage: Variable "sra" going out of scope leaks the storage
it points to.
* Event leaked_storage: Variable "super" going out of scope leaks the storage
it points to.
* Event leaked_handle: Handle variable "dfd" going out of scope leaks the
handle.
* Event leaked_storage: Variable "dl1" going out of scope leaks the storage
it points to
* Event leaked_handle: Handle variable "cfd" going out of scope leaks the
handle.
* Variable "avail" going out of scope leaks the storage it points to.
* Passing unterminated string "super->anchor.revision" to "fprintf", which
expects a null-terminated string.
* You might overrun the 32-character fixed-size string "st->container_devnm"
by copying the return value of "fd2devnm" without checking the length.
* Event fixed_size_dest: You might overrun the 33-character fixed-size string
"dev->name" by copying "(*d).devname" without checking the length.
* Event uninit_use_in_call: Using uninitialized value "info.array.raid_disks"
when calling "getinfo_super_ddf"
V2: clean up validate_geometry_ddf() routine with Mariusz Tkaczyk recommendations.
V3: clean up spaces with Blazej Kucman recommendations.
V4: clean up recommended by Mariusz Tkaczyk.
V5: clean up recommended by Mariusz Tkaczyk.
* Event negative_returns: "fd" is passed to a parameter that cannot be negative. Which
is set to -1 to start.
* Event open_fn: Returning handle opened by "open_dev_excl".
* Event var_assign: Assigning: "container_fd" = handle returned from
"open_dev_excl(st->container_devnm)"
* Event leaked_handle: Handle variable "container_fd" going out of scope leaks the handle
CI: use prepared checkpatch.conf file only for GH actions
Configuration file .checkpatch.conf is working properly only with
GH actions, because flags from GH plugin are used there. This file
shall not be placed in main repo directory, because it causes errors
while using checkpatch from Linux. Add step to review.yml to copy
this file before checkpatch action is started.
mdadm: Fix socket connection failure when mdmon runs in foreground mode.
While creating an IMSM RAID, mdadm will wait for the mdmon main process
to finish if mdmon runs in forking mode. This is because with
"Type=forking" in the mdmon service unit file, "systemctl start service"
will block until the main process of mdmon exits. At that moment, mdmon
has already created the socket, so the subsequent socket connect from
mdadm will succeed.
However, when mdmon runs in foreground mode (without "Type=forking" in
the service unit file), "systemctl start service" will return once the
mdmon process starts. This causes mdadm and mdmon to run in parallel,
which may lead to a socket connection failure since mdmon has not yet
initialized the socket when mdadm tries to connect. If the next
instruction/command is to access this device and try to write to it, a
permission error will occur since mdmon has not yet set the array to RW
mode.
Kinga Stefaniuk [Tue, 25 Jun 2024 08:48:33 +0000 (10:48 +0200)]
CI: fix excluded files in checkpatch.conf
--exclude flag in checkpatch.conf is configured to work on directories
only. When checkpatch.conf contains files, checkpatch scan is not started.
Remove file names and keep only directories which should be excluded.
Nigel Croxon [Tue, 25 Jun 2024 11:57:28 +0000 (07:57 -0400)]
mdadm: Assemble.c fix coverity issues
Fixing the following coding errors the coverity tools found:
* Event dereference: Dereferencing "pre_exist", which is known to be "NULL".
* Event parameter_hidden: Declaration hides parameter "c".
* Event leaked_storage: Variable "pre_exist" going out of scope leaks the
storage it points to.
* Event leaked_storage: Variable "avail" going out of scope leaks the
storage it points to.
connect_monitor() is called from ping_monitor() but this function is often
used as advice, without verification that mdmon is really working. This
produces hangs in many scenarios.
Gwendal Grignou [Wed, 15 May 2024 21:30:59 +0000 (14:30 -0700)]
Makefile: Do not call gcc directly
When mdadm is compiled with clang, direct gcc will fail.
Make sure to use $(CC) variable instead.
Note that Clang does not support --help=warnings,
--print-diagnostic-options should be used instead.
So with Clang, the compilation will go through, but the
extra warning flags will never be added.
mdadm: Fix socket connection failure when mdmon runs in foreground mode.
While creating an IMSM RAID, mdadm will wait for the mdmon main process
to finish if mdmon runs in forking mode. This is because with
"Type=forking" in the mdmon service unit file, "systemctl start service"
will block until the main process of mdmon exits. At that moment, mdmon
has already created the socket, so the subsequent socket connect from
mdadm will succeed.
However, when mdmon runs in foreground mode (without "Type=forking" in
the service unit file), "systemctl start service" will return once the
mdmon process starts. This causes mdadm and mdmon to run in parallel,
which may lead to a socket connection failure since mdmon has not yet
initialized the socket when mdadm tries to connect. If the next
instruction/command is to access this device and try to write to it, a
permission error will occur since mdmon has not yet set the array to RW
mode.
Mateusz Kusiak [Fri, 15 Mar 2024 20:03:09 +0000 (16:03 -0400)]
test: pass flags to services
Commit 4c12714d1ca0 ("test: run tests on system level mdadm") removed
MDADM_NO_SYSTEMCTL flag from test suite. This causes imsm tests to fail
as mdadm no longer triggers mdmon and flags exists only within session.
Use systemd set/unset-environment to pass necessary flags.
Introduce colors to grab users attention to warnings and key messages.
Make test suite setup systemd environment.
Add setup/clean_systemd_env() functions.
Warn user about altering systemd environment.
Logan Gunthorpe [Tue, 4 Jun 2024 16:38:36 +0000 (10:38 -0600)]
mdadm: Fix hang race condition in wait_for_zero_forks()
Running a create operation with --write-zeros can randomly hang
forever waiting for child processes. This happens roughly on in
ten runs with when running with small (20MB) loop devices.
The bug is caused by the fact that signals can be coallesced into
one if they are not read by signalfd quick enough. So if two children
finish at exactly the same time, only one SIGCHLD will be received
by the parent.
To fix this, wait on all processes with WNOHANG every time a SIGCHLD
is received and exit when all processes have been waited on.
Kinga Stefaniuk [Tue, 11 Jun 2024 05:58:49 +0000 (07:58 +0200)]
imsm: make freesize required to volume autolayout
Autolayout_imsm() shall be executed when IMSM_NO_PLATFORM=1 is set.
It was fixed by listed commit, checking super->orom was removed, but
also checking freesize. Freesize is not set for operations on RAID
volume with no size update, that's why it is not required to have
this value and always run autolayout_imsm().
Fix it by making autolayout_imsm() dependent on freesize.
Fixes: 46f192 ("imsm: fix first volume autolayout with IMSM_NO_PLATFORM") Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
Mariusz Tkaczyk [Thu, 23 May 2024 10:06:36 +0000 (12:06 +0200)]
imsm: fix first volume autolayout with IMSM_NO_PLATFORM
Autolayout_imsm() is not executed if IMSM_NO_PLATFORM=1 is set.
This causes that first volume cannot be created. Disk for new volume are
never configured.
Fix it by making autolayout_imsm() independent from super->orom because
NULL there means that IMSM_NO_PLATFORM=1 is set. There are not platform
restrictions to create volume, we just analyze drives. It is safe.
Fixes: 6d4d9ab295de ("imsm: use same slot across container") Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Xiao Ni [Tue, 28 May 2024 13:51:47 +0000 (21:51 +0800)]
mdadm/tests: bitmap cases enhance
It fails because bitmap dirty number is smaller than 400 sometimes. It's not
good to compare bitmap dirty bits with a number. It depends on the test
machine, it can flush soon before checking the number. So remove related codes.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Xiao Ni [Wed, 22 May 2024 08:50:55 +0000 (16:50 +0800)]
mdadm/tests: 07changelevelintr
It needs to specify a 2 powered array size when updating array size.
If not, it can't change chunksize.
And sometimes it reports error reshape doesn't happen. In fact the
reshape has finished. It doesn't need to wait before checking
reshape action. Because check function waits itself.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Xiao Ni [Wed, 22 May 2024 08:50:53 +0000 (16:50 +0800)]
mdadm/tests: 07autoassemble
This test is used to test stacked array auto assemble.
There are two different cases depends on if array is foreign or not.
If the array is foreign, the stacked array (md0 is on md1 and md2)
can't be assembled with name md0. Because udev rule will run when md1
and md2 are assembled and mdadm -I doesn't specify homehost. So it
will treat stacked array (md0) as foreign array and choose md127 as
the device node name (/dev/md127)
Add the case that stacked array is local.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Xiao Ni [Wed, 22 May 2024 08:50:44 +0000 (16:50 +0800)]
mdadm/tests: 03assem-incr enhance
It fails when hostname lenght > 32. Because the super1 metadata name
doesn't include hostname when hostname length > 32. Then mdadm thinks
the array is a foreign array if no device link is specified when
assembling the array. It chooses a minor number from 127.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Xiao Ni [Wed, 22 May 2024 08:50:43 +0000 (16:50 +0800)]
mdadm/tests: names_template enhance
For super1, if the length of hostname is >= 32, it doesn't add hostname
in metadata name. Fix this problem by checking the length of hostname.
Because other cases may use need to check this, so do the check in
do_setup.
And this patch adds a check if link /dev/md/name exists.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Xiao Ni [Wed, 22 May 2024 08:50:41 +0000 (16:50 +0800)]
mdadm/tests: test enhance
There are two changes.
First, if md module is not loaded, it gives error when reading
speed_limit_max. So read the value after loading md module which
is done in do_setup
Second, sometimes the test reports error sync action doesn't
happen. But dmesg shows sync action is done. So limit the sync
speed before test. It doesn't affect the test run time. Because
check wait sets the max speed before waiting sync action. And
recording speed_limit_max/min in do_setup.
Fixes: 4c12714d1ca0 ('test: run tests on system level mdadm') Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Xiao Ni [Wed, 22 May 2024 08:50:39 +0000 (16:50 +0800)]
mdadm: Start update_opt from 0
Before f2e8393bd722 ('Manage&Incremental: code refactor, string to enum'), it uses
NULL to represent it doesn't need to update. So init UOPT_UNDEFINED to 0. This
problem is found by test case 05r6tor0.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Valery Ushakov [Wed, 22 May 2024 14:07:38 +0000 (17:07 +0300)]
Makefile: fix make -s detection
Only check the first word of MAKEFLAGS for 's', that's where all the
single letter options are collected.
MAKEFLAGS contains _all_ make flags, so if any command line argument
contains a letter 's', the silent test will be false positive. Think
e.g. make 'DESTDIR=.../aports/main/mdadm/pkg/mdadm' install
Mariusz Tkaczyk [Fri, 29 Mar 2024 14:21:54 +0000 (15:21 +0100)]
mdadm: deprecate bitmap custom file
This option has been deprecated in kernel by Christoph in commit 0ae1c9d38426 ("md: deprecate bitmap file support"). Do the same in
mdadm.
With this change, user must acknowledge it, it is not
skippable. The implementation of custom bitmap file looks like it's
abandoned. It cannot be done by Incremental so it is not respected by
any udev based system and it seems to not be recorded by metadata.
User must assemble such volume manually.
Tests for bitmap custom file are removed because now they will not
pass because interaction with user is mandatory.
Nigel Croxon [Wed, 22 May 2024 20:53:22 +0000 (16:53 -0400)]
mdadm: super-intel fix bad shift
In the expression "1 << i", left shifting by more than 31 bits has undefined behavior.
The shift amount, "i", is as much as 63. The operand has type "int" (32 bits) and will
be shifted as an "int". The fix is to change to a 64 bit int.
Nigel Croxon [Wed, 22 May 2024 20:05:25 +0000 (16:05 -0400)]
mdadm: super-intel remove dead code
Execution cannot reach this statement: "while (devlist) { dv = de...".
Local variable "err" is assigned only once, to a constant value,
making it effectively constant throughout its scope.
Remove dead code.
Blazej Kucman [Wed, 15 May 2024 11:26:28 +0000 (13:26 +0200)]
mdadm: Fix compilation for 32-bit arch
Casting void pointer to __u64 works for 64-bit arch but fails to compile
on 32-bit arch like i686.
Fail on i686 platform:
drive_encryption.c: In function ‘nvme_security_recv_ioctl’:
drive_encryption.c:236:25: error: cast from pointer to integer of
different size [-Werror=pointer-to-int-cast]
236 | nvme_cmd.addr = (__u64)response_buffer;
| ^
drive_encryption.c: In function ‘nvme_identify_ioctl’:
drive_encryption.c:271:25: error: cast from pointer to integer of
different size [-Werror=pointer-to-int-cast]
271 | nvme_cmd.addr = (__u64)response_buffer;
| ^
cc1: all warnings being treated as errors
make: *** [Makefile:211: drive_encryption.o] Error 1
This change adds cast void pointer to uintptr_t first to ensure that
proper pointer size is used for casting from pointer type. Then is safe to
cast it to __u64 because it is tracked as u_int, regardless it is 32-bit
or 64-bit arch.
Introduce review.yml used by GitHub actions. Add make probe, checkpatch
and hardening-check on every pull request.
Add dependabot.yml file which check for updates of actions used in this
repository. This option enables to automatically fill new PR with action
updated to the latest version.
Kinga Stefaniuk [Tue, 7 May 2024 03:38:56 +0000 (05:38 +0200)]
Wait for mdmon when it is stared via systemd
When mdmon is being started it may need few seconds to start.
For now, we didn't wait for it. Introduce wait_for_mdmon()
function, which waits up to 5 seconds for mdmon to start completely.
Move -pie from LDLIBS to LDFLAGS and make LDFLAGS configurable to allow
the user to drop it by setting their own LDFLAGS (e.g. PIE could be
enabled or disabled by the buildsystem such as buildroot).
Xiao Ni [Thu, 18 Apr 2024 10:23:19 +0000 (18:23 +0800)]
tests/01r5fail enhance
After removing dev0, the recovery starts because it already has a spare
disk. It's good to check recovery. But it's not right to check recovery
after adding dev3. Because the recovery may finish. It depends on the
recovery performance of the testing machine. If the recovery finishes,
it will fail. But dev3 is only added as a spare disk, we can't expect
there is a recovery happens.
So remove the codes about adding dev3.
Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
VROC UEFI driver does not support RAID 10 with more than 4 drives.
Add user prompts if such layout is being created and for R0->R10
reshapes.
Refactor ask() function:
- simplify the code,
- remove dialog reattempts,
- do no pass '?' sign on function calls,
- highlight default option on output.
This patch completes adding support for R10D4+ to IMSM.
IMSM version 1.3 (called ATTRIBS) brought attributes used to define array
properties which require support in driver. The goal of this change was
to avoid changing version when adding new features.
For some reasons migration has never been completed and currently (after
10 years of implementing) IMSM can use older versions.
It is right time to finally switch it. There is no point in using old
versions, use 1.3.00 as minimal one.
Add imsm_level_ops struct for better handling and unifying raid level
support. Add helper methods and move "orom_has_raid[...]" methods from
header to source file.
RAID 1e is not supported under Linux, remove RAID 1e associated code.
Refactor imsm_analyze_change() and is_raid_level_supported().
Remove hardcoded check for 4 drives and make devNumChange a multiplier
for RAID 10.
As for now, IMSM supports only 4 drive RAID 1+0. This patch is first in
series to add support for literal RAID 10 (with more than 4 drives) to
imsm.
Allow setting RAID 10 as raid level for imsm arrays.
Add update_imsm_raid_level() to handle raid level updates. Set RAID10 as
default level for imsm R0 to R10 migrations. Replace magic numbers with
defined values for RAID level checks/assigns.
Define FALLOC_FL_ZERO_RANGE if needed as FALLOC_FL_ZERO_RANGE is only
defined for aarch64 on uclibc-ng resulting in the following or1k build
failure since commit 577fd10486d8d1472a6b559066f344ac30a3a391:
Create.c: In function 'write_zeroes_fork':
Create.c:155:35: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function)
155 | if (fallocate(fd, FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE,
| ^~~~~~~~~~~~~~~~~~~~
Mariusz Tkaczyk [Tue, 26 Mar 2024 12:21:10 +0000 (13:21 +0100)]
mdadm: add CHANGELOG.md
Bring changelog back to life. Remove ANNOUCEs. It will use markdown
format, to have one style. All releases are migrated to new
changelog. It was a exercise I have taken, to familiarize with the
mdadm history.
Blazej Kucman [Fri, 22 Mar 2024 11:51:20 +0000 (12:51 +0100)]
imsm: drive encryption policy implementation
IMSM cares about drive encryption state. It is not allowed to mix disks
with different encryption state within one md device. This policy will
verify that attempt to use disks with different encryption states will
fail. Verification is performed for devices NVMe/SATA Opal and SATA.
There is one exception, Opal SATA drives encryption is not checked when
ENCRYPTION_NO_VERIFY key with "sata_opal" value is set in conf, for this
reason such drives are treated as without encryption support.
Blazej Kucman [Fri, 22 Mar 2024 11:51:19 +0000 (12:51 +0100)]
imsm: print disk encryption information
Print SATA/NVMe disk encryption information in --detail-platform.
Encryption Ability and Status will be printed for each disk.
There is one exception, Opal SATA drives encryption is not checked when
ENCRYPTION_NO_VERIFY key with "sata_opal" value is set in conf, for this
reason such drives are treated as without encryption support.
To test this feature, drives SATA/NVMe with Opal support or SATA drives
with encryption support have to be used.
Example outputs of --detail-platform:
Non Opal, encryption enabled, SATA drive:
Port0 : /dev/sdc (CVPR050600G3120LGN)
Encryption(Ability|Status): Other|Unlocked
NVMe drive without Opal support:
NVMe under VMD : /dev/nvme2n1 (PHLF737302GB1P0GGN)
Encryption(Ability|Status): None|Unencrypted
Unencrypted SATA drive with OPAL support:
- default allow_tpm, we will get an error from mdadm:
Port6 : /dev/sdi (CVTS4246015V180IGN)
mdadm: Detected SATA drive /dev/sdi with Trusted Computing support.
mdadm: Cannot verify encryption state. Requires libata.tpm_enabled=1.
mdadm: Failed to get drive encrytpion information.
- added "libata.allow_tpm=1" to boot parameters(requires reboot),
the status will be read correctly:
Port6 : /dev/sdi (CVTS4246015V180IGN)
Encryption(Ability|Status): SED|Unencrypted
Blazej Kucman [Fri, 22 Mar 2024 11:51:18 +0000 (12:51 +0100)]
Add key ENCRYPTION_NO_VERIFY to conf
Add ENCRYPTION_NO_VERIFY config key and allow to disable checking
encryption status for given type of drives.
The key is introduced because of SATA Opal disks for which TPM commands
must be enabled in libata kernel module, (libata.allow_tpm=1), otherwise
it is impossible to verify encryption status. TPM commands are disabled by
default.
Currently the key only supports the "sata_opal" value, if necessary,
the functionality is ready to support more types of disks. This
functionality will be used in the next patches.